[gclist] Finalizer flame wars.

Paul R. Wilson wilson@cs.utexas.edu
Fri, 10 May 1996 20:00:59 -0500


I agree with Hans's many good points.

Any claim that anything "should be suitable for mission-critical applications"
is usually bogus, unless you build all of the other components of a system
suitable for mission-critical apps.  There are ways of doing these things,
but almost none of them does them, so why should GC be any different?

For example, we can make our GC absolutely hard real time, and absolutely
hard real-space, but it would cost something that most users don't want
to pay, and the annoying problems are in the compiler more than in the GC.
(We intend to make this an option eventually, but it's just not what
most people are asking for.)

With respect to controlling external resources, you have to recognize that
resource usage is almost always implementation-dependent in some significant
ways, and that ensuring resource usage bounds takes thought, and in many
cases takes thought about how the implementation of the language works.

Here's an example of resource usage using finalizers that IS reliable,
but implementation- and application- dependent.  (It's a kind of
app our RScheme system is eventually intended to support.)

One of the things I'd like to do with RScheme is build a real-time
multimedia database system, where the interesting rich data structures
are in the GC'd heap.  Other data structures are in files, managed
with finalizers.  The idea is that the interesting, rich data structures
are in RAM at all times, managed by a hard-real-time GC.  Large, regular
volumes of data are out on disk somewhere, with file access methods ensuring
real-time delivery of streams of dumb (byte) data.  We have three levels
of real-time here: 

   1. No program pauses of more than about two milliseconds, so that
      we can process individual frames of 15 FPS interactive graphics
      in the language, rather than in low-level drivers.  (We can also
      handle interrupts in the GC'd language, if they are guaranteed
      not to come more than one per a few milliseconds... e.g., "disk
      data ready" interrupts, network are-you-alive pings with 10 msec
      timeouts, etc.)

   2. No program slowdowns of more than about 50% over any period of
      5 milliseconds---e.g., if you get a two millisecond GC pause,
      you *won't* get one again for at least another two milliseconds.

   3. Once a connection is made (to a file stream or socket or whatever),
      it delivers data at some guaranteed rate like 100KB a second.

We want to be able to write sophisticated programs that index data,
set up and break down connections, schedule media delivery, and so on,
in a hard-real-time-GC'd language.  This GC'd kernel also manages
external resources, partly via finalizers.

For this kind of application, we can ensure that finalizers execute in a
timely manner, because the GC'd heap is small and can be fully GC'd in
bounded time, on a regular schedule if need be.  This ensures that
finalizers execute within a couple of GC cycles after finalizable objects
become garbage.

This is the kind of thing that will work fine, but you have to know what
you're doing when you build it.  You may decide *not* to use the generational
version of our GC, because it doesn't put as tight a bound on the (temporal)
conservatism of the GC.  If you want prompt finalizers, use the
non-generational GC, bound the amount of live data in the heap, and run
your GC fast enough to meet your deadlines.

Notice that for this application, we may be able to provide very hard
bounds on the time that finalizable objects go unfinalized.  For example,
if we have only a few MB of data on the heap, but may have gigabytes
worth of large files on disk managed with finalizers, we may ensure that
the disk space wasted by the GC is bounded.  Suppose we do a full
incremental GC once every 15 seconds, to ensure that we get all garbage
files back within 30 seconds.

Disk data can only be written to disk at a few MB a second, so in that
30-second time window we can only write a few tens of megabytes of data
to disk.  Anything that's garbage is detected and freed withing 30 seconds,
so we can only write about 100 MB of stuff to disk before all the older
garbage is reclaimed---we never waste more than 100 MB of disk, or never
for more than 30 seconds.  (Note the "or" there, not an "and".  We may
waste storage for a 1 GB file for 30 seconds after it becomes garbage,
but if we already wrote it to disk, we must have had that disk space when
we did it.  In the meantime, we may have written another 100 MB of
undetected garbage files to disk.  So we only need about 100 MB of
"headroom" to make sure that our peak memory usage doesn't exceed the
disk capacity.  Any time we exceed that, we'd have exceeded a 100 MB
smaller disk capacity even if we reclaimed all garbage immediately.)

The real point of the above example is that proving resource bounds
requires understanding of the program using the resources, and the
system managing the resources, but that doesn't mean it can't be done
in a GC'd language, and it doesn't mean it can't be done using
finalizers.  If you need prompter finalizers, you may be out of luck,
but to say that "finalizers are no good if they're not immediate" is
too strong.

To get back to agreeing with Hans's points, it's also too strong to say 
that if there's not a strict guarantee of promptness, finalizers are
useless.  You may have to do system-dependent things to make your
finalizers "prompt enough" for your purpose---e.g., forcing full GC's
every now and then, rather than just letting them happen when the
oldest generation fills up---but any time you do something resource-critical,
you have to think harder.  Most people don't bother to think that hard
until they find out they need to, and rightly so.

I think it's perfectly reasonable to have finalizers that don't have
any particular guarantee of promptness, at least at the *normal*
programming language level.  Most programs will never notice, and if
they run into a bad case, they can be debugged.  If you're doing something
mission-critical, you're going to have to work out the interactions
between your program and the language implementation anyway.