[gclist] Precise GC's performance

boehm.PARC@xerox.com boehm.PARC@xerox.com
Wed, 6 Mar 1996 15:18:30 PST


Mark Tillotson writes:

"There is a straightforward method that DTRT for finalization, although
it does delay collection of finalizable objects somewhat: (hopefully
I'll get this right...)

You start by marking from all the real roots (precisely or
conservatively, it shouldn't matter), and then look at your list
(or whatever) of finalizable objects.  Any marked finalizable objects
stay on the list, but unmarked ones are _moved_ to a new "pending
finalizations" list.  These are the ones known to be inaccessable at
the point the GC was entered.

You then (and only then) mark (and mark from) all the objects on this
pending list.  (You need to do this because they are accessible to the
finalization code, which is part of the mutator). ..."

I do believe that this finalizes cycles, but not all of us agree that's an
advantage.  It is safe, at least in a weak sense, in that finalization code
does not see dangling pointers.  It also gives you prompt finalization for long
chains of finalizable objects.  However, it doesn't present a terribly pleasant
programming model, in that an object may be finalized after the objects it
references.  Hence finalization code may refer to objects whose memory is still
allocated, but which are no longer useful since they have already been
finalized.

The canonical example in which this fails is the buffered file data structure,
which contains a pointer to a file object.  The finalization code for a
buffered file flushes the buffer.  The finalization code for the file object
closes the file.  With these semantics if you drop a pair of these, it may well
happen that the file is closed first and then the program tries to flush the
buffer to a closed file.  (It's possible to avoid this by specially saving the
file pointer someplace and then having the buffered file object remove it.  But
it needs to be treated specially, and failure to do so can lead to errors that
manifest themselves only on rare occasions.)

The other alternative is to not finalize cycles, and give the programmer a
mechanism for breaking links that are no longer needed during finalization.
Empirically you need this mechanism very rarely.  (Cedar experience suggests
about twice per million lines of code.)  And if you forget to use it, you get a
warning about a finalization cycle instead of a crash.

In any case, it really probably doesn't make much difference which solution you
pick for a purely garbage collected environment.  In my experience, code
written for garbage collection typically has sufficiently few (but very
important) uses of finalization that a little bit of extra hassle in writing
those modules doesn't significantly affect overall code complexity.  This is
more of an issue in converting C++ code with heavy destructor use, or if you
need to deal with a third party library that expects explicit deallocation
calls.

Hans-J. Boehm
Standard disclaimer ...