[gclist] Precise GC's performance [really finalization protocol]

David Chase chase@centerline.com
Wed, 06 Mar 96 14:43:38 EST


> From: Mark Tillotson <markt@harlequin.co.uk>
>
> I don't believe so---IF the cycle is purely between the finalizable
> objects, AND if none of them are revived during the "kill methods", then
> the next time round the GC they are no longer on the list of
> finalizable objects, and hence form a detached cycle unreachable from the
> roots.

> Does this clarify things?

Not really.  Perhaps I don't understand how you intend it to work.  
I imagine two finalization policies:

1. containers (roots) are always finalized before containees (interior 
   nodes), which are not finalized if the containers revive themselves.  
   That is, you start finalizing at the various roots of the garbage 
   graphs, and either 

   (a) use a write barrier so you can know whether an object has 
       revived itself, OR 
    
   (b) finalize just "one deep" into the graph, and wait to see whether 
       those objects contained by the garbage roots are new "garbage 
       roots" after the next collection. 

   Note that this policy is stuck, in the face of cycles, because 
   there is no root to the garbage.  If you choose one arbitrarily, 
   you can end up with the same problem faced by choice #2. 

2. everything unreachable, is finalized, in no particular order.

   This deals with cycles, but because the garbage collector's view 
   of your program (a bunch of linked blobs of memory) may be unrelated 
   to your view of the program, it means that the finalizer has to 
   be written with very few assumptions about the state of a components 
   of an object.  What is conceptually "one thing" to a programmer 
   (e.g., a splay tree, or a hash table) might look like many random 
   pieces of memory to the GC.  This seems like a mess to me.

What is especially obnoxious in either case is  how you deal with 
refinalization. If you explicitly register an object for finalization 
(in the style of Modula-3, for instance) then you can detect what 
objects are eligible for future finalization, but if finalization/destruction 
is instead type-linked (as in Java or C++), and you use a weak-pointer-based 
concurrent finalization (I think this is a sensible assumption) then, 
lacking a write barrier, you have no way of knowing (at a future 
garbage collection) whether an object on the finalization queue stayed 
garbage (thus needing no further finalization) or was revived, and 
then became garbage again (thus, it "should" have its finalizer run).  
Java deals with this, at least in the versions of the language that 
I've read about, by declaring that finalizers are run once, full 
stop.  I checked to see if this is really what was intended, and 
it really is.  Obviously, this has implications for what code you 
put in a finalizer, though I don't think they made this point in 
the version of the language spec that I read.  (This, by the way, 
is what I regard as the major problem for use of GC with legacy C++ 
code -- what do you do with the destructors?  How do they get run 
in the first place, and how do you deal with the re-running problem
in cases like this?)

David Chase