[gclist] Guardians

Hans Boehm boehm@hoh.mti.sgi.com
Wed, 16 Apr 1997 16:28:18 -0700


On Apr 16,  1:28pm, Carl Bruggeman wrote:
> Subject: Re: [gclist] Guardians
> Hans Boehm wrote:
> > Timely finalization is elusive in either scheme.  It's hard to
> > define what you mean by timely in a world in which that object in
> > question may have been promoted to a generation that will be
> > collected next week.

>
> Here we fundamentally disagree.  Due to this discussion, I have begun
> to recall some our design decisions from '93, and I now remember that
> we briefly considered what we are now calling "topological" finalization.
> It was precisely due to the issue of timely finalization that we
> dismissed the idea of topological finalization -- and it was
> precisely because we have a production quality generation-based
> collector and the bad interaction between topological finalization and
> generation-based collectors.
So do we.  But I suspect it's quite different from yours.  That may account for
some of the differences in opinion here.

>
> Topological finalization of a chain of N dependant objects requires N
> collections in a simple semi-space copying collector to finalize the
> entire chain of objects, while, in the same system, guardians requires
> a single collection.  In a generation-based collector N is the _lower
> bound_ on the number of collections required for topological
> finalization.  Since each collection potentially promotes live objects
> to older generations, and objects in older generations are, by
> definition collected less frequently, topological finalization
> guarantees that, the farther down the chain an object is, the older
> the generation it will end up in, and the longer it will take to be
> collected, because, again by definition, a generation-based collector
> will do several younger generation collections for each of the older
> generation collections.  For a chain or cycle of guardians, in which
> the oldest generation of any of the objects is G, finalization will
> take at most 1 collection of generation G.
Right.  And that's a good point.

My argument is that based on my experience, admittedly with a
different collector, this has never mattered.  I suspect the reason is that
N was never larger than 2 or so, at least after proper treatment of pointers
not needed for finalization.  (I would guess that actual path lengths can
be significantly longer.  But N here is the number of finalizable objects
in the chain, not the length of the chain.  And finalizable objects are usually
rare.)

If it is an issue, you can get guardian-like behavior by treating all pointers
as weak for finalization and storing the information you really need for
finalization in a global data structure.  This isn't pleasant.  But I think
it's an acceptable fallback for a case I've never encountered.

Also note that this whole argument does not hold for Java, or if you program
with guardians as you would in Java, i.e. if you deal with a dependency of A on
B by storing a global pointer to B until As finalizer is run.

>
> > People don't build lists of a billion finalizable objects.  Linear
> > lists of that length are rarely a reasonable dat a structure.
>
> I think "billion" was never meant to be considered seriously.  In a
> system like ours with 4 generations, however, it would only take a
> chain of length 4 with topological finalization (even if all the
> objects were initially in the youngest generation) to delay the
> finalization of the last object in the chain until a generation 4
> collection (which may indeed be once a week).
This only applies if objects are promoted on surviving a single collection.
(We also do this, but with only 2 generations.)

(I was exaggerating. I actually hope you don't really collect the last
generation once a week in an active Scheme workspace.  That would probably mean
that if read in a 100 MB data file, munge it for a few hours, and then dropped
it, it wouldn't get reclaimed for a week.  I've always found that moderately
frequent full collections are necessary for the kind of environment I've been
dealing with.)

>
> The next obvious thing to consider is:  how much is timely
> finalization worth?  We decided that unless a programmer has some
> guarantee about when finalization occurs, then the finalization
> feature is essentially useless.  Guardians give the same guarantee
> that the programmer has about when memory is freed by the collector
> and guardians are affected by the same mechanisms that the programmer
> has to force collections of particular generations at particular
> times.  With topological finalization the guarantees are data
> dependent (i.e.  they depend on how long the finalization chains are,
> which can vary greatly from run-to-run), and, as such, are very
> difficult for the programmer to reason about in general.
But isn't that in general the argument for garbage collection?  Object
lifetimes are data dependent and hard to predict, so the system should take
care of deallocation.  As I pointed out in my example, required finalization
ordering is also data dependent, which makes it difficult to program explicitly
with guardians.  I think this data dependence is precisely the reason
finalization ordering should be the collector's job.

I don't think there's a real difference in predictability.  The difference is
that objects reachable from finalizers are considered live for finalization
purposes, as well as memory reclammation purposes.  You only consider them live
for memory reclammation purposes (where you thus need 2 collections to reclaim
the memory.)


> There are a several other things from your message that I will comment
> on later, but I already used up my quota of time for mailing lists for
> this week at some point last week...
So did I ...

Hans


-- 
Hans-Juergen Boehm
boehm@mti.sgi.com