[gclist] Guardians

Carl Bruggeman bruggema@ranger.uta.edu
Thu, 17 Apr 1997 11:06:14 -0500


--------
> On Apr 16,  1:28pm, Carl Bruggeman wrote:
> > Subject: Re: [gclist] Guardians
> > Hans Boehm wrote:

 ...

This is important background for the last part of this message:

> > Topological finalization of a chain of N dependant objects requires N
> > collections in a simple semi-space copying collector to finalize the
> > entire chain of objects, while, in the same system, guardians requires
> > a single collection.  In a generation-based collector N is the _lower
> > bound_ on the number of collections required for topological
> > finalization.  Since each collection potentially promotes live objects
> > to older generations, and objects in older generations are, by
> > definition collected less frequently, topological finalization
> > guarantees that, the farther down the chain an object is, the older
> > the generation it will end up in, and the longer it will take to be
> > collected, because, again by definition, a generation-based collector
> > will do several younger generation collections for each of the older
> > generation collections.  For a chain or cycle of guardians, in which
> > the oldest generation of any of the objects is G, finalization will
> > take at most 1 collection of generation G.
>
> Right.  And that's a good point.

 ...

> (I was exaggerating.  I actually hope you don't really collect the
> last generation once a week in an active Scheme workspace.  That
> would probably mean that if I read in a 100 MB data file, munge it
> for a few hours, and then dropped it, it wouldn't get reclaimed for
> a week. 

No, just using your phrase from the example where you meant "for
quite some time".  By default, youngest generation collections are
triggered based on allocation and older generation collections are
triggered after 4 collections of the previous generation. It is
all user configurable, however.  The programmer can supply a function
(which can start a generation N collection or not or whatever) that is
called when the user specified allocation limited is exceeded.

> I've always found that moderately frequent full collections are
> necessary for the kind of environment I've been dealing with.)

They probably are necessary with only 2 generations.  This is why
timely finalization is so important (see below)

> > The next obvious thing to consider is:  how much is timely
> > finalization worth?  We decided that unless a programmer has some
> > guarantee about when finalization occurs, then the finalization
> > feature is essentially useless.  Guardians give the same guarantee
> > that the programmer has about when memory is freed by the collector
> > and guardians are affected by the same mechanisms that the programmer
> > has to force collections of particular generations at particular
> > times.  With topological finalization the guarantees are data
> > dependent (i.e.  they depend on how long the finalization chains are,
> > which can vary greatly from run-to-run), and, as such, are very
> > difficult for the programmer to reason about in general.
>
> But isn't that in general the argument for garbage collection? 
> Object lifetimes are data dependent and hard to predict, so the
> system should take care of deallocation.  As I pointed out in my
> example, required finalization ordering is also data dependent,
> which makes it difficult to program explicitly with guardians.  I
> think this data dependence is precisely the reason finalization
> ordering should be the collector's job.

But we already agreed that 80% of the time, finalization order is
independent (not data dependent) and, more importantly, there are some
cases (<10%) where the data dependent order is different from the
finalization order.  So, topological order gives you a mechanism that is
"natural" for 90% of the cases (where guardians gives it to you
"naturally" in 80% of the cases).  Both approaches require more work
in some of the other cases.  It not that topological ordering is a
silver bullet in 90% of the cases, its that it is a silver bullet in
10-20% of the cases, and is just one of the orderings which is
acceptable in the other 80% of the cases.  I really like "natural"
solutions when they work 100% of the the time (such as garbage
collection); they lose their luster quickly, for me, when they are not
the "right thing" in cases that you want to be able to support.

I think this just an area where we differ in opinion.  I prefer
a single less restrictive language feature to one that is restrictive
enough that it requires an escape mechanism for some applications.

I also think that we really have to leave open the question of which
approach is more "difficult" in the other cases until there is quite a
bit more experience with both approaches.

> I don't think there's a real difference in predictability.

I guess I wasn't clear.  Assume the programmer wishes to force the
finalization of all objects (you yourself noted that you found it
necessary to do maximum generation collections somewhat frequently). 
In our system with guardians, the programmer just does one maximum
generation collection to force the "finalization" of all objects in
the system.  In a system with topological finalization, however, the
programmer must know the maximum length of the longest finalization
dependency chain currently in the system, call it N, and then do N
collections to finalize all the objects.  This is inherent in the
topological finalization model:  you finalize the head of a dependency
chain and copy the rest as if it were alive.  Thus, you can finalize
only one element of a dependency chain per collection of the
generation the object is in.  We found that if you want to maximize
the performance of both the collector and assignments in a
generation-based collector, you have to collect generation N into
generation N+1 (yet another discussion I don't have time for, although
you have noted that you collector behaves the same), thus, in
high-performance implementation, this will generally require N maximum
generation collections rather than just N collections.

This is what I was trying to get at when I said that timely
topological finalization was data dependent while finalization with
guardians did not violate the "generational" principle that
generational collectors are based on.  Thus, my claim that
generation-based collectors and topological finalization were
fundamentally incompatible.  

I think this shows a marked difference in predictability.

> The difference is that objects reachable from finalizers are
> considered live for finalization purposes, as well as memory
> reclammation purposes.  You only consider them live for memory
> reclammation purposes (where you thus need 2 collections to reclaim
> the memory.)

This is a good point that I may not have made clear before.  One of
the design trade offs we made (and which is inescapable for systems
with "resurrection" semantics for objects) is that, although an object
will be placed on a guardian for finalization during one collection,
the storage for the object will not be reclaimed until after the next
collection of the generation the object is in (assuming the
finalizer(s) drop their pointers to the object).  We are trading off
increased memory usage for fully general programmer control over
finalized objects (in the same manner that any generation-based
collector trades off increased memory usage for faster collections of
younger generations).


Carl