[gclist] Guardians

Hans Boehm boehm@hoh.mti.sgi.com
Thu, 10 Apr 1997 22:32:07 -0700

On Apr 10,  8:30pm, Carl Bruggeman wrote:
> Subject: Re: [gclist] Guardians
> --------
> > I think this is the heart of the disagreement.  Topological ordering
> > relies on the assumption that if something is accessible from the
>                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > it may be referenced, and thus should be preserved, i.e. not finalized.  A
>   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> This may indeed be a significant part of the problem.  In our system,
> a pointer chain of guardian protected objects (even a circular chain)
> that becomes inaccessible will have all of its protected objects
> placed on their respective guardian queues during the same collection.
Right.  I don't consider that desirable.  But it's also basically what Java
does.  I don't think it's a disaster, but it's suboptimal.

> > > I generally favor
> > > erring on the side of putting power in the hands of programmers rather
> > > than tying their hands to save them from themselves.
> > So do I, up to a point.  I haven't been convinced that being able to see
> > post-finalized objects (without finalizer cooperation to make them
> > again) is any more of a useful feature than being able to look at
> > storage, which I could generally do without.  If the finalizer deallocates
> > malloc'ed objects, they are THE SAME.
> Ah, another part of the problem.  Guardians do not provide
> "finalization".  They provide some useful information from the
> collector, which can among other things be used to implement
> finalization.  When an object is placed on a guardian queue (or
> queues) during a collection, the collector is informing the program
> that that the object registered with the guardian became inaccessible,
> except via a guardian pointer(s), during the collection.  After all
> guardian protected objects are placed on their respective queues, all
> of the objects reachable from these objects are traced and copied.
> Thus guarded objects and all objects reachable from them are "normal"
> after being placed on a guardian queue and guardians can implement any
> type of service, not just finalization; there are no restrictions on
> what can be done with objects retrieved from a guardian; the object
> becomes inaccessible (and its storage is reclaimed) only when there is
> no strong pointer or guardian pointer to the object.
I think we're splitting hairs here.  The Cedar finalization mechanism also
places objects on queues.  Conversely, in a collector like ours, a procedure is
run (outside the collector), but often its only action will be to enqueue the
object.  The question is whether an object reachable from another guarded
object should be finalized/enqueued.  I've been arguing that it shouldn't.

I wasn't careful enough to explain the last sentence.  I was not claiming that
garbage collected objects can be prematurely deallocated leaving dangling
pointers.  But I strongly suspect that the most common use for finalization in
exisiting systems is to explicitly return resources allocated from third-party
libraries (which are typically written in C or C++).  This involves a call to a
"deallocate" procedure in the third-party library.  Typically this call will
explicitly deallocate memory (and possibly do other things).  Thus subsequent
calls into the library involving the deallocated resource are likely to result
in dangling pointers.  This is the kind of dangling pointer problem you
encounter with non-topological-order finalization.
> > The default behavior should be to err on the side of safety.  In the
> > of topologically ordered finalization, you (on rare occasions) do need a
> > mechanism that corresponds to explicitly clearing pointers.  That allows
> to
> > use other orderings when you have to.  (In the version implemented in our
> I believe that this confirms that topological ordering is not always
> the correct order.  You explicitly clear pointers to change the order
> of finalization (requiring global knowledge of the dependencies on the
> part of the programmer to do so), while we advocate using multiple
> guardian queues to handle finalization dependencies.
The right ordering is the topological ordering minus some of the dependencies.
In nearly all cases, the topological ordering itself is sufficient.  The
advantages of this restriction instead of explicitly programmed ordering are:

1) At least in our collector the pointers are actually cleared, not just
ignored.  Thus you can't get the kind of dangling pointer problems I described

2) Both schemes handle the case were there are no dependencies.  Topological
ordering also handles 90% of the remainder without programmer intervention.

3) If you mess up with topological ordering you have a memory leak.  Cycles
don't get finalized, or long chains get finalized too slowly.  The collector
detects such cycles without even trying.  Thus it's trivial to warn the user.
 If you mess up in one of the unordered scheme, you potentially get memory
smashes, which are hard to diagnose.

> > > Using guardians I would simply register type A objects with one guardian
> > > and type B objects with another guardian, and always finalize all objects
> > > on the type A guardian before finalizing objects on a type B guardian.
> > > The information is not global; it can be limited to the implementation of
> > > the A and B types.
> > Yes, but A and B may have been written by different people.  The
> of
> > A (some random user) shouldn't need to know about the implementation of B
> > (strings).  In fact B may refer to C, which refers to D, which requires
> > finalization.  The necessary information is global.  Equivalently, you have
> > to widen interface definitions to mention finalization constraints.  (This
> Its not global. A and B interact, B and C interact, and C and D interact.
> A does not need to know about C and D.
I disagree.  If my finalizer uses a string, I should not need to know whether
strings potentially require finalization.  I don't know whether you call that
global, but it's certainly not local enough for my taste.

If A's finalizer invokes a method on B, which invokes a method on C, which
invokes a method on D, then A does need to know about D.  A must be finalized
before D.
> > I also don't quire understand how your solution would work.  The type B
> > finalization can trigger another collection, enqueuing more type A and B
> > objects.  The newly enqueued type B objects will then be finalized before
> > theire corresponding type A objects.
> I don't understand your misunderstanding...  Perhaps you didn't
> realize that there are separate queues for A types and B types, and
> that any A-B chains or cycles will be placed on their queues during
> the same collection.  Also, you can't blindly go through the entire B
> queue; after each B you recheck the A queue.  This is a good example
> of the flexibility you get with guardians -- its all under programmer
> control.
This sounds highly nontrivial, especially in the presence of a concurrent
collector.  I check that the A queue is empty, the collector runs, and I then
empty the B queue.  Again things break.

> BTW, how do you handle finalization cycles?  Is that where you need
> pointer clearing?  If so, what happen if the programmer doesn't break
> the cycle by clearing pointers correctly?  Infinite loop?
Certainly not an infinite loop.  The cycle is not finalized, and the user may
be warned.

> > You also seem to be relying on the fact that a type B object will not be
> d
> > inaccessible before its corresponding type A object.  Since A points to B,
> > that's often true.  I'm not convinced it's true for our garbage collector.
> > can certainly contrive collectors for which it's false.
> I think I missed something here again.  This seems to be an assumption
> for both of our collector.
Consider a reference counting collector which incrementally traverses objects
with 0 reference count, adding them to appropriate queues if they are
guarded/finalizable.  It will add A to a queue before B.

I'm not arguing for reference counting.  But this does point out that you need
a somewhat nontraditional property from the collector.  I'm not sure whether or
not this is currently true of our conservative incremental collector.  It may
be, but it would certainly require a nontrivial proof.

> There are examples in our paper.
I couldn't immediately find one, other than the reference to cycles.  But they
are not an argument to push the whole problem back onto the programmer.

> > There are millions of lines of code written with topologically ordered
> > finalization.  The only known deficiency is that you would on rare occasion
> > like to ignore some pointers for finalization ordering.
> Exactly!  Any of your applications where you had to explicitly clear
> pointers because the topological order was not correct is a supporting
> datum for our claim that topological ordering is not always desirable
> and that some sort of user control is necessary (for you pointer
> clearing; for us guardians).  Which is more "dangerous" or relies on
> more "global" information is, to a large extent, a matter of taste.
I disagree with at least the "dangerous" part.  In the absence of
"resurrecting" finalizers, one allows access to post-finalized objects, and the
other doesn't.


Hans-Juergen Boehm