[gclist] Destructor FAQ

Robert A Duff bobduff@world.std.com
Sat, 16 Mar 1996 07:20:38 -0500


> I think that's an interesting analogy.  Finalization code in a language without
> constraints on finalization ordering essentially has to be prepared to see NIL
> anyplace.  And that unfortunately doesn't just mean code inside finalization
> procedures; it means code inside the procedures they call.

True.  But *I* wasn't advocating such a language.  In Ada and C++,
finalization order *is* well defined.  And I think if you add garbage
collection (*some* Ada and C++ compilers do support it), you can use
more-or-less the same ordering rules.  The garbage collector might
decide to finalize some stuff earlier than would happen otherwise.
However, when finalizing a bunch of stuff together, the order should be
well defined.  If you base this order purely on the pointers being
followed, then you have a problem with cycles.  But if you base it on
the declaration order of pointer *types*, as in Ada, then it's not as
big a problem.  (This order turns out in practice to be *almost* the
same as the order you would get by following pointers.)  You still can
run across NIL's, as it were, but you can minimize the cases in which
that happens, and know something about where it can happen.  What you
said above, about running across NIL's anyplace, is not true.

> And it's a little worse than that, since the language doesn't require that
> finalized objects be clearly recognizable like NIL; the program has to ensure
> that.

Granted.

> Perhaps I should restate my previous request to keep the terminology straight:
> Let's not dicuss "synchronous" destructors as in conventional C++.

Well, I'm still confused by your request.  Ada has something called
"finalization", and C++ has something called "destructors", and they're
both essentially the same thing, as far as I can see.  Are you insisting
that the correct term is "destructor"?  Sounds sort of like a C
programmer telling a Fortran programmer: stop calling those things
"subroutines"; the correct term is "void functions".  :-)  Is there some
precedent that Ada violated by calling it finalization?  I only know
about Ada and C++ w.r.t. destructors/finalization.  (Ada is guilty of
confusing/non-standard terminology in other areas -- e.g. Ada calls
pointers "access values" (yuck).)

Also, I don't really understand the distinction you're making.  I only
see one feature, call it what you want.  Heap-based objects have a bit
looser definition of when things happen, so what?  Integers on the heap
have a less well-defined lifetime than integers on the stack.  There's
still only one feature called "integer types".

Please explain.  You *seem* to be saying, if the order is well-defined,
then it's not interesting, and has nothing to do with GC, but that the
order *should* be well-defined.  So *none* of this has anything to do
with GC.  Obviously, I misunderstand you.

>...I've never
> missed them in a garbage collected language.  Since they occur at well-defined
> points, they are essentially syntactic sugar for a procedure call I could make
> explicitly.

Not easily.  That is, not without breaking abstractions.

>...And they don't have all that much to do with garbage collection
> (except that they are one of several hooks which together can be used to
> implement slow but portable GC).
> 
> The interesting issue is what happens when the garbage collector discovers that
> an object with a clean-up action has been dropped.  That's something the
> programmer can't handle without language or library support, and thus needs to
> be addressed.  (Integration of a synchronous mechanism with this mechanism is
> an issue, but I think it's very language specific.)

It is indeed language specific.

> "Can anybody think of a realistic example in which a cyclic structure has
> some pieces with finalization?  I can't right now."
> 
> The cycles are usually accidental and some of the links are irrelevant to
> finalization.  The canonical example (due to Chris Jacobi) is some object P
> with a finalizer (e.g. some window system resource)  that also happens to have
> an attached property list that can store arbitrary (name, value) pairs.  In a
> large system, you might easily get a path from one of the values back to P.
> And no individual module has enough information to determine that there might
> be such a path.

In Ada, you would probably define the pointer-to-window type after the
property list, so when you drop the whole cycle, the window will be
finalized first.  The pointers leading from window to items on property
list are, as you say, irrelevant to finalization.  So finalization of
the window wouldn't follow them.  This seems quite natural to me -- it
seems no different from a recursive tree walk -- if you hand it a cyclic
structure, it will loop infinitely, and there's not much a language can
do to prevent it.  If finalization *does* follow irrelevant pointers,
and gets back around the cycle, something will probably have been set to
"null" or "closed" or some such thing, and the bug, if it is a bug, will
be nowhere near as nasty as a dangling pointer.  GC tries to attack
dangling pointers, and storage leaks -- nobody ever said it has to
attack every imaginable type of bug.

Note that if you don't have a garbage collector, and you never
explicitly deallocate this stuff, then it will get finalized just before
program exit (assuming we're talking about the global heap).  Well, the
same issue of ordering applies then.  And Ada defines that order (a
partial order, actually), even in the presence of cycles.  (I presume
C++ does, too, but I'm less familiar with it.)  And the definition is
not based on chasing pointers around at run time.  (Note: In Ada and
C++, you can create cycles purely in the stack.  Again, finalization has
a well-defined order, and it's not based on chasing pointers.)  This is
why I'm confused by your assertion that when the GC happens to be the
thing invoking the finalization, it's a completely different feature,
and should be called by a different name.

> Of course this is easily fixable, since the finalizer presumably doesn't need
> the property list.  You could split P into P' and R, with P' pointing to R, but
> not the other way around, and R containing the information needed for
> finalization.  P' would contain the property list pointer.  All outside
> references would be to P', so the cycle no longer involves the finalizable
> object.  Or you can do the same thing with post-mortem finalization.  If you
> don't want to split the object, you can get the same effect by adding
> primitives similar to those in our collector.  Or ...

Or just define the order.

- Bob