[gclist] Finalization and death notices

Jerrold Leichter jerrold.leichter@smarts.com
Mon, 8 Oct 2001 15:39:00 -0400 (EDT)

| This seems to come back to the same old debate about GC for
| C++ -- to finalize or not to finalize, and if so how?  We
| go round and round looking for the "right" way, whereas java
| just went ahead and did it wrong, and users soon learned
| that finalizers were nearly useless.  And lacking destructors
| they learned to manage resources with finally clauses.

Which actually brings us back to a fundamental point that gets lost in too
much of the discussion of C++, destructors, and finalizers:  The fundamental
difference between lexically and dynamically determined lifetimes.

When C++ programmers complain that finalizers don't run immediately, what they
usually have in mind is the "initialization is resource acquisition" (hence,
the "destruction is resource deacquisition") idiom.  However, this idiom
really only makes sense for stack-allocated objects.  Like a finally clause,
IRA/DRD deals with the need to clean up resources that live only in a
lexically defined scope - but a scope with multiple exits, often because of
the possibility of exceptions escaping the scope.  No one has proposed using
gc for stack-allocated objects - and if a compiler can determine the lexical
lifetime of a heap-allocated object, it can certainly arrange to have it
deleted directly, without gc involvement.

Dynamically created objects may well have references to things that need
to be cleaned up, but that's an entirely different thing.  The delete operator
in C++ is overloaded:  It means *both* "run the destructor" and "free the
underlying memory".  The second of these operations in invisible to a C++
program.  I've suggested in the past that, if you want to be friendly to both
C++ idioms and gc, you need to transparently decouple these two operations.
To do this, you require that the implementation keep track of whether a block
of memory has been destructed.  Then the user-visible "delete" operator becomes
just a destructor call; the actual memory may be freed later by the collector.
The collector has access to the "has been destructed" flag and thus can avoid
running the destructor twice.  If you care when the destructor runs, use
delete (as you do now.)  In correct - according to the current C++ definition
- code, you can't tell whether delete frees the memory immediately, or waits
until later:  Any access to a deleted object has undefined semantics.  Any
visible consequences of delete are necessarily the result of running the

For an object with a vtable, a "has been destructed" flag is free:  Just
change the vtable pointer to something invalid.  (Better, change it to point
to a valid-looking vtable all of whose entries point to a routine that prints
a "member function of destructed object invoked" message and aborts.  This will
catch many, though not all, errors.)  Also, every implementation carries along
an actual allocated length for an object, and the bottom bit of that length is
pretty much always 0.  It's a perfectly good flag.  Since you can't delete one
element in the middle of an array, such a length is always available.  (You
*can* explicitly destruct one element of an array, and abuse of such an opera-
tion would not necessarily be protected against, any more than it is now.)

Given an implementation like this, if you do reference counting, it will work
just fine.  When your reference counter goes to zero, you call delete and the
object is immediately destructed, just as it is today.

A gc intended to be used in this way might even even provide a nice optional
sanity check:  When doing a delete, it could first do a gc, keeping track of
the number of live pointers to the object being deleted.  If it's more than
one ... error.  (Yes, there are subtleties because of legitimate copies of the
pointer on the call stack.  This is an area where generality isn't important,
however - if you can come up with a calling mechanism that works - e.g., you
must pass in a pointer to what should be the only pointer to the object - then
the rare and specialized code that wants to use this feature can do so.)

							-- Jerry