[gclist] Finalization, collection, etc.

Robert A Duff bobduff@world.std.com
Fri, 15 Mar 1996 09:56:48 -0500


Are you trying to figure out how to retrofit finalization into a
language that has no stack-based variables?  Or are you trying to design
the perfect language that has GC and also finalization?  The latter
seems more interesting to me.  If the latter, then the answer is simple:
provide a stack-based kind of variable, in addition to heap-based, gc-ed
ones.

> In most languages today, we use static scoping, so only execution location, not 
> time, determines visibility.  Give this, the two attributes are, in p
> principle, decoupled:  Visibility is tied to space, lifetime to time.  However, 
> in languages with stack allocation, *for the most part*, we bind the two 
> together:  A variable's value remains live exactly as long as the variable 
> itself remains visible (i.e., the stack frame it's in continues to exist).  For 
> the most part, we specify visibility, imputing lifetime.  (The ALGOL OWN 
> declaration - or the C static storage class for an object declared within a 
> lexical scope - explicitly define lifetime.  In these, and in most cases, the 
> only lifetime you can specify is "until the program exits".)

It's not really true that visibility=lifetime, for stack objects.  In
any language with modules, you can typically restrict visibility to a
small subset of lifetime.  E.g. Ada and Modula-2.  A variable in a
package lives as long as the package, but it is not visible everywhere
the package is visible -- only inside the package (assuming it's not
exported).  Algol's OWN and C's static are crude mechanisms that more or
less do the same thing.

Clearly, visibility has to be a subset of lifetime (it doesn't make
sense for a variable to be visible if it doesn't exist).

In C++, a stack-allocated object might be visible, but its components
are not (necessarily), except when executing inside a method of the
class.

> The notion that lifetime and scope are one and the same is so pervasive in 
> traditional languages that people don't even realize that the two notions are 
> distinct.  C++ constructors and destructors for stack-allocated objects, for 
> example, are defined with this as the underlying assumption:  The constructor is 
> called exactly when the object 
> it constructs comes into scope; its destructor is called exactly when that 
> object goes out of scope.  All sorts of C++ techniques rely on this close 
> connection - e.g., classes that take out a lock on construction and release the 
> lock on destruction.

Yes, and this stack-like behavior is nice and simple.  Seems like a Good
Thing, to me, compared to the elaborate scheme described below.

> Dynamically allocated objects in traditional, non-GC languages have a problem:  
> Explicitly allocated memory has a lifetime that extends to its deletion point.  
> This is completely independent of scoping, which is exactly why you can get both 
> memory leaks and dangling pointers.

Right.  But you seem to be drawing a disctinction between stack/non-gc
languages, and heap-only/pure-gc languages.  But there's no reason why a
language can't have stack-like variables, and *also* have a
garbage-collected heap.  In fact, that's what *I* want in a language.

C makes some trouble for GC because it allows pointers into the stack,
and pointers into the middle of variables.  Ada does also, but (1) you
have to declare the pointed-at variable "aliased", and (2) there are
rules that prevent, or at least minimize, pointers from outer stack
frames to inner stack frames.  Pascal makes it even simpler, by
forbidding pointers into the stack.

> Languages with GC tie lifetime to accessibility, and accessibility is in turn 
> the transitive closure of visibility.  *In principle*, the lifetime of an object 
> ends the moment it is no longer accessible.  The problem is that, unlike 
> visibility from a stack frame, accessibility is usually expensive to check.  
> Only reference counting can practically detect loss of accessibility the moment 
> it occurs (at least in the acyclic case).

Well, sort of.  I prefer to look at it this way: an object has
potentially infinite lifetime.  If the implementation can prove that a
given object will no longer be referenced, then it can reclaim the
object.  Garbage collection can't predict the future perfectly -- it
necessarily uses a conservative method.  Just because an object is
accessible via pointers does *not* imply that it will be referenced
later.

> Consider again C++ destructors for stack objects.  They execute only at block 
> end, even in cases where one could detect that the object has become inacces-
> sible earlier (i.e., just after the last reference to the variable in its 
> declared block).  So C++ destructors don't execute "as soon as needed" either; 
> in fact, a reference-counted implementation might very well execute finalizers 
> much earlier than C++ would!

Good point.  But to me, that implies that such a reference-counted
mechanism is necessarily broken.  Sometimes, a finalizable object is
declared *purely* for its finalization, and it is *never* referenced.
But it's important that it doesn't release the lock or whatever
prematurely.

> What the C++ design gives the programmer is a clear synchronization point:  At 
> the end of the block, all destructors "pending" for the block are executed.  
> This seems natural, again, because in C-like languages we identify visibility 
> with lifetime.

No, I don't think it's because we "identify visibility with lifetime".
We don't, and anyway, it's got nothing to do with visibility, and
everything to do with lifetime.  It's because lifetime is FIFO that we
have a clear synch point.

>...  For non-stack allocated objects, the delete statement is the 
> (perhaps incorrect, as there may be other pointers) synchronization point.
> 
> In the general GC case, there are no obvious synchronization points.  One could, 
> I suppose, call the GC at the exit from each block - recovering exactly the C++ 
> destructor semantics for objects that were only accessed from locals within the 
> block.  Combining a generational collector with appropriate compiler 
> optimizations (escape analysis), this could probably even give decent 
> performance.  For the general case, where the actual access pattern isn't really 
> stack-like, there is no obvious synchronization point.  The programmer will have 
> to provide one.  Using a delete statement as a hint - it not only kills the 
> reference to the object, but if the object has a finalizer triggers a GC "very 
> soon" to see if the finalizer needs to be called - will work, but puts the onus 
> on the *user* of the abstraction, not is implementor.  The way to put it back on 
> the implementor is to allow him to declare the object as "needing quick 
> finalization".  Any time a pointer to such an object is deleted, a GC is 
> triggered "very soon".  (Of course, we could just say that having a finalizer 
> implies this treatment!  Simpler, but less flexible, since there are probably 
> many finalizers that *don't* need special treatment.)  In a statically typed 
> language where this flag is an attribute of the type, this can be implemented 
> with little overhead.  (It's much more expensive in other cases.)

It seems strange to let the gc decide whether finalization is needed,
based on whether the thing is reachable.  If it's reachable, that's
probably a bug in my program.  It seems to me that stack-allocated
finalizable objects are far easier to deal with.  It still leaves open
the question of what to do about heap-allocated ones, but the answer can
be, "I don't care too much, because I generally put finalizable stuff on
the stack."

> Of course, any such triggered GC'ing is unlikely to be compatible with 
> incremental, much less real-time, collectors.  Reference counting seems to me 
> the only hope in such cases.
> 							-- Jerry

- Bob