[gclist] Re: Mark don't sweep

Amit J Patel amitp@Theory.Stanford.EDU
Thu, 9 May 1996 14:17:16 -0700


Charles Fiterman wrote to me on May 8:
> > Thanks!  I'll probably use this in my system!  :) It even seems to
> > support destructors/finalizers.  I can call the destructor for each
> > destroyed object during the next allocation pass, which acts as the
> > `sweep'.  Destructors aren't all called at once, so it's not going to
> > be a noticable slowdown, either.
> 
> Remember that finalizer objects get used as roots without being marked
> themselves. This allows finalizer dependencies to work. An addditional
> gimmic in addition to a mark() method that marks the stuff an object
> uses, a markFinal() that marks the stuff an object's finalizer uses.

Here's my design for an OO language w/finalizers:
(Note:  I'm *not* actually implementing this language; I'm going to
 use mark & don't sweep for something entirely different.)

	(Private) fields for the class are stored as ML-style records:
		private a, b, c : int
	==>
		{a:int,b:int,c:int}        ( {} is the ML record syntax )

A finalizer can be registered on an object.  It is a closure over the
fields of the object.  It does _not_ get the object itself.  (The
object is considered "dead" and cannot be resurrected.)  

So in this system, the finalizer is alive because it is registered
with the GC.  The fields are alive because the finalizer is a closure
which refers to the fields as free variables.  The GC doesn't have to
mark the object fields differently.

(Mutability is taken care of in ML by having `ref' variables, so if b
is mutable, it is declared as `int ref'.  The fields of the object are
then immutable; when you want a mutable field, you have an immutable
pointer to a mutable ref.  When the object dies, the refs are still
alive.)

The finalizer is a function that is invoked at some point after the
object is dead, but the time at which it is invoked is not guaranteed.

> Again I don't approve of finalizers. First a garbage collector deletes
> objects when they can't effect the program. A finalizer effects the
> program. This is original sin.

I have a slightly different view of the finalizers:  I view them as
functions that must be run before the program ends, but the time at
which they are executed is not determined.  The only guarantee is that
the function is executed between the point that the object becomes
garbage and the end of the program.

The object can't affect the execution of the program, so the garbage
collector deletes it.  The finalizer is a separate entity:  it can
affect the program, but it is going to be run anyway.  The GC just
happens to run it earlier than the end of the program.

Yes, it introduces non-determinism, but if you have non-determinism
anyway, then this isn't a problem.  :)  (I'd probably have a thread
that sits there to run finalizers.  After GC completes, it'd move
finalizers that need to be run from the registered finalizer list to
the ready to run queue.  The GC would then return immediately, and
finalizers run in the background.)

> Second finalizers must me prompt and sure. 

If you need to do something promptly, I think it's unwise to wait for
the next GC cycle (for non-ref-count GC).  There may not even be
another GC.  You have to have some other mechanism -- not GC-triggered
finalizers.

> If you have a chain of finalizer objects they only get killed at
> the rate of one per collection cycle. 

True.

(I don't expect any deep chains in my system.)

> A loop of finalizer objects can never die.

True.

I'd call it a `programming error' to write a program that has a loop
of finalizer objects.  ;) I don't have enough experience with
finalizers to make this judgement, however.  :( 

My intended use of finalizers in this OO language, however, is not for
loops: it is for `containment' of objects.  If a Car contains a Tire,
then the finalizer for Car might do something to Tire.  It's not for
`refers to' fields of objects.  If a Car refers to its Owner, the
owner is not part of the car; the finalizer should not do anything to
the owner.  In such a system, there can be no loops, and chains can be
bounded statically (because no object would contain an object of the
same class, even though it may refer to an object of the same class).

Perhaps the language should distinguish between these two types of
object fields.  HAS-A relationships would be acyclic, and the
contained object could be contained in only one other object.
REFERS-TO relationships could be arbitrary.

> In a reflexive language, I hope thats your intent, finalizers belong
> on top of the object system using it. There are choices of how to
> prevent dependency loops. Reference counting is ideal for finalizer
> objects because if you can prevent loops it is prompt and sure. 

My `other project', which is where I will probably use mark & don't
sweep, will use finalizers for external resources.  All the finalizers
will be in the implementation language (C++), but not in the
implement_ed_ language.  I expect to have files, communications ports,
and maybe windows on the screen in the implemented language, and the
implementation will have finalizers for these.  This way, I can
guarantee that there are no loops.  :-)

> I hope you are interested in partial evaluation technology. I think
> any real improvments in language design will come from there. 

I'm not sure what partial evaluation can do for finalizers.  (Did I
miss something?)

	- Amit

-- 
Amit J Patel, Computer Science Department, Stanford University
             "The works of a lone architect are more elegant 
              than those attempted by several together."    - Pascal