[gclist] Finalization and object orientation.

Hans Boehm boehm@hoh.mti.sgi.com
Fri, 28 Mar 1997 15:59:29 -0800

On Mar 28, 12:52pm, Eliot & Linda wrote:
> I wrote:
> > This issue was discussed in Mark Weiser's and my SP&E '88 paper.  Is x seen
> > by the garbage collector during the call to g in
> > { T x; f(x); g("abc"); }
> > For a naive compiler the answer is "yes".  For a CPS-based compiler, the
> > is "no".  For a typical compiler with global register allocation the answer
> > "maybe".
> Right. But which is more important, being able to rely on finalization,
> or
> speed?  Its certainly undecidable whether the performance gained by
> optimizations that break decidable reachability outweigh those gained
> through program simplifications made possible by decidable reachability!
> Take the non-termination example
> 	v := e. c(); e
> where c may not terminate.
> You're the expert here, but it seems to me that except in the above
> example
> the cost of providing decidable reachability, e.g. by nilling registers
> is minimal.
For languages other than Smalltalk, we still have to agree on how we define
reachability.  I think we're now talking about a
definition based on the transitive closure of the points-to relation, possibly
because of reflection in the language.  Assuming we can define the points-to
relation, this still depends on the root set.  It sounds like in Smalltalk this
includes all in-scope variables plus temporaries on the evaluation stack.
SML of NJ very explicitly makes a different choice, because they found that
this could cause too much memory to be retained.  The definition based on
continuation-passing-style avoids treating dead variables as roots.

Our conservative collector effectively does something incomparable to the
Smalltalk approach.  Some other memory may be ratained.  But some dead
variables are not scanned.

If we pick a definition, we can then ask about the cost of sticking to it
and guaranteeing that memory usage will never exceed the total reachable
storage size by more than a constant, and objects become finalizable as
expected.  I would guess the cost varies a lot, depending on the circumstances.
 My guess is that the cost of clearing variables when a scope is exited is not
huge, but significant.  The cost of not reusing registers containing dead but
in-scope variables is probably higher.  I'm quite sure that neither of them
would be considered acceptable by the compiler group here.

There are cases in which the cost involves much more than clearing variables,
even with the Smalltalk definition, though I'm not sure how much of a problem
that is if you only want finalizability "guarantees".  For a memory usage based
example, consider in a language that supports bignums, with builtin

	x := 17**n mod 217; f(); y := 17**n;

10**n can no longer be treated as a common subexpression, since doing
so can change the space complexity of the program.

My impression from Andrew Appel's group is that maintaining asymptotic space
complexity (with the CPS definition of reachability) is neither free nor
intolerably expensive.  It does inhibit some optimizations, and make it much
harder to get the compiler right.

> > There has been a significant amount of work on this kind of issue,
> > in the context of ML and logic programming languages.  (E.g. there was a
> > a few years back claiming significant space improvements in ML by treating
> > variables and data structure components with unconstrained type as dead.
> > their type is unknown, they can'e be used for anything.  They can be safely
> > reclaimed, in spite of the fact that this leaves "dangling" pointers.
 Appel is
> > careful to define a CPS-based notion of reachability for SML of NJ in order
> > avoid some problems they observed with early versions of their system.)
> Are you referring to
> 	Benjamin Goldberg: Tag-Free Garbage Collection for
> 	Strongly Typed Programming Languages. 165-176, PLDI 91
> (right next to your Mostly Parallel paper - shades of Princes Bride :))?
> Seems a little different, but with the Tag-free scheme in the above,
> ignoring the non-termination case, reachability is decidable from all
> call-points in the program, no?

Unfortunately, I don't have my Proceedings handy.  I think I'm referring to a
later paper that expanded on this idea.  If nobody else volunteers the
reference, I'll track it down later.

Effectively this scheme uses yet another definition of reachability, which is
an approximation of the one in the Java definition.  The type-based notion of
reachability is decidable.  But I don't think it's something you want to put in
a language definition.


Hans-Juergen Boehm