[gclist] Finalization and object orientation.

Eliot & Linda elcm@pacbell.net
Fri, 28 Mar 1997 18:30:07 -0800


Hans Boehm wrote:
> 
> On Mar 28, 12:52pm, Eliot & Linda wrote:
> > I wrote:
> > > This issue was discussed in Mark Weiser's and my SP&E '88 paper.  Is x seen
> > > by the garbage collector during the call to g in
> >
> > > { T x; f(x); g("abc"); }
> >
> > > For a naive compiler the answer is "yes".  For a CPS-based compiler, the
> answer
> > > is "no".  For a typical compiler with global register allocation the answer
> is
> > > "maybe".
> >
> > Right. But which is more important, being able to rely on finalization,
> > or
> > speed?  Its certainly undecidable whether the performance gained by
> > optimizations that break decidable reachability outweigh those gained
> > through program simplifications made possible by decidable reachability!
> >
> > Take the non-termination example
> >       v := e. c(); e
> > where c may not terminate.
> > You're the expert here, but it seems to me that except in the above
> > example
> > the cost of providing decidable reachability, e.g. by nilling registers
> > is minimal.
> >
> For languages other than Smalltalk, we still have to agree on how we define
> reachability.  I think we're now talking about a
> definition based on the transitive closure of the points-to relation, possibly
> because of reflection in the language.  Assuming we can define the points-to
> relation, this still depends on the root set.  It sounds like in Smalltalk this
> includes all in-scope variables plus temporaries on the evaluation stack.
> SML of NJ very explicitly makes a different choice, because they found that
> this could cause too much memory to be retained.  The definition based on
> continuation-passing-style avoids treating dead variables as roots.

What I think we're converging on is that if a language hopes to provide
a reliable finalization facility then it must define variable liveness.
So in
	{ T x; f(x); g("abc"); }

the language could define variable liveness as from the first binding of
a
variable to its last occurrence.  In reflective languages with reified
execution
state that choose a simplified execution model for pedagogical reasons
like
Smalltalk we might tolerate an inefficient but simple definition that
the
variable is reachable as long as its enclosing activation is reachable.

Church-Turing undecidability means that in general we have to accept a
naive lexical definition of liveness if we want decidable reachability.
E.g. in
	{ T x; f(x); c(); f(x)}
the third occurrence of x has to be assumed to be live for c's of any
significant algorithmic complexity.  (And in a late-bound language like
Java
most c's are unknown at compilation time.)

If the reachability of the the second occurrence of f(x) is undecidable
then its the programmer's responsibility to remove the reference to x,
because the compiler can't do the necessary proof.  Hence one would
require
tools to identify unreachable garbage (as one often does).

> Our conservative collector effectively does something incomparable to the
> Smalltalk approach.  Some other memory may be retained.  But some dead
> variables are not scanned.
> 
> If we pick a definition, we can then ask about the cost of sticking to it
> and guaranteeing that memory usage will never exceed the total reachable
> storage size by more than a constant, and objects become finalizable as
> expected.  I would guess the cost varies a lot, depending on the circumstances.
>  My guess is that the cost of clearing variables when a scope is exited is not
> huge, but significant.

In a traditional stack implementation it may be very cheap, since the
stack trace
is bounded by the base of the stack and the stack pointer.  But you're
right
that there are hidden costs because one may have to nil memory on
function entry.
Tricky.  And in a concurrent collector I can see that one is faced with
serious
difficulties.

>  The cost of not reusing registers containing dead but
> in-scope variables is probably higher.  I'm quite sure that neither of them
> would be considered acceptable by the compiler group here.

One could squander a register on an argument count or live register
count.
Wouldn't this have minimal cost on most 32-register RISCs, given that
most
calls have few arguments?  I'm out of my depth when thinking about
global
register optimizations.  I can believe that on ISAs like Intel there can
be significant pressure on registers (there is in our Smalltalk engine)
but I find it harder to believe that of standard 32 register RISCs.  But
I'll take your word for it.  But if you consider e.g. a register-poor
machine then the benefits of using registers for globals would be
out-weighed
by the the benefits of using available registers in inner loops, so no
register would be used for a single purpose through-out the entire
program,
unless that program was very simple.

> 
> There are cases in which the cost involves much more than clearing variables,
> even with the Smalltalk definition, though I'm not sure how much of a problem
> that is if you only want finalizability "guarantees".  For a memory usage based
> example, consider in a language that supports bignums, with builtin
> exponentiation:
> 
>         x := 17**n mod 217; f(); y := 17**n;
> 
> 10**n can no longer be treated as a common subexpression, since doing
> so can change the space complexity of the program.

Right.  And again it becomes the programmer's responsibility to write
	v := 17 ** n; x := v mod 217; f(); y := g(v)

One would want to take a Python-esque approach here and have the
compiler
emit an explanation of its inability to do c.s.e. here.
 
> My impression from Andrew Appel's group is that maintaining asymptotic space
> complexity (with the CPS definition of reachability) is neither free nor
> intolerably expensive.  It does inhibit some optimizations, and make it much
> harder to get the compiler right.

That makes sence to me.  Any references?

In summary it seems pointless to allow poor
language definition and optimization to prevent one from relying on
such a useful facility as finalization unless one has no choice.
If I may be so bold I'd say that your papers on conservative GC
used with stock C and C++ environments show how one can do an
excellent job in the face of adversity.  The Java situation is different
sicne it has GC as part fo the language.  My feeling is that the
benefits
(to programmers and program reliability) of providing decidable
reachability
for Java would out-weigh both the loss of optimization opportunities and
the difficulties imposed on implementors.

> > > There has been a significant amount of work on this kind of issue,
> especially
> > > in the context of ML and logic programming languages.  (E.g. there was a
> paper
> > > a few years back claiming significant space improvements in ML by treating
> > > variables and data structure components with unconstrained type as dead.
>  Since
> > > their type is unknown, they can'e be used for anything.  They can be safely
> > > reclaimed, in spite of the fact that this leaves "dangling" pointers.
>  Appel is
> > > careful to define a CPS-based notion of reachability for SML of NJ in order
> to
> > > avoid some problems they observed with early versions of their system.)
> >
> >
> > Are you referring to
> >       Benjamin Goldberg: Tag-Free Garbage Collection for
> >       Strongly Typed Programming Languages. 165-176, PLDI 91
> > (right next to your Mostly Parallel paper - shades of Princes Bride :))?
> >
> > Seems a little different, but with the Tag-free scheme in the above,
> > ignoring the non-termination case, reachability is decidable from all
> > call-points in the program, no?
> >
> 
> Unfortunately, I don't have my Proceedings handy.  I think I'm referring to a
> later paper that expanded on this idea.  If nobody else volunteers the
> reference, I'll track it down later.
> 
> Effectively this scheme uses yet another definition of reachability, which is
> an approximation of the one in the Java definition.  The type-based notion of
> reachability is decidable.  But I don't think it's something you want to put in
> a language definition.
> 
> Hans
> 
> --
> Hans-Juergen Boehm
> boehm@mti.sgi.com

Appologies for wading-in so ignorantly earlier.  This has been more than
interesting.
Thanks.
_______________,,,^..^,,,_______________
Eliot Miranda, ParcPlace-Digitalk