[gclist] Guardians

Carl Bruggeman bruggema@ranger.uta.edu
Sun, 13 Apr 1997 12:34:07 -0500

> On Apr 11,  9:29pm, Carl Bruggeman wrote:
> > I am beginning to believe that we have completely different targets
> > for finalization.  You apparently use finalization to deallocate C++
> > objects and find topological ordering very useful given the contraints
> > of C++ and its destroy methods, while I have viewed finalization only
> > as a means of managing externally allocated resources using the
> > collector, and thus have never had any need for topological
> > finalization.
> I should have responded to this earlier.  I do not understand what C++ has to
> do with this.  To the extent that there is a C++ standard, it says nothing

I said "apparently" only because I have often seen references by you
and other to finalization _methods_, and I assumed that these were C++
methods, and hoped that you would clarify this point if I was
incorrect (and you have clarified things, for me at least, with your
succinct suumary here).

> My concern is that
> a) Nonresurrecting finalizers generally leave objects in an unusable state.
>  (E.g. closed files, objects explicitly deallocated by an OS library or a
> library you bought from a third party.  In my experience such libraries tend 
> be written in C abd C++.  Even if they were written in Scheme, they might
> explicitly deallocate objects and keep them on an internal free list to reduc
> GC overhead.  If the objects are large, that typically even makes sense.)
> b) Accessing such objects after finalization is bad news.
> Thus
> c) The finalization facility should make it hard to accidentally
> access finalized objects.  If I have to write subtle code to get
> finalization order between a file buffer and the underlying file
> right, in my opinion something is wrong.  If the scheme breaks with
> a concurrent collector, something is wrong.

Its the characterization that an approach is "wrong" here that bothers
me.  In this finalization debate there there is no "wrong" and no
"right" way; there are only design tradeoffs between the two
approaches that should be illuminated as clearly as possible (hence my
posts to try to dicover what weak-for-finalization pointers were and
how they were used to influence finalization order). 

As language designers we are in the business of restricting
programmer's freedom in order to make it "easier" in some sense for
them to express their algorithm. After all, if they want to use the
full power of a machine architecture, they should program in
assembler.  I think we both would agree that programmers gain
significantly from copy-based garbage collection, in terms of better
locality and the elimination of "memory smashes", by giving up the
ability to construct arbitrary pointers (and therefore allow the
underlying system chose the pointer representation).  As a language
designer I generally favor as restrictive a language feature as
possible, because it allows the system to do as much of the
programmers work as possible.  I favor less restrictive features when
it is clear that there are occasions when the programmer _must_
circumvent the restrictions, because I would prefer the programmer be
more practiced and familiar with a feature than they would be
with a mechanism that is used only in exceptional circumstances.

You and everyone else following this debate (assuming we haven't lost
our entire audience by now), have their own criteria for judging the
usefulness of a language feature, so it is our responsibility to
critically analyze our respective mechanisms and illustrate as clearly
as possible the tradeoffs we have made in our design processes, so that
everyone can make their own decisions.

My understanding so far is:

Topological finalization:

You are proposing a language restriction that enforces a topological
finalization order.  This restriction correctly handles finalization
automatically N% of the time (I think you have characterized this as
90%, but the important part here is that N<100%).  Thus, the
programmer must use an alternate mechanism in 100-N% of the cases. 
Your mechanism to allow the programmer to handle these cases where
topological ordering is inappropriate is weak-for-finalization

In exchange you are willing to place responsibilty for timely
finalization in the hands of a programmer, who must use
weak-for-finalization pointers in those cases where there are long
dependency chains or cycles.  In other words, you do not get
_automatic_ timely finalization because it requires programmer
intervention in some cases.

It has also been asserted that weak-for-finalization decisions can be
made using only local information.  If this were true and could be
determined automatically (i.e.  done by the compiler), it would in
essence eliminate the need for programmer declarations about weak
pointers and you would again have automatic timely finalization.  I
have proposed a counter example for this assertion, namely that for
cycles, the programmer must have information about an entire cycle to
break a cycle correctly.  I also still believe that more than local
information is required for chains of dependencies.

With guardians:

Timely finalization is automatic and requires no intervention on the
part of the programmer.

Finalization order is not automatic, but is completely under the
control of the programmer.  In M% of the cases, however, the guardian
code can be encapsulated in the object creation functions where it is
nearly as automatic as in the case of topological finalization.  
Unfortunately, the value for M% is rather fuzzy and will remain that
way until we have more experience with guardians.  Even if the simple
mechanism that I proposed for dealing with chains of dependencies
becomes overly complex, it does not rule out the possibility of other
ways of using guardians to handle finalization dependencies in a
simple modular fashion.  In order to consider alternate mechanisms I
would appreciate an example of finalization dependencies where
weak-for-finalization pointers shine.  I have seen someone (I think
it was Hans) allude to an "editor" with "cords" (Cedar ropes with
file backing?) as being an example.  I would appreciate as detailed
a description as possible (documented source if you wish).

My first cut at an analysis:

There are I% of the cases where the finalization order is independent
of any other finalization actions.  In my very limited experience, I%
has been 100% in production programs, but your experience with
real-world problems clearly shows otherwise.  In either case I% must
be less than N% (because topological ordering is one possible order
for independent finalization actins) or M% (because guardians can be
used to implement more than just independent finalization actions with
only trivial work on the part of the programmer). 

Question:  What is a realistic value for I% ?  I suspect that I% is
very large, on the order of 50-80%, but I would be interested in what
you have found out.

Summary:  I% < N% < 100%
          I% < M% < 100%

We don't have any hard data or whether N<M or M<N.  You clearly feel that
M<N, and I wouldn't currently dispute that, but would like to point out
that since the programmer has complete control, each new insight into the
use of guardians pushes M higher.

> If the scheme breaks with a concurrent collector, something is wrong.

You have mentioned this point twice without explaining how guardians 
suffer in a concurrent collector in ways that topological finalization
do not, so I am still completely in the dark about this assertion.

Carl Bruggeman -- bruggema@cse.uta.edu -- Phone: (817) 272-3600 Fax: 272-3784
Computer Science and Engineering Department -- University of Texas at Arlington