[gclist] Finalization and the insane postman bug.

Boehm, Hans hans_boehm@hp.com
Mon, 8 Oct 2001 22:54:05 -0700


The problem with omitting finalization from a language is not that you have
to do a few things differently.  If that were the only issue, I would agree
with you.  The problem is that if the language is missing finalization,
sooner or later you will need some other means to deallocate a non-memory
resource embedded deep in a data structure.  But there is no way to
determine when to deallocate that resource without redoing all of the work
already being done by the garbage collector.  A 100,000 line program
probably needs finalization in only one or two places; but doing without
finalization will involve essentially reqriting the whole program for manual
memory management, with all of the problems that entails.

I find it far easier to read and verify correctness of a program that
includes one or two finalizer uses, than to read one that is manually
reference counted, because the builtin GC didn't quite have the necessary
functionality.

In my opinion, GC without finalization just doesn't quite work.  It gets you
quite a ways, but then you get stuck as you move to larger systems.  The
fact that systems without finalizers invariable have a mechanism like them
builtin under the covers should be a strong hint that sonmething is
incomplete.

I also disagree with the majority opnion here that finalization is somehow
insurmountably difficult.  Many languages (Cedar, Modula-3, I believe later
versions of Smaltalk) got it pretty much right a long time ago.  The
description was tiny compared to a modern language definition, and not very
complex.  In my opinion, Java got the ordering wrong, which makes things a
bit more complicated in a few cases, but is certainly not a fatal flaw.  I
think much of the perception of difficulty comes from:

a) The misconception that this is a replacement for all C++ destructors.

b) Early buggy Java implementations.

c) Poor and confusing advice on how to use finalizers.

Hans

-----Original Message-----
From: rog@vitanuova.com
To: gclist@lists.iecc.com
Sent: 10/8/01 7:58 AM
Subject: Re: [gclist] Finalization and the insane postman bug.

> > do we really need general finalisers?

> In my opinion, yes.  You clearly need them in a system that includes a
> legacy library requiring explicit deallocation/object destruction.
[...]
> Another good example is the "rope" data type that's included in the

hmm.
for any given system feature, there are bound to be many applications
one can think of that need that feature.  but conversely, it is always
possible to write that application in such a way that the feature is
not required.

when adding a feature to the language, surely one must consider in
their entirety the effects that the feature will have on the language
as a whole?  i don't dispute that general finalisers solve problems
like the ones you mention with elegant ease, but i'm not sure that the
resulting language is one that i'd find easier to use and debug.

the approach taken by the developers of Inferno/Limbo, which seems to
me quite reasonable, is that the garbage collector is used to pick off
all the easy fruit, the non-controversial memory reclamation, and a
few selected kernel-managed data structures and resources.

if you want anything else, you have to do it yourself; most APIs are
likely to require some discipline on the part of the programmer to
adhere to the semantics of that API. for years in the C world, we
malloced and freed data structures explicitly; post C, a given API
that needed a finaliser could still require explicit notification:

	x = API.getresource();
	x.dosomething();
	API.freeresource(x);

of course, this can drag in some of the old bugs (in particular,
memory leaks), but those are well understood and (relatively) easy to
find.  by going towards completely general finalisers, we exchange
this for a larger can of worms that wriggle more...

i might be old fashioned here, but i like to be able to verify the
correctness of a piece of code by inspection, and if arbitrary code is
potentially run as a result of pointer assignments, i lose that
potential, which ultimately impacts maintainability and the overall
cost of the s/w development cycle.

if we were to do a cost-benefit analysis of this feature, i think it
would be that which would ultimately swing my vote.  but then again, i
seem to spend most of my time inspecting, rearranging and fixing old
code, so i'm probably biased.

  cheers,
    rog.