[gclist] Concerns about GC

Richard A. O'Keefe ok@cs.rmit.edu.au
Tue, 1 Oct 1996 11:47:58 +1000 (EST)


Someone wrote
1. 	Furthermore, though one class
1. 	of problems disappear (memory leaks) another is created: garbage that
1. 	is trapped by badly written programs.

I asked
2.	How does this make GC worse than not GC-ing?  If you _don't_ GC,
2.	even *more* garbage will accumulate.

(My point, of course, was that this is not "another class", but basically
the *same* class.)

Greg Hudson replied
3.	Suppose I have a program which uses explicit memory management.  It
3.	allocates some big object, passes around a bunch of pointers to that
3.	object, and then frees it, leaving some dangling pointers to that
3.	object hanging around.  Say that, for complicated semantic reasons, it
3.	never uses any of those dangling pointers (maybe it's a global
3.	variable that's only used during the initialization stage).  We have a
3.	perfectly correct (if badly written) program which doesn't leak any
3.	memory.  But if we instead used garbage collection and didn't
3.	explicitly free the object, we would have a program with a memory
3.	leak.

There is a gap in your logic here.  You are assuming that the remaining
pointers will be traced.  This is not always the case.  In the case of
a conservative collector added after the event to a GC-hostile (C) or
GC-neutral (Ada) language, it probably will be.  But in the case of a
GC-favourable language with an optimising compiler (if you haven't got
an optimising compiler, you have worse problems to worry about) and a
co-designed GC, the compiler can arrange to nullify pointer variables
(even pushing this analysis into records) when data-flow analysis shows
them to become dead.  This doesn't solve all known problems:  NOTHING can.

The thing that garbage collection buys you is that if you revise the
program so that one of those dangling pointers _is_ used (and the
revision may be in a completely different module from the one where
the 'free' command is) the GC version will still be correct, while the
manual management version will have a serious mistake.

I see programs with elementary mistakes in spelling and grammar.
I see programs containing stuff like this:
	unsigned char a[2];
	...
	++*(int*)&a[1];
(I kid you not, I really did see that last year in a *fielded* C program,
embedded in a popular Australian gambling device.)
I run programs through LCLint and find all _sorts_ of serious mistakes.

I DON'T TRUST YOUR PROGRAMMERS.  If there are "complicated semantic
reasons" around, I expect errors.  *Especially* memory management errors.

You are also assuming that people who use GC have *only* the GC to rely
on and operate in total ignorance of what's actually happening.  This is
often regrettably the case, but it doesn't have to be so.  Check the
paper
	Lag, drag, void, and use
	heap profiling and space-efficient compilation revisited
	Niklas R\"ojemo and Colin Runciman
	pp 34--41, PLDI 96

It would be very useful to have this kind of profiling for _any_ language
that provides dynamic memory allocation:  Haskell certainly, Mercury (I
was asked for the full reference by someome in the Mercury team; they've
done such good things so far I wouldn't be surprised if they adopted
something like this), Ada, and especially C.  It isn't _enormously_ hard
to roll your own in C (one of the nicest things about C), but it would be
nice if all of us, both GC-ers and manual-memory-manglers, had *ready*
access to such a profiler.

3.	This kind of problem is probably less common than traditional memory
3.	leaks, but if your language supports automatic destruction when a
3.	variable goes out of scope (e.g. C++), then traditional memory leaks
3.	don't have to be very common either.

Um, I am aware of C++.  I have used it and studied it.  I have the draft
C++ standard.  I like constructors and destructors.  Ada has a similar
facility.  But they only provide automatic help with *unshared* structures.
And that means that your example _is_ a problem for C++; a C++ global
variable is not destroyed until the program exits.  (Link time data flow
analysis could improve this, but so it could for GCed systems.)

3.	All of which is not to say that I agree with Pinku, who is probably
3.	making the canonical mistake of confusing "garbage collection" with
3.	"emacs," but I've always valued garbage collection more for its
3.	contribution to language safety and interface design than for its
3.	purported ability to prevent memory leaks.

I think you are assailing a straw man here.  Nobody in his right mind would
say that garbage collection can PREVENT memory leaks.  Perfect garbage
collection is provably impossible, so NOTHING can prevent every imaginable
memory leak.  The issue in this thread had nothing to do with whether GC
can prevent memory leaks, but only with whether GC systems need _much more_
memory than non-GC systems, i.e. whether GC _causes_ a kind of leak.

In complex cases, good programmers don't dare to do manual deletion, at
least, not without _very_ good tools.  In the case you brought up, GC
would *in practice* not leak any more than manual management, because
manual management would *not* free the storage either.

I think it's time for my regular plug of LCLint.  If you are interested
in memory management, take the time to view

    http://larch-www.lcs.mit.edu:8001/larch/lclint/

which can help with memory management in C.  It uses annotations.
For example,

	struct Node {
	    /*@only*/ /*@null*/ struct Node *left;
	    /*@only*/ /*@null*/ struct Node *right;
	    /*@only*/ char *key;
	};

says
    left may be NULL; if it is not NULL, it is the only reference to
    the object it points to, and must eventually be free()d.
    the same about right.
    key may not be NULL.  If is the only reference to the object it
    points to, and must eventually be free()d.

If I have understood it correctly, it even supports the distinction
between "links" and "threads" presented in

	Static and Dynamic Partitioning of Pointers and Links and Threads
	D. S. Wise & J. Walgenbach
	pp 42--49, PLDI 96

via /*@owned*/ and /*@only*/ (links) and /*@dependent*/ (threads)
annotations.

The "only" annotation is similar to the uniqueness types of Clean and
the similar concept in Mercury.  _This_ kind of manual control over
storage allocation is _very_ nice to have, because it can be statically
checked, both that you are not freeing something shared and that you
*are* freeing something eventually that needs freeing.  However, even
this doesn't prevent what R\"ojemo & Runciman call "lag".