[gclist] What to say about GC.

=TL2=Tom Lord() lord@LanMinds.Com
Wed, 24 Jul 1996 16:17:49 -0700 (PDT)


Here is a clarification of my views on "What to say about GC."

I am not arguing that GC should not be used or that it is an inferior
technique to other approaches to memory management.  

I *am* arguing that collectors are hard to use well in complex
applications and are themselves a source of subtle errors.  I am
arguing that when we "sell" GC as a simpler technique than the
alternatives we are being misleading -- we should be selling GC as a
more powerful technique than the alternatives, but one that sometimes
is at least as tricky to use as malloc/free.

Sometimes people speak as if GC simply automated away all the
difficulties of memory management.  Common claims are that unneeded
objects are automatically freed (false) and that only the GC
implementor has to worry about GC internals (often false).  Let's
examine these.

The false idea that unneeded objects are automatically freed by a
collector stems from the true fact that collectors automatically free
all objects not reachable by reference chains in the program's data
structures.  What collectors don't do, as I'm sure we all know, is to
free those objects reachable by reference chains, but unreachable by
the running program.  For example, if there is a list, L, and the
program will never look at (cdr L) then (cdr L) isn't needed, but it
won't be freed.  Collectors require programmers to manually maintain
the invariant that all objects which should be freed must not be
reachable by any chain of references -- therefore freeing unneeded
objects isn't automatic at all.  Any claim that collectors make it
trivial to ensure that all unneeded objects are freed is in essence a
claim that it is easy to manually maintain the invariant that unneeded
objects are not referenced -- a claim that is at best unsupported and
that can be seen to be highly unlikely.

The sometimes false idea, that only the GC implementor has to worry
about GC internals, is true for programmers working in high-level
languages on a fixed platform using a stable language implementation.
But the idea is misleading for programmers who use GC with lower level
languages like C or C++, who extend an existing high-level language
implementation with new built-in primitives or types, or who must port
a GC beyond the platforms supported by its author.

In simple, short-running, non-critical or otherwise permissive
programs GC can be used to automate memory management -- it won't
necessarily be 100% correct but will tend to err in the less harmful
direction of core leaks.  Most of the good press GC gets seems to
focus on this style of using it.

But in long-running or otherwise critical programs, programmers using
GC have to think about and program for memory management just as
carefully as when using malloc/free.  This is something I'd like to
see acknowledged more often.

So why bother with GC in long-running or otherwise critical programs?
Because GC is a more powerful technique, able to deal with data
structures that can not be managed using just malloc/free or reference
counts.


	William Clinger wrote:

	For example, it seems to me that the compilers I have used and
	written tend to be considerably buggier than the garbage collectors
	I have used and written.  Paraphrasing Tom Lord, I could say
	
	    If compilers never changed or needed to be extended and
	    were known to be free of errors and completely reliable,
	    then we could claim that compilers were a general method
	    for translating high-level languages into correct machine
	    code, but in fact compilers do need to be changed and
	    extended and are prone to bugs.
	
	This would not be a very good argument against using compilers,
	and it doesn't change the fact that using a compiler is the only
	sane way to construct most software.  Compilers may not be an
	automated panacea that magically yields "error-free" generation
	of machine code, but they come pretty close, and many of their
	bugs arise from market pressure to emphasize efficiency over
	correctness.

This analogy is interesting and I think it underscores my points.

If we go back in time to, say, the early seventies, then I think the
hypothetical quote "If compilers never changed..." would carry a lot
of weight.  At that time (from what I can tell with hindsight) one
would have had to weigh carefully the decision to use many of the
compilers available because, indeed, the bugs might outweigh the
benefits.  The same is true today of new compilers including compilers
for new languages.

Looking at the state of things today, the decision to use a compiler
is much easier -- but an analogy to GC still makes sense: using a
compiler doesn't necessarily save programmers from having to work very
carefully to make programs efficient.  Even though a compiler's
automated code generation is good enough for cases where performance
isn't critical, where performance matters programmers have to regard
compilers as a powerful means of explicitly controlling machine
language generation -- not as a black box that automates the problem
away.  Analogously, for applications whose memory usage is important,
GC is a powerful means of explicitly controlling object lifetimes --
not a black box that magically solves the problem.

-t