[gclist] What to say about GC.

=TL2=Tom Lord() lord@LanMinds.Com
Fri, 19 Jul 1996 13:07:53 -0700 (PDT)



    Richard Jones writes:

    The memory storage requirements of complex programs are extremely
    difficult to manage correctly by hand.  A single error may lead to
    indeterminate and inexplicable program crashes.  Worse still, failures
    are often unrepeatable and may surface only long after the program has
    been delivered to the customer.  The eradication of memory errors
    typically consumes a substantial amount of development time.  And yet
    the answer is relatively easy - garbage collection - removing the
    clutter of memory management from module interfaces which then frees
    the programmer to concentrate on the problem at hand rather than
    low-level book-keeping details.

    [....] This book considers how dynamic memory can be recycled
    automatically to guarantee error-free memory management.


I am a big fan of garbage collection and I'm looking forward to
seeing Richard Jones' book.

But I also think that the paragraphs quoted above are an exaggeration
of a sort that can mislead people about garbage collection and set
them up to be badly disappointed.  This particular exaggeration is
wide-spread; I'm sure I've passed it along myself.

The essence of the problem with the quotes comes in two parts.

First, the quotes speak of one type of memory management error,
prematurely freeing an object that is still needed, as if it is the
only kind of memory management error that needs to be considered when
in fact another type of error, failing to free memory that is no
longer needed, is just as important and is if anything, harder to get
right in a garbage collected system than in, say, a malloc/free
system.

Second, the paragraphs above suggest that using a garbage collector
completely and automatically solves at least the problem of
prematurely freeing an object.  If garbage collectors never changed or
needed to be extended and were known to be free of errors and
completely reliable, the quotes would be right but in fact garbage
collectors do need to be changed and extended and are prone to hard to
detect, repeat, and repair bugs.  On balance, a garbage collector may
be the best way to manage memory, but this option should be described
like reference counting, malloc/free and other approaches as a
fallible technique with its particular strengths and weaknesses that
needs to be applied with care.  GC is not an automated panacea that
magically yields "error-free" memory management.

We can say something along the same lines as the quotes, but less of
an exaggeration: garbage collection provides "fully general dynamic
lifetime analysis" and saves a programmer using a properly functioning
garbage collector from having to decide staticly where in a program to
free objects, so long as that programmer isn't worried about core
leaks.

When malloc and free are used, the lifetimes of objects are analyzed
*staticly* and calls to "free" are put at certain points in the
program.  To not prematurely free objects, the programmer has to
ensure that each call to free in the program is the last use to which
the freed object will be put.  To not have core leaks, the programmer
has to make sure that every object allocated is eventually freed.  For
some algorithms, it is impossible to staticly analyze the lifetimes of
objects and so malloc/free memory management is insufficient.

When reference counting is used, the lifetimes of objects are analyzed
*dynamically* and free is called when the run-time system notices an
object has died.  To use reference counting, programmers have to be
sure that every time a reference to some object is either stored or
overwritten, the reference count for that object is adjusted
accordingly.  To avoid premature freeing, programmers have to prove
that reference count adjustments are always made when needed and are
always properly balanced -- this is trivial because it can be
automated as an extension to the usual behavior of pointer assignment.
To avoid core leaks, programmers have to ensure that whenever a
particular reference to an object is no longer needed, that reference
is broken.  Deciding where to break references is always at least as
easy as deciding where to call "free" but whether it is really easy or
not is a property of the semantics of the application and the data
it manages.

For some algorithms, it is impossible to use reference counts.
Reference counting is not fully general -- it fails if objects refer
to one and other in a cycle.  I'm sure there are countless other
examples besides simple reference counting of memory management
techniques that dynamically compute lifetimes, but that are less than
fully general.

Garbage collection is any technique that computes lifetimes
dynamically, and that is fully general -- it isn't stumped by cycles
or any other kind of pattern of reference.  To prevent premature
freeing using GC, "all" a programmer needs to do is properly maintain
the GC implementation (more on this later).  To prevent core leaks,
the obligation is the same as with reference counting -- to break
references after they are no longer needed.  Whether it is easy or
hard or even possible to prevent core leaks when using GC depends on
what algorithms are used.

Beyond basic garbage collection are techniques that compute lifetimes
dynamically, that are fully general, and that support one or more
kinds of "conditional reference".  For example, a "weak reference"
from A to B might be automatically broken if no stronger references to
B exist.  Or a "tied weak reference" A to B might be broken if some
third object, C, no longer exists.  I suspect there are many
irreducible variations.  Weak references expand the range of garbage
collected systems -- there are algorithms made practical by their presence
that aren't practical otherwise.  But weaks reintroduce the problem of 
premature freeing in a form that is, if anything, more slippery than
the corresponding problem of a malloc/free environment.

Did I mentioned the bugs yet?  A garbage collector must be maintained,
ported to new platforms, and extended to deal with new data types.
All of those are intricate and error-prone activities.  Some garbage
collectors require programmers to follow strict conventions in their
code or face GC errors.  Fixing, repeating, and even detecting these
bugs is very hard and the penalties for mistakes are similar to those
for mistakes using malloc/free: corrupt memory, premature freeing, and
core leaks.

So then what is garbage collection since it is not a panacea that
yields "error-free memory management"?  It is not a perfect solution
to the problems of memory management it is simply a subset of
techniques known for dealing with them -- a range of trade-offs
available to programmers who need to adopt conventions for allocating
and freeing objects.

What is GC good for?

- Short-running, experimental, or fragmentary programs in which some
  storage reclamation is a good idea, but some core leaks are also
  acceptable.   These are the kinds of programs in which it is 
  most true that GC "frees the programmer to concentrate on the 
  problem at hand rather than low-level book-keeping details."

- Long-running, installed, complete programs in which storage
  reclamation is important and for which static techniques,
  reference counting and other less-than-fully-general approaches
  are inadequate or may be inadequate for foreseeable enhancements.

I think that combination of strengths makes GC an ideal choice for a
default development environment -- it imposes the fewest restrictions
on the forms programs can take and it supports exploratory, incremental, 
and interactive program development.   When Richard Jones implies that GC
is good for "the memory storage requirements of complex programs" I agree.
But "error-free memory management"?  Not even close.

-t
Tom Lord