[gclist] conservative vs. "precise" gc

Hans Boehm boehm.PARC@xerox.com
Sun, 25 Feb 1996 22:43:19 PST


Darius writes in response to my description of minimally conservative
collection:

"I'd be very interested in any true measure of this risk, and more
globally, with any measure which relates to where one hopes that
everything will go right, and where everything can go wrong.
Some people say real problem do not occur in practice very often,
but what is very often ?"

I don't know of any collector that actually only scans the top stack
frame(s) conservatively.  If you did this, the risk should be essentially
zero, since any false pointer will cease to be a false pointer if you
ever trigger a collection with a different frame on top of the stack.
Since you probably really have to be conservative only if the thread in
question was preempted, the collector would have to repeatedly preempt
a given thread with the same top stack frame containing the same bogus
pointer.

For higher degrees of conservativism, it's not clear how you measure the
risk.  My experience (based heavily on feedback from others), is that
it's rare to have any problems, even with full conservativism, assuming
the collector does all it can to minimize the risk.  I don't think
conservative collector leakage is a major factor in any of the Zorn
benchmarks.  (Zhong Shao looked at 4 or 5 of them, and essentially precluded the
possibility for all but one based on live data found by the collector.
And the problem with the last one may well have been a pointer variable
pointing to dead heap cells.)

In other code, there have been a few cases of significant leakage
reported by others.  But
so far they have all been fixable by providing a very small amount of
information about layout of heap objects to the collector.  The problems
are not equally distributed.  Certain kinds of data in the heap or static
variables are problematic unless the collector is informed that they are
not pointers.  Bitmaps (esp. compressed) and bignums come to mind.
A problem in the Sather compiler was due to an integer field in each
parse tree node that contained some cleverly encoded position information,
if I remember correctly.  Simply
using the pointerfree allocation primitive in one or two places is likely
to fix the problem, except in the Sather case, which required a slightly
more invasive solution.  None of these problems would have occurred if
only the stack was scanned conservatively.

Another common cause of conservative GC leakage in C are custom allocators
built on top of the garbage collecting allocator.  These allocators
tend to keep references to objects they consider to be free space, but
which isn't cleared.  Thus the collector will continue to traverse
such free lists etc. and mark objects accessible from them.  But this has
nothing to do with conservativism.  It has to do with C programming styles
that are poorly tuned for garbage collection.

Hans-J. Boehm
Standard disclaimer ...