[gclist] Experience with conservative GC sought

Hans Boehm boehm@hoh.mti.sgi.com
Sun, 25 Jan 1998 21:34:41 -0800


A few additions to the discussion:

> Has any imprecise collector been used in a production system
> (10,000+ daily users today) that has as high an allocation rate
> as emacs?

Yes.  Some Xerox high end printers use a fully conservative collector for much
of the software, including the PostScript interpreter.  I believe the user
number is in at least this vicinity.  I suspect the allocation rate is similar
as well.  The application is of course very different.

The standard Sun Java VM (and many derivatives, including SGIs) conservatively
scan the stack.  I suspect they have a few more than 10,000 users, in spite of
the fact that this is probably not the world's greatest garbage collector.

A number of language implementations on Xerox D-machines used conservative
stack scans.  So does GNU Common Lisp, as far as I know.

I have certainly seen applications that needed tweaking to work with
conservative GC, though a large number seem to work out of the box. I don't
know of any that failed but would have worked with a "precise" GC.

Some of the Univ. of Colorado papers give space usage numbers for
conservatively collected C programs and their malloc/free counterparts.  There
was little evidence of leakage even for the unmodified applications.

I have seen no cases in which a conservatively collected system worked for some
long running inputs and then failed on others due to leakage.  Pointer
misidentifications will vary.  But in nearly all cases, a single pointer
misidentification will at most double space requirements, and usually the
effect is MUCH less.  Thus usually (lots of handwaving here, but I expect it
should actually be possible to formalize this) the probability of conservative
collection increasing the asymptotic space requirements is zero.

There are two cases in which the above is false:

1) Infinitely growing data structures, most of which gradually becomes garbage
(queues, lazily evaluated lists).

2) High misidentification probablity.  An pointer misidentification causes
storage to be retained that on average includes at least one more misidentified
pointer.

Both tend to be seen during testing, at least for applications for which the
data structures are similar between runs.  Gnerally these can also be easily
avoided, with either explicit pointer clearing or very limited type
information, once the problem is understood.  (Lazy lists are probably the
hardest case.  But they don't seem to be used much in real applications.)

The problems are not fundamentally difficult to diagnose, though the tools tend
to be lacking.

Hans





-- 
Hans-Juergen Boehm
boehm@mti.sgi.com