[gclist] When to garbage collect

David Gadbois gadbois@cyc.com
Fri, 14 Nov 1997 11:07:30 -0600 (CST)


   From: JanFriso.Groote@cwi.nl
   Date: Fri, 14 Nov 1997 10:02:01 +0100 (MET)

   I would like to know whether there exist references to papers
   dealing with when to really do a Mark-Sweep garbage collection.

There is some discussion in Jones' and Lin's book, but techniques for
doing this well tend to be black art and highly application-specific.
Most systems use 1) triggering when some absolute heap size is
reached; 2) Doing some set amount of GC "work" per allocation; or 3)
heuristicating based on how much space was recovered in previous
collections.

I recently tried a technique that sounded promising but didn't work.
I will describe it here so you can avoid it:  Assume a generational GC
system with a mechanism for recording stores of references to young
objects into older object.  Trigger a GC when the number of these
references reach a certain level.  The intuitions of why this approach
could win are 1) Testing of whether to GC is moved out of the
frequently used allocation path and into the infrequently used write
barrier trap code; and 2) The amount of GC work to be done is a
function of how many forward generation references there are.  The
reason why this approach actually loses is that, for the application
in question at least, the Ordering Assumption does hold, merely
counting the number of forward references missed most allocation, and
the heap size tended to get out of hand.

An approach that works much better is to do a GC when the size of the
youngest generation hits some limit.  I initially used the size of
physical memory (the application has a heap size that is usually much
larger than physical memory) and eventually found a sweet spot at
about half the physical memory size.  Wilson suggests keeping the size
limit down to secondary cache size.

This was for a copying system.  Not sure how it would translate to
mark-and-sweep,

--David Gadbois