[gclist] zillion objects and gc

Boehm, Hans hans_boehm@hp.com
Tue, 28 Aug 2001 16:57:52 -0700


> From: David.Chung@USPTO.GOV [mailto:David.Chung@USPTO.GOV]
> 	Say I have 100 M worth of objects resident 
> in  memory. Most of them are not large, and their 
> pointers criss-cross everywhere. 
> 
> 	One characteristic of these objects is that a 
> fraction of them die, be it very slowly. If you GC'ed 
> these objects (via generational collector), when the 
> oldest generation has to be garbage collected, the 
> collector has to scan all the pointers that are boxed 
> in 100 M worth of objects. This can be CPU 
> intensive. 
> 
Just to be explicit about it: The issue here is latency not total CPU cost.
If you allow a constant factor of space overhead, and assume that your heap
is always significantly larger than the cache, but smaller than main memory,
and that the root size is much smaller than the heap, then the GC cost per
byte of allocation should be independent of the heap size.  But collecting
4GB will probably take a few seconds.

> 	If you impose the condition that there are no 
> cycles, then solving the preceding problem is easy: just use 
> reference counting. Because reference counting 
> only touches objects that are affected when a reference 
> is decremented, the overall cost to the above system is low, 
> even though one may have 100 M of small, pointer-filled 
> objects.
>
That's not necessarily true.  In general, if objects are small, and you
assign pointers significantly more frequently than you allocate objects, I
would expect the overall reference count cost to be higher than for a
tracing collector.  At least that's the way the experiment came out the last
time I tried it.  I'd expect it to be far higher if you have to synchronize
on reference count updates.  Also note that reference counting touches
objects that wouldn't otherwise be touched, especially when it deallocates a
large data structure that has just been dropped.  Reference counting tends
to do much better for larger objects.  And it has some other advantages.

Hans

> 	Of course, for the problem I am considering, 
> there is no guarantee that the chain of ponters 
> are acyclic.