[gclist] Finalizers & Reference counting.

Boehm, Hans hans_boehm@hp.com
Wed, 28 Aug 2002 11:25:58 -0700

> From: Charles Fiterman [mailto:cef@geodesic.com]
> Collection is divided by giving each processor an area to collect. If 
> processor A finds a pointer to processor B's data it notifies 
> processor B. 
> At small sizes this works fine, as we scale up it breaks down. GC 
> parallelizes quite well at one scale but not another.
I don't consider that the standard approach to parallelizing tracing.  Most parallel collectors that I know of use some wort of work list algorithm, where the work list essentially contains grey objects.  There is no specific memory region associated with each process(or).  This assumes an SMP, or a NUMA machine with relatively small remote access penalties.

My impression (based on various papers I've seen and some of my own measurements, cf. http://lib.hpl.hp.com/techpubs/2000/HPL-2000-165.html) is that at least for a mark/sweep collector, this scales well unless/until it runs into memory bandwidth limitations of the underlying hardware.  Our collector takes in the vicinity of 1.5 seconds to trace a GB of pointer-containing small objects on a 2 processor Itanium 2 machine, and almost exactly twice that on a uniprocessor.