[gclist] collector optimization

Boehm, Hans hans_boehm@hp.com
Mon, 12 Mar 2001 09:56:26 -0800

As David pointed out, it's very helpful to tell the collector which objects
are completely pointer-free.  Besides reducing the potential for false
pointers, this reduces the number of cache lines and pages touched during
GC, sometimes by a large fraction.

There are several ways to pass more detailed layout information to the
collector.  The C typed allocation interface is one.  Gcj uses another
that's more geared towards a world in which every object has a "vtable"
pointer anyway.  They help primarily in reducing the potential for false
pointers.  The impact on typical GC time is usually minimal, since this
often doesn't change the set of cache lines that need to be touched during
GC by much.  (If you end up using primarily one of these facilities, it may
be worth restructuring the mark loop to check for the most common case
first.  Currently it assumes that it will most often be asked to scan
sequential ranges of memory, as opposed to ranges described by bit maps.)

As David also points out, there are hooks for controlling the triggering of
garbage collections, if you notice that you are spending significant amounts
of time in badly timed collections, i.e. when nothing gets reclaimed.

In your environment, you could probably get a significant win (10% of GC
time for X86?) if you are willing to tie the GC object code to a specific
machine.  Enabling prefetching in the marker often results in a significant
reduction of GC time. (See my ISMM 2000 paper.)  Code to do this currently
exists for Linux/X86, but not NT.  It should be easy to add, assuming there
is a way to tell the compiler to generate a prefetch instruction.  The
problem is that you need either a Pentium II+ or a recent AMD processor, and
the Intel and AMD prefetch instructions are incompatible.  I've been
considering optionally including all versions, and switching them based on a
dynamic test for the processor type.  But that's not yet there.

I would expect that for something like Scheme implementation, versions 6.x
will outperform the 5.x versions of the collector, due to a more refined GC
triggering heuristic.

I've found that under Linux the collector is now occasionally faster in
incremental/generational mode.  That's application dependent.  I'm not sure
whether that's true under NT, since (based on obsolete anecdotal evidence
only) I believe the signal/exception handling overhead for the VM write
barrier is higher under NT.


> -----Original Message-----
> From: Ji-Yong D. Chung [mailto:virtualcyber@erols.com]
> Sent: Saturday, March 10, 2001 3:12 PM
> To: gclist@iecc.com
> Subject: [gclist] collector optimization
>     Hi,
>     I just finished replacing my copying collector with 
> Boehm's collector.  (I used the included C++ interface 
> on VC++6.0, NT platform).  
>     Eventually, I would like to try optimizing it for
> speed.
>     Does anyone know if there are application specific 
> optimizations I can try with Boehm's collector? 
> More specifically, I am wondering if there are parts of Boehm's 
> code that are known to be hackable for application
> specific optimization  -- I mean no disrespect to 
> Boehm or to Boehm's collector, here  :)
>     I do not mean just changing values of the tuning 
> hooks that are provided, as I have done much of that.
>     Thanks in advance, for any information related
> to the collector optimization.
> Take Care
> Ji-Yong D. Chung