[gclist] My copying collector or Boehm's?

Boehm, Hans hans_boehm@hp.com
Fri, 16 Mar 2001 09:19:22 -0800


A few observations:
>     With Boehm's collector, my interpreter runs about 2.5
> times slower than before.  This is not a knock on Boehm's 
> collector.
I still find that surprising, since it suggests that your interpreter is
spending >60% of its time allocating/collecting with our collector,
something I don't normally see.  A profile and GC log would be interesting
to me.  In particular, it would be nice to know:
a) How much time is spent in the marker?
b) How much time is spent locking around allocation?  (The win32 threads
port uses EnterCriticalSection() and LeaveCriticalSection().  On many other
platforms a custom locking scheme is used instead, since the standard one
exhibited serious performance problems, some form of convoying being the
most common.  I've heard from several sources that at least some
implementations off the win32 primitives have similar problems, so a custom
solution should be used as well.  That would be fairly easy, but I haven't
gotten around to it yet.  A confirmation that it really is a problem would
help.)
c) What fraction of the time is spent context switching (see (b)).
d) Does the amount of live data in the GC log look right?
> The drop off in performance was expected, 
> because (1) my original collector is turned on and off 
> at precise points in my C++ code to minimize
> collection
That's potentially a big win, clearly.  You can
also do that with our collector, though it may be much
less effective with a global heap than per-thread heaps.

> (2) my collector uses type information all the time, 
My experience has been that for small objects this matters in the expected
case only to the extent that it reduces the overhead of checking a real
pointer, i.e. only if you can actually reduce checking on a "pointer" field
to a comparison against null.  That requires that you disallow pointers to
statically allocated data, which typically requires some copying for
constants.  And even the significance of that has been decreasing as GC
costs become more dominated by the costs of cache misses on the data being
traced/copied.

> (3) it uses no locks for allocation, because it has a
> separate heap for each thread.and
Potentially a large win, but very restrictive on the client code.

> (4) heap residency 
> was low for the test cases. -- which favors copying 
> collector over mark-sweep.
Somewhat.  But it also helps our collector a lot, provided you similarly
increase the heap size.  (This should work better in 6.0 than in the 5.x
releases.)  If it doesn't help, I would definitely suspect the locking code.

Hans