[gclist] A problem with the Baker treadmill.

David Petrie Stoutamire david@stoutamire.com
Tue, 14 Jan 1997 09:44:34 -0800

Charles Fiterman wrote:
> >Thats exactly what I'm doing and have run against.
> >If you're using a treadmill per size class then
> >you're only locking a treadmill for one size of object,
> >but still..
> I'm doing Solaris Threads for a start and the overhead
> is in the lock call. It doesn't matter if other threads
> are running. That means that since the lock is for a
> small number of instructions we should lock on the
> whole collector rather than lock twice. It suggests use
> of a spin lock written in assembler. In fact the requeue
> operation can profitably be in assembler with the spin lock.

I can confirm that writing your own spinlocks is the
right thing to do.  In the pSather implementation, which
aggressively combines active messages and multiprocessing,
I looked carefully at the performance of hand-coded spinlocks
vs. Solaris primitives.  The punchline: you can save up to two orders
of magnitude on some systems by using spinlocks when there is
little contention.  The Solaris primitives are not very efficient,
and you don't want to even consider using anything heavier than
mutex; all the others are extremely bad (as of 2.5).  You can
write your own eg. semaphores in terms of spinlocks and mutex
and they will be far faster because the common case won't require
system calls.
Of course you can only want to use spinlocks for short, non-blocking
critical sections, etc.  I use them in the internals of my GC for
Sather under Solaris.  These results come from 4 cpu SS10 and 
uniprocessor Ultra systems; I haven't analyzed the x86 implementation
yet.  I also can't speak for IRIX, etc.

Beware cache effects associated with context switching!  It's quite
easy to drop more orders of magnitude there.

   - Dave

David Stoutamire