[gclist] A problem with the Baker treadmill.

Jerry Leichter leichter@smarts.com
Wed, 22 Jan 1997 10:35:19 -0500


| ...More generally, it seems to me that this is one case in which solutions
| with good real-time properties don't behave all that well in the
| expected case.  If you want the lock to behave fairly, so that you
| get some guaranteed response from each thread, you should
| hand off the lock to a waiting thread when you release it.  But if
| you have more active threads than processors, that typically forces
| extra contex switches, very often one per lock acquisition.  The result
| may be fair, but only that you spend all your time in the scheduler and
| nobody gets anything done.

A general comment on this.

Mutexes are the machine language of parallel processing.  It's impossible to 
optimize them in any general setting because there's so little semantic 
information available.  I remember running into exactly the problem of extra 
context switches in an implementation of read/write locks:  When a writer exits, 
it walks a queue of waiting readers, waking them one at a time.  If you do this 
the obvious way, you wake a reader; it grabs the CPU, tries to take the mutex 
that controls the read/write lock internals; fails, because the writer still 
holds it; blocks; and the writer gains control to wake the next reader.  Much 
better is to pull all the readers who will need to be awakened off the queue 
first, unlock the read/write lock data structure, *then* start waking the 
readers - avoiding, for n pending readers, 2n unnecessary context switches.  
There's no way the cleverest mutex code could hope to make this optimization.

OS synchronization objects can be horribly inefficient because all sorts of 
extra features come to accrete on them.  I once needed to determine the fastest 
way to do a "wait until signaled by another process" with the System V IPC 
primitives.  You'd guess that you couldn't beat semaphores, but you'd be wrong:  
In several different implementations, messages queues were substantially faster, 
probably because of all the bookeeping that System V semaphores include.

Unfortunately, these features rarely, if ever, provide the additional semantic 
knowledge that would allow for smarter implementations.  They just make things 
complex.

The only way out of the trap is to choose good, semantically-meaningful higher- 
level abstractions.  While some of these have been proposed over the years, like 
GC, they've encountered a great deal of resistance.  *Real* programmers use 
malloc/free and spinlocks; anything else is "too inefficient".  Poor implemen- 
tations, in both cases, haven't helped.

There are glimmers of hope on the horizon for both GC and synchronization 
primitives.  At a low level, there's the apparent acceptance of the "signal one" 
vs. "signal all" distinction for condition variables.  At a high level, Java, 
for all its faults, is bringing monitors (and GC) to the masses.  It'll be 
interesting to see whether new Java implementations make monitors efficient 
enough so that the programming books can stop warning you to use "synchronized" 
only when absolutely necessary, though - if they don't, we're likely to see Java 
libraries containing mutexes for those who need "efficiency".

							-- Jerry