[gclist] A problem with the Baker treadmill.
Jerry Leichter
leichter@smarts.com
Wed, 22 Jan 1997 10:35:19 -0500
| ...More generally, it seems to me that this is one case in which solutions
| with good real-time properties don't behave all that well in the
| expected case. If you want the lock to behave fairly, so that you
| get some guaranteed response from each thread, you should
| hand off the lock to a waiting thread when you release it. But if
| you have more active threads than processors, that typically forces
| extra contex switches, very often one per lock acquisition. The result
| may be fair, but only that you spend all your time in the scheduler and
| nobody gets anything done.
A general comment on this.
Mutexes are the machine language of parallel processing. It's impossible to
optimize them in any general setting because there's so little semantic
information available. I remember running into exactly the problem of extra
context switches in an implementation of read/write locks: When a writer exits,
it walks a queue of waiting readers, waking them one at a time. If you do this
the obvious way, you wake a reader; it grabs the CPU, tries to take the mutex
that controls the read/write lock internals; fails, because the writer still
holds it; blocks; and the writer gains control to wake the next reader. Much
better is to pull all the readers who will need to be awakened off the queue
first, unlock the read/write lock data structure, *then* start waking the
readers - avoiding, for n pending readers, 2n unnecessary context switches.
There's no way the cleverest mutex code could hope to make this optimization.
OS synchronization objects can be horribly inefficient because all sorts of
extra features come to accrete on them. I once needed to determine the fastest
way to do a "wait until signaled by another process" with the System V IPC
primitives. You'd guess that you couldn't beat semaphores, but you'd be wrong:
In several different implementations, messages queues were substantially faster,
probably because of all the bookeeping that System V semaphores include.
Unfortunately, these features rarely, if ever, provide the additional semantic
knowledge that would allow for smarter implementations. They just make things
complex.
The only way out of the trap is to choose good, semantically-meaningful higher-
level abstractions. While some of these have been proposed over the years, like
GC, they've encountered a great deal of resistance. *Real* programmers use
malloc/free and spinlocks; anything else is "too inefficient". Poor implemen-
tations, in both cases, haven't helped.
There are glimmers of hope on the horizon for both GC and synchronization
primitives. At a low level, there's the apparent acceptance of the "signal one"
vs. "signal all" distinction for condition variables. At a high level, Java,
for all its faults, is bringing monitors (and GC) to the masses. It'll be
interesting to see whether new Java implementations make monitors efficient
enough so that the programming books can stop warning you to use "synchronized"
only when absolutely necessary, though - if they don't, we're likely to see Java
libraries containing mutexes for those who need "efficiency".
-- Jerry