[gclist] synchronization cost (was: Garbage collection and XML)

Ji-Yong D. Chung virtualcyber@erols.com
Fri, 9 Mar 2001 19:28:24 -0500


    Hi,


> > Is that an X86 machine?  I just timed a Pentium III/500/100 machine at
> > something near 25 cycles per
> > "lock; cmpxchgl".  I'm interested because I've sometimes heard the claim
> > that X86 is particularly bad at this, but that hasn't really been
> > consistent
> > with my experience.  Is this chipset dependent, perhaps?

(1)    A few years ago, I had opportunity to do some measurements
on CMPXCHG and if I remember correctly, the preceding figure is
pretty close to what I got -- I was reading about 30-40 instructions per
cmpxchg
and more on cmpxchg8b on pentium II, 200 Mhz.  (Windows NT4.0).

(2)   If one is trying to use a faster locking mechanism
for the garbage collector on Windows NT (single process,
multithreaded), one might consider EnterCriticalSection.
For many cases, it is MUCH faster than
using mutexes and other synchronization mechanisms.
(likely to be based on CMPXCHG).

    However, see

http://www.cs.wustl.edu/~schmidt/win32-cv-1.html


(3) Does anyone know how EnterCriticalSeciton is implemented?

I tried writing semaphores, mutexses, shared semaphores,
based on CMPXCHG, CMPXCHG8, but my implementations
were always much slower than EnterCriticalSection.  I had
suspicion that it was not using CMPXCHG, and
that was the reason why it could be so
fast.  But I could never be sure.