And I forgot to mention that EnterCriticalSection takes (I have read) about 6 CPU cycles in optimal case. I have seen implementations of non-reentrant spin locks that take a 5 cycles per lock (implemented using LOCK and XCHG and MOV. LOCK takes 1 CPU cycle, XCHG takes 3 CPU cycles and MOV takes 1 cycle).