[gclist] Re: RE: synchronization cost (was: Garbage collection and XML)

Jan-Willem Maessen jmaessen@mit.edu
Fri, 16 Mar 2001 20:37:10 -0500


Manoj Plakal <plakal@cs.wisc.edu> replied to my post on Intel synchronization:
> >   The LOCK prefix can be prepended only to the following instructions
> >   and to those forms of the instructions that use a memory operand:
> >   ADD, ADC, AND, BTC, BTR, BTS, CMPXCHG, DEC, INC, NEG, NOT, OR, SBB,
> >   SUB, XOR, XADD, and XCHG. ...
> > 
> > That's a pretty long list, and one of the above instructions is
> > usually what you actually want.  For example (getting back to GC
> > here), BTC/BTR/BTS allow you to atomically update a shared
> > allocation/mark bitmap efficiently.
> 
>         If you look at the part of the Intel manuals describing
>         optimizations for the Pentium-II/III/IV, I think you'll
>         find that they deprecate the use of prefixes like this.

After a bit of back-and-forth with Manoj, my conclusion is that this
was probably a misreading of (paraphrasing) "Don't use prefixes except
for 0F".  This seems to refer specifically to 0F introducing
multi-byte instructions such as MMX, SIMD, CMPXCHG, and the like.  My
impression is that prefixes bottleneck instruction fetch/decode.
There were various other warnings about avoiding the LOCK prefix
wherever possible, but I was unable to find anything specifically
blessing LOCK; CMPXCHG or depracating other uses.

That being said, kabbalistic readings of Intel documents are a blood
sport in multiprocessor memory model circles.  Manoj helpfully
provided a link to the appropriate documentation:
>         I was looking at the Pentium III Architecture Optimization
>         Manual at this URL:
>            http://developer.intel.com/design/pentiumii/manuals/245127.htm

You may be best off reading it and drawing your own conclusions!

-Jan-Willem Maessen
Eager Haskell project
jmaessen@mit.edu