[gclist] Re: RE: synchronization cost (was: Garbage collection and XML)
Jan-Willem Maessen
jmaessen@mit.edu
Fri, 16 Mar 2001 20:37:10 -0500
Manoj Plakal <plakal@cs.wisc.edu> replied to my post on Intel synchronization:
> > The LOCK prefix can be prepended only to the following instructions
> > and to those forms of the instructions that use a memory operand:
> > ADD, ADC, AND, BTC, BTR, BTS, CMPXCHG, DEC, INC, NEG, NOT, OR, SBB,
> > SUB, XOR, XADD, and XCHG. ...
> >
> > That's a pretty long list, and one of the above instructions is
> > usually what you actually want. For example (getting back to GC
> > here), BTC/BTR/BTS allow you to atomically update a shared
> > allocation/mark bitmap efficiently.
>
> If you look at the part of the Intel manuals describing
> optimizations for the Pentium-II/III/IV, I think you'll
> find that they deprecate the use of prefixes like this.
After a bit of back-and-forth with Manoj, my conclusion is that this
was probably a misreading of (paraphrasing) "Don't use prefixes except
for 0F". This seems to refer specifically to 0F introducing
multi-byte instructions such as MMX, SIMD, CMPXCHG, and the like. My
impression is that prefixes bottleneck instruction fetch/decode.
There were various other warnings about avoiding the LOCK prefix
wherever possible, but I was unable to find anything specifically
blessing LOCK; CMPXCHG or depracating other uses.
That being said, kabbalistic readings of Intel documents are a blood
sport in multiprocessor memory model circles. Manoj helpfully
provided a link to the appropriate documentation:
> I was looking at the Pentium III Architecture Optimization
> Manual at this URL:
> http://developer.intel.com/design/pentiumii/manuals/245127.htm
You may be best off reading it and drawing your own conclusions!
-Jan-Willem Maessen
Eager Haskell project
jmaessen@mit.edu