Sub-blocking vs cache-line-overwrite (Was Re: RE: RE: [gclist] reference counting)
Manoj Plakal
plakal@cs.wisc.edu
Thu, 14 Sep 2000 23:45:08 -0500
Boehm, Hans wrote (Thu, Sep 14, 2000 at 04:01:26PM -0700) :
> A number of the older GC measurements, including the one Henry Baker
> originally referred to, were made on machines with sub-blocking, notably
> certain models of MIPS-based DECstations(?). The net effect of this was
> that if you used a bump-the-pointer allocator, and generally wrote things in
> subblock increments, you could write, and subsequently read back, previously
> uncached memory with only memory traffic needed to write the evicted cache
> lines, and essentially no stalls for memory. In particular, there was never
> a need to read the rest of the cache line that you were writing, because it
> could simply be marked invalid by the hardware and, in almost all cases,
> would quickly be overwritten anyway.
I think I see why I was confused earlier, thanks for
the clarification.
So, to see if I now understand correctly: if the sub-blocking
granularity is equal to the size of the data stored in a single
store instruction, then you will get this nice stall-free behavior
since the CPU knows for sure that the whole sub-block is being
atomically overwritten in a single instruction and doesn't
need to bring the sub-block into the cache.
The UltraSPARC-I and II have sub-blocking: 32-byte cache
lines with 2 16-byte sub-blocks. Furthermore, the SPARC V9 ISA
also has a few new instructions including "block load/store"
instructions that transfer 64 bytes to/from 8 FP regs. These
bypass the ordering restrictions of the SPARC memory model(s).
This can speed up bzero/bcopy. The UltraSPARC User Manual
says that these instructions DO NOT allocate blocks in
the L1 or L2 caches on a miss. Perhaps other ISAs (x86/MIPS/PA-RISC)
have something similar?
Manoj