Sub-blocking vs cache-line-overwrite (Was Re: RE: RE: [gclist]
Fri, 15 Sep 2000 01:37:28 -0400
Manoj Plakal wrote:
> So, to see if I now understand correctly: if the sub-blocking
> granularity is equal to the size of the data stored in a single
> store instruction, then you will get this nice stall-free behavior
> since the CPU knows for sure that the whole sub-block is being
> atomically overwritten in a single instruction and doesn't
> need to bring the sub-block into the cache.
Perhaps a smart enough CPU could also realize that multiple, successive
stores completely fill a subblock.
> The UltraSPARC-I and II have sub-blocking: 32-byte cache
> lines with 2 16-byte sub-blocks. Furthermore, the SPARC V9 ISA
> also has a few new instructions including "block load/store"
> instructions that transfer 64 bytes to/from 8 FP regs. These
> bypass the ordering restrictions of the SPARC memory model(s).
> This can speed up bzero/bcopy. The UltraSPARC User Manual
> says that these instructions DO NOT allocate blocks in
> the L1 or L2 caches on a miss. Perhaps other ISAs (x86/MIPS/PA-RISC)
> have something similar?
The trouble is that blocks are typically zeroed out for immediate reuse;
if the stores bypass cache, then we endure the cost of a main memory
write _and_ a subsequent cache miss to load the block in cache.
I'm not really familiar with this capability. On the other hand,
write-allocate seems to be available with Alpha EV6, and I think I heard
someone say MIPS and PA-RISC have it too.