Sub-blocking vs cache-line-overwrite (Was Re: RE: RE: [gclist] reference counting)

Jeff Sturm
Fri, 15 Sep 2000 01:37:28 -0400

Manoj Plakal wrote:
>         So, to see if I now understand correctly: if the sub-blocking
>         granularity is equal to the size of the data stored in a single
>         store instruction, then you will get this nice stall-free behavior
>         since the CPU knows for sure that the whole sub-block is being
>         atomically overwritten in a single instruction and doesn't
>         need to bring the sub-block into the cache.

Perhaps a smart enough CPU could also realize that multiple, successive
stores completely fill a subblock.

>         The UltraSPARC-I and II have sub-blocking: 32-byte cache
>         lines with 2 16-byte sub-blocks. Furthermore, the SPARC V9 ISA
>         also has a few new instructions including "block load/store"
>         instructions that transfer 64 bytes to/from 8 FP regs. These
>         bypass the ordering restrictions of the SPARC memory model(s).
>         This can speed up bzero/bcopy. The UltraSPARC User Manual
>         says that these instructions DO NOT allocate blocks in
>         the L1 or L2 caches on a miss. Perhaps other ISAs (x86/MIPS/PA-RISC)
>         have something similar?

The trouble is that blocks are typically zeroed out for immediate reuse;
if the stores bypass cache, then we endure the cost of a main memory
write _and_ a subsequent cache miss to load the block in cache.

I'm not really familiar with this capability.  On the other hand,
write-allocate seems to be available with Alpha EV6, and I think I heard
someone say MIPS and PA-RISC have it too.