Sub-blocking vs cache-line-overwrite (Was Re: RE: RE: [gclist] reference counting)

Manoj Plakal
Thu, 14 Sep 2000 23:45:08 -0500

Boehm, Hans wrote (Thu, Sep 14, 2000 at 04:01:26PM -0700) :
> A number of the older GC measurements, including the one Henry Baker
> originally referred to, were made on machines with sub-blocking, notably
> certain models of MIPS-based DECstations(?).  The net effect of this was
> that if you used a bump-the-pointer allocator, and generally wrote things in
> subblock increments, you could write, and subsequently read back, previously
> uncached memory with only memory traffic needed to write the evicted cache
> lines, and essentially no stalls for memory.  In particular, there was never
> a need to read the rest of the cache line that you were writing, because it
> could simply be marked invalid by the hardware and, in almost all cases,
> would quickly be overwritten anyway.

	I think I see why I was confused earlier, thanks for 
	the clarification. 

	So, to see if I now understand correctly: if the sub-blocking 
	granularity is equal to the size of the data stored in a single 
	store instruction, then you will get this nice stall-free behavior
	since the CPU knows for sure that the whole sub-block is being
	atomically overwritten in a single instruction and doesn't
	need to bring the sub-block into the cache. 

	The UltraSPARC-I and II have sub-blocking: 32-byte cache
	lines with 2 16-byte sub-blocks. Furthermore, the SPARC V9 ISA 
	also has a few new instructions including "block load/store" 
        instructions that transfer 64 bytes to/from 8 FP regs. These 
        bypass the ordering restrictions of the SPARC memory model(s). 
        This can speed up bzero/bcopy. The UltraSPARC User Manual
        says that these instructions DO NOT allocate blocks in
	the L1 or L2 caches on a miss. Perhaps other ISAs (x86/MIPS/PA-RISC)
	have something similar?