[gclist] reference counting

Manoj Plakal plakal@cs.wisc.edu
Thu, 14 Sep 2000 16:07:51 -0500

Boehm, Hans wrote (Thu, Sep 14, 2000 at 10:00:00AM -0700) :
> If I understand it correctly, this is different from a cache that keeps
> multiple valid bits per line.  But it should help with our collector.  At
> least in some simple benchmarks a large fraction of the cache misses on
> write occur when an entire page is being written.  Thus it should be safe to
> ignore the previous contents of the page, and hence cache line.

	Correct me if I'm wrong, but ...

	I think there is some confusion here. Having multiple
	valid bits per line is what is called sub-blocking,
	if I remember my Hennessy & Patterson correctly. This
	reduces the size of data exchanged between the cache
	and the memory system, thus reducing conflict misses,
	false sharing and bandwidth. This is completely
	transparent to software.

	What you seem to need is an "overwrite-cache-line"
	kind of store instruction which tells the cache that it
	does not need to fetch the line in case of a miss
	since the data is going to be over-written. Seems kind
	of tricky for the software since it has to guarantee
	that it will over-write a whole cache line's worth
	of data, and make sure that there are no parts of
	the cache line that are "left over" and which need
	to be fetched (I guess this is where sub-blocking
	would be useful).

	What might also be useful is the latest addition of
	streaming multimedia instructions to various
	CPUs: e.g., the SSE2 instructions in Pentium-III
	include some cache-bypass loads/stores which might
	be useful for not polluting the cache e.g., doing	
	the sweep phase of GC. I think you can load 
	relatively large chunks into a set of registers
	so you don't have to go to memory for each word.


> I'm not sure whether this is usable with a compacting/copying collector that
> uses a pointer increment allocator.  You have to be more careful, since at
> the time of allocation, the newly allocated object may share a cache line
> with a previously allocated object that has been evicted from the cache.
> You'd presumably have to issue the WH64 instructions ahead of time.  Has
> anyone with access to a new Alpha tried this?
> (Based on some Linux/alpha kernel discussion I found with a quick search,
> this seems a little tricky to use profitably.  Hopefully the instruction is
> a cheap no-op on older processors?  That's a serious issue with the
> multitude of X86 prefetch instructions, some of which trap on some
> processors.)
> Hans
> > -----Original Message-----
> > From: Jeff Sturm [mailto:jsturm1@home.com]
> ...
> > Alpha appears to have some support for write-allocate.  The WH64
> > instruction specifies that a 64-byte cache line containing a given
> > address may be overlaid with unspecified data.  Cache 
> > coherency is still
> > required for multiprocessors, so I assume WH64 has no effect on
> > addresses that are already resident in any cache.  A related
> > instruction, ECB, allows early eviction of cache lines.  Perhaps these
> > two could benefit a tracing collector?  I haven't 
> > experimented much with
> > them yet.
> > 
> > -Jeff
> >