[gclist] reference counting
Thu, 14 Sep 2000 16:07:51 -0500
Boehm, Hans wrote (Thu, Sep 14, 2000 at 10:00:00AM -0700) :
> If I understand it correctly, this is different from a cache that keeps
> multiple valid bits per line. But it should help with our collector. At
> least in some simple benchmarks a large fraction of the cache misses on
> write occur when an entire page is being written. Thus it should be safe to
> ignore the previous contents of the page, and hence cache line.
Correct me if I'm wrong, but ...
I think there is some confusion here. Having multiple
valid bits per line is what is called sub-blocking,
if I remember my Hennessy & Patterson correctly. This
reduces the size of data exchanged between the cache
and the memory system, thus reducing conflict misses,
false sharing and bandwidth. This is completely
transparent to software.
What you seem to need is an "overwrite-cache-line"
kind of store instruction which tells the cache that it
does not need to fetch the line in case of a miss
since the data is going to be over-written. Seems kind
of tricky for the software since it has to guarantee
that it will over-write a whole cache line's worth
of data, and make sure that there are no parts of
the cache line that are "left over" and which need
to be fetched (I guess this is where sub-blocking
would be useful).
What might also be useful is the latest addition of
streaming multimedia instructions to various
CPUs: e.g., the SSE2 instructions in Pentium-III
include some cache-bypass loads/stores which might
be useful for not polluting the cache e.g., doing
the sweep phase of GC. I think you can load
relatively large chunks into a set of registers
so you don't have to go to memory for each word.
> I'm not sure whether this is usable with a compacting/copying collector that
> uses a pointer increment allocator. You have to be more careful, since at
> the time of allocation, the newly allocated object may share a cache line
> with a previously allocated object that has been evicted from the cache.
> You'd presumably have to issue the WH64 instructions ahead of time. Has
> anyone with access to a new Alpha tried this?
> (Based on some Linux/alpha kernel discussion I found with a quick search,
> this seems a little tricky to use profitably. Hopefully the instruction is
> a cheap no-op on older processors? That's a serious issue with the
> multitude of X86 prefetch instructions, some of which trap on some
> > -----Original Message-----
> > From: Jeff Sturm [mailto:firstname.lastname@example.org]
> > Alpha appears to have some support for write-allocate. The WH64
> > instruction specifies that a 64-byte cache line containing a given
> > address may be overlaid with unspecified data. Cache
> > coherency is still
> > required for multiprocessors, so I assume WH64 has no effect on
> > addresses that are already resident in any cache. A related
> > instruction, ECB, allows early eviction of cache lines. Perhaps these
> > two could benefit a tracing collector? I haven't
> > experimented much with
> > them yet.
> > -Jeff