[gclist] reference counting

Boehm, Hans hans_boehm@hp.com
Wed, 6 Sep 2000 09:41:57 -0700

> From: Henry G. Baker [mailto:hbaker@netcom.com]
> A very nice paper a few years ago showed that if:
> 1) you used a "pointer-bump" (contiguous) allocation GC and
> 2) you had a "write-allocate" cache,
> then a large fraction of allocations would live and die completely
> in the cache, and never cause any memory references or disk traffic
> at all (remember: only _live_ objects are touched by the GC).  I'm
> not sure that refcount can ever compete with this kind of efficiency.
> Henry
I believe this also requires that the cache has separate valid bits for each
section of a cache line that you might write, e.g. for each word.  Otherwise
when you write the first word in a cache line, that line will be read from
memory so that the entire line can be made valid in the cache.

As far as I have been able to determine, there are very few machines that
have the required cache property, notably one or two (now very obsolete) DEC
machines that were used in most of the experiments.  The hardware I have
worked with recently always seems to fetch the cache line when you write the
first word to it.  (I'm not a hardware expert, but I suspect it's hard to
maintain partially valid cache lines in a multiprocessor.)  If someone has
more information, please post.

Based on some very limited experiments, my impression is that modern
hardware tends to favor malloc/free over garbage collected systems, at least
in single-threaded environments.  It's best to reuse reclaimed memory before
it leaves the cache.  Malloc/free often manages to do that, even for the
fastest level of the cache.  (So does reference counting, presumably.)  In
my environment, it's hard to do that, usually even with generational
collection and for the slowest cache level (except on high end machines with
multiple MB cache).  ML garbage collectors probably have an easier time with
smaller first generations, and thus might be able to do better.

In multithreaded environments, most malloc/free and especially
reference-count implementation encounter substantial synchronization
overheads on modern hardware.  Tracing collectors often get by with a lot