[gclist] reference counting

David Chase chase@world.std.com
Tue, 05 Sep 2000 08:25:32 -0400

At 07:20 AM 9/5/00 -0400, Greg Morrisett wrote:
>(1) People also ignore the overheads of locking to do reference
>counting in a multi-threaded system.  It helps to have atomic
>increment/decrement and/or compare & swap, but sadly, few
>machines provide it.

Compare and swap or load-linked/store-conditional are
available on most machines.  PowerPC, Pentium, Sparc
v9, and MIPS support one or the other, I think 68k does
too.  Sparc v8 doesn't.

However, at least on Pentium, CMPXCHG is not atomic
w.r.t. memory unless you LOCK-prefix it.  It takes
some non-trivial number of memory bus cycles to

>(3) It's also not clear to me that pointer-bump allocation is
>as huge a win as people think.  A good non-compacting organization
>(as in the BDW collector) can allocate objects almost as fast.

How's that go on a multiprocessor?  A compacting collector
makes it relatively easy to carve your heap up into pieces,
one per thread, and allocate from those lock-free (*).  I imagine
you could do the same with your free lists (given that you're
using 64 or fewer, it's not that much overhead) but I haven't
hashed out all the details, and there surely are details.

(*) We also rolled our own threads on top of native threads;
this means we know which threads are "running" and which
are not, and hand out heap resources only to the relatively
small number of "running" threads.  There are also very many
details in rolling your own threads.

>Indeed, Fred Smith and I found that the "wins"
>of compaction were blown away by the space overheads of accuracy
>(needed header words on objects) and copying.

Seems like, for languages that aren't ML, that you may already
be paying for those header words.  This is certainly true for