[gclist] ref-counting performance cost

Simon Peyton-Jones simonpj@microsoft.com
Tue, 5 Sep 2000 00:27:31 -0700


There is a fourth cost, namely the effect on allocation cost.
Reference counting can't do compaction, so allocation has to be 
done off a free list.  To reduce fragmentation one typically has several
freelists, for objects of different sizes.  So the allocation code has to
see if there's enough space on the exact-fit list, and if not try some other
source of space.  As a result, allocation is typically done by an
out-of-line
call.   (Correct me if I'm wrong here.)

In contrast, if there's a single contiguous allocation area, allocation
consists
only of writing the contents of the object, all in-line.  10 words, 10
instructions.  One
has to check that there's enough space, but that only needs to be done once
per basic block, so the cost is amortised over all the allocations.
Furthermore,
if several objects are allocated, their addresses are all implicitly held by
the
allocation pointer (e.g. hp+10, hp+30, hp+38) which reduces register
pressure
somewhat.

Finally, there's the locality issue.  Contiguous allocation improves
locality.
Compaction improves locality.  On the other hand, reference counting 
re-uses hot addresses.  So it's entirely unclear to me what the tradeoff
in cache effects is.

For some reason the effect on allocation cost is never mentioned in papers 
about reference counting, but I believe it is fairly large.  I'd be
interested in any direct
measurements people have made.  I don't have any myself.

Simon

| There are three problems with naive reference counting.
| 
| 1.  Inability to clean up cycles.
| 
| 2.  The time taken to do the reference counting operations.
|     A simple
| 	x <- y
|     turns into
| 	ref(y)++
| 	if (ref(x)-- == 0) bury(x)
| 	x = y
| 
| 3.  The space required for the reference counts.  In a Lisp-like
|     system, a cons cell bloats from 2 pointers to 3.