[gclist] Finalizers & Reference counting.

Boehm, Hans hans_boehm@hp.com
Wed, 28 Aug 2002 16:06:46 -0700


[Discussion of Linus port to gcc@gcc.gnu.org]
> One argument being made here is that reference counting is 
> preferable to
> general purpose garbage collection because it achieves significantly
> better cache behavior. 
This cuts both ways.  Tracing clearly has bad spacial locality, though hopefully you do it relatively rarely.  Generational GC may or may not help, depending on whether it's practical to make the young generation small enough.  References generated during tracing are somewhat predictable, so prefetching can sometimes help substantially.  But clearly directly reusing an object performs significantly better than having the collector reclaim it.

On the other hand, reference counting has its own locality problems.  A shallow copy of a node brings its children into the cache and dirties them.  Dropping a large linked data structure forces the whole data structure into the cache just as you discovered that you're not going to touch it again.  (If you reallocate it before it leaves the cache, great.  If you don't, you lose.  If you drop your whole data structure just before you exit, you lose big.)  It tends to do bad things to code size in large systems, as opposed to adding a fixed amount of code for the tracing gc.  (For gcc, it's not quite fixed, but it's closer.)  Objects are often bigger due to the reference count, reducing effective cache capacity.

> (The other point being made is that pointer
> chasing in general has bad spatial locality resulting in bad cache
> behavior.)
Unfortunately, I suspect that's hard top avoid in a complex compiler.

> 
> As a programmer I'm a big fan of the way GC makes my job a lot easier.
> However, the notion that GC has significant inherent performance
> problems bothers me. I'm curious whether anyone has any data 
> and/or any
> experiments that might confirm or deny this claim.
So would I.  But it's very hard to do.

In general, it's easy to construct toy examples where either one wins, at least for the single-threaded case.  The problem with comparing real systems is that someone needs to construct both versions first.  In particular, someone has to build the reference-counted version that doesn't leak cycles. In cases in which that's already done, it's easy to have the reference counting encourage "optimizations" that only make sense with RC, so that it becomes impossible to cleanly extract the reference-counting code.  (This was true in at least some versions of Python.)

My only real experience was with the Russell compiler, back to before caches mattered, when memory operations were relatively cheap, and threads weren't an issue. (And before a lot of tuning for my collector).  Heavily hand-optimized reference counts were a bit cheaper.  But I dropped them because they seemed to account for nearly half of the maintenance time, and I could figure out better things to do with that time ... 

Hans