[gclist] Finalizers & Reference counting.

Jerrold Leichter jerrold.leichter@smarts.com
Wed, 21 Aug 2002 13:26:43 -0400 (EDT)

| >
| > Reference counting has bugs, it can't deal with cyclic data
| > structures, it has excessive overhead but it has some striking
| > advantages. There are no serious SMP bottlenecks, it never has to
|               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| > stop the world.
|   ^^^^^^^^^^^^^^^
| I'm *certainly* no expert, but don't you need a lock on every object
| to update reference counts atomically? Moreover, modifying an object
| in SMP world means that you have to flush the processor's cache to
| make sure that every processor can see the changes -- and RC demands
| an object modification at every access. Together those seem like they
| would impose fairly severe limits on the scalability of SMP RC.

(Almost) all modern architectures provide cheap ways to increment or
decrement and count atomically, and for the decrement case determine whether
the result was 0.  The methods vary form interlocked increment/decrement
operations that set condition codes based on the result to combining fetch-
and-add operations to general load linked/store conditionals.  All of these
scale, to varying degrees - the more recent versions generally scaling better.
In practice, the hardware primitives for this kind of thing generally match
up pretty well with the actual scale of the hardware out there.

Yes, in the weaker memory models you have to worry about synchronization.  One
way or another, modifying a reference count had better either reach real memory
or at least grab ownership of the cache line.  Machines that try to scale
pretty much have to do cache snooping and ownership protocols, so this is
probably not that expensive.

(BTW, I had to say "almost" above because as far as I can tell the HP PA RISC
has only your basic semaphore primitive - an atomic "load and clear word"
instruction.  However, the architecture requires fully coherent caches even in

So ... yes, there's some cost; there's always a cost associated with synchro-
nization.  But is it overwhelming?  Is it enough to make RC impractical?
That seems unlikely.

| Something like the Doligez-Gonthier algorithm, which gives each thread
| its own youngest generation, and collects tenured objects with a
| variant of Djikstra's concurrent algorithm, would seem to be a way to
| go. And if that doesn't do it, then the only means left is to become a
| Real Programmer and preallocate all your data, like God and FORTRAN 77
| intended. :)

You could play a similar game with RC:  The "youngest generation" is unique
to a thread, hence requires no synchronization when accessing the reference
							-- Jerry