[gclist] Finalizers & Reference counting.
Tue, 27 Aug 2002 09:30:42 -0400 (EDT)
| On the architectures I've tried recently, atomic instructions (which hit the
| cache) seem to take on the order of 20 cycles, which I wouldn't call cheap.
| On a Pentium 4, unlike a Pentium II/III, the number seems to be on the order
| of 100. It may be slightly less for other architectures, but I don't think
| it's ever negligible. For a pointer assignment that used to be a register
| move, you're talking a slowdown of something like a factor of 50 or more for
| the reference-counted version.
Let's be clear what we are comparing. If you buy into RC - uniprocessor or no
- you are agreeing to pay an overhead at every pointer assignment. That's an
overhead GC doesn't have, and it's been a good argument for GC's efficiency
Even on a uniprocessor, RC maintenance will have to read and write memory,
which is expensive on modern machines. Adding a strong synchronization
requirement makes it worse. How much worse is non-trivial to quantify: You
end up taking just about as long in the uniprocessor case (well, maybe not for
x86), but some of the time gets hidden by pipelining since you needn't wait
for the write to be visible. On the other hand, RC code is easily visible to
the compiler, and even general optimizations help; specialized optimizations
could probably help quite a bit more. (Could optimizations help GC? Maybe,
but it seems a harder problem, since the optimizer doesn't have its hands on
the collector in any meaningful sense while it's producing the mutator. Some
compilers that target GC'ed languages actually produce code to help the
collector - e.g., specialized per-lass tracing code - but since RC code is
inherently in-line in any reasonable implementation, any optimization at least
potentially specializes the code for collection.)
Also, GC isn't free. The usual way to look at the cost is amortized per
byte allocated. We usually don't consider the cost for an additional
reference with GC, but of course there is one: That reference has to be
traced, and there's some cost even if tracing turns out to be trivial. Plus
if you have some kind of weak memory model you're going to have to do some
kind of synchronization to make sure the collector looks at valid data before
Of course, the GC cost is for a reference that's "live" at the moment the
collector runs, so it's not directly comparable to the cost at each reference
creation with RC. To even begin to make a comparison, you'd need to know how
long typical references stick around. The biggest expense with RC is dealing
with short-lived references - the wasteful increment/change a byte/decrement
sequences. That plays right into this calculation.
Personally, I'll take GC over RC if I can get it for almost all applications.
I just don't think you can pick one particular factor (like the cost of
synchronization) and say "Aha! RC loses!" The tradeoffs are complex, and
this is just one more factor to consider.
| > -----Original Message-----
| > From: David Chase [mailto:email@example.com]
| > Sent: Wednesday, August 21, 2002 4:06 PM
| > To: firstname.lastname@example.org
| > Subject: Re: [gclist] Finalizers & Reference counting.
| > At 01:26 PM 8/21/2002 -0400, Jerrold Leichter wrote:
| > >(Almost) all modern architectures provide cheap ways to increment or
| > >decrement and count atomically, and for the decrement case
| > determine whether
| > >the result was 0. The methods vary form interlocked
| > increment/decrement
| > >operations that set condition codes based on the result to
| > combining fetch-
| > >and-add operations to general load linked/store conditionals.
| > Is "modern architectures" code language for "not x86"?
| > LOCK-prefixed instructions there are not "cheap" in the
| > usual sense of the word, at least according to my understanding
| > and measurements (I generally care about locked CMPXCHG).
| > On a uniprocessor it is atomic if unlocked, but not on a
| > multiprocessor.
| > David Chase