[gclist] ref-counting performance cost

Bob Kerns Bob.Kerns@brightware.com
Wed, 11 Oct 2000 10:34:29 -0700

The use of reference counting on C++ strings was a dubious use of the
technique, and certainly a silly use now that serious investigations of its
performance have been published.

But a spurious application of a technique doesn't invalidate the technique.
For example, if space is absolutely more critical than time, or if the cost
of a deep copy involves a lot of network traffic, etc. etc.

This supports your point about algorithms being interesting based on their
characteristics. But I'd also suggest that it matters whether those
characteristics have an application. This thing called "pure research" is
valuable because we don't always know what applications may exist at the
time the algorithms are being investigated.

-----Original Message-----
From: Boehm, Hans [mailto:hans_boehm@hp.com]
Sent: Wednesday, October 11, 2000 9:54 AM
To: Bob Kerns; 'Andrew Shi-hwa Chen'; Bill Birch
Cc: gclist@iecc.com
Subject: %%: RE: Re: [gclist] ref-counting performance cost 

> -----Original Message-----
> From: Bob Kerns [mailto:Bob.Kerns@brightware.com]
> Perhaps your technique has relevance to non-traditional GC 
> environments,
> such as distributed GC, or other non-flat memory models, 
> where the cost
> tradeoffs are distinctly different?
Or embedded systems, where the space overhead of a tracing collector may not
be acceptable?

It seems to me that whether or not an improved reference counting algorithm
is interesting depends entirely on the characteristics of the algorithm.  If
you can reclaim cycles immediately without increasing the cost of pointer
assignments by more than a constant factor over a traditional reference
counter, I would find that astonishing, and almost certainly worthy of
publication, probably even as a purely theoretical result.  (In fact, I
would expect that to be provably impossible, though I don't recall such a
proof.  So the real question becomes what you lose in the process.)
> Anyway, as to your last point: this technique is not exactly 
> rare. It's
> often later optimized with a reference count (to avoid the copy) and
> copy-on-write (to preserve the un-shared semantics).
Unfortunately, this brings back memories of the C++ "string" class, in my
opinion a low point of the C++ standard.

The C++ string class ("basic_string" really) went this route.  Then the
standards committee determined that the existing reference counted
copy-on-write implementations actually didn't satisfy any reasonable
specifications, and there was no way to fix the implementations.  So they
attempted to legitimize them. (See 21.3, paragraphs 5 and 6.)  The result is
just about incomprehensible.  (If s is a string, the expression s[0] ==
s[1], the most natural way to compare the first two characters, still
appears to be illegal in some contexts, though not others.  Last I looked at
it more thoroughly, I convinced myself that the spec is also wrong, and the
traditional reference counted implementations are still broken.  But I
wouldn't bet on that one way or the other.)

Things get much worse if you add in threads ...

The SGI implementation ended up going back to deep copy (for "string", not
"rope").  I'm unconvinced that any other implementation of the string
interface is usable in the presence of threads.

The bottom line is:  If you choose to go here, tread VERY carefully.