[gclist] ref-counting performance cost

Jeff Sturm jsturm1@home.com
Fri, 15 Sep 2000 01:28:06 -0400

Kragen Sitaker wrote:
> I asserted that reference-counting is almost always much slower than
> well-implemented mark-and-sweep or copying collectors; he requested
> quantitative evidence.  Real quantitative evidence requires
> measurements of real programs, and doing those is a lot of work.  Has
> someone done them?

I haven't seen much recently.  Since the rules have changed a bit on
modern hardware, I resurrected some experiments I performed earlier this
year, and gathered a few more stats out of plain curiousity.

I started with a language that is (in)famous for reference counting,
Tcl/Tk.  The core of the interpreter is implemented in C using macros
for increment/decrement.
I modified the interpreter core to use the BDW conservative collector,
removing the refcount field from certain structs and changing the macros
to a no-op.  I also omitted the recursive tree deletion routines and
applied atomic (pointer-free) allocation for character arrays.

Immediately I observed a 10-30% runtime improvement in various moderate
to complex scripts.  I ran some detailed performance stats on one
benchmark, a sieve of eratosthenes in Tcl, using the hardware
performance counters of the Alpha CPU:

              Instruction Counts (in millions)

             cycles  issues  loads  stores
original        733     313     51      29
GC tclsh        548     226     36      25

Nothing too surprising.  All counts are down because the code is smaller
and less complex than the original refcount interpreter.  Although the
mark phase of GC is expensive relative to explicit deallocation, it may
not occur often enough to be statistically significant.

I was more interested in cache performance due to the bloated heap that
is characteristic of GC (I ran the GC test with the heap fixed at 16MB):

                    Cache Hits/Accesses

           instruction (16KB)  data (16KB)  external (1MB)
original            86%             90%           83%
GC tclsh            89%             93%           85%

Even though reference counting increases the chance that newly allocated
pages will be resident in cache (in theory, at least) I haven't been
very successful in demonstrating that.  (However to be fair I also
haven't considered the efficiency of the standard Tcl allocator.  The
free lists used by default could interfere with cache performance, among
other things.)

About the only code that really benefitted from reference counting is
Tcl's list operators, which often check for (refcount == 1) to avoid
performing a copy.  There is no simple way to do object copy-on-write
with a GC as far as I can tell.