GC performance - was Re: [gclist] GC topics

David Chase chase@centerline.com
Fri, 23 Feb 96 13:19:00 EST


> As I told you earlier, I do not have this limitation: I can use the most
> optimizing C compiler with YAFL, and then again, I believe this is not the
> proper way of answering the question: apparently, even you seem to admit
> that conservative GC *is* slower than precise GC, and given the liberty 
> to implement any of thoses systems in a brand new compiler, language
> or environment (so you optimizing compiler issue does not apply),
> precise GC has clear advantages.

It would also be helpful to see the sort of code that YAFL generates,
before it is fed to a C compiler.  For a language like Modula-3, 
when C is generated, it looks very similar to "what you would write" 
if the code were written in C instead of Modula-3.

For instance, suppose you have a piece of Modula-3 that takes
two vectors and computes the distance between them

  TYPE coord = OBJECT
     x : REF ARRAY OF REAL;
     ...
  END;

  PROCEDURE dist (a, b: coord) : REAL = 
    VAR  ax : REF ARRAY OF REAL := a.x;
         bx : REF ARRAY OF REAL := b.x;
        sum : REAL := 0.0;
        d1 : REAL;
        i : INTEGER;
        l : INTEGER = MIN(LAST(a.x), LAST(b.x)) - 1;
  BEGIN
    FOR i := 0 TO l DO
       d1 := ax[i] - bx[i];
       sum := sum + d1 * d1;
    END
    RETURN Real.Sqrt(sum);
  END dist;

This could be compiled to a piece of C that looks
something like this, applying a few obvious optimizations.

  float dist(coord * a, coord * b) {
    float * bx = a -> x.mem;
    float * ax = b -> x.mem;
    int l = min (a -> x.len, b -> x.len) - 1;
    int i;
    float sum = 0;
    float d1;

    /* Use some auto-increment addressing mode to
       reference the arrays. */
    for (i = 0; i <= l; i++) {
       d1 = *ax - *bx;
       ax++; bx++;
       sum = sum + d1 * d1;
    }
    
    return sqrt(sum); /* Should be tail-call optimized. */
  }

Do note that after the first iteration of the loop, there's no longer 
an obvious pointer to a, or b, or to either array (the registers 
ought to be reused, if the compiler is any good).  A conservative 
collector is still happy with this code -- I didn't have to do anything 
to protect my pointers, because it will still find them (but, see 
note).

I know this is not true for a precise collector, so you must be generating 
some different code.  What is it, and is it as amenable to optimization? 
How does it interact with the underlying C compiler's use of tail 
call elimination?  How do you cope with multi-threaded programs?
Or, if you have access to the web, wander over to

  http://www.centerline.com/people/chase/papers/thesis/chapter4.ps

It's a little dated (1987), but it's quite obvious that almost nobody 
has read it, and some of it is still relevant.  You can also see 
that my beliefs, then, were a good deal like yours are now.  I didn't
change my mind on a whim.

As to my "admission", if you wish to take the use of conservative 
engineering estimates as an "admission", then fine, but's that not 
what I said nor what I meant.  I was merely pointing out that GC 
performance is not the ultimate goal -- system performance (either 
throughput, latency, power consumptions, ROM footprint, whatever) 
is.  Functional languages, naively implemented, can do enough heap 
allocation that GC speed pretty much determines their performance, 
but this is not true of all languages or of all programming styles.
Use of a conservative collector lets you generate very straightforward
code, which typically gives you the best non-GC performance you
can hope to get.

(note)

Of course, it is also possible for an optimizer (but not a programmer, 
assuming strictly conforming Ansi C, as if that mattered) to write 
the loop like this, which will thwart a conservative collector, too.

   float * la = a + l;
   int bma = b - a;

   for ( ; a <= la ; a++) {
      d1 = *a - *(a+bma);
      sum = sum + d1*d1;
   }

Now, there's no pointer at all to b, not even an interior pointer. 
In practice, optimizers don't appear to do this.  I've studied the 
problem, so has Hans, and the only time I ever saw anything like 
it happen was when I fed a Fortran compiler a particularly tortured 
piece of code (even by Fortran standards) contrived for exactly the 
purpose of making this sort of thing happen. 

speaking for myself,

David Chase