GC performance - was Re: [gclist] GC topics
Darius Blasband
darius@phidani.be
Sat, 24 Feb 1996 15:02:54 +0100 (MET)
>
> It would also be helpful to see the sort of code that YAFL generates,
> before it is fed to a C compiler. For a language like Modula-3,
> when C is generated, it looks very similar to "what you would write"
> if the code were written in C instead of Modula-3.
>
> For instance, suppose you have a piece of Modula-3 that takes
> two vectors and computes the distance between them
>
> TYPE coord = OBJECT
> x : REF ARRAY OF REAL;
> ...
> END;
>
> PROCEDURE dist (a, b: coord) : REAL =
> VAR ax : REF ARRAY OF REAL := a.x;
> bx : REF ARRAY OF REAL := b.x;
> sum : REAL := 0.0;
> d1 : REAL;
> i : INTEGER;
> l : INTEGER = MIN(LAST(a.x), LAST(b.x)) - 1;
> BEGIN
> FOR i := 0 TO l DO
> d1 := ax[i] - bx[i];
> sum := sum + d1 * d1;
> END
> RETURN Real.Sqrt(sum);
> END dist;
The somehow comparable YAFL code would be:
IMPLEMENTATION MODULE Coord;
IMPORT SYSTEM;
IMPORT Math;
CLASS Coord;
VAR
x: ARRAY OF REAL; -- All YAFL arrays are similar to M3's REFS
METHOD GetX: ARRAY OF REAL;
BEGIN
RESULT := x;
END GetX;
END Coord;
--------------------------------------------
CLASS Other;
-- Since YAFL only supports classes the Dist method must be put
-- within a class for the sake of the example. Quite clearly,
-- the right way of doing this would have been to put this method
-- in the Coord class...
METHOD Dist (a, b: Coord): REAL;
VAR
i: INTEGER;
x, y: ARRAY OF REAL;
r: REAL;
BEGIN
ASSERT a <> VOID;
ASSERT b <> VOID;
x := a.GetX; -- Copying references, not duplicating the arrays
y := b.GetX;
ASSERT x <> VOID;
ASSERT y <> VOID;
FOR i := 0 TO SYSTEM.MIN (x.SIZE, y.SIZE) DO -- MIN is not predefined...
r := x[i] * y[i];
RESULT := RESULT + r*r;
END;
RESULT := Math.Sqrt (RESULT); -- No hidden goto -> no RETURN statement,
-- using RESULT as the result.
END Dist;
END Other;
END Coord;
Which generates the following C code (Sorry, it clear is far more
obfuscated than the C code generated by the corresponding M3 sample):
static double Y_12Dist YPARAMS6(obj_ptr ,Y_THIS,obj_ptr ,Y_a ,
obj_ptr ,Y_b ){
yint Y_i =(yint )0;
obj_ptr Y_x =(obj_ptr )0;
obj_ptr Y_y =(obj_ptr )0;
double Y_r =(double )0;
double Y_RESULT=(double )0;
obj_ptr **Y_18 ;
obj_ptr *Y_19 ;
SAVE_VSTACK(Y_19 ) /* Saving the stack pointers... */
SAVE_DSTACK(Y_18 )
CHECK_DUAL_OVFL(5,YNM_Coord ,YNC_CoordOther ,"Dist");
REG(&Y_x ); /* Registering whatever has to */
REG(&Y_y );
REG(&Y_THIS );
REG(&Y_a );
REG(&Y_b );
#ifdef YCHECK_VOID_OBJ
if(!Y_THIS)fail_void_obj(YNM_Coord ,YNC_CoordOther ,"Dist");
#endif
ENTER_METHOD(YNM_Coord ,YNC_CoordOther ,"Dist");
{
SET_LINE(26);
#ifdef YCHECK_ASSERT
if(!(Y_a !=(obj_ptr )0))fail_assert("Coord",26);
#endif
SET_LINE(27);
#ifdef YCHECK_ASSERT
if(!(Y_b !=(obj_ptr )0))fail_assert("Coord",27);
#endif
SET_LINE(28);ASSIGN_OBJ_PTR(Y_x ,Y_11GetX (Y_a )); /* expands to a simple */
SET_LINE(29);ASSIGN_OBJ_PTR(Y_y ,Y_11GetX (Y_b )); /* assignment */
SET_LINE(30);
#ifdef YCHECK_ASSERT
if(!(Y_x !=(obj_ptr )0))fail_assert("Coord",30);
#endif
SET_LINE(31);
#ifdef YCHECK_ASSERT
if(!(Y_y !=(obj_ptr )0))fail_assert("Coord",31);
#endif
SET_LINE(32);{{yint Y_20 ;
yint Y_21 ;
Y_21 =(yint )0;
/*
HIGH expands to something similar to what is found in M3 if in
production mode, or to a true function call which check for the
validity of the array if in debugging mode
*/
Y_20 =YG_MINSYSTEMSYSTEM (Y_SYSTEMSYSTEM ,HIGH(Y_x ),HIGH(Y_y ));
for(Y_i =Y_21 ;Y_i <=Y_20 ;Y_i ++){
{
/* ARRAY_ELEMENT works somehow similarly to HIGH (see above) */
SET_LINE(33);Y_r =(ARRAY_ELEMENT(Y_x ,double ,Y_i )*ARRAY_ELEMENT(
Y_y ,double ,Y_i ));
SET_LINE(34);Y_RESULT +=(Y_r *Y_r );
}}}
Y_i =0;
}
SET_LINE(36);Y_RESULT =YG_SqrtMathMath (Y_MathMath ,Y_RESULT );
}EXIT_METHOD; /* Method postlude, used if debugging is enabled */
RESTORE_DSTACK(Y_18 ) /* Restoring the stacks... */
RESTORE_VSTACK(Y_19 )
return Y_RESULT ; /* Returning the value... */
}
>
> I know this is not true for a precise collector, so you must be generating
> some different code. What is it, and is it as amenable to optimization?
> How does it interact with the underlying C compiler's use of tail
> call elimination? How do you cope with multi-threaded programs?
>
Please excuse me for not cleaning up any further the C code generated
by the YAFL compiler, but I hope you'll understand that C-level
readability has never been an issue to us.
There are a few optimization which are performed at the local level
(not applicable here) and some are applicable at the more global
level. In almost random order:
- At first sight, since the Dist method calls the Min and Sqrt methods,
it cannot be assumed that no memory allocation will be performed. Hence,
conservatively ( :-) ) all references to pointer-related variables
(objects and arrays) are pushed on the stack, using the REG macro.
- Even if it does not show here (absence is hard to demonstrate),
some local optimization has been performed by the compiler when
noticing that no allocation could take place within the loop, hence,
there is no need to save and restore the stack pointers for each
iteration.
- Then, the global optimizer can notice that the min and sqrt method
invocations are monomorphic, and the only version of these methods
which can be called here do not allocate anything, then all
the stuff to save and restored the stack pointers, including
registering all the locals and formals can be discarded. Then, you
get something amazingly similar to the M3-generated C-code, except
for the tail-call optimization which is not applicable as is to YAFL.
(Of course, further optimization might detect cases where it would
be possible, but....) The global optimization system would also
recognize the GetX method invocation as being monomorphic and trivial,
and it will be replaced by a direct assignment of the attribute of
the Coord instance it is being applied to. You can then get the best
of both worlds: true encapsulation, and the ability to redefine
the method in derived classes if you need to, and still, optimal
performance if you do not actually use these flexible features...
- Multi-threading is an open issue in YAFL, but I don't see it
(can prove wrong, of course) as a solutionless problem.
>
> (note)
>
> Of course, it is also possible for an optimizer (but not a programmer,
> assuming strictly conforming Ansi C, as if that mattered) to write
> the loop like this, which will thwart a conservative collector, too.
>
> float * la = a + l;
> int bma = b - a;
>
> for ( ; a <= la ; a++) {
> d1 = *a - *(a+bma);
> sum = sum + d1*d1;
> }
>
> Now, there's no pointer at all to b, not even an interior pointer.
> In practice, optimizers don't appear to do this. I've studied the
> problem, so has Hans, and the only time I ever saw anything like
> it happen was when I fed a Fortran compiler a particularly tortured
> piece of code (even by Fortran standards) contrived for exactly the
> purpose of making this sort of thing happen.
>
Even if I agree with the fact that this is a very unlikely situation,
it is also very typical of the kind of issues I did not want to get
involved with when developing the YAFL compiler. So far, it worked
with ALL the c compiler we tried, except sometimes for size issues,
but YAFL's GC works just the same under all flavours of Unix, Dos, NT,
Windows and MVS.
Regards...
Darius