GC performance - was Re: [gclist] GC topics

Gregory Morrisett jgm@CS.Cornell.EDU
Wed, 21 Feb 1996 14:41:43 -0500


> I've also heard lots of talk about stack allocating objects when it can
> be proved legal.  Does anyone know how much this helps in a Lisp-like
> langauge?

Lars Birkedal (CMU) and Mads Tofte (DIKU) built a compiler for
SML that allocates all values in a stack-like fashion.  They
used a variant of ML-type inference combined with Gifford-style
effects inference to assign a "region" to an object.  Regions
are allocated and deallocated in a stack-like fashion.  The
trick is that the regions aren't constrained in size -- so
they're actually regions of heaps.  (In some respects, their
assigning a generation to an object at compile time and using
a dynamically extensible, multi-generational collection 
scheme -- it's just that they have a way to determine when
an entire generation is "unreachable" in one go.)  

In their recent Princples of Programming Languages (PoPL '96)
paper, they combined the region inference with a size inference
algorithm.  The result is that most regions do have a statically
determinable size and thus, these regions can be stack allocated
and deallocated in a normal fashion (i.e., you don't need stack
frames of arbitrary size).  Of course, they have a different
way to deal with regions of arbitrary size.  The results
in terms of total space used are pretty impressive when
compared to SML/NJ.  The running times aren't that far off,
either, given that they don't do any major optimizations or
code transformations, when compared to SML/NJ.

However, you end up writing code in a funny fashion sometimes to get
proper space behavior.  For example, it's often better to make copies
of values to ensure that they can be allocated/ deallocated propertly
(not unlike, say Postscript.)  It's a fascinating paper and I urge all
to check it out.

Of course, Baker's "Cheney on the MTA" and the SML/NJ allocator
are two examples of this approach taken to an extreme -- 
simply stack allocate everything, including stack frames,
but never "return" from a procedure (i.e. convert to continuation-
passing style where every thing is a tail-call.)  The cost,
of course, is that you never do any stack pops, so you use
a tracing collector to "compress" the stack frames.  

When you step back and look at program evaluation abstractly,
you realize that all compilers and runtimes do a lot of integrated
memory management -- it's just that C only provides a stack-like
automatic allocator/deallocator.  When you factor the costs of
stack manipulation into the running times of typical C code, you'll 
see a fairly steep overhead compared to, say, typical Fortran
code.  

-Greg Morrisett
 jgm@cs.cornell.edu