GC performance - was Re: [gclist] GC topics
Gregory Morrisett
jgm@CS.Cornell.EDU
Wed, 21 Feb 1996 14:41:43 -0500
> I've also heard lots of talk about stack allocating objects when it can
> be proved legal. Does anyone know how much this helps in a Lisp-like
> langauge?
Lars Birkedal (CMU) and Mads Tofte (DIKU) built a compiler for
SML that allocates all values in a stack-like fashion. They
used a variant of ML-type inference combined with Gifford-style
effects inference to assign a "region" to an object. Regions
are allocated and deallocated in a stack-like fashion. The
trick is that the regions aren't constrained in size -- so
they're actually regions of heaps. (In some respects, their
assigning a generation to an object at compile time and using
a dynamically extensible, multi-generational collection
scheme -- it's just that they have a way to determine when
an entire generation is "unreachable" in one go.)
In their recent Princples of Programming Languages (PoPL '96)
paper, they combined the region inference with a size inference
algorithm. The result is that most regions do have a statically
determinable size and thus, these regions can be stack allocated
and deallocated in a normal fashion (i.e., you don't need stack
frames of arbitrary size). Of course, they have a different
way to deal with regions of arbitrary size. The results
in terms of total space used are pretty impressive when
compared to SML/NJ. The running times aren't that far off,
either, given that they don't do any major optimizations or
code transformations, when compared to SML/NJ.
However, you end up writing code in a funny fashion sometimes to get
proper space behavior. For example, it's often better to make copies
of values to ensure that they can be allocated/ deallocated propertly
(not unlike, say Postscript.) It's a fascinating paper and I urge all
to check it out.
Of course, Baker's "Cheney on the MTA" and the SML/NJ allocator
are two examples of this approach taken to an extreme --
simply stack allocate everything, including stack frames,
but never "return" from a procedure (i.e. convert to continuation-
passing style where every thing is a tail-call.) The cost,
of course, is that you never do any stack pops, so you use
a tracing collector to "compress" the stack frames.
When you step back and look at program evaluation abstractly,
you realize that all compilers and runtimes do a lot of integrated
memory management -- it's just that C only provides a stack-like
automatic allocator/deallocator. When you factor the costs of
stack manipulation into the running times of typical C code, you'll
see a fairly steep overhead compared to, say, typical Fortran
code.
-Greg Morrisett
jgm@cs.cornell.edu