[gclist] Re: Name that hypothesis

David Chase chase@centerline.com
Wed, 04 Dec 96 14:20:00 EST


> From: boehm@hoh.mti.sgi.com (Hans Boehm)

> Particular languages seem to encourage very specific programming styles, making
> it hard to generalize to anything else.  For example, Manuel Serrano concluded
> that ML programs (perhaps especially ML benchmarks), benefit much more from
> generational collection than even Scheme programs.

I've wondered for some time now how "my favorite data structures" 
would fare under different garbage collectors.  MFDS are (usually) 
a generic thing-to-small-integer map (implemented using a hash table), 
a growable-array (some of multiple and/or arbitrary dimension), and 
(at times) splay trees.  The style of code that results often looks 
a lot like Fortran, and I pine for typed integers in the languages 
I usually use (*), but I've had good luck with code reuse (the original 
thing-to-integer map was written in 1986, and is still in use). 

(*) This generic map was originally written as part of an implementation 
of "An Improvement to Bottom-Up Tree Pattern Matching", appearing 
in POPL 1987.  In that code, strings, trees, and bit vectors are 
all hashed to integers, which are logically quite distinct, but completely 
interchangeable from the point of view of the C type system.  The 
use of small integers led to substantial (~10x) performance improvements 
when building sets and precalculating the results of tree-building 
operations. 

However, if you consider distributions that result from this style, 
you can see that I end up with a few large, frequently modified objects 
(the arrays and hash tables) and many small rarely modified objects 
(the "actual" data being manipulated, when it is not in fact itself 
small integers).  The consequences of not quickly recycling the arrays 
when they grow (in this application) are pretty bad -- they are large, 
and they grow (that is, a growing array will have a new, different, 
size), and they consume the bulk of the space.  Furthermore, because
the frequently modified objects are large, they will be (relatively)
costly to scan for pointers-to-younger when a generational collection
takes place.

Splay trees, of course, are modified at almost every access.

David Chase