[gclist] Re: Name that hypothesis
David Chase
chase@centerline.com
Wed, 04 Dec 96 14:20:00 EST
> From: boehm@hoh.mti.sgi.com (Hans Boehm)
> Particular languages seem to encourage very specific programming styles, making
> it hard to generalize to anything else. For example, Manuel Serrano concluded
> that ML programs (perhaps especially ML benchmarks), benefit much more from
> generational collection than even Scheme programs.
I've wondered for some time now how "my favorite data structures"
would fare under different garbage collectors. MFDS are (usually)
a generic thing-to-small-integer map (implemented using a hash table),
a growable-array (some of multiple and/or arbitrary dimension), and
(at times) splay trees. The style of code that results often looks
a lot like Fortran, and I pine for typed integers in the languages
I usually use (*), but I've had good luck with code reuse (the original
thing-to-integer map was written in 1986, and is still in use).
(*) This generic map was originally written as part of an implementation
of "An Improvement to Bottom-Up Tree Pattern Matching", appearing
in POPL 1987. In that code, strings, trees, and bit vectors are
all hashed to integers, which are logically quite distinct, but completely
interchangeable from the point of view of the C type system. The
use of small integers led to substantial (~10x) performance improvements
when building sets and precalculating the results of tree-building
operations.
However, if you consider distributions that result from this style,
you can see that I end up with a few large, frequently modified objects
(the arrays and hash tables) and many small rarely modified objects
(the "actual" data being manipulated, when it is not in fact itself
small integers). The consequences of not quickly recycling the arrays
when they grow (in this application) are pretty bad -- they are large,
and they grow (that is, a growing array will have a new, different,
size), and they consume the bulk of the space. Furthermore, because
the frequently modified objects are large, they will be (relatively)
costly to scan for pointers-to-younger when a generational collection
takes place.
Splay trees, of course, are modified at almost every access.
David Chase