[gclist] Fast-allocating non-copying GC's
David Chase
chase@centerline.com
Fri, 15 Mar 96 09:46:35 EST
> From: David.Ungar@Eng.Sun.COM (David Ungar)
> Dave's note about ST systems having long-running images as a difference
> w.r.t. C and Modula programs is quite true!
> I think that PPS Smalltalk can trace its image back to 1976!
> But seriously folks, ST'ers run for hours without doing a global
> collection/compaction sometimes.
Well, the (lack of?) timing of memory movement has a lot to do with my
arguments against it. Keeping in mind that I'm thinking about mostly
statically typed procedural languages (Modula-3, Eiffel, Java), one
would like to be able to optimize them as well, or perhaps better,
than C. The low-level stuff appears to matter (reference -- any
of the last N PLDI conferences where people present incremental frobs
on the latest low-level analysis/optimization, and my own experience
working on a back-end, in which "good performance" is reached only
at the end of a long, long march).
IF memory movement can occur at any time, then these low level optimizations
are basically not possible (*). On the other hand, if they can be
confined to certain locations (calls to procedures with a certain
signature -- you could quite easily piggy-back it on top of the
exception-handling signatures for Modula-3 or Java, for instance),
then optimization life might be wonderful again.
(*) in theory, the compiler and GC can cooperate very thoroughly
to make this happen, which is what the people at U Mass tried to
do in their Modula-3 compiler. See also the message of 29 Feb 1996
from Antony Hosking regarding their efforts to resurrect this
compiler.
> BTW, I tried a "customizing allocator" (aka multiple free lists)
> in the first version of Berkeley Smalltalk.
> My experience with it was so different from Paul and David Chase's that
> I think context (e.g. St vs. C, interactive vs. real-time)
> must play a very important role.
This is weird. I have gone out and measured this myself, and friends
of mine have measured it as well in other applications, and what
we've always discovered is that something like 90% of the objects
allocated fall into two or three (smallish-object-size) lists.
This supports your assertion that header size matters, because an
8-byte header is a big deal if the user only request 16 bytes of
memory for data. (And this is why I like to use BIBOP whenever
possible, and why I think that the language's type system and
GC headers should be integrated whenever possible.)
One to note is that there is a breakpoint in the freelist policy
of the allocators that I've played with. Based on nothing but random
rule of thumb (plus, it worked better when I was done, so I kept
the result -- nothing like Monte Carlo methodology, eh?), in recent
years I've always had a multi-freelist policy for small objects,
and an address-ordered freelist for large objects. But, again, this
is for malloc-free memory allocation. On the other hand, the Boehm-Weiser
collector looks pretty much like this, too.
The other major difference that I see (besides long-lived workspaces),
for both Smalltalk and Lisp, is that activation records are often
allocated on the heap. Do their sizes vary that much? (Should I
go drag out my Smalltalk implementation experience book to refresh
my memory?) I would think that their high traffic and usually-LIFO
behavior would justify use of a special region with a special policy.
Note, by-the-way, that most of the heuristics and optimizations are
chasing these same short-lived objects -- they're the easiest ones
to discover with compile-time analysis, they're most profitably collected
by generational collection, and I suspect that if you used the 1-bit
reference-count hack, it would pick up many of them as well.
On another note, the GC FAQ has sat dormant for all of this week.
Sorry for the hiatus.
David Chase