[gclist] Fast-allocating non-copying GC's

David Chase chase@centerline.com
Fri, 15 Mar 96 09:46:35 EST


> From: David.Ungar@Eng.Sun.COM (David Ungar)
> Dave's note about ST systems having long-running images as a difference
> w.r.t. C and Modula programs is quite true!
> I think that PPS Smalltalk can trace its image back to 1976!
> But seriously folks, ST'ers run for hours without doing a global
> collection/compaction sometimes.

Well, the (lack of?) timing of memory movement has a lot to do with my
arguments against it.  Keeping in mind that I'm thinking about mostly
statically typed procedural languages (Modula-3, Eiffel, Java), one
would like to be able to optimize them as well, or perhaps better,
than C.  The low-level stuff appears to matter (reference -- any 
of the last N PLDI conferences where people present incremental frobs
on the latest low-level analysis/optimization, and my own experience 
working on a back-end, in which "good performance" is reached only 
at the end of a long, long march).

IF memory movement can occur at any time, then these low level optimizations 
are basically not possible (*).  On the other hand, if they can be 
confined to certain locations (calls to procedures with a certain 
signature -- you could quite easily piggy-back it on top of the 
exception-handling signatures for Modula-3 or Java, for instance), 
then optimization life might be wonderful again. 

(*) in theory, the compiler and GC can cooperate very thoroughly 
to make this happen, which is what the people at U Mass tried to 
do in their Modula-3 compiler.  See also the message of 29 Feb 1996
from Antony Hosking regarding their efforts to resurrect this
compiler.

> BTW, I tried a "customizing allocator" (aka multiple free lists)
> in the first version of Berkeley Smalltalk.
> My experience with it was so different from Paul and David Chase's that
> I think context (e.g. St vs. C, interactive vs. real-time)
> must play a very important role.

This is weird.  I have gone out and measured this myself, and friends 
of mine have measured it as well in other applications, and what 
we've always discovered is that something like 90% of the objects 
allocated fall into two or three (smallish-object-size) lists. 
This supports your assertion that header size matters, because an
8-byte header is a big deal if the user only request 16 bytes of
memory for data.  (And this is why I like to use BIBOP whenever
possible, and why I think that the language's type system and
GC headers should be integrated whenever possible.)

One to note is that there is a breakpoint in the freelist policy 
of the allocators that I've played with.  Based on nothing but random 
rule of thumb (plus, it worked better when I was done, so I kept 
the result -- nothing like Monte Carlo methodology, eh?), in recent 
years I've always had a multi-freelist policy for small objects, 
and an address-ordered freelist for large objects.  But, again, this 
is for malloc-free memory allocation.  On the other hand, the Boehm-Weiser 
collector looks pretty much like this, too. 

The other major difference that I see (besides long-lived workspaces), 
for both Smalltalk and Lisp, is that activation records are often 
allocated on the heap.  Do their sizes vary that much?  (Should I 
go drag out my Smalltalk implementation experience book to refresh 
my memory?)  I would think that their high traffic and usually-LIFO 
behavior would justify use of a special region with a special policy.  
Note, by-the-way, that most of the heuristics and optimizations are 
chasing these same short-lived objects -- they're the easiest ones 
to discover with compile-time analysis, they're most profitably collected 
by generational collection, and I suspect that if you used the 1-bit 
reference-count hack, it would pick up many of them as well. 

On another note, the GC FAQ has sat dormant for all of this week.  
Sorry for the hiatus.

David Chase