[gclist] Fast-allocating non-copying GC's

Paul R. Wilson wilson@cs.utexas.edu
Thu, 14 Mar 1996 08:57:22 -0600


>From David.Ungar@Eng.Sun.COM Wed Mar 13 17:38:41 1996
>Subject: Re: [gclist] Fast-allocating non-copying GC's
>Cc: wilson@cs.utexas.edu
>Status: OR
>
>Paul,
>
>You seem to be arguing that if a system exhibits a high
>object allocation rate, it must be "broken" and therefore
>it would be better to redesign it that to worry too much about
>allocation efficiency.

No, I was just arguing that systems with high object allocation
rates often *are* in fact broken in that respect.  This doesn't
mean that there aren't fast-allocating programs.  It just means
that their importance is overrated.  How overrated?  It depends.

I'll even admit to a certain inconsistency.  If you look at my alloctor
survey, you'll notice that one of the most important points is that
you can reduce the overhead of Knuth's boundary tag scheme (for
supporting general coalescing) from two words to one.  Standish
came up with this hack in 1978---and nobody noticed.  For over
15 years after that, most allocators didn't support general coalescing
or took a needless space hit of a word per object.  That probably amounts
to billions of dollars worth of wasted memory.

So why don't I think a word per object is crucial for GC's?  Well, I
do, in a sense---your point is valid---but there are a lot of other
issues.  For example, some people aren't going to use sliding compaction
because it's slow.  (How slow?  Maybe not as a slow as they think.
Maybe that should be in the FAQ.)

Others aren't going to use sliding compaction because they have more data
than will fit in RAM, and sliding compaction makes two (or more) passes
over the data, which is bad news locality-wise.  (How bad?  I don't know.)

Others aren't going to use sliding compaction because it only makes sense
for the oldest generation that fits in RAM.  (So far as I can tell.) 
The extra CPU and locality costs aren't worth it for younger generations.
Maybe those people should implement a hybrid (as you have), but some won't
bother, so they'll pick something else.

A lot of the people who won't implement a hybrid are implementors of weird
languages without state-of-the-art compilers.  A 10% difference in memory
usage in the oldest generation is not their biggest concern.  Many of
them don't have a generational GC yet.  Some of them are using reference
counting.  They've got bigger problems.  Some of them have crummy
compilers and bytecode interpreters, and any few weeks of their time
is better spent working on code generation than garbage collection.

Some people who do have high-performance systems are more worried about
interoperability than a 10% difference in memory usage.  They want to
support multiple heaps for different languages, with pointers across
those heaps.  They may need to support conservative scanning of thread
stacks.  These things are much easier to do cleanly and efficiently in a
nonmoving collector.  (Maybe we should put a sliding compactor in our
oldest generation, but make it "mostly sliding" like Bartlett's "mostly
copying" collector.  If you'd like to implement the code, we'd be very
happy to roll it into our GC.  Please make it hard-real-time incremental.)


So basically, I agree with you that 10% matters.  It's just that many of
us have bigger fish to fry for the time being, so other factors matter
too.

For my own purposes, it's just not the most critical thing.  Our GC supports
C++ and RScheme.  For C++, we have to deal with the crummy pointer model,
which is sad.  For RScheme, we're not trying to build a compiler with
maximum performance---just good performance with extreme portability
and interoperability and extensibility.  These are different goals than
your goals for Self.   (Or at least, the goals are prioritized differently.)
(Maybe this is the real issue---we should just argue about whether everyone
should use Self for everything, starting now.)

I also have a hidden agenda that maybe I should make explicit.  We're working
on compressed virtual memory techniques that will make some of the cost of
the cost of the two extra words go away in the long term.

Those pointers are easy to compress away for paging purposes, and to some
degree for caching purposes.  This gives the effect of cdr-coding the
GC's own data structures, more or less.  It also does this in a general
way for the users' data structures, giving you a cleaner system structure.
Rather than wienering on the representation of each program's (or runtime
system's) data structures to "compress" the overhead out by hand, we want
to do something that's more general and easier to comprehend.