[gclist] Fast-allocating non-copying GC's

Paul R. Wilson wilson@cs.utexas.edu
Wed, 13 Mar 1996 14:34:36 -0600


>From majordom@iecc.com Wed Mar 13 12:30:50 1996
>From: David.Ungar@Eng.Sun.COM (David Ungar)
>Subject: Re: [gclist] Fast-allocating non-copying GC's
>Cc: wilson@cs.utexas.edu
>
>Sorry to disagree with you twice in one day, Paul,
>but I'm afraid I must.

Well, go ahead, if you must.  I can live with it.

>First an extra word per object overhead may not seem like too
>much in a system with 10-word objects till you think of all
>the other nifty things that you could do with the extra word.
>(incremental stuff, profile info, you name it!)

We *do* use our extra words to support incremental collection.
Part of our motivation for using fake copying was to achieve hard
real time efficiently.

>For Self, we worked hard to get the total overhead down to
>2 words per object (including the "class" pointer).

Then you're ahead of us by one word.  It's not clear to me that that's
usually the critical factor.

>Secondly, IMHO fragmentation is a big concern.
>At ParcPlace, it seemed to be the way that most real ST apps
>really stressed the memory system.

Interesting.  More info would be very welcome.  What was the underlying
allocator?  Fragmentation can be very dependent on the choice of
allocator?  (e.g., Best Fit or address-ordered first fit usually has
an order of magnitude less fragmentation than next fit.)

>From the data I've seen, the fragmentation problem is overrated---picking
a good allocator can help a *lot*.   I don't know if the data are
representative, though.

>Users tend to stuff as much as possible into their images and
>frequently run on the edge of thrashing.
>They notice how well the GC system keeps things compact.

I assume you're talking about a systm with a big system image, like your
usual Smalltalk or Self environment.

Two relevant points:

  * This is much less of an issue with your average Modula-3 or C++ program.

  * You may want a hybrid GC that treats the image specially.  If you don't
    need to GC it, you don't need the two pointers.  (Or if you can GC it
    offline, or whatever.)  Or you can use a different online strategy
    for the big blob of old stuff.  (That's what you do anyway, right?)

Another issue is that we haven't clearly separated out the issue of
"nonmoving" from "non-copying."  It take it you're advocating a moving
(sliding compaction) GC for the old data, not a copying GC.  Copying
is a huge lose for big volumes of data that barely fit in memory.

>So, I would take issue with the statement that
>"For *most* apps, non-copying GC is a win"

Well, I'll take issue with your taking issue.  Now we can argue interminably
about how to figure out what "most" means.  Or I can just defend the
thesis that "it depends".  E.g., on:

  1. Do you need hard real-time?  What CPU cost will you pay for it
     (e.g., uniform indirections like Brooks' GC or Marc Feely et al.'s
     sliding compaction GC)

  2. Do you have a cooperative compiler?  How cooperative?  (Does it
     identify all pointers in the stack and registers unambiguously?
     If not, will it let you pre-fragment objects in the compiler
     by changing the layout decisions to reduce fragmentation for
     a non-copying GC?)

  3. Do you need to support interoperability with languages like C?
     What's your idea of an acceptable interface?

  4. Do you have a decent nonmoving allocator?

  5. Do you have a decent compactor or copying algorithm?

  6. Do you have a big, fat system image?  Can you avoid GC'ing it
     online?

  7. Do you have more data than will fit in memory, so that you're 
     worried about locality?

  8. Are your application data structures and algorithms favorable to
     your copying algorithm?  (Does it make locality better, or worse?)
     Are they favorable to an order-preserving compacting algorithm?
     (Do they naturally have good locality in allocation order, and
     just need the holes squeezed out?)  

  9. Are your data structures and algorithms favorable to a good
     allocator, in terms of fragmentation?

  10. Are you willing to implement a hybrid GC?  How hairy a hybrid?


My claim (for which the evidence is admittely scanty, but I think generally
on my side) is that non-copying GC is good for most apps, taking lots
of considerations into account.

That's not to say it's not bad enough for some apps that it's worse on
average.

It's also not to say that if you have a very friendly compiler, you couldn't
be better off with a copying collector---or, more likely, some kind of
hybrid collector that does some kind of compaction.

The data on all of this are very sketchy, so the bottom line really is
"it depends."  For the FAQ, though, I think it's important to defuse
the widespread impression that copying collection is clearly superior
to other forms of GC.  It just ain't so.


As far as my own person axe-grinding goes, I just want more data.  I'd
be perfectly happy to find out that fragmentation is a bigger problem
than it has appeared from our recent experiments.  We were kind of
disappointed that the problem just seemed to go away when we picked
good allocators---we put a lot of effort into building fancy tools
for studying fragmentation, and it turned out to look like a non-problem.

(We'd be very interested in allocation traces of programs that exhibit
significant fragmentation for good allocators;  it'd give us something
to sink our teeth into.)

With respect to real-time GC's, we'd also be happy to find out that copying
or compaction is a big win.  The point's we're working on aren't mostly
about whether the GC is copying or not.  We think you want to use the
same basic incremental strategy either way, to get real-time performance
without much overhead.