[gclist] Fast-allocating non-copying GC's

boehm.PARC@xerox.com boehm.PARC@xerox.com
Fri, 15 Mar 1996 14:12:52 PST


Dave Ungar writes:

"With a contiguous allocator, you can keep the
allocation pointer in a register, and even put its
limit in either a register or an immediate field of an instruction.
The new, hot stuff, is all contigous in memory, which helps VM
and caching behavior--yes I know you can blow the cache anyway,
but it still seems better than following lists."

I disagree about the VM behavior part.  You do win a bit on allocation, but in
most situations you more than pay for it during collections, when you have to
touch the old live data (even pointerfree data if you compact everything) and
the area for the compacted copy (which may overlap the old objects).

There is possibly a cache advantage, since you can probably confine the
youngest generation to a smaller region, but it's not clear how significant it
is.  A free list based collector can and should arrange to keep adjacent free
objects adjacent on a free list.  Admittedly, you can get old objects
interspersed with new objects.  But my impression (based on too many hours with
debuggers rather than precise measurements) is that's actually not the typical
situation.  Data structures tend to be allocated together and die in large
clumps, making contiguous sections available again.  I have no idea whether
this generalizes to images that last for 20 years.  It's not obvious to me one
way or the other.  This same effect also reduces the VM locality advantage for
allocation.  Clearly your mileage will vary depending on the application, etc.

David Chase wrote:
"This is weird.  I have gone out and measured this myself, and friends
of mine have measured it as well in other applications, and what
we've always discovered is that something like 90% of the objects
allocated fall into two or three (smallish-object-size) lists.
This supports your assertion that header size matters, because an
8-byte header is a big deal if the user only request 16 bytes of
memory for data.  (And this is why I like to use BIBOP whenever
possible, and why I think that the language's type system and
GC headers should be integrated whenever possible.)"

I agree that header size sometimes matters, even for C.  But the large number
of small objects is often only half the story.  What I remember from Cedar is
that most objects are in the 24 to 32 byte range.  (4 to 8 bytes of that could
be eliminated with some bit-twiddling, which would probably have been done for
a C program.)  But the average object size is roughly 100 bytes.  Much of the
heap is in fact consumed by a small number of large objects.  My impression is
that this isn't unusual for C programs either.  Average object sizes in the
Detlefs-Dosser-Zorn paper vary from 15.3 to 752.4 bytes.

Hans
Standard disclaimer ...