[gclist] object identity & copying collection

Paul R. Wilson wilson@cs.utexas.edu
Thu, 11 Dec 1997 07:31:50 -0600


>From majordom-gclist-out-owner-wilson=cs.utexas.edu@iecc.com Thu Dec 11 01:19:13 1997
>From: boehm@hoh.mti.sgi.com (Hans Boehm)
>Date: Wed, 10 Dec 1997 23:18:16 -0800
>To: Fergus Henderson <fjh@cs.mu.oz.au>,
>        Garbage Collection mailing list <gclist@iecc.com>
>Subject: Re: [gclist] object identity & copying collection
>
>On Dec 11,  5:51pm, Fergus Henderson wrote:
>> Subject: Re: [gclist] object identity & copying collection
>> Mark Tillotson <markt@harlequin.co.uk> wrote:
>> > (Incidentally
>> > the cost of fragmentation is surely more than one word per object?)
>>
>> Good question.  With the Boehm (et al) collector,
>> is there a simple way of measuring the cost of fragmentation?
>>
>According to some of the UTexas work, the answer is very unclear.  15%
>fragmentation loss or less seems to be typical for good (nonGC) allocators.
>That means that for typical object sizes it's borderline.

Actually, the answer recently got a whole lot clearer.  We re-did
our experiments and factored out miscellaneous overheads:  headers,
rounding up to minimum block sizes, and architurally-imposed alignment
constraints.  The true fragmentation for our good allocators is on
average a little less than ONE PERCENT for 8 real C and C++ programs
(including GCC, Ghostscript, and a variety of other programs).

When we published our survey, we knew that some of the 15% wasn't
real fragmentation, and guessed that maybe half was overheads.  We
should have been suspicious that the best allocators all performed
very similarly---the reason turns out to be that all we were seeing
was implementation overheads, and there was almost no real fragmentation,
hence no significant variation in the fragmentation.

Mark Johnstone just finished his dissertation, which has the new
results.

The bottom line is that you can definitely beat the 15% figure in
our survey.  Most of that is due to headers, and rounding small requests
up to minimum block sizes.  Those costs can mostly be eliminated if
you're clever.  Some of the remaining cost is due to rounding to
8-byte alignment, which is not the allocator's fault.  Interestingly,
the fragmentation cost is less than the cost of rounding up to
8 bytes.

We need to do more experiments with more real programs, but I think
it's pretty safe to say that for most programs most of the time, a
good allocator should give you fragmentation of no more than a few 
percent---significantly less than one word per object.  (That's
assuming a typical object size around 10 words.)