GC performance - was Re: [gclist] GC topics

Darius Blasband darius@phidani.be
Thu, 22 Feb 1996 09:08:26 +0100 (MET)


> [Me:]
> Doesn't the collector have to see them anyway to be able to mark them
> in order to prevent them from being deallocated ? Admittedly, they
> don't have to be scanned if talking of a conservative GC..."
> 
> [Hans:]
> In a simple collector yes.  However, if the collector cares about VM
> performance it should keep the mark bits on separate pages from the objects.
> That also sometimes makes it possible to avoid adding an extra word to every
> object.  If the mark bits are separate, then only the pages containing the mark
> bits need to be resident.  In our case this is roughly 1/32 the size of the
> data, and it could be less.  The data page is touched only shortly before we
> allocate from it again, when it needs to be touched anyway.
> 
> There may be extra cost involved (on the order of 4 or 5 memory references) to
> map an address to a mark bit.  In our case, that's mainly work needed for the
> pointer validity check anyway.  (Note that this work is almost never needed for
> invalid pointers, since most of those fail some very simple check.  It's the
> validity checks on valid pointers that are relatively expensive.  That's why
> information of the form "this word is not a pointer" is not all that helpful
> for typical case performance.)

I don't know. We already have an extra level of complexity, by discriminating
between small allocations, smaller than a threshold value (48 bytes right now)
which are allocated in fixed size entries within fixed size pages, and
large allocations, which are basically mapped to malloc. Separating mark
bits from the data would be rather easy for the fixed size allocations,
but don't you think that keeping the large allocations separate might
ruin our efforts, making the performance improvement negligible.

It is true that the main problem we have been facing with GC was VM. I guess
that conservative GC has the same problem: as soon as we talk about
large systems which make extensive use of VM, the performance of GC gets
pretty dramatic. We have been trying several things (even if we did not
consider separating mark bits from actual data), such as keeping a cache
of references to be marked, using most significant bits in the address
as hash key, in order to improve locality, that is, attempt to group marking
so that whenever a reference is marked, the probability for the next marking
to occur within the same page is increased. We did see a significant 
performance improvement, but far from what would be required to make
GC really practical in a VM-intensive environment.

Darius...