GC performance - was Re: [gclist] GC topics

boehm.PARC@xerox.com boehm.PARC@xerox.com
Thu, 22 Feb 1996 11:55:17 PST


"I don't know. We already have an extra level of complexity, by discriminating
between small allocations, smaller than a threshold value (48 bytes right now)
which are allocated in fixed size entries within fixed size pages, and
large allocations, which are basically mapped to malloc. Separating mark
bits from the data would be rather easy for the fixed size allocations,
but don't you think that keeping the large allocations separate might
ruin our efforts, making the performance improvement negligible."

We have a considerably higher threshold for small objects (512-4096 bytes,
depending on the platform), and basically allocate large objects on page
boundaries.  There are other down sides to this, but it does make it relatively
easy to separate mark bits.

As Charles points out, there are cases in which separating out the mark bits
makes all the difference in the world.  I'm not sure how frequent those
applications are.  Nonetheless, you're gaining substantially in performance
robustness, possibly at a small loss in performance for small applications.  I
think it's the right decision.

(Page aligning large objects risks some performance anomalies due to cache
design.  But putting mark bits in fixed positions in malloc allocated objects
is also a bit risky for the same reason.)

"...  We did see a significant performance improvement, but far from what would
be required to make GC really practical in a VM-intensive environment."

I think separating mark bits is part of the answer.  The other crucial part is
incremental or concurrent collection, so that you still get some work done in
between paging by the collector.  The kinds of techniques you suggest
(concentrating the marker's attention on one section of memory at a time) also
help a bit.  Carl Hauser added code to do this in an older version of our
collector.  It sometimes reduced GC times by a bit more than a factor of 2.
The problem is that it's slower if you're not paging, and it tends to require
more space during collections with a large heap, making it more likely you'll
run out of memory during collection, and making it harder to recover.  Taking
care of thes issues adds complexity.  I dropped it from the current version of
our collector since it wasn't clear the benefit was worth the complexity for
the average program, and it didn't really seem to make the difference between
usable and unusable applications.  I think both separate mark bits and
incremental collection often do make that difference.  But I could argue for
putting the code back in as well.

My experience with the combination of all of these techniques (and simple
generational collection, which also helps) is that the collector no longer
seems to make applications unusable in a VM intensive environment.  This is
based on some very limited experiments with Cedar on a 10MB SPARCstation.  (It
actually had more, but I convinced the kernel it had 10 MB.)  This was
unpleasant independent of garbage collection.  It was even less pleasant during
garbage collections, but it didn't become completely intolerable.  I suspect
this is the best you can hope for.  I haven't tried this experiment with more
recent versions of the collector, which run incrementally instead of in a
separate thread.

Hans

Standard disclaimer ...