[gclist] gc concerns

Boehm, Hans hans_boehm@hp.com
Wed, 13 Dec 2000 16:10:10 -0800

> From: Mike Roberts [mailto:miker@artifact-entertainment.com]
> > From: Boehm, Hans [mailto:hans_boehm@hp.com]
> > ...
> > 1) Response time.  The client may be stopped for long enough that
> > the pause
> > becomes visible to the user.
> the client is being done by a different team, so i know it 
> won't be written
> with gc (hopefully, they won't try to keep me from using it). 
> if the time
> spent in collection (stop-the-world) is small compared to the 
> amount of time
> between internet updates (2-3 times a second is our goal) 
> from the server, i
> don't think the client will see any noticeable pause.
As a very rough guess, a collector might trace 200MB/sec/processor on a
modern X86 machine.  (I don't have real numbers off the top of my head.
Perhaps someone can fill them in?  A third of the real memory bus bandwidth
is probably not too far off.)  That means you should be able to keep GC
pauses to 100msecs if you have to trace on the order of 20MB or less.  On a
4 processor, single bus machine with a paralel GC, that might go to 50MB or
500MB/sec trace speed.
> > 2) Multiprocessor throughput.  Even if you perfectly 
> parallelism the GC
> > client, the garbage collector may make the application
> > single-threaded for a
> > significant fraction of the execution time.  If 1/5 of the cycles
> > are spent
> > in the single-threaded portion of the GC, you'll be ending half the
> > wall-clock time in the GC on a 4 processor machine.
> that's half the gc's wall-clock time in the single threaded 
> portion. not
> half the application's wall-clock time in the gc, right?
Actually, I meant the latter, unfortunately.  For a sequential GC, most of
the allocation/GC time will normally be spent in the single-threaded part of
the GC.  If the allocation/GC time is 20% of the total application time, and
the rest of the application uses all processors, the rest is arithmetic.
Spending 20% of the time in the allocator usually requires a fairly
allocation-intensive application, but is not unusual.

Note that you can easily get some of these problems with malloc/free in
multithreaded applications as well.  Making malloc scalable is also
nontrivial.  (In some cases, it may actually be harder.  I can send you a
paper if you really want the details.)

This assumes that you do not have any concurrency between the client and the
GC.  There are many algorithms that allow you to violate that assumption.
But it tends to be tricky to make such things robust if you don't
parallelize the collector as well. 
> > ...
> > 1) Most modern computers are memory-poor in comparison to 
> the processor
> > speed.  Hence there is strong motivation to keep heap sizes 
> down, and thus
> > most applications run with heap sizes that are small enough that
> > stop-the-world collections finish in a reasonable amount of 
> time, and GC
> > pause times are often not much of an issue.
> we're looking at 2-4G of memory per system, more if necessary. the
> processors are most likely to be pentium iii xeons. this 
> server application
> will be the only application running on each server.

If the collected heap can occupy a significant fraction of those 2-4 GB,
then it sounds like you do need incremental collection.  As was pointed out,
the above observation doesn't apply to servers, or at least nowhere near
universally.  I was only trying to explain why there generally seems to be
relatively little interest in incremental GC, the unfortunate result being
that it isn't well supported.  I personally think it is useful.