[gclist] Java GC

Jeremy Fitzhardinge jeremy@zip.com.au
Sun, 01 Jun 1997 21:18:22 +1000

Mark Hamburg wrote:
> >From: Jeremy Fitzhardinge <jeremy@zip.com.au>
> >We've found in our Java implementation that after using a JIT to make
> >bytecode execution reasonably efficient, GC performance outweighs most
> >other factors for an interesting proportion of Java code.
> Where was your test code coming from? One of the things that can be
> relatively important to avoiding this sort of problem is being careful not
> to allocate more often than necessary.

We're using a number of common Java programs as benchmarks.  Our runtime
runs mostly CPU-bound code (like a number of the caffeinemark
benchmarks) significantly faster than the Sun interpreter.  However,
things like javac are helped less by the JIT because it basically comes
down to a comparison of GC performance.  As you say, Java encourages
heap allocation, and javac certainly takes advantage of it.

> So before going after garbage collector changes, one thing to look at is
> coding changes. GC provides the illusion of infinite memory. This should
> not, however, be confused with free memory.

Well, since we only supply a Java VM, we can't control what our
customers want to run under it.  Trying to keep down heap allocation is
a well-known technique for increasing Java performance, but its not
always obvious how to do it, and there's frequently counter-intuitive
results (for example, if you have DIY allocation by having a
manually-managed freelist of pre-allocated objects, you don't
necessarily see the performance improvement you'd expect, because the GC
still scans your freelist).

We're looking at getting the compiler to determine which allocations
have a well-defined scope and visibility, and using stack allocation for
those.  On the other hand, using a GC which is more finely tuned for
Java allocation may be easier to implement and more generally useful.

> If that fails, then my current thoughts about the best route for a Java
> oriented GC would be to assume that a lot of objects will never be
> referenced from the heap or globals. Keep a ZRC (zero-reference count) list
> that lists all of the objects suspected of only being referenced from the
> stack or registers and add a mark bit to each object that gets set when
> storing a pointer to the object into the heap or into a global.

Hm.  In programs with only a small number of threads, there is a limit
to the amount of heap which can be directly referenced from the stack
and registers.  The programs our VM is typically used for are servers
which run on multi-CPU machines.  They typically have large heap sizes
with relatively complexly constructed datastructures.  They also have
lots of threads, which could take advantage of your scheme, but I'd
guess that the amount of heap allocation solely referenced by thread
context is relatively low.  I suspect that better generational GC
support would be more useful.

>> At the expense of a bit more complexity and code on pointer storage, we can
> change the mark bit to a counter that covers the range 0, 1, 2. This then
> allows us to do recursive frees on objects for which only one pointer has
> been stored in the heap. The counter isn't a true reference count. It just
> satisfies the conditions that if it is zero, there are no references from
> the heap, and if it is one then there is at most one reference from the
> heap.

Maintaining any kind of reference count (or other operation needing
read-modify-write) is very expensive in threaded environments on most

> Either the first step alone or the two steps together provide a simple
> "generational" scheme that is compatible with mark-and-sweep garbage
> collection and hence more readily connected to environments like C where
> copying collectors are harder to use.

Since we use the Boehm collector, intermixing C and Java is relatively
straightforward.  However, since all future C code used in a Java
environment will use JNI, which allows more elaborate GC schemes,
there's not much point maintaining this sort of interface if it has a
significant impact on Java performance.

Our prime concern in a GC is not so much the efficiency of
marking/copying/whatever, but mainaining concurrenct with mutator
threads.  Ideally, we'd like to never have to interrupt a mutator
thread.  This is almost certainly impossible, but we'd certainly like to
avoid any algorithm which requires stopping all mutator threads (and if
it does have to do it, make sure it doesn't do any significant work).  
I've found a paper by Andrew Appel which describes a collector which is
close to what we'd like (something like "A real-time GC for stock
multiprocessors" - I don't have the exact reference on hand).