[gclist] Compiler tables for accurate gc

Hans Boehm boehm@hoh.mti.sgi.com
Wed, 17 Dec 1997 14:02:17 -0800


On Dec 17, 12:45pm, Eliot Moss wrote:

> In any case, I advocate accurate collection not so much because conservative
> collection does not work most of the time, but for these two primary reasons:
>
> 1) Conservative collection tends to restrict the algorithms you can use, and
>    which algorithm is best (e.g., mark-sweep versus copying) depends
>    substantially on application behavior.
>
> 2) Conservative collection can occasionally lead to drastic anomalous storage
>    retention, and surprises are bad and not necessarily easy for the typical
>    application programmer to solve.
>
I agree, mostly.  (I disagree slightly, in that I think it's imperative that
storage retention problems become easier to diagnose, conservative collector or
not.  You need basically the same tools in either case.  Other issues can be at
least as serious as conservative pointer identification.)

However, especially in the case of Java, there is also a major disadvantage:

Java/native code combinations are likely to be much less reliable, harder to
write, and perform considerably worse.  This is aggravated by the fact that we
have two conflicting conventions for how to write native code that's callable
from Java.  Neither spec is easy to deal with.  I'm not convinced that either
is precise enough that I could write correct code to it.  There is no
reasonable way to test such code, since failures to obey the spec will with
high probability show up rarely and only on some systems.  And the failures
will be much harder to debug than it is to debug conservative retention
(assuming reasonable tools for both).

As far as I can tell, Java/native code combinations are the norm rather than
the exception at the moment, even if you don't count the runtime system.  I see
no reason for that to change soon for server-side applications.

Altogether, it seems to me that a fully accurate collector is often not the
right technical tradeoff for Java.  I want a conservative GC in the presence of
native code.  I think the overall experience with Lisp foreign function
interfaces points in the same direction.

It seems to me that you can actually make this tradeoff dynamically in the
runtime, at least most of the time.  An old generation (mostly) nonmoving,
potentially conservative, collector seems to perform about as well as (or
better than) anything else for in-memory old object collection.  You can add a
copying collector, based on precise type information, for young Java allocated
objects.

At runtime, you have to make a choice between two modes:

1) Use the young generation collector.  Flush the young generation for native
calls.

2) Bypass the young generation.

In either case, anything for which no type information is provided is scanned
conservatively by the old generation collector.  The Java runtime itself
provides type information wherever it can at reasonable cost.  The native code
never sees young generation objects.

This gives you fast allocation of short-lived objects for pure Java apps, which
would presumably always run in mode 1.  Any app that makes frequent native
calls would run in mode 2, thus getting the same performance and reliability
that you get with existing conservative implementations (except that it's not
hard to do better than many of them).

This also makes it much easier to debug the type information generated by the
runtime.  You can run in mode 2, and turn it off selectively until things start
working.

As usual, this is my personal opinion, which has nothing to do with anybody's
product plans, etc.

-- 
Hans-Juergen Boehm
boehm@mti.sgi.com