[gclist] zillion objects and gc

David F. Bacon dfb@watson.ibm.com
Wed, 29 Aug 2001 12:42:16 -0400

you might be interested in some work that we did at IBM watson in the
jalapeno JVM on concurrent cycle collection that was recently published
in PLDI and ECOOP; see
http://www.research.ibm.com/people/d/dfb/papers.html for copies.

more specific comments follow...

David.Chung@USPTO.GOV wrote:

> Hi
>         This question is a general one. In brief:
> if you have a large number of objects, a small
> fraction of which die (very slowly), is there a
> method of automatic memory management
> (i.e., gc), which would not (1) try to copy most of
> those objects or (2) trace most of those pointers?
>         Objects here can be cyclic.
> =====================================
>         Detailed description of the question.
>         Say I have 100 M worth of objects resident
> in  memory. Most of them are not large, and their
> pointers criss-cross everywhere.
>         One characteristic of these objects is that a
> fraction of them die, be it very slowly. If you GC'ed
> these objects (via generational collector), when the
> oldest generation has to be garbage collected, the
> collector has to scan all the pointers that are boxed
> in 100 M worth of objects. This can be CPU
> intensive.

presumably, if objects are dying slowly they are also being allocated
slowly, except for local spikes.  the other important factor,
particularly (as hans mentions) for reference counting, is the mutation
rate: how often pointers in the heap are updated.  mutations correspond
to barrier executions, which may be more costly than traditional
trace-based GC barriers.

finally, for cycle collection, an important factor is the degree of
interconnectivity of the object graph.  if your object graph consists of
one big SCC, then the cycle collection methods we published will run
into trouble because in order to find out if there is a garbage cycle
they will have to trace the whole structure.  on the other hand, if the
object graph consists of lots of small components, then each individual
cycle collection will be bounded.

to summarize: the new concurrent cycle collection techniques work best
with applications that have low mutation rates and heavily partitioned
object graphs.  if your application fits these criteria, or even leans
in that direction, i'd suggest you give cycle collection a look.

since you describe the application as a server cache whose objects die
slowly, i'm guessing that it's mostly read traffic, which means you meet
the first criterion (low mutation rate).

while some folks have mentioned some of the usual knocks on reference
counting, i doubt that they apply to your situation.  first of all, you
are trying to avoid latency from mature-space collection.  second, the
major issue is to keep server traffic working in the CPU cache or the
RAM cache and off the disk backing store.  if you are handling mostly
read traffic and objects are dying slowly, then the "large object death"
problem will not be much of an issue, particularly if the server is an
MP, since other threads will continue to service requests.

>         If you impose the condition that there are no
> cycles, then solving the preceding problem is easy: just use
> reference counting. Because reference counting
> only touches objects that are affected when a reference
> is decremented, the overall cost to the above system is low,
> even though one may have 100 M of small, pointer-filled
> objects.

our system includes a hack to the class loader that makes a quick,
conservative check for whether the class is inherently acyclic.  this
removes anywhere from 10-90% of objects from consideration when doing
cycle collection.  if you are willing to add application-specific
knowledge, or to perform global analyses of the program, you may do

>         Of course, for the problem I am considering,
> there is no guarantee that the chain of ponters
> are acyclic.
> Ji-Yong D. Chung

in general, i would say that conclusions that have been drawn from
benchmarks like SPEC are likely to be meaningless for the type of
application you describe.  when you have 50GB of RAM, it's a whole new
ball game.