[gclist] zillion objects and gc

Rafael Dueire Lins rdl@ee.ufpe.br
Fri, 31 Aug 2001 10:02:35 -0300

 Dear David and co-authors,

First of all, please let me thank you for the several references to my
papers and nice words about my book on Garbage Collection
(together with R.Jones).

However, I feel obliged to draw your attention to a few important points
related to the algorithm described in "Lins, R.D. Cyclic reference counting
with lazy mark-scan. Inf.Proc.Letters. 44, 4 (Dec. 1992), 215-220", on
which your papers are based on.

You say on page 4 of  your paper "Concurrent Cycle Collection in
Reference Counted Systems":

  "There are two major changes that make the algorithm linear
  time: first of all, we add a buffered flag to every object, which
  is used to prevent the same object being added to the root
  buffer more than once per cycle collection. This in turn places
  a linear bound on the size of the buffer"

 A more careful reading of my algorithm shows that, your "root buffer"
corresponds to what I called "control_set". Notice that a set has no
repeated elements, and that if inserted in the control-set the cell is
painted "purple" avoiding repeated cell insertion.

In the same reference you continue:

  "Secondly, we analyse the entire transitive closure of Roots as
  a single graph, rather than a set of graphs. ...

 In a closer look at my book on page 66 you find a section entitled
"Control set strategies" in which I say that "Different strategies can
be easily incorporated to manage the control set". The strategy you
present is one of the several possibilities described and benchmarked
in reference [Lins and Vasques, 1991].

The example in which you say that my algorithm exhibits quadratic
complexity only shows that you choose an inadequate strategy to
manage the control set. Notice that your strategy has the drawback
of having a much longer suspension time in uniprocessors, which may
be unsuitable for certain applications.

I recently developed a new algorithm which is far more efficient than my
previous ones and reduces the complexity from 3O to 2O, where O is
the size of the subgraph below the deleted pointer. This new algorithm
uses only two colours and is, in my opinion, extremely neat.
It was submitted to Information Processing Letters in December/2000.

I congratulate you and your colleagues for the most interesting piece of
work. It is extremely nice to see such good performance figures for
reference counting.

Best wishes and regards,


"David F. Bacon" wrote:

  you might be interested in some work that we did at IBM watson in the
  jalapeno JVM on concurrent cycle collection that was recently published
  in PLDI and ECOOP; see
  http://www.research.ibm.com/people/d/dfb/papers.html for copies.

more specific comments follow...

David.Chung@USPTO.GOV wrote:

> Hi
>         This question is a general one. In brief:
> if you have a large number of objects, a small
> fraction of which die (very slowly), is there a
> method of automatic memory management
> (i.e., gc), which would not (1) try to copy most of
> those objects or (2) trace most of those pointers?
>         Objects here can be cyclic.
> =====================================
>         Detailed description of the question.
>         Say I have 100 M worth of objects resident
> in  memory. Most of them are not large, and their
> pointers criss-cross everywhere.
>         One characteristic of these objects is that a
> fraction of them die, be it very slowly. If you GC'ed
> these objects (via generational collector), when the
> oldest generation has to be garbage collected, the
> collector has to scan all the pointers that are boxed
> in 100 M worth of objects. This can be CPU
> intensive.

presumably, if objects are dying slowly they are also being allocated
slowly, except for local spikes.  the other important factor,
particularly (as hans mentions) for reference counting, is the mutation
rate: how often pointers in the heap are updated.  mutations correspond
to barrier executions, which may be more costly than traditional
trace-based GC barriers.

finally, for cycle collection, an important factor is the degree of
interconnectivity of the object graph.  if your object graph consists of
one big SCC, then the cycle collection methods we published will run
into trouble because in order to find out if there is a garbage cycle
they will have to trace the whole structure.  on the other hand, if the
object graph consists of lots of small components, then each individual
cycle collection will be bounded.

to summarize: the new concurrent cycle collection techniques work best
with applications that have low mutation rates and heavily partitioned
object graphs.  if your application fits these criteria, or even leans
in that direction, i'd suggest you give cycle collection a look.

since you describe the application as a server cache whose objects die
slowly, i'm guessing that it's mostly read traffic, which means you meet
the first criterion (low mutation rate).

while some folks have mentioned some of the usual knocks on reference
counting, i doubt that they apply to your situation.  first of all, you
are trying to avoid latency from mature-space collection.  second, the
major issue is to keep server traffic working in the CPU cache or the
RAM cache and off the disk backing store.  if you are handling mostly
read traffic and objects are dying slowly, then the "large object death"
problem will not be much of an issue, particularly if the server is an
MP, since other threads will continue to service requests.

>         If you impose the condition that there are no
> cycles, then solving the preceding problem is easy: just use
> reference counting. Because reference counting
> only touches objects that are affected when a reference
> is decremented, the overall cost to the above system is low,
> even though one may have 100 M of small, pointer-filled
> objects.

our system includes a hack to the class loader that makes a quick,
conservative check for whether the class is inherently acyclic.  this
removes anywhere from 10-90% of objects from consideration when doing
cycle collection.  if you are willing to add application-specific
knowledge, or to perform global analyses of the program, you may do

>         Of course, for the problem I am considering,
> there is no guarantee that the chain of ponters
> are acyclic.
> Ji-Yong D. Chung

in general, i would say that conclusions that have been drawn from
benchmarks like SPEC are likely to be meaningless for the type of
application you describe.  when you have 50GB of RAM, it's a whole new
ball game.


  Rafael Dueire Lins
  Prof.Adjunto IV

  Departamento de Eletrônica e Sistemas
  Centro de Tecnologia e Geociências
  5o. Pavimento - Sala 427
  Universidade Federal de Pernambuco
  Fone: + 55 81 271-8210 ou 271-8211 Ramal 241
  Fax:  + 55 81 453-6633
  e-mail: rdl@ee.ufpe.br