[gclist] zillion objects and gc

David F. Bacon dfb@watson.ibm.com
Tue, 04 Sep 2001 13:52:55 -0400


Rafael Dueire Lins wrote:

> Dear David and co-authors,
>
> First of all, please let me thank you for the several references to my
>
> papers and nice words about my book on Garbage Collection
> (together with R.Jones).

yes, the book really is a great reference that was sorely lacking in the
field.

> However, I feel obliged to draw your attention to a few important
> points
> related to the algorithm described in "Lins, R.D. Cyclic reference
> counting
> with lazy mark-scan. Inf.Proc.Letters. 44, 4 (Dec. 1992), 215-220", on
>
> which your papers are based on.
>
> You say on page 4 of  your paper "Concurrent Cycle Collection in
> Reference Counted Systems":
>
>      "There are two major changes that make the algorithm linear
>      time: first of all, we add a buffered flag to every object,
>      which
>      is used to prevent the same object being added to the root
>      buffer more than once per cycle collection. This in turn
>      places
>      a linear bound on the size of the buffer"
>
> A more careful reading of my algorithm shows that, your "root buffer"
> corresponds to what I called "control_set". Notice that a set has no
> repeated elements, and that if inserted in the control-set the cell is
>
> painted "purple" avoiding repeated cell insertion.

it's true that you call it a set, and that objects are painted purple
when they are inserted into it.  but if i understand correctly you also
color an object black if its reference count is incremented.  if it is
later decremented again before the control set is processed, it must be
re-inserted into the control set.  however, since it has been colored
black, there is no longer a localized way to find out if it is in the
set (that is, it is no longer purple).

in practice, these sets have to be implemented as linear buffers for
efficiciency.  so adding the "buffered" flag provides the additional
localized information needed to avoid duplicates in the buffer.

> In the same reference you continue:
>
>      "Secondly, we analyse the entire transitive closure of Roots
>      as
>      a single graph, rather than a set of graphs. ...
>
> In a closer look at my book on page 66 you find a section entitled
> "Control set strategies" in which I say that "Different strategies can
>
> be easily incorporated to manage the control set". The strategy you
> present is one of the several possibilities described and benchmarked
> in reference [Lins and Vasques, 1991].

i was unable to get a copy of this technical report.  i would appreciate
a copy, electronic or otherwise.

> The example in which you say that my algorithm exhibits quadratic
> complexity only shows that you choose an inadequate strategy to
> manage the control set. Notice that your strategy has the drawback
> of having a much longer suspension time in uniprocessors, which may
> be unsuitable for certain applications.

there are two issues: how often you process the root buffer, and what
"strategy" you use to process it.  for the latter, once you decide to
process the root buffer, you want to do so as efficiently as possible.
and that means using the linear algorithm outlined in our paper, rather
than the quadratic algorithm outlined in the book.

once you choose the "strategy", you can vary the frequency of root
buffer collection.  and indeed, you could collect it so often that it
always had only one element, in which case it degenerates to the
quadratic algorithm.  however, we found that for java programs,
processing small root buffers led to enormously increased overall GC
times, and not significantly shorter uniprocessor pause times for most
applications.  the reason is simple: there is often mutation to large,
live structures.

> I recently developed a new algorithm which is far more efficient than
> my
> previous ones and reduces the complexity from 3O to 2O, where O is
> the size of the subgraph below the deleted pointer. This new algorithm
>
> uses only two colours and is, in my opinion, extremely neat.
> It was submitted to Information Processing Letters in December/2000.

i would appreciate a copy of this.  reducing the number of graph scans
would be a major win.

> I congratulate you and your colleagues for the most interesting piece
> of
> work. It is extremely nice to see such good performance figures for
> reference counting.

thanks.  it is an interesting time to be working in the area.

regards,

david