[gclist] Demand driven generational collector

Nick Barnes Nick.Barnes@pobox.com
Fri, 05 Oct 2001 11:59:21 +0100


At 2001-10-05 02:31:10+0000, "Ji-Yong D. Chung" writes:
>     Here is one question about
> generational collectors.
> 
> =====================================
> 
>     (1) In implementing a generational collector,
> I would like to make my collections be demand-driven.

[...]

>     Say we know that generation n should
> be collected.  We determine if generation n + 1
> should be collected by seeing if there is
> enough space on the generation n + 1
> to accommodate a potential en-masse
> promotion of generation n.

Take care!  Suppose that generation N is nearly full, and mostly
survives collection into generation N+1.  Now generation N+1 is nearly
full, and the next time you decide to collect generation N you will
have to collect generation N+1 again (even though you collected it
recently).  I suffered exactly this in MLWorks: a pig in the python
could lead to very enthusiastic collection of the older generations.
I called it "bubble collections".  My advice is to take care with your
promotion policy: don't always promote survivors from generation N
into generation N+1.

The MLWorks collector had a dynamic number of dynamically-sized
generations.  The policies for collecting different generations
changed a number of times during the development of the collector.

At each minor collection, each generation would be considered for
collection.  A decision would be made for each generation.  A decision
to collect a generation would influence the decisions for younger
generations (there were no remembered sets for young-to-old
intergenerational references, so generations 0 to N-1 would be scanned
in collecting generation N, and it may be worth collecting them simply
to reduce their size before doing that, depending on the likely
mortality in those generations).

There was a maximum generation size (compile-time configurable, usual
values 16MB or 32MB; "large" objects were treated separately and so
were additional to this), and early versions of the collector would
drive collections based on that.  This is similar to what you describe
as demand-driven collection.  In these early versions, survivors from
generation N were always promoted to generation N+1.  This lead to the
"bubble collections" I describe above.

Once I reworked the promotion policy, this problem went away.  Having
decided which generations to collect, the collector would then decide,
for each generation N being collected, whether surviving objects would
be promoted to generation N+1 or would remain in generation N.  This
would depend on the predicted survival rate of the collection, on the
collection decisions for surrounding generations, and on whether there
was room in generation N+1 (If generation N+1 was being promoted,
there would certainly be room in generation N+1).

In fact, generations were a linked list, rather than an indexed array:
"generation N" just means "the Nth along the list".  When a generation
was actually collected, survivors always went into the next node on
this linked list.  So this promotion policy decided whether, and
where, new empty generations are to be added to this linked list
before the collections take place (empty generations were removed
after the collections were complete).

Then the collector would perform the selected collections, one at a
time, youngest generation first.  Each collection was non-incremental,
but a decision to collect an old generation could be rescinded if the
total pause time of younger collections had become excessive.

Performance was good, but the collector was not sufficiently adaptive.
The survival profiles of different applications are quite different,
and the collection and promotion policies were sensitive to predicted
survival rates.  So I developed an adaptive prediction algorithm to
figure out the likely cost and value of collecting each generation.
It adapted to survival profiles for the current application, and
included a non-linear cost metric for estimated pause times.

Nick Barnes
Ravenbrook Limited


> By "demand-driven,"  I mean that a collection of
> up to generation n occurs  when generations
> 1 to n - 1 are likely to run out of memory.
> Put it differently, the idea here is that when a
> generation is used up above a certain threshold,
> it is collected.  Traditionally, major collections are
> triggered when certain number of minor collections
> have taken place.
> 
>     Because the collection of each generation 
> is demand-driven, the collector will not attempt to 
> collect those generations merely because the
> younger generations were collected a prespecified
> number of times.
> 
>     Does anyone know if there has been
> a design of generational collector that works
> this way, demand-driven?  If so, how "well"
> does it perform?
> 
> ==================================
> 
> P.S.  
>      I have thought of one simple algorithm for
> measuring whether generation n is likely to run
> out of memory for the above "demand-driven"
> strategy.
> 
>     I would greatly appreciate any comments,
> critique of the algorithm, which is as follows:
> 
>     Say we know that generation n should
> be collected.  We determine if generation n + 1
> should be collected by seeing if there is
> enough space on the generation n + 1
> to accommodate a potential en-masse
> promotion of generation n.
> 
>     Here, 0 < n < max gen, so that
> the preceding criteria can be applied
> from generation 1 all the way to m,
> where m is the first generation that
> does not meets the criterion.
> 
>     The algorithm starts with generation 0
> (which is the nursery), whose
> collection is triggered when it is filled
> above certain threshold.
> 
>