[gclist] When to garbage collect

Fridtjof Siebert siebert@gr.opengroup.org
Mon, 17 Nov 1997 09:52:19 +0100

Hans Boehm wrote:
> On Nov 14,  3:15pm, Fridtjof Siebert wrote:
> > Here, the amount of work done by the gc on one allocation is bound by a
> > small constant if the mutator guarantees never to use more than a
> > certain percentage of the available memory (while this percentage does
> > not have to be known in advance).
> >
> I think you meant the amount of GC work done per allocated byte, not object.

I first considered equally sized objects and then extended the approach
to objects of arbitrary sizes, where the amount of work per allocation
then depends on the size of the object that is allocated. 

> This is the crucial property that needs to be guaranteed by a collection
> heuristic.  It follows roughly that if at all possible, you should avoid
> collecting unless the amount of allocation you have done since the last
> collection justifies the GC effort.  For a nongenerational collector, the GC
> effort is basically proportional to the amount of live data, which is bounded
> by the heap size.  

In my approach, the amount of work to be done per allocation is not
proportional to the amount of live data, but proportional to 1/f, where
f is the amount of free memory. I measure the the work of the collector
in with the number of objects being marked black in the mark-phase of
the collection (I use the number of bytes used by these objects in the
case of objects with different sizes, but for simplicity, I just refer
to the number of objects of equal size here).

We want to avoid that we run out of free memory, so we have to complete
a collection cycle within the next f+1 allocations. For a system that
can store m=1000 objects, if it is in a state where only f=9 objects are
free, we will have to finish a gc cycle within the next 10 allocations.
So on the next allocation, we should check 1/10 of the objects, which is
s=m/(f+1)=1000/10=100. On the next allocation, the amount of work for
f'=f+1 will then be s=m/(f'+1)=1000/9=111, unless the previous
allocation finished a gc cycle and increased f. 

Using this rule, s=m/(f+1), for the amount of work to be done has some
advantages: This now guarantees an upper limit of gc work that is done
on one allocation, if the number of reachable objects r is limited as
well. And this upper limit of work can easily be reduced just by
increasing the total amount of available memory m.

Additionally, I can imply a constraint on the number of reachable
objects r, just by knowing the number of allocated objects a=m-f. So,
eg. if I have a/m=97%, I know that r/m>=85% or --at least-- r'/m>=85%
was true at some earlier stage of the program (the figures are just
guesses, the correct values are in my thesis which I don't have here).
This information can be useful to determine when to increase the amount
of memory m by allocating a new chunk of memory form the OS.