[gclist] ratio of heap size to live data?

Boehm, Hans hans_boehm@hp.com
Wed, 4 Oct 2000 09:57:33 -0700



> -----Original Message-----
> From: Nick Barnes [mailto:Nick.Barnes@pobox.com]
> Sent: Wednesday, October 04, 2000 2:52 AM
> To: Norman Ramsey
> Cc: gclist@iecc.com
> Subject: Re: [gclist] ratio of heap size to live data? 
> 
> 
> At 2000-10-03 21:47:09+0000, Norman Ramsey writes:
> 
> > I'm looking for any research that would tell me about what ratio
> > of heap size to live data is needed for a modern copying or 
> mark-sweep
> > collector to perform well.  Can anyone point me to papers?
> 
> This rather depends on what you mean by "live data" (and, for that
> matter, "heap size").  The heap H consists of:
> 
> - live objects L (i.e. data which will be used again);
> 
> - dead but reachable objects R (i.e. data which can be reached by
>   chasing pointers from the roots);
> 
> - unreachable objects U (i.e. objects which could be collected by a
>   GC); this may be divided into "strongly unreachable" (S) and "weakly
>   unreachable" (W), where W is treated as reachable by the collector
>   (due to the approximations which it makes).
> 
> - allocation overhead A (object headers, padding, etc);
> 
> - collection overhead C (collector data structures, to-space, etc).
> 
...
> So the question is, what ratio interests you?  The most interesting
> one is probably (L+R)/H, but to obtain this you need to know L+R, for
> which you need to do an accurate full GC, which may not be possible.
> In a conservative collector this is very hard (multiple runs with
> different base addresses, etc).
For a conservative collector running with C/C++ code, it's also sometimes
possible to get a lower bound on L+R, since many of the benchmarks used in
the past were written for explicit deallocation, and some of those don't
leak.  Thus you can count the amount of live memory in the explicitly
managed version of the program.  I remember that Zhong Shao tried this on
some of the Zorn benchmarks, and ended up with very small bounds on W in
most but not all cases.  (The bound may be large either because W is large,
or because the explicitly managed program deallocates memory without
clearing the pointers to it.)
> 
> As I recall, in MLWorks (generational exact copying collector), when
> L+R is 20-30MB, U is typically 10-20MB (mainly garbage building up in
> older generations).  A was small.  C was very small except during a
> collection of an older generation, when it could reach 16MB.  So the
> total heap most of the time would be 50-75% larger than the reachable
> set.
It would be nice to get some more numbers along these lines.  For our
collector U is normally kept at 33-50% for L+R on the order of 10MB, i.e.
large enough to swamp additive constants.  Fragmentation overhead (part of
A?) is typically small, but has certainly been reported to reach 50% in some
cases, and can theoretically be much worse.  (It seems to also be possible
to trade off fragmentation overhead in a nonmoving collector against
collection frequency.  If you collect only with a full heap, the entire heap
is broken up into smaller blocks during each collection cycle, violating
Phong Vo's Wilderness preservation principle.  In this way, nonmoving
collector fragmentation is different from malloc/free fragmentation.) 

My impression is that for copying collectors the to-space component of C is
often an issue, though probably less so for ML.  Even if the to space for
the older generation is only used rarely, it still needs to be resident at
that point.  That means it either needs to remain resident, or it needs to
be paged in and out.  Paging in (and out) something on the order of (L+R) is
often a nontrivial operation.

It's possible to make this even less frequent by increasing the size of
younger generations.  For slightly different reasons, this seems to be
common practice for HotSpot.  But then you may end up with a significantly
larger value of U, and you reduce the possibility of fitting the youngest
generation into cache.

Hans