[gclist] Name that hypothesis

Paul R. Wilson wilson@cs.utexas.edu
Thu, 5 Dec 1996 08:56:18 -0600


>From majordom@iecc.com Thu Dec  5 05:20:40 1996
>Sender: majordom@iecc.com (MajorDomo)
>To: gclist@iecc.com
>Subject: Re: [gclist] Name that hypothesis
>Date: Thu, 05 Dec 1996 11:09:23 +0000
>From: Nick Barnes <nickb@harlequin.co.uk>
>
>A more precise statement is that the instantaneous half-life increases
>monotonically with age.

In keeping with Dave's "curve schmurve" comment, I have to disagree
with this formulation.

In general, the "curve" is not monotonic, and doesn't have to be
for generational GC to work just fine.  (e.g., 90% of your objects
may die within 100 KB of allocation, and none may die between
100KB and 150KB, then you lose another 5% around 1MB... and the
rest live 10-100MB...)  Along this "curve", you may have discontinuities
where most objects die very soon, but that doesn't necessarily defeat
the generational principle.  (And it's an opportunity for tuning
or adaptiveness.)

Talking about half-lives can be very misleading, because programs
aren't stochastic processes.  (The allocator literature is a disaster
because people took the notion of stochastic processes and applied
it where it wasn't appropriate---see my allocator survey on my
web site if you're interested.)

We need to be careful about the characterization of younger-to-older
vs. older-to-younger pointers, too.  Pointers from older to younger
objects don't matter for most generational GC's, as long as they
don't cross generation boundaries.  I think it's pretty common
to create some data structure in some odd order, using assignment,
but if the whole data structure stays together in one generation,
and is advanced together to the next, it's not a problem.

This suggests that if you move to more and smaller generations, you
may get more problematic pointers from old-to-young, because you
have more generation boundaries that might get crossed.

My intuition is that this doesn't happen often for reasonable numbers
of generations (say, up to 4), but it would be great to see data for an
8-generation system.  (And lots of real programs, and lots of parameter
settings, of course.)

Even better, it'd be good to see data for a system like Lieberman
and Hewitt's original design, where "generations" are really age
cohorts, and you never take different batches of objects and mingle
them together in an "older" generation.  (That is, in principle
you can keep objects born at about the same time together in
birth-time batches, which you may have an indefinite number of,
rather than having a sort-of-logarithmic batching of objects by very
rough age.)  In practice, I think this would be a bad idea if done
naively, because it is good to lump several batches of objects together
and call them "a" generation, just so that you don't get lots of
old-to-new pointers.  But the data could be enlightening.  (Maybe
Marc's got data along those lines?)