[gclist] Name that hypothesis

David Ungar david.ungar@Sun.COM
Thu, 5 Dec 1996 11:51:52 -0800


Amen (to at least the first half of this), Paul!

- Dave


At 8:56 AM -0600 12/5/96, Paul R. Wilson wrote:
>>From majordom@iecc.com Thu Dec  5 05:20:40 1996
>>Sender: majordom@iecc.com (MajorDomo)
>>To: gclist@iecc.com
>>Subject: Re: [gclist] Name that hypothesis
>>Date: Thu, 05 Dec 1996 11:09:23 +0000
>>From: Nick Barnes <nickb@harlequin.co.uk>
>>
>>A more precise statement is that the instantaneous half-life increases
>>monotonically with age.
>
>In keeping with Dave's "curve schmurve" comment, I have to disagree
>with this formulation.
>
>In general, the "curve" is not monotonic, and doesn't have to be
>for generational GC to work just fine.  (e.g., 90% of your objects
>may die within 100 KB of allocation, and none may die between
>100KB and 150KB, then you lose another 5% around 1MB... and the
>rest live 10-100MB...)  Along this "curve", you may have discontinuities
>where most objects die very soon, but that doesn't necessarily defeat
>the generational principle.  (And it's an opportunity for tuning
>or adaptiveness.)
>
>Talking about half-lives can be very misleading, because programs
>aren't stochastic processes.  (The allocator literature is a disaster
>because people took the notion of stochastic processes and applied
>it where it wasn't appropriate---see my allocator survey on my
>web site if you're interested.)
>
>We need to be careful about the characterization of younger-to-older
>vs. older-to-younger pointers, too.  Pointers from older to younger
>objects don't matter for most generational GC's, as long as they
>don't cross generation boundaries.  I think it's pretty common
>to create some data structure in some odd order, using assignment,
>but if the whole data structure stays together in one generation,
>and is advanced together to the next, it's not a problem.
>
>This suggests that if you move to more and smaller generations, you
>may get more problematic pointers from old-to-young, because you
>have more generation boundaries that might get crossed.
>
>My intuition is that this doesn't happen often for reasonable numbers
>of generations (say, up to 4), but it would be great to see data for an
>8-generation system.  (And lots of real programs, and lots of parameter
>settings, of course.)
>
>Even better, it'd be good to see data for a system like Lieberman
>and Hewitt's original design, where "generations" are really age
>cohorts, and you never take different batches of objects and mingle
>them together in an "older" generation.  (That is, in principle
>you can keep objects born at about the same time together in
>birth-time batches, which you may have an indefinite number of,
>rather than having a sort-of-logarithmic batching of objects by very
>rough age.)  In practice, I think this would be a bad idea if done
>naively, because it is good to lump several batches of objects together
>and call them "a" generation, just so that you don't get lots of
>old-to-new pointers.  But the data could be enlightening.  (Maybe
>Marc's got data along those lines?)