[gclist] Name that hypothesis

Nick Barnes nickb@harlequin.co.uk
Fri, 06 Dec 1996 13:07:21 +0000


> >A more precise statement is that the instantaneous half-life increases
> >monotonically with age.
> 
> In keeping with Dave's "curve schmurve" comment, I have to disagree
> with this formulation.
> 
> In general, the "curve" is not monotonic, and doesn't have to be
> for generational GC to work just fine.  (e.g., 90% of your objects
> may die within 100 KB of allocation, and none may die between
> 100KB and 150KB, then you lose another 5% around 1MB... and the
> rest live 10-100MB...)  Along this "curve", you may have discontinuities
> where most objects die very soon, but that doesn't necessarily defeat
> the generational principle.  (And it's an opportunity for tuning
> or adaptiveness.)
> 
> Talking about half-lives can be very misleading, because programs
> aren't stochastic processes.  (The allocator literature is a disaster
> because people took the notion of stochastic processes and applied
> it where it wasn't appropriate---see my allocator survey on my
> web site if you're interested.)

I've read and enjoyed your allocator papers, and agree with the
substance of them. Stochastic models are overly simple. However, I
don't think that using continuous random processes to model the
behaviour of an unknown set of unknown mutators is, in itself, a bad
thing. When designing a garbage collector, one often wants to cater to
a wide range of mutator behaviours. There are a few obvious ways of
preparing for this:

(1) trace and measure the mutators one has available,
(2) construct benchmark mutators,
(3) construct mathematical models of mutators.

One would be a fool not to pursue (1), but the set of available
large-scale mutators is often very small; it is easy to end up with a
GC specialized to a couple of programs which then performs abysmally
with another user's code.

(2) is quick and easy, but most benchmarks are far too trivial to
stress a GC in realistic ways. Again, it should be done (if only to
eliminate trivial worst-case behaviour), but it is clearly
insufficient.

This leaves us with (3) as an essential tool for the GC researcher and
designer. And if one's observations from real large-scale mutators are
that some aspects of their behaviour can be realistically modelled by
a continuous random process, one is lead naturally to investigate the
mathematic properties of continuous random models.

The principle danger here is that one must not pick too simple a
model. I agree that my use of the term "half-life" implies too simple
a model; I was using it to avoid discussing the second-order
derivatives which actually come out of the simplest actual such model.

> We need to be careful about the characterization of younger-to-older
> vs. older-to-younger pointers, too.  Pointers from older to younger
> objects don't matter for most generational GC's, as long as they
> don't cross generation boundaries.  I think it's pretty common
> to create some data structure in some odd order, using assignment,
> but if the whole data structure stays together in one generation,
> and is advanced together to the next, it's not a problem.

I think Mark Tillotson's observation here is a good one. What one
really cares about are "long" pointers; a little local old-to-young
does nobody any harm.

Nick B