[gclist] Name that hypothesis
Nick Barnes
nickb@harlequin.co.uk
Tue, 10 Dec 1996 17:08:27 +0000
> >I've read and enjoyed your allocator papers, and agree with the
> >substance of them. Stochastic models are overly simple. However, I
> >don't think that using continuous random processes to model the
> >behaviour of an unknown set of unknown mutators is, in itself, a bad
> >thing.
>
> Hmmm... I think it's a bit dangerous to say that continuous random
> process are an acceptable model of an unknown set of unknown anythings.
> (I'm not sure we disagree, but I think the vocabulary can be misleading.)
> If something's unknown, you can't model it with any real confidence,
> and using "continuous random processes" is likely to introduce huge
> biases. Program behavior is often discontinuous and almost always
> nonrandom (i.e., it's usually phased and strongly patterned).
I am using "continuous" in the analytical sense, as distinct from
"discrete". I am not using it to mean "constant", or "steady state".
This is still a simplifying assumption, but not such a huge one (nor,
I believe, such a distorting one). I have a continuous random model
which can account for phases &c. I am in the process of refining this
and fitting it to the large-scale mutators which I have available for
study. This is all very much part-time, however; I hope that an
academic may be able to devote more resources to such a project.
I would note that very many other systems, including ones with little
actual randomness, have been successfully modelled with random models.
The key is to ensure that the randomness introduced corresponds to
actual variation in the system, or between the systems, under study.
This is where Markov models of allocators have failed, as you have
powerfully observed: they have added randomness which does not
correspond to real systems, and as a result have made predictions
which do not correspond to reality.
> What we found in studying allocators is that simple Markov-like models
> as traditionally applied to allocation problems are simply wrong, and
> deeply wrong.
Yes. Such Markov models are discrete, not continuous.
> [...] I can believe that you can come up with an interesting
> parameterized model that models *phases* and nestings of phases
> using random variables [...] You don't have a single parameter, or
> even a pair of parameters.
Indeed.
> But then you're left with the problem of trying to find combinations
> of parameters that generate realistic large-scale behaviors, and to
> do that, you need real data and sophisticated analyses.
Yes.
> [...]
> >(1) trace and measure the mutators one has available,
> >(2) construct benchmark mutators,
> >(3) construct mathematical models of mutators.
> >
> >One would be a fool not to pursue (1), but the set of available
> >large-scale mutators is often very small; it is easy to end up with a
> >GC specialized to a couple of programs which then performs abysmally
> >with another user's code.
>
> Right. But the only good solution is to gather more data, and to
> have a realistic idea of what behaviors matter. This is where
> allocator researchers went wrong in the 60's, and it led to
> decades of wasted effort. (That's why I'm pretty vehement about
> this. The GC literature is much better than the allocator literature,
> and I'd like it to stay that way.)
Right. I think that, in some GC areas at least, there is a good
intuitive understanding of what behaviours matter. We even have a
somewhat informal terminology for it ("pig in the python", "premature
tenuring", "clustering", "phase death", &c).
> Or if we're modeling the higher-level behavior that is directly
> relevant to the systems we're studying, we need to make that very
> clear, and point out how _ad_hoc_ the models are. (Ad hoc models
> are often good, but it's important to make clear which _hoc_ they're
> _ad_ to, i.e., which facts we're modeling, and which we're not,
> and why it's appropriate.)
Yes.
> I don't want to discourage modeling. I *do* want to discourage
> over-mathematization of things that aren't mathematically tractable.
I believe that things are either mathematically tractable or not
scientifically tractable at all. There is no science without numbers,
and any study of numbers is mathematics.
> [...]
> For synthetic benchmarks, I'd like to see programs that get by without
> random number generators. [...]
I agree. Some actual mutators use random numbers, or external
behaviour that has a random element (for an obvious example, see
http://www.harlequin.com/full/press/pr-att.html), so we need some
synthetic benchmarks which reflect that, but most programs do not (and
our models need to reflect that).
But, for instance, we might need to know that a compiler written in a
GCed language does not make a pathologically bad case for the GC
whatever the nature of the source code fed to it. To determine this,
one needs a model of the compiler behaviour which takes a number of
parameters describing the source, which parameters can be widely
varied. A random model is well suited to this.
Nick B