[gclist] Re: Name that hypothesis

Hans Boehm boehm@hoh.mti.sgi.com
Wed, 4 Dec 1996 10:46:25 -0800

On Dec 4, 10:54am, Charles Fiterman wrote:
> Subject: Re: [gclist] Re: Name that hypothesis
> > || What is the name of the following hypothesis?
> > ||
> > || 	"Most references in a system point backwards in time, i.e. from
> > || 	 younger objects to older objects."
> >
> >We tested that hypothesis.  It is incorrect.
> >
> >Specifically, here is how some well-known C programs behave:
> I see systematic sampling error here. The programs are all in C.

The problem is that the real world has committed a systematic sampling error:
 All the real programs most of us use are written in C (probably with some
amount of C++ of various flavors thrown in, depending on the platform).

This means most of us have 2 choices when it comes to selecting sample programs
for such studies:

a) Ignore the real world, and consider programs written in a garbage collected
language in what we consider appropriate style (on which not many of us would
agree), or

b) Measure C programs, and perhaps try to compensate for artifacts of the
language and for inappropriate programming style.

Option (a) has many problems.  The samples tend to reflect the programming
styles of a few people, and tend to be biased by artifacts of one or two
implementations.  The samples tend to be small and sometimes artificial.  The
programs often have not gone through much performance tuning.  They're arguably
less likely to solve problems that most of us care about.

Particular languages seem to encourage very specific programming styles, making
it hard to generalize to anything else.  For example, Manuel Serrano concluded
that ML programs (perhaps especially ML benchmarks), benefit much more from
generational collection than even Scheme programs.

Given an exclusive choice, I greatly prefer option (b).  This does introduce
some biases.  But it's not clear to me why it should bias pointer direction
measurements, except in comparison to language implementations that make it
hard or impossible to introduce forward pointers.

In an ideal world, I'd like to see as many measurements as possible.  But I'm
certainly not willing to dismiss measurements becasue they only include C


Hans-Juergen Boehm