[gclist] Re: Name that hypothesis

Nick Barnes nickb@harlequin.co.uk
Thu, 05 Dec 1996 11:07:06 +0000


> The problem is that the real world has committed a systematic sampling error:
>  All the real programs most of us use are written in C (probably with some
> amount of C++ of various flavors thrown in, depending on the platform).

Data point: about 90% of the cycles I use daily are ML. Many of the
rest are elisp.

> This means most of us have 2 choices when it comes to selecting
> sample programs for such studies:
> 
> a) Ignore the real world, and consider programs written in a garbage
> collected language in what we consider appropriate style (on which
> not many of us would agree), or
> 
> b) Measure C programs, and perhaps try to compensate for artifacts
> of the language and for inappropriate programming style.
> 
> Option (a) has many problems. The samples tend to reflect the
> programming styles of a few people, and tend to be biased by
> artifacts of one or two implementations. The samples tend to be
> small and sometimes artificial. The programs often have not gone
> through much performance tuning. They're arguably less likely to
> solve problems that most of us care about.

There is definitely a problem in benchmarks for functional programming
languages. For instance, in the ML world there is a habit of running a
set of about 8 benchmarks, none of which take very long and most of
which are very unrepresentative of real-world ML programs. It's
analogous to the situation in workstation benchmarks before the
formation of SPEC: everyone used Whetstones, Dhrystones, MIPS, and
similarly meaningless numbers.

> In an ideal world, I'd like to see as many measurements as possible.  But I'm
> certainly not willing to dismiss measurements becasue they only include C
> programs.

Absolutely. Those of us with an interest in memory managers for C and
C-like languages (Modula-2?) should take a keen interest in such
measurements. Mutatis mutandis for these other classes of language:

	- OO languages
	- strict functional languages
	- Lisps
	- logical languages
	- lazy functional languages
	- &c

Each of these classes of language encourages a certain style, and
implementation strategy, which has a different effect on memory use
patterns. Thus measurements for each are very interesting to those
involved in that class of language, and less interesting to those
involved in other classes.

Nick B