[gclist] Re: Name that hypothesis

Henry G. Baker hbaker@netcom.com
Wed, 4 Dec 1996 17:49:58 -0800 (PST)

> The most detailed measurements I know about generational GC in a lazy
> settingare in my FPCA93 paper with Patrick Sansom (though maybe I've missed
> some)
> 	http://www.dcs.gla.ac.uk/fp/papers/gen-gc-for-haskell.ps.Z
> Contrary to what Henry says, and contrary to our initial expectation
> we found that generational GC worked very well.  While things are
> updated often, they are almost always updated when very young, so it
> doesn't matter much.
> And these measurements were with an Appel-style collector that is
> prone to premature tenuring (it promotes everything live at a minor GC).
> Simon Peyton Jones
> | I agree with Hans re programming styles.  I also forgot to mention
> | some work from the _lazy_ functional programming community, whose
> | results are much more like those of the logic programming community
> | than those of the strict functional programming community -- e.g., ML.
> | The reason is that the lazy values get updated _after_ the links to
> | them are created, and this screws up generational GC's pretty badly.
> | I seem to recall several papers in the past several years from FPCA
> | and some of the GC conferences that ran into this problem.  I also
> | seem to recall that the problem can be ameliorated, but I don't recall
> | off the top of my head how this was done.

After a brief perusal of my library, I found one paper which alluded
to some of the problems:

Seward, Julian.  "Generational Garbage Collection for Lazy Graph Reduction".

"Although not listed here, the total number of updates in old-space,
whether or not the overwriting cell contains pointers, was also
recorded.  This indicates that about two-thirds of all old-space
updates had to be entered into the forward reference table.  Ignoring
'Primes', this means that at worst 97.6% of all updates are _not_ in

"...We observe that the 'Primes" program has a large number of old-space
updates and hence an inordinately large collection time...."

"So there is at least one program to which generational collection
does not respond well.  Is that a good reason to abandon this
approach?  We think not."

"Although collector performance in this case ['Primes'] is not as good
as we would like, it is not bad: perhaps two or three times the cost of
semispace collection in a moderate-sized heap."

"...It seems a pity to throw away this large average-time performance gain."

"This paper provides strong evidence that generational garbage collection
is viable for lazy reduction systems...."

"Finally, some modifications were suggested.  Whether or not these are a good
idea awaits further work."


So, my brain apparently flagged a special case without taking into
consideration the excellent average performance mentioned by this author
and Simon Peyton Jones.

Seward also points out the different programming style used in the
'Primes' benchmark.

If one is proposing a GC algorithm that has statistically better
performance, perhaps there is a hard mean lower bound -- analogous to
the use of information theory in compression technology -- that forces
at least some programs to have _worse_ performance.  Just as we don't
throw away 'compress', 'gzip', 'pkzip', etc., due to their worse
performance on some files -- e.g., an already compressed file -- we
wouldn't throw away a better GC algorithm, either, based on datapoints
of very small measure (unless, of course, those datapoints cause the
machine to crash ;-) ;-).

I will also defer to Simon regarding the performance of GC's in lazy
functional programming languages!

Henry Baker
www/ftp directory: