[gclist] Get Real! Require GC (was re: quality of impl., etc.)

Paul R. Wilson wilson@cs.utexas.edu
Sat, 20 Apr 1996 13:38:08 -0500


>A legal Lisp compiler could simply not have a garbage collector.

I think this is getting silly.

Maybe the language standard doesn't require garbage collection, _but_it_
_should_.  The fact that acceptable GC behavior is hard to define in the
hard cases does not mean that it shouldn't be required in the paradigmatic
and easy cases.  It doesn't mean that we shouldn't state our intent:
"This here language requires GC."

A Lisp without a GC is simply broken, and the language spec should say
so explicitly.  Language lawyers should take a tip from real lawyers, and
write specs that have a clear intent, even if they have fuzzy boundary cases.

For example, there are laws that talk about things like "intent to kill."
Defining "intent" precisely is impossible at the moment, because psychology
isn't far enough advanced.  Does that keep real legislators from outlawing
murder?  No.  It just means that sometimes weird cases end up in the
courts.

The idea that things like GC are "quality-of-implementation issues" is
a complete and serious cop-out, in my opinion.  Computer scientists try
*way* too hard to be like mathematicians and say that if you can't define it
precisely, it's meaningless.  This is bunk, it's unscientific, and it's
a big problem all through CS.

A language spec for a GC'd language should explicitly require GC (or
sufficient compiler analysis to avoid it, for some programs, blah blah
blah..).  There should be clear "usual cases" where the data memory
requirements of a program are within a constant factor of something
reasonable, plus or minus a reasonable constant.  (E.g., "shall not grow
without bound while we indefinitely loop to allocate and drop dumb objects")

This doesn't mean that we can't use conservative GC's that may leak in a
tiny minority of cases.  The language spec can explicitly say that it is
acceptable for an implementation to take a small risk like that, but if it
systematically loses on a significant fraction of programs, it's just
broken.  We can say things like this.  We're allowed.  Who's to stop us,
and who are they, anyway?

Consider laws about negligence and so on.  They don't say you can't take a
reasonable risk---just that you can't take *unreasonable* risks.  Sometimes
a jury has to decide whether a risk was reasonable or not.  Big deal.  That's
life.  In the case of a language spec, we don't have a court system to
adjudicate things, but we can certainly ridicule people and point to
the spec _if_it_says_what_it_should_say_.  If we leave important stuff out
because it's "not precise enough", we deserve what we get.  People can
claim to have a conforming implementation when they clearly violate the
intent of the language spec.

(Notice that in many cases, people who want to get off from a murder
charge have to plead insanity.  So should any implementor who claims
they have a Lisp implementation that doesn't have a garbage collector.)

To some degree, this can be addressed with validation suites.  Some languages
(like Ada) have official validation suites that you have to be able to
run to call a compiler a compiler for that language.  If you choke on those
programs, it's not a legal compiler, period.  That doesn't mean it's a
perfect, proven-correct compiler, but there isn't any such beast anyhow.

Other languages should have validation suites, too.  (Scheme has an
unofficial one.)  For a GC'd language, there should be a few simple
and reasonable programs that stress the GC a bit and see if it falls down
completely.  If it does, you shouldn't say you have a conforming
implementation.

If we're worried about complicated situations killing a compiler that takes
reasonable risks, don't put those complicated situations in the validation
suite, and _do_ say something about it in the language spec.  (E.g., "A
compiler shall make a reasonable effort to avoid leaks in this kind of
situation, but it is legal for a compiler to take a reasonable risk of
interactions between optimizations and GC [...], for example loop invariant
code motion may extend the life of a variable to that of its enclosing
block or even longer [...]")

That is, we should point out some clear cases, and say what must happen,
and we should point out some unclear cases, and explicitly say that 
there are no guarantees---but that anybody who _systematically_ loses
_is_a_loser_.

Ideally, we'd come up with some clear taxonomy of things that the GC must
handle and things it's allowed to act wierdly for, but that's not critical.
We don't have to have a taxonomy to state an intent.  Giving a few clear
examples should help, too, even in lieu of a real taxonomy.