[gclist] Guardians
Jerry Leichter
leichter@smarts.com
Thu, 10 Apr 1997 09:53:27 -0400
| Why is it that "closing a file" is such a common example of finalisers?
|
| If you assume a perfect world, so that the mere act of asking for something
| to be written is a guarantee that it will be, then closing by finalising
| makes sense. But in the real world, you really want something like
|
| out_ok = open ok and then write ok and then write ok and then close ok
|
| What happens if you are relying on close-by-finaliser and the attempt to
| close the file fails? Presumably the entire chunk of the program that
| knew about the file and so could have attempted some kind of recovery
| (perhaps by writing to a different file) has disappeared....
|
| The underlying assumption, in other words, appears to be that finalisers
| never fail, or that if they do, the program couldn't have done anything
| about it anyway....
|
| What have I misunderstood?
Nothing, as far as I can see.
Much of this debate reminds me of a line a friend of mine (Martin Minow) came up
with years ago: Virtual memory is fine if you want to do virtual work.
GC provides a program with an abstract view of memory. In this abstract view,
memory is infinite. It's great to program with abstractions. They hide the
grubby details that make programming different from pure mathematics. But
abstract interfaces have to have real implementations - and real implementations
of all but the most trivial abstract interfaces fail to "abstract away" some of
the grubby details. You may program on a garbage-collected system as if memory
were infinite, but it's actually quite finite. If you keep pointers to semantic
garbage around, your performance will suffer. If your actual semantically-live
data exceeds the physical memory available by a large enough factor, your
performance will disappear. Ultimately, if your actual semantically-live data
exceeds the available address space, your program won't run.
It happens that using virtual memory on modern systems, most programs will only
rarely if ever run into the second two limitations. As for the first, see
"Debugging Storage Management Problems in Garbage-Collected Environments" by
Detlefs and Kaslo (ftp://gatekeeper.dec.com/pub/misc/detlefs/detlefs-coots95.ps)
Now consider using garbage collection to recover file handles. While the memory
available on typical systems is hundreds of thousands of times the size of
typical requests, the number of file handles available is rarely more than a few
tens of times the size (1) of typical requests. For allocation in "file handle
space," running out of the total available space is quite easy. If it weren't,
all the discussion of the need for prompt finalization to recover file handles
from dead objects wouldn't exist.
Mr. O'Keefe is raising other ways in which the reality of underlying resources
may differ from the abstractions one may wish to build upon them. Sure, it's
nice to have abstract files that never run into I/O errors, never run out of
space, never become inaccessible because network links fail. People write
programs assuming such abstract files all the time, providing their uses with
abstractions of databases that never fill or fail. And when problems at the
bottom level arise and come smashing through the layers of abstraction, they
smash, as it were, all the way through the video screen, leaving bits of broken
glass all over the user's lap. Users have to deal with real-world people,
objects, actions; the abstractions in their computers are only images of the
real-world, and those images are never completely faithful.
GC provides a wonderful, powerful abstraction mechanism. Like all abstraction,
it's based on hiding some ugly realities - and, inevitably, ignoring others.
Files are themselves abstractions - the open/read/write/close model, which goes
back to OS/6, isn't a physical reality either: A disk knows nothing about open
files. The file abstraction, unlike the GC "infinite perfect memory" abstrac-
tion, is at least structured so that the underlying reality can be handled in a
clean way (even if this potentiality isn't often properly realized).
Using finalizers to close files is never going to work correctly, since the GC
abstraction simply doesn't have any way to hide the details of file access. At
best, it can ignore those details - fine for simple-minded toy programs, useless
for real systems.
The right common abstraction to include both GC'd infinite memory and files
eliminates files entirely, and instead uses persistent objects. Failures can
still occur in the underlying real storage system, of course, but at least a
true persistent object store has enough access and control to be able to hide
them (with logging, redundancy, retries, whatever).
I'll advance a hypothesis: Finalizers (or weak references or any other
mechanisms of this general sort) are appropriate for improving performance
(e.g., getting rid of local proxies for no-longer-locally-accessed remote
objects); they are *in*appropriate for ensuring correctness. This hypothesis
shouldn't seem as radical as it looks: After all, *GC itself* was development
as a mechanism for improving performance, *not* for providing correctness. If
the abstraction that GC provides is "infinite memory", then the null GC,
together with enough (for the program at hand) real memory, should work just as
well!
-- Jerry