[gclist] Guardians

Jerry Leichter leichter@smarts.com
Thu, 10 Apr 1997 09:53:27 -0400


| Why is it that "closing a file" is such a common example of finalisers?
| 
| If you assume a perfect world, so that the mere act of asking for something
| to be written is a guarantee that it will be, then closing by finalising
| makes sense.  But in the real world, you really want something like
| 
| out_ok = open ok and then write ok and then write ok and then close ok
| 
| What happens if you are relying on close-by-finaliser and the attempt to
| close the file fails?  Presumably the entire chunk of the program that
| knew about the file and so could have attempted some kind of recovery
| (perhaps by writing to a different file) has disappeared....
| 
| The underlying assumption, in other words, appears to be that finalisers
| never fail, or that if they do, the program couldn't have done anything
| about it anyway....
| 
| What have I misunderstood?

Nothing, as far as I can see.

Much of this debate reminds me of a line a friend of mine (Martin Minow) came up 
with years ago:  Virtual memory is fine if you want to do virtual work.

GC provides a program with an abstract view of memory.  In this abstract view, 
memory is infinite.  It's great to program with abstractions.  They hide the 
grubby details that make programming different from pure mathematics.  But 
abstract interfaces have to have real implementations - and real implementations 
of all but the most trivial abstract interfaces fail to "abstract away" some of 
the grubby details.  You may program on a garbage-collected system as if memory 
were infinite, but it's actually quite finite.  If you keep pointers to semantic 
garbage around, your performance will suffer.  If your actual semantically-live 
data exceeds the physical memory available by a large enough factor, your 
performance will disappear.  Ultimately, if your actual semantically-live data 
exceeds the available address space, your program won't run.

It happens that using virtual memory on modern systems, most programs will only 
rarely if ever run into the second two limitations.  As for the first, see 
"Debugging Storage Management Problems in Garbage-Collected Environments" by 
Detlefs and Kaslo (ftp://gatekeeper.dec.com/pub/misc/detlefs/detlefs-coots95.ps)

Now consider using garbage collection to recover file handles.  While the memory 
available on typical systems is hundreds of thousands of times the size of 
typical requests, the number of file handles available is rarely more than a few 
tens of times the size (1) of typical requests.  For allocation in "file handle 
space," running out of the total available space is quite easy.  If it weren't, 
all the discussion of the need for prompt finalization to recover file handles 
from dead objects wouldn't exist.

Mr. O'Keefe is raising other ways in which the reality of underlying resources 
may differ from the abstractions one may wish to build upon them.  Sure, it's 
nice to have abstract files that never run into I/O errors, never run out of 
space, never become inaccessible because network links fail.  People write 
programs assuming such abstract files all the time, providing their uses with 
abstractions of databases that never fill or fail.  And when problems at the 
bottom level arise and come smashing through the layers of abstraction, they 
smash, as it were, all the way through the video screen, leaving bits of broken 
glass all over the user's lap.  Users have to deal with real-world people, 
objects, actions; the abstractions in their computers are only images of the 
real-world, and those images are never completely faithful.

GC provides a wonderful, powerful abstraction mechanism.  Like all abstraction, 
it's based on hiding some ugly realities - and, inevitably, ignoring others.  
Files are themselves abstractions - the open/read/write/close model, which goes 
back to OS/6, isn't a physical reality either:  A disk knows nothing about open 
files.  The file abstraction, unlike the GC "infinite perfect memory" abstrac- 
tion, is at least structured so that the underlying reality can be handled in a 
clean way (even if this potentiality isn't often properly realized).

Using finalizers to close files is never going to work correctly, since the GC 
abstraction simply doesn't have any way to hide the details of file access.  At 
best, it can ignore those details - fine for simple-minded toy programs, useless 
for real systems.

The right common abstraction to include both GC'd infinite memory and files 
eliminates files entirely, and instead uses persistent objects.  Failures can 
still occur in the underlying real storage system, of course, but at least a 
true persistent object store has enough access and control to be able to hide 
them (with logging, redundancy, retries, whatever).

I'll advance a hypothesis:  Finalizers (or weak references or any other 
mechanisms of this general sort) are appropriate for improving performance 
(e.g., getting rid of local proxies for no-longer-locally-accessed remote 
objects); they are *in*appropriate for ensuring correctness.  This hypothesis 
shouldn't seem as radical as it looks:  After all, *GC itself* was development 
as a mechanism for improving performance, *not* for providing correctness.  If 
the abstraction that GC provides is "infinite memory", then the null GC, 
together with enough (for the program at hand) real memory, should work just as 
well!

							-- Jerry