[gclist] What to say about GC.
Jerry Leichter
leichter@smarts.com
Thu, 25 Jul 1996 10:14:33 -0400
I recently had a battle with a bug that illustrates both sides of the arguments
here.
The bug was in code for an OO database, and manifested itself as random failures
when attempting to restore from a checkpoint. It turned out that the person who
implemented the restore code had neglected to clear a pointer to an object that
describes a pending transaction. Unfortunately, as this was C++ code, he had
(independently) deleted that object.
Now, consider what actually happened, versus what would have happened with GC:
1. Object is released, pointer becomes dangling. Code checks to
see if pointer is null, sees it isn't, decides there's an
active transaction, takes alternate path. This path ends
up producing few visible effects, but does store through the
dangling pointer, over-writing random objects in memory and
producing random apparently-unrelated symptoms later. (In
practice, in the particular configuration I was testing, the
random memory over-written was some other object's vtable
pointer, resulting in calls to random memory when any of its
virtual functions were invoked. Very confusing.)
vs. 2. Pointer isn't cleared, but there is no explicit release, so the
transaction object sticks around. Code checks to see if
pointer is null, sees it isn't, decides there's an active
transaction, takes alternate path. Stores through the pointer
go into the transaction object that "shouldn't be there".
There are few visible symptoms, though eventually the fact that
half the code thinks a transaction is in progress and half
doesn't is bound to cause some problem, such as an unexpected
transaction failure.
Now, one could argue that the GC failed to protect against, or even detect, the
bug. One might even argue that it served to *hide* the bug: Stores through the
uncleared pointer "should" have produced errors, but with GC they wouldn't. But
I know which class of symptoms *I'd* like to work with! GC may not be perfect,
but at least it maintains the program's abstractions. Stores through dangling
pointers don't.
For the record, tracking this down took over 3 weeks - by far the longest time
I've ever spent on one bug in the almost 25 years I've been programming. (Have
I lived a charmed life?) For reasons that I can't explain, *none* of the tools
I tried (Purify, Sentinel, the Solaris debugger's memory tracing facility) were
able to deal successfully with the code involved - a big, hairy, multi-threaded
server - either dieing in unpleasant ways or just failing to start.
-- Jerry