[gclist] What to say about GC.

Jerry Leichter leichter@smarts.com
Thu, 25 Jul 1996 10:14:33 -0400


I recently had a battle with a bug that illustrates both sides of the arguments 
here.

The bug was in code for an OO database, and manifested itself as random failures 
when attempting to restore from a checkpoint.  It turned out that the person who 
implemented the restore code had neglected to clear a pointer to an object that 
describes a pending transaction.  Unfortunately, as this was C++ code, he had 
(independently) deleted that object.

Now, consider what actually happened, versus what would have happened with GC:

	1.  Object is released, pointer becomes dangling.  Code checks to
		see if pointer is null, sees it isn't, decides there's an
		active transaction, takes alternate path.  This path ends
		up producing few visible effects, but does store through the
		dangling pointer, over-writing random objects in memory and
		producing random apparently-unrelated symptoms later.  (In
		practice, in the particular configuration I was testing, the
		random memory over-written was some other object's vtable
		pointer, resulting in calls to random memory when any of its
		virtual functions were invoked.  Very confusing.)

vs.	2.  Pointer isn't cleared, but there is no explicit release, so the
		transaction object sticks around.  Code checks to see if
		pointer is null, sees it isn't, decides there's an active
		transaction, takes alternate path.  Stores through the pointer 	
		go into the transaction object that "shouldn't be there".
		There are few visible symptoms, though eventually the fact that 
		half the code thinks a transaction is in progress and half
		doesn't is bound to cause some problem, such as an unexpected
		transaction failure.

Now, one could argue that the GC failed to protect against, or even detect, the 
bug.  One might even argue that it served to *hide* the bug:  Stores through the 
uncleared pointer "should" have produced errors, but with GC they wouldn't.  But 
I know which class of symptoms *I'd* like to work with!  GC may not be perfect, 
but at least it maintains the program's abstractions.  Stores through dangling 
pointers don't.

For the record, tracking this down took over 3 weeks - by far the longest time 
I've ever spent on one bug in the almost 25 years I've been programming.  (Have 
I lived a charmed life?)  For reasons that I can't explain, *none* of the tools 
I tried (Purify, Sentinel, the Solaris debugger's memory tracing facility) were 
able to deal successfully with the code involved - a big, hairy, multi-threaded 
server - either dieing in unpleasant ways or just failing to start.

							-- Jerry