[gclist] GC debug (was: What to say about GC)

William R. Dieter dieter@lexmark.com
Mon, 29 Jul 1996 15:38:32 -0400 (EDT)


> I suppose that I could turn this into a question, and ask
> "What techniques have people found useful in GC debugging,
> as some of the usual methods don't work well?" The obvious
> ones are:
> - writing separate code that checks for GC induced errors
> - writing locations with values likely to cause bus errors,
>    to cause errors to show up quickly.
> 
> Any others? Or (on-line) places I can look for information
> on this? I want to be well prepared when I start working
> on GC for my system.
> 

In dealing with a mark and sweep compacting garbage collector, I have
found most problems to be either of the "live object was not marked",
"pointer was not updated", or "pointer was incorrectly updated"
categories.  I would suspect a copying garbage collector would have
pretty much the same problems.  In a non-compacting mark and sweep
garbage collection you only have the "live object was not marked"
problems.  "dead object was not garbage collected" is a problem in any
garbage collector, although that is harder to detect, and doesn't
directly cause crashes.

As a really simple first check, I have a global flag that lets me turn
off garbage collection.  Unless it runs out of memory, if a job does
not run successfully with garbage collection turned off, it is
probably not the garbage collector's fault.

I have found that writing all the locations you think are garbage with
some distinct value that will cause a bus error, as mentioned above,
helps flush out quite a few bugs.  Actually, this technique works
pretty well for non-GC memory management bugs as well.

One trick I have used when I am hunting a GC bug is to run the garbage
twice in a row.  If it has modified a pointer to point to the wrong
location or freed something that was not garbage, the GC will usually
crash on the second try because the garbage collector has to follow
all of the pointers in all live objects.  At this point you can dump
memory and try to figure out what when wrong, or re-run and see what
the data structure that is causing problems looked like before garbage
collection.  This probably won't help if you have an incremental
garbage collector.

In many cases I have to resort to printing out low level information
like "marked x bytes at address y" or "moved pointer a to pointer b".
If I have pointer that is pointing at what looks like a bad location,
I will look back through the low level debug information about marking
to see if the pointer was marked.  If it wasn't marked, then it is a
"live object not marked" problem.  If it was marked, I look through
the update information to see how the pointer was updated.  This can
be pretty time consuming, especially if the system has to run for
quite a while before it crashes.

I, too, would be interested in hearing what other people do to debug.

Bill Dieter.
dieter@lexmark.com