[gclist] benchmarks

Hans Boehm boehm@hoh.mti.sgi.com
Thu, 6 Nov 1997 12:38:39 -0800


On Nov 6,  9:16am, Giuseppe Attardi wrote:
> In our experience we would probably have not figured out that Boehm was
> performing poorly if we did not have a comparison with CMM at hand:
> we would have assumed that the application was just too big to handle.
> And even then it took some effort to figure out that the reason were interior
> pointers or the allocation of big integers.

This is certainly a valid complaint.  But solving the problem is really just
the proverbial "small matter of programming".

I wrote a rudimentary tool for Cedar while I was at PARC that would generate a
random address in the heap, and then backtraced from that (using the obvious
method involving many sequential heap searches) until it found a root.  This
gave yo a statistical sample of why objects were being retained.  For programs
that had run (and leaked) long enough, that was likely to point out the
problem. The tool would run for a small number of minutes to get the necessary
backtraces, but that wasn't a big deal.

It would be easy enough to combine this with an allocator that saved allocation
call stack summaries in objects, so that you could identify the objects in the
trace.  In this case, you would have noticed fairly easily that many of the
traces went through objects (digit sequences) that should not have contained
pointers.

The reason this hasn't been done in a more general setting is that the code
needs to use debugger information to print a reasonable description of these
traces.  And as a result none of it's likely to be very portable.  (Cedar had
semi-adequate information floating around inside the runtime.  Most C/C++
systems don't.)

Note that you need this tool even for a nonconservative collector.  Most of the
heap growth problems I've encountered were due to growing data structures, not
pointer-finding issues.  This is very useful for pinning those down, too.

Hans




-- 
Hans-Juergen Boehm
boehm@mti.sgi.com