[gclist] Two conservative collectors.

Fergus Henderson fjh@cs.mu.oz.au
Fri, 3 Dec 1999 16:25:29 +1100

[This mail was delayed due to the iecc.com mail server
blocking mail from sites running open mail relays.]

On 10-Nov-1999, Michael T. Richter <mtr@igs.net> wrote:
> I'm looking into the possibility of adding garbage collection (masquerading
> it as "run-time leak detection and correction" to managers convinced that
> GC is evil) to a C++ project.  I'd like to minimize the impact adding such
> GC to projects would entail.  This had led me to looking into conservative
> collectors.
> I've tracked down two conservative collectors which will work in the Win32
> environment: the Boehm-Demers-Weiser collector at
> http://reality.sgi.com/boehm_mti/gc.html and the Great Circle collector at
> http://www.geodesic.com/products/greatcircle.html.
> I'd like to get a feel for the costs of integrating either of these
> packages into existing code.  Can either of them be used "out of the box"
> with no source code changes whatsoever for at least a start?

Yes, the Boehm et al collector can be used in exactly that fashion.

For "production" use, i.e. once you start getting significantly
concerned about performance and reliability, you should modify
the way you handle certain kinds of data structures, e.g.
you should make sure to zero out unused sections of arrays,
and you will probably want to allocate pointer-free memory
(e.g. strings, bitmaps, etc.) using GC_MALLOC_ATOMIC() rather
than GC_MALLOC().

I haven't used the Great Circle collector, so I don't have any comments
about that.

> How reliable/unreliable are either of these?

In my experience, the Boehm collector is very reliable, if used correctly.
However, it is fairly easy to misuse.  I suggest you read the documentation
carefully before writing your code, and then read it carefully again
after writing your code but before running it.
Mistakes that I have made in the past include:

	- compiling the collector (and/or the application) with the
	  wrong set of `-D' options

	- pointing to memory allocated via GC_malloc() from
	  member allocated via malloc(), without compiling the
	  collector with -DREDIRECT_MALLOC

	- forgetting to initialize the collector properly
	  (e.g. not calling GC_init(), and

An unfortunate thing is that if something does go wrong, then it
can be quite hard to track down the cause of the problem.

Unfortunately it is not particularly portable, and so this tends to lead to
minor problems when migrating to some new or obscure platform or when
upgrading to new versions of the OS and/or standard library.
For example, as far as I know the Boehm collector does not yet
support shared libraries on Solaris/x86 (has anyone on this list
solved that yet?).

For various reasons, the Boehm collector is often suspected of being
responsible for many bugs that turn out to be something else.  If
something "inexplicable" is going on, the collector is always one of
the usual suspects.  That makes narrowing down the search a bit harder;
it's easy to waste a bit of debugging time investigating potential
problems with the collector when the real culprit is elsewhere.

But despite all that it is still significantly easier than manual memory
management, IMHO.

> How configurable are either of these?

The Boehm collector is highly configurable.  You can enable or disable
increment collection, you can configure for large or small systems, you
can set numerical parameters which control how much space it tries to
use, you can disable and re-enable collection and/or statistics
printing at runtime, you can try to get collection work done while
waiting for user input, you can allow recognition of arbitrary interior
pointers or you can set it to only follow pointers to the start of a
memory block, you can tell it to allow code using the low bits of
pointers as tags, you can have finalizers, you can choose between
several different models for when finalizers will be invoked, you can
set callback handlers so that the collector will notify your
application when starting a collection, or to handle warnings and error
messages (e.g.  when running out of memory), you can register your own
root sets, and so on.

Sometimes I think it is *too* configurable ;-)

Fergus Henderson <fjh@cs.mu.oz.au>  |  "I have always known that the pursuit
WWW: <http://www.cs.mu.oz.au/~fjh>  |  of excellence is a lethal habit"
PGP: finger fjh@        |     -- the last words of T. S. Garp.