[gclist] Finalizers & Reference counting.

Jerrold Leichter jerrold.leichter@smarts.com
Fri, 30 Aug 2002 07:56:29 -0400 (EDT)


This whole discussion does bring up an interesting point, however.  Some
programs - compilers are a great example - go through phases with the
property that:

	a)  Each phase allocates and manipulates a significant amount of
		memory, much of which remains live throughout the phase;

	b)  Much of the memory allocated in a phase goes dead when the
		phase ends.

(In many cases, the "much" memory is actually identifiable at the point of
allocation.)

The classic method to deal with this in a non-GC environment is to support
the notion of an allocation domain (various name have been used for this).
Each phase creates an AD, and phase-specific memory is allocated from the
AD.  At the end of the phase, the entire AD is deleted, normally using a
fast mechanism that is often essentially independent of the number of
allocated blocks in the AD.  In a very simple case, an AD is a block of
contiguous memory, allocations just "bump the pointer", and deleting an AD
simply requires deleting the large block, ignoring its content.  This can be
surprisingly effective if AD-allocated memory is rarely deleted within the
phase.  (It also helps cache/working set locality.)

The question is:  How could you capture the same advantages in a GC'ed
system?  It seems that AD's, in some ways, are similar to generations.
However, generations have the wrong lifetime semantics:  Anything we would
likely have put into an AD will probably get promoted to a generation that
will stick around for long after the phase ends.  Still, the basic
*techniques* used to implement generational collection could perhaps be used
here.

Thus, imagine that, in a GC'ed system, we retain the same three user
visible primitives:

	ad <- createAD()
	pointer <- allocate(size,ad)
	destroy(ad)

We would like destroy() to be fast, as immediate as possible, but of course
safe.  If we maintain the invariant that no pointer to an object in an
AD can come from outside the AD, then destroy() becomes trivial - just put
everything in the AD - or the whole AD, if you allocate it as a big chunck -
on the free list.  This should be implementable with write barriers.  (There
are, of course, subtleties like allowing register and stack pointers to
*not* force stuff out of the AD, or everything will wander out of it rapidly.)

I'm sure a complete design and implementation would reveal all kinds of
issues.  Has anyone ever looked at this kind of "hinted" GC?

							-- Jerry