[gclist] Representation of pointer set

Robert A Duff bobduff@world.std.com
Tue, 2 May 2000 09:21:55 -0400 (EDT)


During the mark phase, a GC will typically keep a set of pointers to
objects that have been marked but their children have not yet been
marked.  In the worst case, this set can grow very large, but that seems
unlikely.

Representing the set as a linked list seems inefficient.  An array seems
like the right choice.

The literature is full of techniques for dealing with overflow of this
set.  They seem annoyingly complicated to me.

My question is: Why is this necessary, on a machine with a large virtual
address space?  Why not just allocate an enormous array in *virtual*
memory (1/4 of the address space?)?  Then the array can't overflow.  In
most cases, the array will use a reasonably small amount of *physical*
memory, so it will be efficient.  If somebody creates an extremely long
linked list, then the array might actually eat up large amounts of
physical memory, and therefore cause slowness due to paging.  But it
seems tolerable to have slowness in unlikely cases.  Pretty much any
modern GC has *that* property -- show me an incremental or generational
GC, and I can construct a pathological case that will kill its
performance.

This all assumes a reasonable operating system, which commits physical
memory pages only when needed.

- Bob