[gclist] Representation of pointer set

Boehm, Hans hans_boehm@hp.com
Tue, 2 May 2000 10:30:50 -0700


The issue is really whether you want to be able to recover from running out
of swap space.  If not, and if you have a 64-bit machine, then it seems
reasonable to me to just preallocate a huge region for the grey object
stack/queue.

My impression has been that if you want to convert C/C++ programmers, you
need to claim you can handle running out of swap space, and do roughly as
well as the average Unix application at doing so.  (Well-written
applications seem to be able to recover at least well enough to die
gracefully about 95% of the time.  The remaining 5% of the time you run out
of memory someplace other than in malloc, or some crucial system daemon runs
out of memory first, and life gets hard.)  Unfortunately, a garbage
collector with a mark stack tends to almost run out of space growing the
mark stack, usually because it just failed to grow the heap, and then
collected instead.  In my experience, you get much less than 95%
recoverability unless you actually deal properly with mark stack overflows
(which we try to do).

Unlike Eliot, I would be inclined to represent the mark stack as a
contiguous array, with no links, since the code to add/remove things from
the stack/queue tends to be on the critical path.  The links probably don't
slow things down in the typical case very much, but can add variance if you
happen to bounce back and forth across one of the links.  But this is a
minor point.

Hans

-----Original Message-----
From: Eliot Moss [mailto:moss@emperor.cs.umass.edu]
Sent: Tuesday, May 02, 2000 8:01 AM
To: Robert A Duff
Cc: gclist@iecc.com
Subject: [gclist] Representation of pointer set


Well, you have something of a point. In "the old days", the view was that
GC was run when memory was tight, so it needed not to demand much more
space. In practice, providing a fixed amount of space, and then having a
fall back if it ever overflowed, seemed to work well.

When using copying algorithms (generational or not), Cheney's technique
(dating from 1970) uses the copied-but-not-yet-scanned objects as an
implicit queue, avoiding the need for a separate data structure.

These days, having a growable stack/queue/set seems reasonable, and I'd be
inclined to do it using linked "chunks", where each chunk is an array. This
is more flexible in terms of the actual storage allocation than insisting
that all elements reside in a single contiguous array, but it's not much
more complicated and its density (useful info versus data structure
overhead) is high.
============================================================================
==
J. Eliot B. Moss, Associate Professor     http://www.cs.umass.edu/~moss
www
Department of Computer Science            +1-413-545-4206
voice
140 Governor's Drive, Room 372            +1-413-545-1249
fax
University of Massachusetts               moss@cs.umass.edu
email
Amherst, MA  01003-4610  USA              +1-413-545-3733 Priscilla Coe
sec'y
============================================================================
==