[gclist] Garbage collection FAQ.
David Chase
chase@centerline.com
Tue, 27 Feb 96 17:26:23 EST
> Should we have a garbage collection FAQ. It could also become part of
> other FAQs on the net. What should the topics be?
>
> Algorithms.
I think it would be appropriate to reference Paul Wilson's survey, but
that is a pretty large chunk of text. It might make sense to simply
record the existence of many of these things, so that people would not
regard the various incarnations of garbage collection as technology
indistinguishable from magic. So, by way of a straw man, I'll start
to fill in the blanks. Feel free to set fire to my straw man, of course.
Basic:
Reference counting.
for each "object" managed by the collector, maintain
a count of pointers referencing that object. Whenever
a pointer is assigned to, as in "P := Q", the referent
of Q has its reference count incremented, and the former
referent of P has its reference count decremented. If
the reference count ever falls to zero, then it is known
that there are no pointers to the object, and it may be
reclaimed. The converse is not true -- some unreferenced
objects may not have a reference count of zero, because
of cycles. Furthermore, the fields in which reference
counts are stored are typically small, so it is also possible
that they will overflow, in which case they "stick" at
their maximum value -- again, the object cannot be reclaimed.
See also....
Mark-and-sweep.
The garbage collector has some way of finding all "root"
pointers (e.g., global variables and automatic variables),
and for each referenced object, it has some way of finding
all the pointers contained within it. Via some iterative
process (it could be recursive, but it need not be, see
...) all objects reachable from the root set are found.
As objects are found, they are marked as "found". All other
objects not marked as "found", must be free, and are returned
to the free pool for future reallocation.
Copying-compacting.
Copying collection attempts to collect memory by removing all
active memory from a storage arena, and then reusing the arena.
This has the theoretical advantages of compacting the working
set of an application, and of being asymptotically more efficient
(because the cost of the collection is only proportional to the
size of the reachable data, and is unrelated to the size of the
arena itself. Mark-and-sweep collection must scan the arena).
However, theory and practice disagree (in practice). See
"Folk Myths".
Less basic:
Conservative collection.
Conservative garbage collection makes use of the observation
that if you are not relocating (copying) objects, then you
need not be quite so certain about exactly what is a pointer.
It suffices to scan the root set (and objects) for any pointer-like
bit patterns, and treat those pointer-like bit patterns as
actual pointers while marking. The result is that the "true"
reachable objects are all found, along with a few others.
This works surprisingly well in practice (if a few additional
tricks are added) and often permits the use of garbage collection
with oblivious source code and compilers.
Deferred reference counting.
Note that naive reference counting can take arbitrarily long
to return from a decrement, because freeing one object may
recursively trigger the freeing of other objects whose counts
drop to zero. Instead, an object whose reference count falls
to zero may be placed on a queue/list/stack of objects that
are no longer in use, but not yet processed onto the free list.
Alternatively, when an object is allocated off the free list,
the pointers to objects within it may be scanned.
Generational.
Snapshot mark-and-sweep.
Snapshot mark-and-sweep uses the observation that the set
of unreachable objects does not shrink. It is possible
(using, for instance, copy-on-write page mapping) to make
a copy of the address space, process it concurrently to
determine what is garbage, and send that information back
to the running process. More garbage may have been generated
in the interim, but that is ok.
Baker's real time algorithm.
Appel-Ellis-Li "real time" concurrent.
It isn't really real time (see question from the audience at
1988 SIGPLAN PLDI) but that's probably ok. (more details needed)
Bartlett's conservative-compacting collector.
Joel Bartlett's conservative-compacting collector fragments
the "new" and "old" spaces of a typical copying-compacting
collector into "new" and "old" pages. To add conservatism,
his collector classifies pointers into definite and indefinite.
A definite pointer is definitely a pointer -- if the object is
relocated, it pointer can be changed, because it is definitely
NOT an integer, floating point value, or portion of a bitmap.
An indefinite pointer (typically one found on the stack, where
an uncooperative compiler has placed variables willy-nilly)
cannot be changed, because it might not really be a pointer.
Thus, any object referenced by an indefinite pointer cannot be
relocated. In the simplest form, this collector scans from
all the indefinite pointers first (if only the stack is indefinite,
this is not hard), and all objects so referenced cannot be moved.
Finishing the collection, from definite pointers, objects may be
moved (this is not a good explanation, is it?)
Treadmill collector.
A treadmill collector is based on the observation that the regions
used in a copying-compacting collector (real-time or not) need not
be identified with address ranges. It is also possible to use
doubly-linked lists (unit-time insertion and deletion) together with
a few membership bits to indicate which list an arbitrary object is on.
Goldberg's tagless collection
Astoundingly, in a statically typed language with a type system
as complex as ML's, it is not necessary to tag pointers to make
a garbage collector work.
Assorted stuff:
Blacklisting in a conservative collector.
In a conservative collector, it is often useful to look
for bit patterns that aren't yet valid pointers, but that
are likely to become valid if the size of the heap is increased.
The pages (objects) that they reference should be "blacklisted"
so that they are never used, because if the bit pattern doesn't
change, it will cause spurious retention of any object allocated
at that address.
Hardware Support (Kelvin Nilsen's and others).
One-bit reference counts.
> OS Support.
Zeroing out memory (I take it for granted, it need
not be the case, especially on PCs).
User-supplied fault handlers?
Compiler support? (as opposed to "uncooperative environments" below)
Lack of compiler support.
Not hiding pointers (see Chase, Chase and Boehm)
Nulling out dead pointer variables.
Identifying pointers (Moss et al.)
Avoiding heap allocation (SETL, Lisp, Ruggieri, Chase)
Optimizations of reference counting (Barth paper, SETL work).
Folk myths?
"Optimal" strategies that fail miserably.
> Thread Support.
> Garbage Collection for distributed objects.
> Garbage Collection in uncoopertive environments.
> Who will volunteer for various segments?
> Who will run the show?