[gclist] Garbage collection FAQ.

David Chase chase@centerline.com
Tue, 27 Feb 96 17:26:23 EST


> Should we have a garbage collection FAQ. It could also become part of
> other FAQs on the net. What should the topics be?
> 
> Algorithms.

I think it would be appropriate to reference Paul Wilson's survey, but
that is a pretty large chunk of text.  It might make sense to simply
record the existence of many of these things, so that people would not
regard the various incarnations of garbage collection as technology
indistinguishable from magic.  So, by way of a straw man, I'll start
to fill in the blanks.  Feel free to set fire to my straw man, of course.

Basic:
  Reference counting.
    for each "object" managed by the collector, maintain
    a count of pointers referencing that object.  Whenever
    a pointer is assigned to, as in "P := Q", the referent
    of Q has its reference count incremented, and the former
    referent of P has its reference count decremented.  If
    the reference count ever falls to zero, then it is known
    that there are no pointers to the object, and it may be
    reclaimed.  The converse is not true -- some unreferenced
    objects may not have a reference count of zero, because
    of cycles.  Furthermore, the fields in which reference
    counts are stored are typically small, so it is also possible
    that they will overflow, in which case they "stick" at
    their maximum value -- again, the object cannot be reclaimed.

    See also....

  Mark-and-sweep.
    The garbage collector has some way of finding all "root"
    pointers (e.g., global variables and automatic variables),
    and for each referenced object, it has some way of finding
    all the pointers contained within it.  Via some iterative
    process (it could be recursive, but it need not be, see
    ...) all objects reachable from the root set are found.
    As objects are found, they are marked as "found".  All other
    objects not marked as "found", must be free, and are returned
    to the free pool for future reallocation.

  Copying-compacting.
    Copying collection attempts to collect memory by removing all
    active memory from a storage arena, and then reusing the arena.
    This has the theoretical advantages of compacting the working
    set of an application, and of being asymptotically more efficient
    (because the cost of the collection is only proportional to the
    size of the reachable data, and is unrelated to the size of the
    arena itself.  Mark-and-sweep collection must scan the arena).
    However, theory and practice disagree (in practice).  See
    "Folk Myths".

Less basic:
  Conservative collection.
    Conservative garbage collection makes use of the observation
    that if you are not relocating (copying) objects, then you
    need not be quite so certain about exactly what is a pointer.
    It suffices to scan the root set (and objects) for any pointer-like
    bit patterns, and treat those pointer-like bit patterns as
    actual pointers while marking.  The result is that the "true"
    reachable objects are all found, along with a few others.
    This works surprisingly well in practice (if a few additional
    tricks are added) and often permits the use of garbage collection
    with oblivious source code and compilers.

  Deferred reference counting.
    Note that naive reference counting can take arbitrarily long
    to return from a decrement, because freeing one object may
    recursively trigger the freeing of other objects whose counts
    drop to zero.  Instead, an object whose reference count falls
    to zero may be placed on a queue/list/stack of objects that
    are no longer in use, but not yet processed onto the free list.
    Alternatively, when an object is allocated off the free list,
    the pointers to objects within it may be scanned.

  Generational.

  Snapshot mark-and-sweep.
    Snapshot mark-and-sweep uses the observation that the set
    of unreachable objects does not shrink.  It is possible
    (using, for instance, copy-on-write page mapping) to make
    a copy of the address space, process it concurrently to
    determine what is garbage, and send that information back
    to the running process.  More garbage may have been generated
    in the interim, but that is ok.

  Baker's real time algorithm.

  Appel-Ellis-Li "real time" concurrent.
    It isn't really real time (see question from the audience at
    1988 SIGPLAN PLDI) but that's probably ok.  (more details needed)

  Bartlett's conservative-compacting collector.
    Joel Bartlett's conservative-compacting collector fragments
    the "new" and "old" spaces of a typical copying-compacting
    collector into "new" and "old" pages.   To add conservatism,
    his collector classifies pointers into definite and indefinite.
    A definite pointer is definitely a pointer -- if the object is
    relocated, it pointer can be changed, because it is definitely
    NOT an integer, floating point value, or portion of a bitmap.
    An indefinite pointer (typically one found on the stack, where
    an uncooperative compiler has placed variables willy-nilly)
    cannot be changed, because it might not really be a pointer.
    Thus, any object referenced by an indefinite pointer cannot be
    relocated.  In the simplest form, this collector scans from
    all the indefinite pointers first (if only the stack is indefinite,
    this is not hard), and all objects so referenced cannot be moved.
    Finishing the collection, from definite pointers, objects may be
    moved (this is not a good explanation, is it?)

  Treadmill collector.
    A treadmill collector is based on the observation that the regions
    used in a copying-compacting collector (real-time or not) need not
    be identified with address ranges.  It is also possible to use
    doubly-linked lists (unit-time insertion and deletion) together with
    a few membership bits to indicate which list an arbitrary object is on.

  Goldberg's tagless collection
    Astoundingly, in a statically typed language with a type system
    as complex as ML's, it is not necessary to tag pointers to make
    a garbage collector work.

Assorted stuff:
  Blacklisting in a conservative collector.
    In a conservative collector, it is often useful to look
    for bit patterns that aren't yet valid pointers, but that
    are likely to become valid if the size of the heap is increased.
    The pages (objects) that they reference should be "blacklisted"
    so that they are never used, because if the bit pattern doesn't
    change, it will cause spurious retention of any object allocated
    at that address.

  Hardware Support (Kelvin Nilsen's and others).

  One-bit reference counts.

> OS Support.
  Zeroing out memory (I take it for granted, it need
    not be the case, especially on PCs).
  User-supplied fault handlers?

Compiler support? (as opposed to "uncooperative environments" below)
  Lack of compiler support.
  Not hiding pointers (see Chase, Chase and Boehm)
  Nulling out dead pointer variables.
  Identifying pointers (Moss et al.)
  Avoiding heap allocation (SETL, Lisp, Ruggieri, Chase)
  Optimizations of reference counting (Barth paper, SETL work).

Folk myths?
  "Optimal" strategies that fail miserably.

> Thread Support.

> Garbage Collection for distributed objects.

> Garbage Collection in uncoopertive environments.

> Who will volunteer for various segments? 
> Who will run the show?