[gclist] taking bets on a new collector

Tom Lord lord@emf.net
Thu, 14 Aug 2003 11:40:44 -0700 (PDT)

The GC described in this _preliminary_ report has been coded, but not
yet put to work in an application.  It's only about 2K LOC so, if it's
a dumb idea, I can scrap it and go with something more conventional.
The question is, is it a dumb design?  Or a good one?  Or something in
between?  Gentlefolk, step up to the window and place your bets.

         A Brief, Preliminary Report on a New Memory Manager
                   for an Implementation of Scheme

* intended applications and motivation

  The collector is to be used in a run-time system suitable for an
  implementation of an extended dialect of the Scheme programming

  The Scheme implementation will itself be used to implement a GUI
  application framework whose design is strongly influenced by Emacs.

  Several elements of the framework which would more conventionally
  be coded in, for example, C, will in this system be coded in Scheme.
  Examples include the representation for buffers of attributed text
  and the display-updating algorithms for those buffers.
  Consequently, pressure on the memory manager is greater than in
  classic Emacs implementations such as GNU Emacs.

  Previous experiments have demonstrated the practicality of these
  goals.   Even very simple Scheme-to-C compilation strategies are
  sufficient to achieve more than ample rates of computation to do
  tasks such as display updating in Scheme.   Prospects for being
  able to handle such tasks in _interpreted_ code are good.

  But previous experiments have also demonstrated one problem: GC
  pause times.  For example: if, when a user is operating a scroll bar,
  display updating is interrupted for .6s, this is a problem.  In the
  previous experiments, scrolling worked well enough to be usable --
  the redisplay algorithm kept up and the experience wasn't _too_
  frustrating -- but the brief "glitches" in interation caused by
  using a classic mark/sweep collector were noticable and an

* other approaches

  Clearly there are many well established GC techniques and
  implementations which can be used to, at least with
  high-probability, keep pause times low enough for this application.

  Those other approaches are my back-up strategy.   I just wanted
  to try some "new" (though, without doubt, simply combining many
  old ideas in a fairly obvious way).

* this collector

  The new collector is an incremental mark/sweep collector with
  a bibop allocator.

** segregation of "gc types" on aligned pages

  The set of all allocated objects is partioned into disjoint
  subsets of "gc types".   All elements of a GC type have the
  same size and the same layout in memory.   

  Memory is divided into aligned "pages", each page containing
  256 pointer-sized words of memory management header, and
  3840 words of object storage (for a total of 4096 words).
  The page size could be multiplied by any power of 2, but it would
  be difficult to make it smaller.

  All allocated objects are 2^K words in size with 1 <= K <= 8.

  At least two GC bits are provided per object.   These are stored
  in the header of the page on which the object resides.

  Thus, for example, the page header of an object, given that object's
  address, can be found by masking out some low-order bits of the
  address.  The offset to the header word containing the the GC bits
  for the object and the offset of those bits within the word can be
  found by performing computations on the low order bits of the
  object's address.

        | header (w/GC bits) | ..........  objects ....... |
        address aligned by page size

** very literal implementation of the three-color abstraction

  The two GC bits are used to encode one of three possible values
  for each allocated object: black, white, and grey.

  The collector is conceptually organized as three cooperating
  coprocesses: the tracer, the sweeper, and the mutator.

  The mutator may read from any allocated object, at any time, 

  A write barrier is imposed on the mutator to preserve the 
  three-color invariant (that black (traced) objects may not 
  contain references to white (condemned) objects).  

  The performance implications of the write barrier is one of the
  unknowns in this new collector.   I've described it in greater
  detail below.

** space-efficient tracing and sweeping: page states

  The GC processes must maintain sets of objects.  For example, 
  the tracer must maintain a set of grey objects from which it
  can efficiently choose/remove items as they are traced.

  This collector represents those sets _imprecisely_, by storing
  sets of pages instead of sets of objects.

  Sets of pages are represented as queues implemented as 
  doubly linked lists.   Each page header contains two pointers
  for its position in such a queue, and a pointer to the queue
  header itself.

  For example, a newly created, initially empty page is part of a set
  of "collected" pages -- it is of no immediate interest to the tracer
  or sweeper.   Such a page is said to be in COLLECTED state.

  If the mutator greys an object on some page, that page is moved to a
  set of "grey" pages -- it is in GREY state.

  An incremental tracer step consists of choosing a GREY page,
  examining the GC bits in the header to find all grey objects,
  tracing those, and moving the page to MARKED state.

  There are, essentially, five page states:

	FREE		- an unused page
	COLLECTED	- a page containing only white objects
        GREY		- a page containing some grey objects
	MARKED		- a page containing only black and white objects
	TO-BE-SWEPT	- a page containing only black and white
                          objects, with all white objects known to be

  The state transition diagram for these five states is 
  included below (search for "page state map").

  The mutator will never encounter pages in FREE state (other than
  in a call to the allocator).    The mutator can safely read from
  a page in _any_ state, and can write to a page in any state other
  than TO-BE-SWEPT.    So this is another job for the write-barrier:
  in addition to maintaining the three-color invariant, it must trap
  writes to TO-BE-SWEPT pages.

** free-list based allocation

  Each page header contains a pointer to a list of unallocated objects
  on that page.   Normal allocation consumes objects from one of these
  lists.   When the free-list of a page is emptied, another page of
  the same type with a non-empty free-list is found or created.

  To avoid a potentially expensive search for pages of a given type with
  non-empty free-lists, the representation of the page-state sets is
  is used:

  For example, rather than a single queue of all COLLECTED pages,
  there are two queues of COLLECTED pages per GC-type: one the queue
  of COLLECTED pages with empty free-lists, the other the queue of
  COLLECTED pages with non-empty free-lists. 

  Similar to the write-barrier, the allocator can safely allocate only
  from pages in COLLECTED, GREY, or MARKED state.    Thus, the search
  for a page of the suitable type with a non-empty free-list, needed
  when the free-list for that type becomes empty, consists of checking
  those three page-state queues in an array indexed by type id.

** tracer steps and sweeper steps

  As mentioned previously, a tracer step consists of choosing a GREY
  page, examining the header to find grey objects, tracing those
  objects (possibly transitioning other pages to GREY state), and then
  transitioning the page to MARKED state.

  When the tracer runs out of GREY pages, a trace cycle has completed.
  All MARKED pages become TO-BE-SWEPT pages, all COLLECTED and
  TO-BE-SWEPT pages become FREE pages.

  Similarly, a sweeper step chooses a TO-BE-SWEPT page, scans it for 
  dead objects, and builds a new free-list for the page before
  transitioning it to COLLECTED state.

** two potential problems and one interesting opportunity

** interesting opportunity -- generational collection

  Let's suppose that, with a little instrumentation, the sweeper 
  notices that on a particular page, few objects die.

  Then when such a page is found to be in TO-BE-SWEPT state, it may
  make sense to, rather than sweeping the page, simply revert it to 
  MARKED state -- leaving the black objects black -- at least some
  "M out of N" times.

  People familiar with the implemtation of GNU Emacs might think of 
  this idea as "dynamically adjusted pure space".

*** potential problem -- is the write barrier too heavy

  How expensive can a write barrier be before it is "too expensive"?

  In addition to storing a value in an object slot, the mutator write
  barrier must (in effect) fetch and examine GC bits for two words,
  and fetch and examine the page state of one object.  If the GC bits
  say "white reference being stored in black object" or the page state
  says TO-BE-SWEPT, than an "exception" (just a procedure call) is
  made to appropriately update the GC bits and page state, invoking a
  sweeper-step if necessary.

  There is some flexability in how this can be coded.  The memory
  reads to retrieve the GC bits and page-state can precede the write
  itself, and the write itself can precede the conditional tests for
  that might invoke an exception.  Some of the extra reads can be
  avoided if an additional test of the value being stored determines
  that it is an "immediate value" (i.e., not an object reference).

  Additionally, a write into an object on a TO-BE-SWEPT page is
  _always_ a write into a black object.

  Thus, the write barrier can be coded something like:

	set_slot (obj, slot_n, value)
          obj_gc_color = get_gc_bits_of (obj);

          val_is_immediate = is_immediate_value (value);

          if (val_is_immediate)
            val_gc_color = get_gc_bits_of (value);
            val_gc_color = black;

          obj[slot_n] = value;

          if (obj_gc_color == black)
              obj_page_state = get_page_state_of_obj (obj);
              if ((val_gc_color == white) || (obj_page_state == TO-BE-SWEPT))
                set_slot_exception (obj, slot_n, value);

  Of course, "primitive" operations and optimized compiled scheme
  code which modify multiple slots can avoid using the write barrier
  in some circumstances.

** potential problem -- page hysteresis and GC scheduling

  Although tracer steps are incremental, because they are page-based,
  they are not flyweight.  It would be bad, I suspect, to have a busy
  page bouncing rapidly back and forth between GREY and MARKED state.

  _Perhaps_ treating the page-sets as queues is a boon here: the page
  most recently GREYed by the mutator is the page farthest away from
  being transitioned to MARKED by the tracer and, we hope, a likely to
  be mutated again soon.  On the other hand, that means that the
  tracer may wind up competing with the mutator for cache space.

  Closely related to this is the effect of tracer-step scheduling on
  heap growth.   If a given type has no objects with non-empty
  free-lists, and there are no FREE pages, and the GREY set contains
  more than a very small number of items -- what happens?   A sweeper
  step might be able to free some space for the allocation, but that
  isn't guaranteed.   The only "pauseless" resolution is to expand the
  heap and create new FREE pages.

  I'm at something of a loss for what heuristics are best to use to
  schedule tracer steps.

** the page state map

                 (memory from `sbrk')
            |            ^         |
            |            |         |
           [I]          [J]       [A]
            |                      |
            |            |         |
            |            |         |
            |           / \        |
            |      /---    ---\    V
          TO-BE-SWEPT          COLLECTED -->-
            ^    |              ^  |        |
            |    |              |  |        |
            |    ------[H]-------  |        |
            |                      |        |
            |                     [B]       |
            |                      |        |
           [G]                     |        |
            |                      |        |
            |                      |        |
            |                      |        |
            |   <-------[C]------  V        |
   [F]--->MARKED		GREY        |
    |     | ^   --------[D]----->           |
    ------- |                               |


        Allocation of an object for which the freelist is empty.  A
        FREE page is turned into a collected page of that object type.


        The tracer has found a reference to an object on this page.
        Objects on this page are _not_ leaf objects (i.e., they _may_
        contain references to other objects).

        The object is marked grey, and put the page in GREY state.


        The tracer has turned attention to a page in GREY state.
        All non-black objects referenced by grey objects on that page
        are marked grey (non-leaf objects) or black (leaf objects)
        and the pages of referenced non-leaf objects are transitioned
        to GREY (by transition [B] or [C]).

        Then all grey objects on this page are marked black, and 
        this page transitioned to MARKED state.

        Note that at the beginning of this tracer step, there may
        be grey objects on this page containing references to 
        white objects on this page -- the tracer much treat such
        white objects as if they were grey.


        The mutator has stored a reference to a white object into
        a black object on this page.   The mutated object is turned
        grey, and the page transitioned to GREY state.


        The tracer found a reference to an object on this page 
        from a grey object.   Objects on this page are leaf
        objects (they may not contain references to other objects).

        The referenced object is colored black, and the page is
        transitioned to MARKED state.


        The tracer found a reference to an object on this page 
        from a grey object.   Objects on this page are leaf
        objects (they may not contain references to other objects).

        The referenced object is colored black, and the page is
        state is unchanged.


         A tracer cycle has just completed (the set of GREY pages
         has transitioned to empty).   All MARKED pages are converted,
         en masse, to TO-BE-SWEPT state.


         The sweeper has turned its attention to this page.  
         All white objects recognized as dead and their storage
         is reclaimed.   All black objects are recognized as live,
         and their color reverted to white.  Some live objects 
         were found on this page by the sweeper.

     [I] TO-BE-SWEPT->FREE (1)

         The sweeper has turned its attention to this page.  
         The page is found to contain only white objects 
         which means that it contains no live objects.  Storage
         for any objects is reclaimed and the page becomes
         a FREE page.


         A tracer cycle has completed (the set of GREY pages has
         transitioned to empty).   At that time, it is known
         that any pages in TO-BE-SWEPT or COLLECTED state are
         not reachable -- are dead.   Therefore, these pages 
         contain nothing but dead objects.   Their storage
         is reclaimed, and they become FREE pages.

    Some handy invariants:


		COLLECTED pages contain only white objects.

                With reference to the most recently completed 
                trace, COLLECTED pages contain only live objects.

                The mutator may modify and allocate from COLLECTED

	GREY state:

		GREY pages contain at least one grey object.

                They may also contain white objects and black
                objects.   The black objects are those that have
                been traced in a concurrently running tracer cycle.

		A trace cycle completes whenever the set of GREY
                pages transitions to empty.

                The mutator may modify and allocate from GREY pages.

	MARKED state:

		MARKED pages contain only black and white objects
                and the set of black or the set of white objects
                on the page may be empty.

                All black objects on the page have been traced in
                a concurrent tracer cycle.

                When a tracer cycle completes (the set of GREY pages
                becomes empty), all MARKED pages become TO-BE-SWEPT

                The mutator may modify and allocate from MARKED pages.

	TO-BE-SWEPT state:

		TO-BE-SWEPT pages contain only black and white
                objects.   The black objects are live with respect
                to the most recently completed trace -- the white
                objects are dead.

		The mutator _may_not_ modify or allocate from a
                TO-BE-SWEPT page.  If the mutator wants to modify or
                allocate from a TO-BE-SWEPT page, it must first run
                a sweeper step on the page to transition it to
                COLLECTED state.

                The tracer _may_not_ change the color of an object on
                a TO-BE-SWEPT page yet must treat all objects on a
                TO-BE-SWEPT page _as_if_ they were white objects
                (which the tracer must color black or grey).
                Therefore, if the tracer discovers in some grey object
                a reference to an object in TO-BE-SWEPT state,
                it must invoke a sweeper step on the TO-BE-SWEPT 
                page, transitioning it to COLLECTED state.