[gclist] taking bets on a new collector
Tom Lord
lord@emf.net
Thu, 14 Aug 2003 11:40:44 -0700 (PDT)
The GC described in this _preliminary_ report has been coded, but not
yet put to work in an application. It's only about 2K LOC so, if it's
a dumb idea, I can scrap it and go with something more conventional.
The question is, is it a dumb design? Or a good one? Or something in
between? Gentlefolk, step up to the window and place your bets.
A Brief, Preliminary Report on a New Memory Manager
for an Implementation of Scheme
* intended applications and motivation
The collector is to be used in a run-time system suitable for an
implementation of an extended dialect of the Scheme programming
language.
The Scheme implementation will itself be used to implement a GUI
application framework whose design is strongly influenced by Emacs.
Several elements of the framework which would more conventionally
be coded in, for example, C, will in this system be coded in Scheme.
Examples include the representation for buffers of attributed text
and the display-updating algorithms for those buffers.
Consequently, pressure on the memory manager is greater than in
classic Emacs implementations such as GNU Emacs.
Previous experiments have demonstrated the practicality of these
goals. Even very simple Scheme-to-C compilation strategies are
sufficient to achieve more than ample rates of computation to do
tasks such as display updating in Scheme. Prospects for being
able to handle such tasks in _interpreted_ code are good.
But previous experiments have also demonstrated one problem: GC
pause times. For example: if, when a user is operating a scroll bar,
display updating is interrupted for .6s, this is a problem. In the
previous experiments, scrolling worked well enough to be usable --
the redisplay algorithm kept up and the experience wasn't _too_
frustrating -- but the brief "glitches" in interation caused by
using a classic mark/sweep collector were noticable and an
annoyance.
* other approaches
Clearly there are many well established GC techniques and
implementations which can be used to, at least with
high-probability, keep pause times low enough for this application.
Those other approaches are my back-up strategy. I just wanted
to try some "new" (though, without doubt, simply combining many
old ideas in a fairly obvious way).
* this collector
The new collector is an incremental mark/sweep collector with
a bibop allocator.
** segregation of "gc types" on aligned pages
The set of all allocated objects is partioned into disjoint
subsets of "gc types". All elements of a GC type have the
same size and the same layout in memory.
Memory is divided into aligned "pages", each page containing
256 pointer-sized words of memory management header, and
3840 words of object storage (for a total of 4096 words).
The page size could be multiplied by any power of 2, but it would
be difficult to make it smaller.
All allocated objects are 2^K words in size with 1 <= K <= 8.
At least two GC bits are provided per object. These are stored
in the header of the page on which the object resides.
Thus, for example, the page header of an object, given that object's
address, can be found by masking out some low-order bits of the
address. The offset to the header word containing the the GC bits
for the object and the offset of those bits within the word can be
found by performing computations on the low order bits of the
object's address.
page:
| header (w/GC bits) | .......... objects ....... |
^
|
address aligned by page size
** very literal implementation of the three-color abstraction
The two GC bits are used to encode one of three possible values
for each allocated object: black, white, and grey.
The collector is conceptually organized as three cooperating
coprocesses: the tracer, the sweeper, and the mutator.
The mutator may read from any allocated object, at any time,
directly.
A write barrier is imposed on the mutator to preserve the
three-color invariant (that black (traced) objects may not
contain references to white (condemned) objects).
The performance implications of the write barrier is one of the
unknowns in this new collector. I've described it in greater
detail below.
** space-efficient tracing and sweeping: page states
The GC processes must maintain sets of objects. For example,
the tracer must maintain a set of grey objects from which it
can efficiently choose/remove items as they are traced.
This collector represents those sets _imprecisely_, by storing
sets of pages instead of sets of objects.
Sets of pages are represented as queues implemented as
doubly linked lists. Each page header contains two pointers
for its position in such a queue, and a pointer to the queue
header itself.
For example, a newly created, initially empty page is part of a set
of "collected" pages -- it is of no immediate interest to the tracer
or sweeper. Such a page is said to be in COLLECTED state.
If the mutator greys an object on some page, that page is moved to a
set of "grey" pages -- it is in GREY state.
An incremental tracer step consists of choosing a GREY page,
examining the GC bits in the header to find all grey objects,
tracing those, and moving the page to MARKED state.
There are, essentially, five page states:
FREE - an unused page
COLLECTED - a page containing only white objects
GREY - a page containing some grey objects
MARKED - a page containing only black and white objects
TO-BE-SWEPT - a page containing only black and white
objects, with all white objects known to be
dead.
The state transition diagram for these five states is
included below (search for "page state map").
The mutator will never encounter pages in FREE state (other than
in a call to the allocator). The mutator can safely read from
a page in _any_ state, and can write to a page in any state other
than TO-BE-SWEPT. So this is another job for the write-barrier:
in addition to maintaining the three-color invariant, it must trap
writes to TO-BE-SWEPT pages.
** free-list based allocation
Each page header contains a pointer to a list of unallocated objects
on that page. Normal allocation consumes objects from one of these
lists. When the free-list of a page is emptied, another page of
the same type with a non-empty free-list is found or created.
To avoid a potentially expensive search for pages of a given type with
non-empty free-lists, the representation of the page-state sets is
is used:
For example, rather than a single queue of all COLLECTED pages,
there are two queues of COLLECTED pages per GC-type: one the queue
of COLLECTED pages with empty free-lists, the other the queue of
COLLECTED pages with non-empty free-lists.
Similar to the write-barrier, the allocator can safely allocate only
from pages in COLLECTED, GREY, or MARKED state. Thus, the search
for a page of the suitable type with a non-empty free-list, needed
when the free-list for that type becomes empty, consists of checking
those three page-state queues in an array indexed by type id.
** tracer steps and sweeper steps
As mentioned previously, a tracer step consists of choosing a GREY
page, examining the header to find grey objects, tracing those
objects (possibly transitioning other pages to GREY state), and then
transitioning the page to MARKED state.
When the tracer runs out of GREY pages, a trace cycle has completed.
All MARKED pages become TO-BE-SWEPT pages, all COLLECTED and
TO-BE-SWEPT pages become FREE pages.
Similarly, a sweeper step chooses a TO-BE-SWEPT page, scans it for
dead objects, and builds a new free-list for the page before
transitioning it to COLLECTED state.
** two potential problems and one interesting opportunity
** interesting opportunity -- generational collection
Let's suppose that, with a little instrumentation, the sweeper
notices that on a particular page, few objects die.
Then when such a page is found to be in TO-BE-SWEPT state, it may
make sense to, rather than sweeping the page, simply revert it to
MARKED state -- leaving the black objects black -- at least some
"M out of N" times.
People familiar with the implemtation of GNU Emacs might think of
this idea as "dynamically adjusted pure space".
*** potential problem -- is the write barrier too heavy
How expensive can a write barrier be before it is "too expensive"?
In addition to storing a value in an object slot, the mutator write
barrier must (in effect) fetch and examine GC bits for two words,
and fetch and examine the page state of one object. If the GC bits
say "white reference being stored in black object" or the page state
says TO-BE-SWEPT, than an "exception" (just a procedure call) is
made to appropriately update the GC bits and page state, invoking a
sweeper-step if necessary.
There is some flexability in how this can be coded. The memory
reads to retrieve the GC bits and page-state can precede the write
itself, and the write itself can precede the conditional tests for
that might invoke an exception. Some of the extra reads can be
avoided if an additional test of the value being stored determines
that it is an "immediate value" (i.e., not an object reference).
Additionally, a write into an object on a TO-BE-SWEPT page is
_always_ a write into a black object.
Thus, the write barrier can be coded something like:
set_slot (obj, slot_n, value)
{
obj_gc_color = get_gc_bits_of (obj);
val_is_immediate = is_immediate_value (value);
if (val_is_immediate)
val_gc_color = get_gc_bits_of (value);
else
val_gc_color = black;
obj[slot_n] = value;
if (obj_gc_color == black)
{
obj_page_state = get_page_state_of_obj (obj);
if ((val_gc_color == white) || (obj_page_state == TO-BE-SWEPT))
set_slot_exception (obj, slot_n, value);
}
}
Of course, "primitive" operations and optimized compiled scheme
code which modify multiple slots can avoid using the write barrier
in some circumstances.
** potential problem -- page hysteresis and GC scheduling
Although tracer steps are incremental, because they are page-based,
they are not flyweight. It would be bad, I suspect, to have a busy
page bouncing rapidly back and forth between GREY and MARKED state.
_Perhaps_ treating the page-sets as queues is a boon here: the page
most recently GREYed by the mutator is the page farthest away from
being transitioned to MARKED by the tracer and, we hope, a likely to
be mutated again soon. On the other hand, that means that the
tracer may wind up competing with the mutator for cache space.
Closely related to this is the effect of tracer-step scheduling on
heap growth. If a given type has no objects with non-empty
free-lists, and there are no FREE pages, and the GREY set contains
more than a very small number of items -- what happens? A sweeper
step might be able to free some space for the allocation, but that
isn't guaranteed. The only "pauseless" resolution is to expand the
heap and create new FREE pages.
I'm at something of a loss for what heuristics are best to use to
schedule tracer steps.
** the page state map
(memory from `sbrk')
|
|
V
----------->FREE--------
| ^ |
| | |
[I] [J] [A]
| |
| | |
| | |
| / \ |
| /--- ---\ V
TO-BE-SWEPT COLLECTED -->-
^ | ^ | |
| | | | |
| ------[H]------- | |
| | |
| [B] |
| | |
[G] | |
| | |
| | |
| | |
| <-------[C]------ V |
[F]--->MARKED GREY |
| | ^ --------[D]-----> |
------- | |
|-----------[E]------------------
[A] FREE->COLLECTED
Allocation of an object for which the freelist is empty. A
FREE page is turned into a collected page of that object type.
[B] COLLECTED->GREY
The tracer has found a reference to an object on this page.
Objects on this page are _not_ leaf objects (i.e., they _may_
contain references to other objects).
The object is marked grey, and put the page in GREY state.
[C] GREY->MARKED
The tracer has turned attention to a page in GREY state.
All non-black objects referenced by grey objects on that page
are marked grey (non-leaf objects) or black (leaf objects)
and the pages of referenced non-leaf objects are transitioned
to GREY (by transition [B] or [C]).
Then all grey objects on this page are marked black, and
this page transitioned to MARKED state.
Note that at the beginning of this tracer step, there may
be grey objects on this page containing references to
white objects on this page -- the tracer much treat such
white objects as if they were grey.
[D] MARKED->GREY
The mutator has stored a reference to a white object into
a black object on this page. The mutated object is turned
grey, and the page transitioned to GREY state.
[E] COLLECTED->MARKED
The tracer found a reference to an object on this page
from a grey object. Objects on this page are leaf
objects (they may not contain references to other objects).
The referenced object is colored black, and the page is
transitioned to MARKED state.
[F] MARKED->MARKED
The tracer found a reference to an object on this page
from a grey object. Objects on this page are leaf
objects (they may not contain references to other objects).
The referenced object is colored black, and the page is
state is unchanged.
[G] MARKED->TO-BE-SWEPT
A tracer cycle has just completed (the set of GREY pages
has transitioned to empty). All MARKED pages are converted,
en masse, to TO-BE-SWEPT state.
[H] TO-BE-SWEPT->COLLECTED
The sweeper has turned its attention to this page.
All white objects recognized as dead and their storage
is reclaimed. All black objects are recognized as live,
and their color reverted to white. Some live objects
were found on this page by the sweeper.
[I] TO-BE-SWEPT->FREE (1)
The sweeper has turned its attention to this page.
The page is found to contain only white objects
which means that it contains no live objects. Storage
for any objects is reclaimed and the page becomes
a FREE page.
[J] TO-BE-SWEPT->FREE (2) and COLLECTED->FREE
A tracer cycle has completed (the set of GREY pages has
transitioned to empty). At that time, it is known
that any pages in TO-BE-SWEPT or COLLECTED state are
not reachable -- are dead. Therefore, these pages
contain nothing but dead objects. Their storage
is reclaimed, and they become FREE pages.
Some handy invariants:
COLLECTED state:
COLLECTED pages contain only white objects.
With reference to the most recently completed
trace, COLLECTED pages contain only live objects.
The mutator may modify and allocate from COLLECTED
pages.
GREY state:
GREY pages contain at least one grey object.
They may also contain white objects and black
objects. The black objects are those that have
been traced in a concurrently running tracer cycle.
A trace cycle completes whenever the set of GREY
pages transitions to empty.
The mutator may modify and allocate from GREY pages.
MARKED state:
MARKED pages contain only black and white objects
and the set of black or the set of white objects
on the page may be empty.
All black objects on the page have been traced in
a concurrent tracer cycle.
When a tracer cycle completes (the set of GREY pages
becomes empty), all MARKED pages become TO-BE-SWEPT
pages.
The mutator may modify and allocate from MARKED pages.
TO-BE-SWEPT state:
TO-BE-SWEPT pages contain only black and white
objects. The black objects are live with respect
to the most recently completed trace -- the white
objects are dead.
The mutator _may_not_ modify or allocate from a
TO-BE-SWEPT page. If the mutator wants to modify or
allocate from a TO-BE-SWEPT page, it must first run
a sweeper step on the page to transition it to
COLLECTED state.
The tracer _may_not_ change the color of an object on
a TO-BE-SWEPT page yet must treat all objects on a
TO-BE-SWEPT page _as_if_ they were white objects
(which the tracer must color black or grey).
Therefore, if the tracer discovers in some grey object
a reference to an object in TO-BE-SWEPT state,
it must invoke a sweeper step on the TO-BE-SWEPT
page, transitioning it to COLLECTED state.
-t