[gclist] taking bets on a new collector

Tom Lord lord@emf.net
Fri, 15 Aug 2003 08:30:13 -0700 (PDT)

    > From: David Gadbois <gadbois@computer.org>

    > >   Memory is divided into aligned "pages", each page containing
    > >   256 pointer-sized words of memory management header, and
    > >   3840 words of object storage (for a total of 4096 words).
    > >   The page size could be multiplied by any power of 2, but it would
    > >   be difficult to make it smaller.

    > I highly recommend segregating the per-page meta information out
    > in a way that is locality sensitive.  For example, keep the
    > queue of MARKED pages separate so that the flip operation does
    > not have to go back out and touch the scanned pages again.

I thought about those issues.  The flip operations do _not_ touch all
the pages being flipped.  For example, when a tracer cycle ends, all
remaining TO-BE-SWEPT pages become FREE pages and all MARKED pages
become TO-BE-SWEPT.  The flip operation for a given type is, in

	append_queues (almost_free_queue, to_be_swept_queue);

        /* the to_be_swept_queue is now empty */

	tmp = to_be_swept_queue;
        to_be_swept_queue = marked_queue;
        marked_queue = tmp;

        to_be_swept_queue->type = this_is_a_to_be_swept_queue;
        marked_queue->type = this_is_a_marked_queue;

The append_queues operation touches, at most, three page headers.
The swap of queue pointers doesn't touch any.   The trade-off is that 
to compute which state a given page is in, given only a pointer to the
page header, two reads are needed, not one.

(Actually, for some types of objects, upon a tracer-cycle-end flip,
the TO-BE-SWEPT become a state I didn't mention, ALMOST-FREE, because
objects on those pages may need some form of finalization and can even
be resurrected.  But it's the same flip operation.)

    > Also, if there is a way to put the GC bits in the objects
    > themselves, that would lessen the likelyhood of having to read
    > from two cache lines in the event of a write.  Though it is a
    > tradeoff for a mark-and-sweep collector on write rate versus
    > sweep rate.

Yes it's that.   

I don't want to expand the size of my objects for this purpose (right
now, the overhead per object is just a bit over 2 bits) so it would
also mean imposing a non-noop read barrier.  I've used mark/sweep
system that squeeze a single GC bit in each object -- this would need
two -- that'd be a tough squeeze.  Finally, I'm hoping to preserve
properties such as boxed inexact floats fitting in 64 bit objects --
no room for GC bits at all.

    > Keeping the meta info separate would also greatly simplify the
    > task of supporting objects bigger than 256 words: You could
    > interpose a layer of abstraction where the GC operates on
    > regions of contiguous pages.

I should have added the size restriction to the list of questions.
It certainly means that the implementation of things like Scheme
vectors will be, uh, interesting.

If the GC were able to operate on regions of contiguous pages, that
would simplify large objects in some areas, but complicate other
things.   Surely I wouldn't want the tracer scanning entire large
objects -- so I'd be using more than one GC color per object
(something that's possible in this proposal, anyway, for objects
larger than 8 words -- but I first wanted to try it without having to
do that).   Additionally, I'd think that complicates the underlying
page allocator a bit:  for example, if a multi-page region becomes
FREE, is it then a candidate for busting up into smaller regions?