[gclist] taking bets on a new collector

Boehm, Hans hans_boehm@hp.com
Tue, 19 Aug 2003 10:57:15 -0700

Speaking from experience, there are two distinct problems with keeping metadata in the same page with the actual data:

1) Any part of the algorithm that looks mostly at a specific piece of metadata tends to perform poorly with low associativity caches.  In a direct mapped 4K cache, all instances of a given per-page metadata field collide.  For real caches, the effect is generally less severe, but still noticeable.

2) Changing the color of a pointer-free object touches the page.  If paging ever becomes an issue, this is likely to be fatal, since it will often double the number of pages that the collector needs to be memory resident.

Is there a reason not to sweep incrementally during allocation (see e.g. my ISMM 2000 paper)?  That tends to
reduce the memory cost of sweeping from 1 extra cache miss per object to none.

Overall, not having studied this in great detail, I would guess that this design will give you appreciably less throughput than our collector, even in its incremental mode, but probably with shorter pause times.


> -----Original Message-----
> From: Tom Lord [mailto:lord@emf.net]
> Sent: Friday, August 15, 2003 8:30 AM
> To: gclist@lists.iecc.com
> Subject: Re: [gclist] taking bets on a new collector
>     > From: David Gadbois <gadbois@computer.org>
>     > >   Memory is divided into aligned "pages", each page containing
>     > >   256 pointer-sized words of memory management header, and
>     > >   3840 words of object storage (for a total of 4096 words).
>     > >   The page size could be multiplied by any power of 
> 2, but it would
>     > >   be difficult to make it smaller.
>     > I highly recommend segregating the per-page meta information out
>     > in a way that is locality sensitive.  For example, keep the
>     > queue of MARKED pages separate so that the flip operation does
>     > not have to go back out and touch the scanned pages again.
> I thought about those issues.  The flip operations do _not_ touch all
> the pages being flipped.  For example, when a tracer cycle ends, all
> remaining TO-BE-SWEPT pages become FREE pages and all MARKED pages
> become TO-BE-SWEPT.  The flip operation for a given type is, in
> pseudo-code:
> 	append_queues (almost_free_queue, to_be_swept_queue);
>         /* the to_be_swept_queue is now empty */
> 	tmp = to_be_swept_queue;
>         to_be_swept_queue = marked_queue;
>         marked_queue = tmp;
>         to_be_swept_queue->type = this_is_a_to_be_swept_queue;
>         marked_queue->type = this_is_a_marked_queue;
> The append_queues operation touches, at most, three page headers.
> The swap of queue pointers doesn't touch any.   The trade-off is that 
> to compute which state a given page is in, given only a pointer to the
> page header, two reads are needed, not one.
> (Actually, for some types of objects, upon a tracer-cycle-end flip,
> the TO-BE-SWEPT become a state I didn't mention, ALMOST-FREE, because
> objects on those pages may need some form of finalization and can even
> be resurrected.  But it's the same flip operation.)
>     > Also, if there is a way to put the GC bits in the objects
>     > themselves, that would lessen the likelyhood of having to read
>     > from two cache lines in the event of a write.  Though it is a
>     > tradeoff for a mark-and-sweep collector on write rate versus
>     > sweep rate.
> Yes it's that.   
> I don't want to expand the size of my objects for this purpose (right
> now, the overhead per object is just a bit over 2 bits) so it would
> also mean imposing a non-noop read barrier.  I've used mark/sweep
> system that squeeze a single GC bit in each object -- this would need
> two -- that'd be a tough squeeze.  Finally, I'm hoping to preserve
> properties such as boxed inexact floats fitting in 64 bit objects --
> no room for GC bits at all.
>     > Keeping the meta info separate would also greatly simplify the
>     > task of supporting objects bigger than 256 words: You could
>     > interpose a layer of abstraction where the GC operates on
>     > regions of contiguous pages.
> I should have added the size restriction to the list of questions.
> It certainly means that the implementation of things like Scheme
> vectors will be, uh, interesting.
> If the GC were able to operate on regions of contiguous pages, that
> would simplify large objects in some areas, but complicate other
> things.   Surely I wouldn't want the tracer scanning entire large
> objects -- so I'd be using more than one GC color per object
> (something that's possible in this proposal, anyway, for objects
> larger than 8 words -- but I first wanted to try it without having to
> do that).   Additionally, I'd think that complicates the underlying
> page allocator a bit:  for example, if a multi-page region becomes
> FREE, is it then a candidate for busting up into smaller regions?
> -t