[gclist] precise access barrier with hardware dirty bits

Hans Boehm boehm@hoh.engr.sgi.com
Thu, 8 Jul 1999 10:49:14 -0700


I haven't seen this idea before.  It sounds interesting.

My initial reaction is that it may be useful in some cases (64-bit addresses,
GC has direct access to VM hardware).  But there are disadvantages in other
cases:

- I've found that for our collector dirty bit handling at user level can be
quite expensive even with page-level granularity.  This will make it
considerably more expensive.

- Some hardware and associated operating systems don't like to have the same
page mapped in more than one place.  At least early RS/6000s were in that
category.  (On other systems, it's tricky to get this right without losing
performance.)

- As was already pointed out, TLB misses are likely to be a problem.

- It sounds like this would add some allocation overhead, though that may not
be a large cost.

It seems to me that you can trade off dirty-bit overhead for granularity by
having objects share a virtual page.  That probably helps a little.

I'm not sure in what sense the language needs to be strongly typed.  As far as
I can tell, this all works fine with C, to the same extent any other collector
does.  The allocator returns pointers to different virtual pages for objects
that share the same physical page.  In fact, for C it has the added benefit
that you can now tell whether a pointer is pointing just past the end of one
object or to the beginning of the next one, thus saving some space.

On Jul 8,  7:21pm, Francois-Rene Rideau wrote:
> Subject: [gclist] precise access barrier with hardware dirty bits
>
> [ plain text
>   Encoded with "quoted-printable" ] :
Dear all,
>    I had an idea last night that I wanted to share with you GC experts.
> The idea was to use dirty bits of paging hardware to achieve _precise_
> read or write barriers for a GC or persistent store,
> by using one logical page mapping by logical object on a same (physical)
page?
> i.e. a same physical page P1 could contain three logical objects A, B, C,
> at different offsets, and would accordingly be mapped three times
> at different addresses logical virtual memory. Assuming your language
> is strongly typed and your implementation is otherwise correct,
> you could then track down _precisely_ at a _fine grained_
> which of the objects on the page is being read or modified,
> despite the hardware having only page-level granularity!
>
> You could thus save some work for the GC,
> at the expense of wasting address space
> and increasing the cost of TLB management and misses
> (which probably makes the trick useless on 32-bit architectures
> and interesting only for 64-bit architectures).
> And since you may map a physical page exactly as many times
> as it has logical objects on it, you don't waste much physical resources
> (the per-object GC meta-information is moved from concatenated headers
> to the page table structure, which may also help in some cases,
> like not mixing administrative pointers and data,
> or not having a special case for page-aligned raw data buffers).
>
> Has anyone considered the efficiency tradeoffs of such an approach?
> Has anyone implemented it?
> Is it completely worthless for some reason I don't understand,
> or has it any value? Is it already well-known?
>



-- 
Hans-J. Boehm
boehm@sgi.com