[gclist] Program behaviour

Paul R. Wilson wilson@cs.utexas.edu
Tue, 15 Oct 1996 04:26:52 -0500


>From majordom@iecc.com Mon Oct 14 20:00:38 1996
>Subject: Re: [gclist] Program behaviour
>Cc: boehm@hoh.mti.sgi.com
>
>Marc Shapiro wrote:
>
>"A more detailed analysis of cycles yields the following information:
>
> - Most cycles only contain objects allocated about the same time (allocated
>   within ~150 allocations of each other); however Boehm's allocator breaks
>   this locality by dispersing objects in memory (up to 500K bytes distance
>   between objects in a cycle)."
>
>This surprises me somewhat, given that free objects on a page should be
>allocated to consecutive requests.  But distance is not a good measure of
>locality.  How many pages are spanned by those cycles?

This is a very good question.

In general, distance in address space doesn't matter at all as long as
things that are touched at about the same time are clustered onto a few 
pages.

This isn't strictly true, because you may get locality effects at a
larger-than-page granularity because of disk geometries and caches
in the disk drive, e.g., accessing several blocks in one track
may win relative to accessing several blocks scattered over several
tracks.

You can often do very drastic things to the relative locations of
particular objects without actually affecting locality much.

For example, if you have 100 objects of four sizes in a data
structure, and they take up 4 pages, and you touch most or all
of those objects whenever you touch any of them, the arrangement
of those objects within those 4 pages doesn't matter much at
all.  Whenever you touch the data structure you'll touch 4 pages,
independently of how the objects are divided between the pages.

What matters most turns out to be keeping objects that are sometimes
*not* accessed together on different pages.  Sometimes it is a win
to separate of objects of different types onto different pages,
even if the objects are linked together into a single data structure.
If some traversals of the data structure skip one kind of object,
it'll be good if those are on a different set of pages than the
ones you always touch---you don't always have to fault in the
whole set.

(This was the idea behind our "Object Type Directed Garbage Collection"
from the first IWMM proceedings.  The paper was weak, because it
had results for a very small system, but I think the point is good.)

By the way, my "secret" web page has a draft of most of a survey
paper on locality, which discusses issues like this.  Check out
http://www.cs.utexas.edu/users/wilson/notready if you're interested.
I could use feedback on the paper.  (I should get a nearly-complete
and generally improved draft out, but I don't seem to be getting
around to that.)