[gclist] When and where to allocate/free
Andrew Shi-hwa Chen
chenandr@cse.msu.edu
Thu, 29 Aug 2002 16:36:21 -0400
Garbage collection tries to answer the question of where to free, and
the given algorithm usually has it's limitations on when to free.
The mutator answers the question of when to allocate, but the memory
management system (often intricately tied in with the garbage
collector) is the one that decides WHERE to allocate.
Allocation policies have been studied before, and will be studied
again, but here's a question:
If I knew, before I allocated an object "A", that there was another
object "B" that would most likely refer to "A" for a substantial
period of "A"'s life, is it worth trying to allocate "A" near "B"?
How close makes it worth it? Same page?
Next, suppose I have a recently freed object "C" that I _know_ is in
cache, but is not near to "B", but is far away (on another page,
let's say). Do I want to use "C" for the allocation of "A" if I can?
Obviously, on a per application basis this is going to vary a lot.
Using an object in the cache gives a "hot-spot" speedup, but it may
cause longer-term performance penalties (tlb or page misses)
elsewhere in the program in ways that prefetching might solve, but
then the prefetching instructions cause code bloat so it might not
fit in cache, etc....
But if I make it a point to allocate "A" on the same page as "B" (if
possible) then I will most likely decrease the working set (at least
somewhat) later on, but for a cache penalty now. But I might not
decrease the working set - depending on how many references to "A"
there will be that aren't from "B". Again, this will vary based on
the application.
But if I knew all that - that is, if I knew that "B" existed - AND if
I knew that there were an incredibly small number of references to
"A" from anything other than "B", AND if I knew that there would be
quite a number of times that the reference from "B" to "A" would be
followed, then the cache hit (of allocating from the same page)
becomes worth it, because I _know_ I'm reducing my working set (less
pages).
Of course, we usually don't know all that. On a per application basis
we could find it out, with sufficient profiling, if we wanted to.
(And it's something language designers can keep in mind if they'd
like good performance... - know all that and pass it to the compiler.)
What's my point? My point is that saving a cache miss now only to
have page misses later on is short-sighted, and might very well be
what happens (in a potentially very difficult to measure way) if we
continue down this path of optimizing things to bits.
This isn't an RC (allocate what's in the cache) versus GC (allocate
nearby if possible) discussion any more. This is about looking at the
small details versus the large picture. We can even think about it in
terms of how we trace: When we trace, should we check the TLB and try
to trace/mark grey objects that are in the TLB only?
At what point does the overhead of trying to ensure that we have good
locality decrease locality and/or performance? It's a fine point, and
will always totally vary as systems change.
--
Andrew Chen
chenandr@cse.msu.edu