[gclist] When and where to allocate/free

Andrew Shi-hwa Chen chenandr@cse.msu.edu
Thu, 29 Aug 2002 16:36:21 -0400


Garbage collection tries to answer the question of where to free, and 
the given algorithm usually has it's limitations on when to free.

The mutator answers the question of when to allocate, but the memory 
management system (often intricately tied in with the garbage 
collector) is the one that decides WHERE to allocate.

Allocation policies have been studied before, and will be studied 
again, but here's a question:

If I knew, before I allocated an object "A", that there was another 
object "B" that would most likely refer to "A" for a substantial 
period of "A"'s life, is it worth trying to allocate "A" near "B"? 
How close makes it worth it? Same page?

Next, suppose I have a recently freed object "C" that I _know_ is in 
cache, but is not near to "B", but is far away (on another page, 
let's say). Do I want to use "C" for the allocation of "A" if I can?

Obviously, on a per application basis this is going to vary a lot. 
Using an object in the cache gives a "hot-spot" speedup, but it may 
cause longer-term performance penalties (tlb or page misses) 
elsewhere in the program in ways that prefetching might solve, but 
then the prefetching instructions cause code bloat so it might not 
fit in cache, etc....

But if I make it a point to allocate "A" on the same page as "B" (if 
possible) then I will most likely decrease the working set (at least 
somewhat) later on, but for a cache penalty now. But I might not 
decrease the working set - depending on how many references to "A" 
there will be that aren't from "B". Again, this will vary based on 
the application.

But if I knew all that - that is, if I knew that "B" existed - AND if 
I knew that there were an incredibly small number of references to 
"A" from anything other than "B", AND if I knew that there would be 
quite a number of times that the reference from "B" to "A" would be 
followed, then the cache hit (of allocating from the same page) 
becomes worth it, because I _know_ I'm reducing my working set (less 
pages).

Of course, we usually don't know all that. On a per application basis 
we could find it out, with sufficient profiling, if we wanted to. 
(And it's something language designers can keep in mind if they'd 
like good performance... - know all that and pass it to the compiler.)

What's my point? My point is that saving a cache miss now only to 
have page misses later on is short-sighted, and might very well be 
what happens (in a potentially very difficult to measure way) if we 
continue down this path of optimizing things to bits.

This isn't an RC (allocate what's in the cache) versus GC (allocate 
nearby if possible) discussion any more. This is about looking at the 
small details versus the large picture. We can even think about it in 
terms of how we trace: When we trace, should we check the TLB and try 
to trace/mark grey objects that are in the TLB only?

At what point does the overhead of trying to ensure that we have good 
locality decrease locality and/or performance? It's a fine point, and 
will always totally vary as systems change.

--
Andrew Chen
chenandr@cse.msu.edu