[gclist] Compiler lifetime analysis
Paul R. Wilson
wilson@cs.utexas.edu
Sat, 17 Aug 1996 15:38:30 -0500
>From majordom@iecc.com Sat Aug 17 06:27:41 1996
>
>> Also, could many general GC allocators make use of a lifetime hint when
>> the object is allocated? The idea is that an object has sufficiently
>> complex usage characteristics that it still needs general GC, but the
>> compiler can guess that it has a short/long lifetime. I'm wondering if
>> such a hint could appreciably increase the efficiency of the allocator.
>
>This is exactly what you want:
>
> Barrett, David and Zorn, Benjamin. Using Lifetime Predictors to
> Improve Memory Allocation Performance. Proceedings of the SIGPLAN
> '93 Conference on Programming Language Design and
> Implementation, 187-196. Albuquerque, New Mexico, June 1993.
>
> - Dave
I forget which stuff is in this paper, but Zorn, Grunwald, Barrett and
Henderson (in different permutations) have papers on both normal
malloc-style allocators and GC's.
In a normal allocator (or nonmoving GC), you can decrease fragmentation
by allocating objects that are likely to die at about the same time together
in memory---the come free almost simultaneously, so you get back
contiguous hunks of memory.
What you need for this is actually weaker than lifetime prediction---you
don't have to know which objects are going to die when, you just have
to know which objects are likely to die together. We call this
"death time discrimination" Two things seem to be good predictors
of this:
1. Objects allocated at about the same time usually die at about
the same time, because they're used during a phase, and then
the whole data structure is deallocated at a whack. (Interestingly,
their lifetimes may be very different---some allocated at the
beginning of a phase, and some near the end, but the often die
together anyway. So you really do want death time discrimination,
not lifetime prediction. Luckily, the former is actually easier.
2. Objects of the same type are especially likely to die at about
the same time, if they're allocated at about the same time.
For example, during a phase, some objects are used very transiently
(during some tiny intermediate phase) and others die at the end
of the phase, because they represent the main data structures
that are accumulated during the phase, but die at the end when
some final result is computed. Still other objects are part of
the result of the phase, and may live much longer.
At any rate, death time seems to be strongly correlated with both
birth time and apparently with type. We need to analyze more data
about this before we can quantify it.
For GC's the situation is more subtle and even more poorly understood.
It's not clear when it's worthwhile to use lifetime prediction or
death time discrimination, and when the basic generational principle
is about as good as you're going to do, or if opportunistic GC
scheduling is the best thing. This has to do with costs of GC algorithms'
operations and subtle issues of locality.
The first thing to notice is that allocating objects with long
expected lifetimes in an older generations can be expensive. To
preserve the GC's invariants, you generally need a write barrier
on initializing writes. For normal allocation in the youngest
generation, you don't, because there you can't be creating pointers
into a younger generation. So right off the bat, you're slowing
things down somewhat. You may be slowing things down later, too,
because objects allocated at about the same time may have pointers
to each other stored into them later. This may increase your
write barrier costs further.
An alternative is to use opportunism to get much of the same effect,
without as much cost (maybe). If you can give the system hints
about the ends of significant phases, or it can infer them, then
the GC can be scheduled to do GC's when there is little live data
to traverse, and lots of garbage space to get back. We did this
by noticing not-to-local minima in the stack height, which suggest returns
from procedures that created lots of data. Barry Hayes did this
by noticing when ``key objects'' become unreachable from the root
set, which is more direct and perhaps more reliable. I think the
right thing is an automatic opportunism mechanism augmented with
the ability to provide hints (derived from serious profiling) during
performance tuning.
All of this is especially tricky because you're what you're really
worried about is locality. In some cases, you may be better off
gc'ing things aggressively, because if you don't, you wont' be
able to reuse space that's in fast memory. In other cases, you
may be better off *not* gc'ing things aggressively, because the
live data will just be paged out to slower storage, and not be
touched again---normal caching may work okay. What you have to
know is not just lifetimes, but future reference patterns.
Consider a case where you have both a fair amount of live data
and a fair amount of garbage. If the live data will not be touched
again before they become garbage, you're likely may be better off
letting it be paged back to slower storage, because you won't
page it in again anyway. If they're likely to be touched again,
though, you're better off compacting the data while it's still in
cache, and getting the space back.
This also interacts with the costs of misses during allocation.
In some systems, they're cheap, because write allocate caches
let you create blocks in the cache, without reading their old
garbage contents in from slower memory. (See Appel's and Moss,
Tarditi, and Diwan's papers on this, but realize it's a little
more complicated than they make it sound). Similarly, if you
sbrk() or mmap() pages in a virtual memory, those "fresh" pages
don't have to be read in from disk.) You still have to pay
write-back costs to move someting out of slow memory to make
room for the "newly created" memory in fast memory, but you
don't take write faults on allocation.
The bottom line is that a GC has to guess three things:
1. How much live data is there to traverse, vs. garbage space to
get back---if it GC's a generation before it grows too large
for a level of memory, will it be worth the CPU costs?
2. If it lets the generation grow larger than the level
memory it's in now, what will the locality consequences
be. This includes the locality of the program per se---will
it ever traverse that data gain, and will the data be
clustered well? It also includes the locality of the GC
itself---when it eventually has to traverse the live data,
how much stuff will it have to traverse, and how well
clustered it will be.
The GC has to decide whether to let sleeping dogs lie, realizing
that they may wake up and become nasty, but they may just die
in their sleep.
>--
>David Stoutamire
>http://www.icsi.berkeley.edu/~davids
>