[gclist] Question from a newbie

Wang, Daniel Chen-An (Dan) dcwang@agere.com
Tue, 6 May 2003 23:41:44 -0400


At 10:43 PM 5/6/2003, Eliot Moss wrote:
>I think by "comprehensive" you mean what I have usually called "complete",
>namely that the GC guarantees to reclaim all garbage eventually. Here are
>some things I know, since I have worked on algorithms w.r.t. this property:

Just to be a nit-picker, but what do you define as "garbage"? If you define 
garbage to be any memory that will never be touched by the program in the 
future, then there is no algorithm that has this complete garbage 
collection property. You basically have to solve the halting problem to 
collect all the "garbage" under this definition.

If you take the more traditional definition of garbage to be memory not 
reachable from any valid set of pointer deferences, then your notion of 
"complete" is perfectly sensible. However, the reclaim the garbage 
"eventually" definition is too loose. As I can implement a "complete" 
garbage collector by just waiting until the program terminates and reclaim 
all the space then and there.

A tighter formal definition of "complete" collection requires you to define 
a procedure to estimate the space complexity of a program based on your 
favorite notion of what "garbage" really is and a ideal GC. Then a 
particular GC is complete iff it reclaims enough space so that the amount 
of memory used by the program is asymptotically the same as the ideal GC.

If your program only allocates a constant amount of space, then reclaiming 
all the space at the end is just as good as an ideal GC that reclaims any 
unreachable data after every operation. Likewise, if you have a RC scheme 
that doesn't handle cycles but can prove that the number of cycles is 
statically bounded, then a RC scheme could be considered "complete" for a 
particular class of programs.

It is the case the generational GC are "sloppy" in that they may hold on to 
more data that a non-generational GC at any given point in the program, but 
I think it's fair to say that this sloppyness doesn't change the asymptotic 
space usage.

If you want to categorize GC schemes, I think the two dimensions you want 
to think about are an asymptotic notion of "completeness" as I've defined 
above, and another notion of "promptness". RC is prompt because they 
reclaim garbage as soon as possible.

Copying/mark sweep are complete but not as prompt as RC when the programs 
can be both handled by RC and GC. Generational schemes are less prompt, but 
sometimes being tardy is a good thing, if the amount of work you do is 
related to the amount of "live data" and not the amount of garbage.