[gclist] Question from a newbie

Ravi Jonnalagedda RaviJ@crisnet.com
Tue, 6 May 2003 23:57:31 -0600


Thank you, this is an excellent definition, which is only strengthening
my basic understanding of the algorithms. Are there any refreed
publications in which someone discusses this at great lengths? For
example, in my categorization of the algorithms in their techniques, I
am currently placing refn counting, generational and incremental
techniques as "incomplete" as there is no "complete" reclamation of
discarded memory in the heap. I am placing the copying collectors, MS/MC
algorithms as more "complete" collectors as they reclaim memory from the
heap, provided the mutator does not change the live graph.
With my understanding at this basic levels, am I fair in categorizing it
this way? Or for my report, should I be categorizing these into more
real-time and stop-the-world collectors and do a survey of the
techniques based on this classification? Can someone please point me out
to some published sources which I can use for my research? I currently
am searching on ACM and IEEE digital libraries, and so far have come up
with very few helpful articles on this one. Maybe I am not looking in
the right places, or is my topic too base which does not merit published
work?
If all else fails, can someone give me an idea if I am even looking in
the right direction for research? Or is there any other interesting
topic in GC which I can do research upon?

Thanks for all the help in advance,

Regards,
Ravi

-----Original Message-----
From: Wang, Daniel Chen-An (Dan) [mailto:dcwang@agere.com]
Sent: Tuesday, May 06, 2003 9:42 PM
To: moss@cs.umass.edu; Ravi Jonnalagedda
Cc: gclist@lists.iecc.com
Subject: Re: [gclist] Question from a newbie



At 10:43 PM 5/6/2003, Eliot Moss wrote:
>I think by "comprehensive" you mean what I have usually called
>"complete", namely that the GC guarantees to reclaim all garbage
>eventually. Here are some things I know, since I have worked on
>algorithms w.r.t. this property:

Just to be a nit-picker, but what do you define as "garbage"? If you
define
garbage to be any memory that will never be touched by the program in
the
future, then there is no algorithm that has this complete garbage
collection property. You basically have to solve the halting problem to
collect all the "garbage" under this definition.

If you take the more traditional definition of garbage to be memory not
reachable from any valid set of pointer deferences, then your notion of
"complete" is perfectly sensible. However, the reclaim the garbage
"eventually" definition is too loose. As I can implement a "complete"
garbage collector by just waiting until the program terminates and
reclaim
all the space then and there.

A tighter formal definition of "complete" collection requires you to
define
a procedure to estimate the space complexity of a program based on your
favorite notion of what "garbage" really is and a ideal GC. Then a
particular GC is complete iff it reclaims enough space so that the
amount
of memory used by the program is asymptotically the same as the ideal
GC.

If your program only allocates a constant amount of space, then
reclaiming
all the space at the end is just as good as an ideal GC that reclaims
any
unreachable data after every operation. Likewise, if you have a RC
scheme
that doesn't handle cycles but can prove that the number of cycles is
statically bounded, then a RC scheme could be considered "complete" for
a
particular class of programs.

It is the case the generational GC are "sloppy" in that they may hold on
to
more data that a non-generational GC at any given point in the program,
but
I think it's fair to say that this sloppyness doesn't change the
asymptotic
space usage.

If you want to categorize GC schemes, I think the two dimensions you
want
to think about are an asymptotic notion of "completeness" as I've
defined
above, and another notion of "promptness". RC is prompt because they
reclaim garbage as soon as possible.

Copying/mark sweep are complete but not as prompt as RC when the
programs
can be both handled by RC and GC. Generational schemes are less prompt,
but
sometimes being tardy is a good thing, if the amount of work you do is
related to the amount of "live data" and not the amount of garbage.