[gclist] Distributed Garbage Collection Revisited 1

Matthew Fuchs matt@wdi.disney.com
Tue, 2 Jul 96 10:25:49 PDT


Helena:
> 
> My name is Helena Rodrigues. I am a PhD student at the University of Kent, UK.
> I am working on distributed garbage collection as well.
> I am implementing a distributed algorithm for collecting cycles on the Network
> Objects system from DEC. My algorithm is based on the work by Richard Jones
> and Rafael Lins on partial mark-sweep and the idea of groups.
> 

Will you be making a description of your algorithm available over the
Web, or post it here?  Is this similar to Lang, et al.'s algorithm of
ever growing groups?

> I was studying all the stuff lately presented and I have some
> comments/questions about it. I think they are addressed to all the authors/works
> presented.
> 

One point which has not been fully addressed, although I have
mentioned it, is the presence of garbage processes.  This is discussed
in a paper by Dennis (?) Kafura on garbage collecting actors.  In this
case, a process is only garbage if it neither "sees" nor is "seen" by
a local root somewhere.  If a process can communicate with a live
object, then it can always pass it a pointer to itself or some other
object, making that object alive.

Without garbage processes, anytime an object finds out about a new
reference to it, it knows it is alive.  With them, it must always
check the new reference to find out if it is just the gyrations of a
garbage process which will soon be collected as well.

This must be considered in any GC for the Internet.

> You all seem to agree about the need of some barrier to catch the changes of
> the local IRG. Giuliano said:
> 
> >However, while the local mutator can't make a stub locally reachable, some remo
> te mutator may.
> >A locally unreachable stub's IRG can be altered, if the space recieves a messag
> e which
> >mentions one of the scions which can reach the stub. This can be detected by ma
> intaining a
> >barrier. Any scion mentioned by a message is marked dirty. Among other conditio
> ns, a stub is
> >considered reachable if one of the scions in its IRG is marked dirty. This may
> temporarily
> >prevent an unreachable stub from being collected. But that isn't a big deal. Wh
> at is important
> >is that this guarantees that a stub is not reclaimed too early.
> 
> >g
> 
> I think that the degree of conservatism you mentioned here is important. In
> your back search you can find some object marked locally reachable by the last
> local collector which is not locally reachable in the current state. This would
> make you giving up and some amount of work will be wasted!
> 

Since computation is not halted during DGC, it is inevitable that some
garbage will not be collected, although one should certainly try to
minimize this.  This is partly an implementation issue.  The local
system can (should?) choose to perform a local GC, or briefly delay
the algorithm until one can be performed, before continuing the
distributed trace or replying.  

> Furthermore, I think the algorithm is more complicated. I think there is a
> problem of termination introduced by the barriers. It may happen that a local
> collection is performed during the first or second trace. This local collection
> would delete the "dirty" information and consequently live objects can be
> wrongly collected or it would be necessary another confirming trace, and
> another.....
> 
I assume by "confirming trace" you mean a local collection from the
scions?  There are two ways to address this:
1) Use a different bit for marking nodes involved in a DGC.
2) Continue performing collections as before, which involves tracing
   from the scions.  This will remark all the necessary nodes.

> Matthew Fuchs's algorithm, whenever a new incoming reference appears, it
> starts a new distributed trace along that path instead of doing a confirming
> trace. In this case, how do you terminate?
> 

I am not sure I understand.  The algorithm would only not terminate if
an infinite number of new incoming references appear before the
distributed trace terminates.  The new incoming reference can only
come to a scion, since there cannot be a remote reference to any
object without a scion.  The new incoming reference can only belong to
either:

1) A remotely rooted object.  If there are no garbage processes, we
can just use this as proof the object is alive.

2) A remote process, or one manipulated by a remote process, but we
don't know if the process is alive or garbage.  On one level I can
argue that processes cannot find out about new network nodes without
knowing about a name service, which would mean the process is alive.
So if the original object was garbage when the collection began, the
number of new references which could arrive (since it is only one
remote reference per node) is strictly limited, and if the number
rises unreasonably it can just assume it must be alive (since the race
condition which could create an "infinite" number of references means
it must have been alive at the start of the collection).

On the other hand the real question is, how do we deal with runaway
processes on the network?  Obviously something must be done to make
sure their behavior is bounded.  Also, nodes need to be able to defend
themselves, such as refusing to consider new references to an object.
We design algorithms for a clean, nice network, but it won't be the
case.  One of the reasons I chose to trace the inverse reference graph
was to allow each object/node in the network to know who it was
dealing with.  Since each object knows who is looking at it, it knows
who it is dealing with, and can choose how to behave accordingly (such
as charging money to accept an incoming reference).  I've compared the
future Internet to midtown Manhattan at lunch.  Caveat emptor.

Matthew Fuchs
matt@wdi.disney.com
http://galt.cs.nyu.edu/students/fuchs
Mobile distributed objects, distributed coordination, and lots and
lots of languages