[gclist] Distributed Garbage Collection Revisited...

Giuliano Carlini giuliano@ix.netcom.com
Wed, 03 Jul 1996 21:37:55 -0700


> Gustavo
> > Giuliano
> >following. I believe that the cost of transmission is the sum of 1 to N =
> >N/2(N+1). This "reduces" to a cost of O(N^2). I believe the transmission
> >costs of algorithm's which I, Matt, and Arturo propose are O(N).
> 
> Yes. This can cause a long token. However, I was thinking that the token does
> not need to physically travel around. It can physically reside in
> the node that started the back tracing and travel around just logically.

Yep. But, I wouldn't call this "a single token message". I think that it
leads very easily to my misunderstanding. You may want to use a phrase
that conotes that the a back traces information is sent back and maintained
by the initiating node.


> > The second reason is that this approach is likely to duplicate the
> >back-search many times. It is very likely that several nodes on the cycle
> >will start a search "at the same time". If you allow them all to proceed,
> >this increases the cost to O(N^3). If you suspend or abandon some, you
> >must deal with deadlock.

> Observe that delaying a back trace does not causes deadlock since the
> back-trace is not suspended.

I was wrong here. If you abandon, you can't deadlock. But, you do lose
useful information that will need to be recomputed. This will double
the cost on average.

> Thanks for the suggestion. However, the zombie with
> one reachable stub still has to be added to the token for the verification phase.
> Otherwise there is no way to detect changes in the inverse graph when the changes
> are done through this zombie.

If a change has occured to the IRG through a node, some scion will have been marked
dirty, and the back search can return that the nodes are reachable. If no changes
have been made, the token doesn't need to store the back-links because they are
present in the node.

> Maybe it seems that I am very stubborn to have all the state of a back trace in
> a single place.

Not at all. We need people with conviction to try multiple approaches. It's the
only way to discover what works in practice. While I believe that keeping all
the data in a single token won't work as well, I'd sure like someone to try it out
in practice.

> But a lot of things can go wrong specially in a network of the size
> of the internet. If the state of a back-trace is aborted for some reason (network
> crash, system crash etc.), you have to deal with cleaning up all the previous state.

I guess I don't see the difficulty here. If a node see's that it can't reach the
stub for a scion, it cleans up, and responds to the backsearch initiator that the
backsearch abandoned due to a net failure. This propagates down each discontinuous
segment of the backsearch.

> Having all the state in a single place makes the cleaning up process easier.

It also limits the scalability, which was one of my main concerns. On a
distributed system the size of the Internet this will be a real issue.

> However,
> if the state of the back trace is distributed all over the place, the clean up process
> will be painful. Specially in your approach, since only one back-trace can take place
> at once, and no other back-trace can start until the current one ends.

I think you misunderstood. My approach allows many simultaneous backsearches.
It even takes pains to merge those which are tracing the same cycle. My only
missing piece was avoiding deadlock. You address this later.

> This is implementation dependent. The implementations of garbage
> collectors I have seen allow allocation of scratch memory during
> garbage collection. During a collection it is sometimes necessary to
> allocate extra memory for the mark stack, dirty bits, black lists etc.
> Of course you may be thinking of implementing your algorithm using a local
> garbage collector that does not allow memory allocation during garbage collection.

I didn't want to rely on it. If I can allocate while GC'ing great. But I don't
want to make a mess out of the allocator to allow this. If I can make
the allocator substantially faster by giving up allocating inside the
collector, that's a trade I'm willing to make. The mutator, of course,
must be able to allocate while the GC thread(s) are running.

> > The last problem to handle is avoiding a deadlock.
> Have you taught about using a total partial order in the back-search ids?

I considered it, but I wanted something a bit more nondeterministic, so that
the work is not constantly handed off to the same nodes. I'll probably use
this simple compare at first though. If my DGC works, later versions can
use a better deadlock avoidance algorithm.

Thanks much Gustavo. You - like everyone else - have raised good issues.
Although my responses have been stated with great certainty, I'm actually
concerned about many of the issues you have all raised, and which I
hadn't thought of before. We'll see whose gut feel ends up being right.

g