[gclist] Dgc algorithm for catching cycles

Arturo Bejar arturo@communities.com
Fri, 7 Jun 96 17:47:39 +0100


Hi,

I apologise if I had not introduced myself earlier, but I've been working 
hard to try and get this posted for everyone as soon as possible. My name 
is Arturo Bejar and I am one of the senior engineers at Electric 
Communities. I was responsible for implementing our Distributed Garbage 
Collector for E, which is a series of extensions to Java that allow for 
optimistic, distributed and secure applications.

I just got permission from EC to post the algorithm to the list in the 
interest of contributing to the field. For those interested in the 
intelectual property of the algorithm look at the end of the posting.

First some quick background:

- Piquer and Birrel did all the foundation work in using IRG for the 
purpose of distributed garbage collection. Fuchs contributed to it in his 
phd disertation, the url is: http://galt.cs.nyu.edu/students/fuchs, you 
can find references to Piquer and Birrel in his paper.

- If you combine that with the local garbage collector, which makes 
inexpensive cleanup of acyclical garbage, like Giuliano Carlini did in 
his earlier posting, after some time all you have lying around is cycles. 
Actually his algorithm is very similar except for one detail, it was a 
good posting.

The idea for the algorithm came from topology, to be precise a 
Topological Sort of a Hamiltonian Chain. A Hamiltonian Chain is where you 
can come up with a full graph traversal where you touch each node once. A 
topological sort is where you take cycles in the graph and you reduce 
them. (massive simplification, for all the fellow mathematicians out 
there I'll be happy to expound)

So the starting point is to discover cycles as efficiently as possible, 
so if there is a redundant path to a node in the IRG, that branch is 
disregarded for the evaluation. It turns out that if an object is in a 
cycle, all of the branches in the IRG are disregarded.

Here's the algorithm:

We depend on the local garbage collector to clean up its pointers to 
remote objects when they're not being referred to by any local/persistent 
root.

A suspect node is one that is only referenced remotely and it refers to 
other objects remotely, otherwise it couldn't be in a cycle. 

When a suspect node [a] is to be inspected it generates a big random 
identifier (n) and sends a AreYouRooted(n) message is sent to all the 
objects in the IRG of [a].

For every object in the IRG of [a] including it the following rules apply:

1. Whenever an object [x] receives an AreYouRooted(n) message:

1.1 If the object is locally rooted in that machine it replies with a 
Yes(n) message.

1.2 Otherwise, if it is the first time it has heard of (n) it stores the 
identifier and sends a AreYouRooted(n) message to all the objects that 
are referring remotely.

1.3 *this is the key* If it has (n) already stored it replies with a 
Disregard(n) message. 

2. For every object in the IRG of [a] except [a] that is processing (n):

2.1 If a single Yes(n) message comes back it replies with a Yes(n).

2.2 Otherwise if all the responses it was waiting for are Disregard(n), 
it replies with a Disregard(n).

3. For [a]

3.1 If a single Yes(n) comes back it becomes remote rooted and waits for 
either a remote reference to it to drop, or time to pass, or any other 
criterion before becoming suspect again.

3.2  If all the responses it was waiting for are Disregard(n) it 
self-destructs, with it breaks it and the cycle gets cleaned up by 
consequence.

This is deadlock free, inexpensive and can be used asyncrhonously. And 
actually the lazier you are about running it the better because you can 
be pretty sure that after a while the cyclical garbage will be left lying 
around.

To reestablish a node as suspect once it has been established as remotely 
rooted you can either watch for remote references to it change, specially 
dropped references.

The issue of the limit of a number of times a node is touched over a 
large number of concurrent gc operations can be addressed in different 
ways, the first one is by creative use of timing, ie waiting for some 
time befor actually passing on a request, specially if it is the starter 
node of an operation because then you cal leverage the evaluation result. 
Note that we don't actually block because that can lead to a deadlock 
situation. The other one which is in the current implementation is that 
we wait for a number of requests to pile up before sending them out, so 
rather than sending n messages at any time, so we group the requests 
together so any given node is touched once and we keep the number of 
network messages down.

Electric Communities is seeking intellectual property protection on the 
key aspects of the algorithm, at the same time it is more than open to 
discussion of very liberal licensing terms for interested commercial 
parties.

For those interested in implemented optimistic, secure, distributed 
computation in a cool language please check out our web site at 
http://www.communities.com.

Please let me know what you think, I hope that this is useful to the 
group.

Arturo

___________________________________________________________________
Arturo Bejar                   "What you can see, you can achieve."
arturo@communities.com                          - J. Verne, sort of