[gclist] Dgc algorithm for catching cycles
Arturo Bejar
arturo@communities.com
Fri, 7 Jun 96 17:47:39 +0100
Hi,
I apologise if I had not introduced myself earlier, but I've been working
hard to try and get this posted for everyone as soon as possible. My name
is Arturo Bejar and I am one of the senior engineers at Electric
Communities. I was responsible for implementing our Distributed Garbage
Collector for E, which is a series of extensions to Java that allow for
optimistic, distributed and secure applications.
I just got permission from EC to post the algorithm to the list in the
interest of contributing to the field. For those interested in the
intelectual property of the algorithm look at the end of the posting.
First some quick background:
- Piquer and Birrel did all the foundation work in using IRG for the
purpose of distributed garbage collection. Fuchs contributed to it in his
phd disertation, the url is: http://galt.cs.nyu.edu/students/fuchs, you
can find references to Piquer and Birrel in his paper.
- If you combine that with the local garbage collector, which makes
inexpensive cleanup of acyclical garbage, like Giuliano Carlini did in
his earlier posting, after some time all you have lying around is cycles.
Actually his algorithm is very similar except for one detail, it was a
good posting.
The idea for the algorithm came from topology, to be precise a
Topological Sort of a Hamiltonian Chain. A Hamiltonian Chain is where you
can come up with a full graph traversal where you touch each node once. A
topological sort is where you take cycles in the graph and you reduce
them. (massive simplification, for all the fellow mathematicians out
there I'll be happy to expound)
So the starting point is to discover cycles as efficiently as possible,
so if there is a redundant path to a node in the IRG, that branch is
disregarded for the evaluation. It turns out that if an object is in a
cycle, all of the branches in the IRG are disregarded.
Here's the algorithm:
We depend on the local garbage collector to clean up its pointers to
remote objects when they're not being referred to by any local/persistent
root.
A suspect node is one that is only referenced remotely and it refers to
other objects remotely, otherwise it couldn't be in a cycle.
When a suspect node [a] is to be inspected it generates a big random
identifier (n) and sends a AreYouRooted(n) message is sent to all the
objects in the IRG of [a].
For every object in the IRG of [a] including it the following rules apply:
1. Whenever an object [x] receives an AreYouRooted(n) message:
1.1 If the object is locally rooted in that machine it replies with a
Yes(n) message.
1.2 Otherwise, if it is the first time it has heard of (n) it stores the
identifier and sends a AreYouRooted(n) message to all the objects that
are referring remotely.
1.3 *this is the key* If it has (n) already stored it replies with a
Disregard(n) message.
2. For every object in the IRG of [a] except [a] that is processing (n):
2.1 If a single Yes(n) message comes back it replies with a Yes(n).
2.2 Otherwise if all the responses it was waiting for are Disregard(n),
it replies with a Disregard(n).
3. For [a]
3.1 If a single Yes(n) comes back it becomes remote rooted and waits for
either a remote reference to it to drop, or time to pass, or any other
criterion before becoming suspect again.
3.2 If all the responses it was waiting for are Disregard(n) it
self-destructs, with it breaks it and the cycle gets cleaned up by
consequence.
This is deadlock free, inexpensive and can be used asyncrhonously. And
actually the lazier you are about running it the better because you can
be pretty sure that after a while the cyclical garbage will be left lying
around.
To reestablish a node as suspect once it has been established as remotely
rooted you can either watch for remote references to it change, specially
dropped references.
The issue of the limit of a number of times a node is touched over a
large number of concurrent gc operations can be addressed in different
ways, the first one is by creative use of timing, ie waiting for some
time befor actually passing on a request, specially if it is the starter
node of an operation because then you cal leverage the evaluation result.
Note that we don't actually block because that can lead to a deadlock
situation. The other one which is in the current implementation is that
we wait for a number of requests to pile up before sending them out, so
rather than sending n messages at any time, so we group the requests
together so any given node is touched once and we keep the number of
network messages down.
Electric Communities is seeking intellectual property protection on the
key aspects of the algorithm, at the same time it is more than open to
discussion of very liberal licensing terms for interested commercial
parties.
For those interested in implemented optimistic, secure, distributed
computation in a cool language please check out our web site at
http://www.communities.com.
Please let me know what you think, I hope that this is useful to the
group.
Arturo
___________________________________________________________________
Arturo Bejar "What you can see, you can achieve."
arturo@communities.com - J. Verne, sort of