[gclist] Re: gclist-digest V1 #70

Arturo Bejar arturo@communities.com
Thu, 13 Jun 96 15:58:14 -0700


>I reiterate my comment that you might as well go forward because it is less
>costly.

If you go forward and discover that you are in a cycle, then what?, there 
is no way to tell from the forward reference graph wether you're alive or 
not.

It our case there is no such expense since in the implementation we 
generate the inverse reference graph and the messages sent backward are 
as expensive as the ones sent forward.

>More importantly, I think this algorithm is incorrect.  The mutator can be
>concurrently affecting the graph in a way that tricks this algorithm.
>Consider this simple graph:
>
>           x -----> A -------> B ------> C
>                    ^                    |
>                    |                    |
>                    ----------------------
>
>x is a root pointer accessible to the mutator.  Each object A, B, C has a
>field 'next' pointing to the next one in the circular list.  The following
>actions occur:
>
>        C sends areYouRooted to B
>        x := x.next.next     (x now points to C)
>        B sends areYouRooted to A
>        x := x.next.next     (x now points to B)
>        A sends areYouRooted to C
>        C responds Disregard to A
>        x := x.next          (x now points to C)
>        A responds Disregard to B
>        x := x.next          (x now points to A)
>        B responds Disregard to C
>        C self-destructs
>        x := x.next.next     Program crashes!
>
>To make the algorithm correct you need to add a write barrier between the
>forward and backward pass.  Even then, I would not trust it until I see a
>proof and an implementation.

No you don't, I posted last week how we handled these situations in the 
concrete implementation, we've thrown very patological cases to the dgc 
and it hasn't had any problems. In fact this algorithm implementation is 
very mutator friendly (in the distributed sense) because the way the 
destruction is the same as a network failiure, we avoid terminate 
messages because those can be spoofed. What you point out was initally 
brought to light by Rick Hudson, here's a repost of the reply (for those 
of you who have already read the reply there's nothing new here).

Posted last week:

Rick,

Thanks for your feedback. As you probably know a distributed system there 
is not much one can do about constantly changing references, there is no 
inexpensive way to establish a lock like you can with a local gc.

In a way, because of network latency, the algorithm is run for a very 
relative time t on the network, so on the way out the reference graph 
will be at time t for the first node, (t + latency), for any node node (t 
+ latency1 + ...), and then you add them on the way back, this would 
apply to any algorithm.

There are a number of ways to address your point, the formal way is that 
[A], as any other node can detect a change in the set of objects directly 
referring to it, and you have to choose a policy to address them, i.e. if 
while A is waiting for a response from B it notices that a new reference 
to it (from [M]) has showed up, so it sends a message with the same (n) 
to [M], which will be replied postively. If [M] is the same machine 
boundary as [A], [A] will just become locally rooted again.

We can tell these changes easily because we are using the mark & sweep of 
the local Java GC to build the inverse reference graph of a specific set 
of objects, which we then work on.

The second way of handling it is implementation related, we don't 
actually destroy A but we make it drop the remote references that it 
holds and the remote references to it (the pointer to B in this case), 
something that could happen in a network failure or partition, as is 
usual with distributed programming any object that has to talk to another 
across a machine boundary needs to be able to cope with failiure. Then we 
let the local GC clean it up.

We could follow a more agressive strategy for cleaning up the cycle by 
sending terminate messages, but those could be spoofed, if all of the 
machines that were running the algorithm were in a trusted boundary, like 
a company, then we can pursue much more agressive collection strategies.

Let me know what you think,
Arturo


___________________________________________________________________
Arturo Bejar                   "What you can see, you can achieve."
arturo@communities.com                          - J. Verne, sort of