[gclist] Distributed Garbage Collection Revisited...

Gustavo Rodriguez-Rivera grr@cs.purdue.edu
Sun, 30 Jun 1996 17:12:52 -0500 (EST)


These are my comments (that I hope are not too late) about
what has been said so far about distributed garbage collection with
back tracing:

---
> From Arturo:
>
>> From Gustavo
>>In my approach all the state related to a back tracing is represented
>>by a single token-message that has a list of the ids of the proxies
>>involved in the back tracing. The token-message is first created for
>>a suspect. When the token is created, all the ids of the proxies whose
>>scions reach the suspect are added to the token. The token is sent to
>>the node where the next unvisited proxy lives.
>
>Does sending this information around raise any security concerns?
>

Yes. It raises some security concerns that I am still working on. In one
side it is necessary to make sure that the information in the token has no
other use than for garbage collection. That could be accomplished by using
private and public keys so the node that receives the token can access only
the ids it is interested in. In the other side it is necessary to make sure that
the tokens can not be maliciously generated. This problem also exists
in the other approaches. I am still working on that but I am sure the problem
of security can be worked out.

---
> From Matthew:
>
>> From Gustavo
>> More than a year ago (April 28, 1995) in my preliminary
>> examination I proposed an algorithm for distributed garbage
>> collection similar to the one Carlini and Fuchs describe.
>> You can retrieve the report that I handed in to the members
>> of my committee prior to the preliminary examination from:
>>
>>    http://www.cs.purdue.edu/homes/grr/back-tracing-draft.ps
>>
>
>Does this mean that your algorithm will be the subject of your
>dissertation?
>

 Yes it does. I hope to have a working prototype in the next few months. 
Currently I am modifying our conservative garbage collector to
obtain the local back references. This is something similar to the RV
collector that Giuliano proposes. Also as a first step I am going
to implement it in Lingua Franca, our distributed system, and later
implement it in ORBIX. The later will be difficult since I am
not able to modify the ORBIX internals. My guess is that I will need to
instrument the code of each method call to add code for object
ids exchange and for the write barrier.

>> 
>> Also I would like to make the implementation of the distributed
>> garbage collector work transparently with a commercial CORBA
>> implementation such as ORBIX.
> 
>This is a great idea.  I would suggest we start looking at standard
>interfaces for DGC.
>

 Good idea. Pretty soon commercial object oriented distributed systems
will feel a strong need for distributed garbage collectors. Having an
interface for distributed garbage collection ready that fits everybody
will be useful.

> Have you considered a Linda implementation of your algorithm?  I'm not
> a Linda fan personally, but some of the ideas in your approach sound
> like they might map really nicely to Linda's distributed structures.
> 
> You might even be able to have a gc arise in a completely asynchronous
> fashion, i.e., objects start looking for gc tuples when they think
> they might be garbage, and any object in the operation which discovers
> it is alive can eat the tuple at once.  
>
> I can't speak to the efficiency of it, but it could be a pretty
> elegant approach.

Thanks for the suggestion. This could give raise to an interesting experiment.
Unfortunately I may not be able to implement the garbage collection algorithm
in Linda because of limitations in time. However, I will use in my thesis
the comparison between the tuples in Linda and the tokens in my algorithm.
Thanks again for this idea.

-----
>
> From Giuliano:
>
> Looks like good work. I'll be looking at it more carefully in a few days.

Thanks.

> I've got a couple of quick comment though.
>
>> From Gustavo:
>>It stores all the
>>back-tracing status in a single token-message to allow simultaneous
>>back-tracings (iii.). It uses the RPC counter of exported objects as
>>the barrier to verify the validity of back-references (iv.).
>
>At first, I though about using this approach. I mailed a prior version 
>using this suggestion to Paul Wilson asking for his impressions. I 
>scrapped it for two reasons. First, due to the problem you note: the 
>message size increases with the length of the chain. As a chain becomes 
>very long, the cost of transmission becomes very expensive. It has been 
>far to long since I studied complexity theory, so I may be wrong on the 
>following. I believe that the cost of transmission is the sum of 1 to N = 
>N/2(N+1). This "reduces" to a cost of O(N^2). I believe the transmission 
>costs of algorithm's which I, Matt, and Arturo propose are O(N).
>
>
> While in practice it is unlikely for the back-trace messages to get large, 
>it will happen occasionally. I'm not sure, but I fear your approach will 
>fail catastrophically with this approach.
>

Yes. This can cause a long token. However, I was thinking that the token does
not need to physically travel around. It can physically reside in
the node that started the back tracing and travel around just logically.
The node can request other nodes the back pointers it needs to construct the
token. In this case the cost in space is O(n) and the cost of transmission is
the number of edges of the graph. The cost is similar to the one of the algorithm
you propose.

A hybrid solution could be to start the token as before, and then when the
token reaches some length, send the token back to the node that started the
back search. Then the construction of the token is resumed but now passing the
token around just logically.

> The second reason is that this approach is likely to duplicate the 
>back-search many times. It is very likely that several nodes on the cycle 
>will start a search "at the same time". If you allow them all to proceed, 
>this increases the cost to O(N^3). If you suspend or abandon some, you 
>must deal with deadlock.

A mechanism can be made where the start of a back search is delayed by
some back-off time if another back search is in progress. It is still possible that
more than one back trace still could run simultaneously. However only in very
improbable cases all of the suspects will start one back-trace at the same time.
Observe that delaying a back trace does not causes deadlock since the
back-trace is not suspended.

In cases where a lot of things can go wrong, it is not a bad idea to have
some extra redundancy. For example TCP/IP has a lot of redundancy but it also
has heuristics to reduce the extra overhead that comes with it. I think
this is why it has worked so well.

> If you continue with this approach, you may want to consider an 
> optimization I rejected in favor of my current approach. You need only add 
> a zombie to the list if it has more than one reachable stub. This works 
> because the only way to reach a scion in this list is through either the 
> original zombie or one which branched out. Thus, the cycle would have been 
> detected at those points, and there is no need to include scions which 
> reach a single stub in the message. While the cost is still O(N^2), the 
> proportionality constant is dramatically reduced.

Thanks for the suggestion. However, the zombie with
one reachable stub still has to be added to the token for the verification phase.
Otherwise there is no way to detect changes in the inverse graph when the changes
are done through this zombie.

Maybe it seems that I am very stubborn to have all the state of a back trace in
a single place. But a lot of things can go wrong specially in a network of the size
of the internet. If the state of a back-trace is aborted for some reason (network
crash, system crash etc.), you have to deal with cleaning up all the previous state.
Having all the state in a single place makes the cleaning up process easier. However,
if the state of the back trace is distributed all over the place, the clean up process
will be painful. Specially in your approach, since only one back-trace can take place
at once, and no other back-trace can start until the current one ends.

I am getting a lot of insight from this discussion.
It is great that you brought up distributed garbage collection to this forum.

	--Gustavo

----

P.D. Here are some more comments to Giuliano's first proposal.

>
> From Giuliano:
> REMOTE VISIBILITY SCANNER
>
>The remotely visible scanner uses the scions as it roots. It is
>responsible for setting the remotely visible bit for all objects
>it scans. It is also responsible for adding stubs to visas' reachable
>list, and visas' to the stubs reached from list. If one of these lists
>is too small, it may not be possible to allocate memory to expand them
>to add new entries. In this case, the RV scanner will increment the
>deficiency counter for the list. After the local GC completes and the
>allocator state is consistent, these lists can be expanded. They will
>then be ready for the next run of the RV scanner.

This is implementation dependent. The implementations of garbage
collectors I have seen allow allocation of scratch memory during
garbage collection. During a collection it is sometimes necessary to
allocate extra memory for the mark stack, dirty bits, black lists etc.
Of course you may be thinking of implementing your algorithm using a local
garbage collector that does not allow memory allocation during garbage collection.


>From Giuliano:
>
>  AVOIDING DEADLOCK:
>
> The last problem to handle is avoiding a deadlock. A deadlock can occur
> when several zombies in a cycle start back-searches. As each comes
> upon the others visa's, they will see that the visa's back-search id
> is non null and not equal to their id, and suspend. Deadlock.

Have you taught about using a total partial order in the back-search ids?
That means having a way to compare one id against each other and the
back search with the maximum id is the one that progresses and the other
ones are suspended. Matthew proposes this in his paper.