[gclist] Re: gclist-digest V1 #70

Gustavo Rodriguez-Rivera grr@cs.purdue.edu
Sun, 9 Jun 1996 20:54:55 -0500 (EST)


Dear GC-LIST members,

My name is Gustavo Rodriguez-Rivera. I am a Ph.D. student
in the Computer Science Department at Purdue University.

More than a year ago (April 28, 1995) in my preliminary
examination I proposed an algorithm for distributed garbage
collection similar to the one Carlini and Fuchs describe.

You can retrieve the report that I handed in to the members
of my committee prior to the preliminary examination from:

    http://www.cs.purdue.edu/homes/grr/back-tracing-draft.ps

and the corresponding slides I used during my preliminary
examination:

    http://www.cs.purdue.edu/homes/grr/back-tracing-talk.ps

I invite you to read the report and send me your comments.

I did not want to publish this document before I had a working
implementation that could prove the usefulness of the algorithm.
However given the circumstances I am making this document available
to the public before having a working implementation.

Currently I am implementing the distributed garbage collecting
algorithm using a conservative garbage collector of our invention

  http://www.cs.purdue.edu/homes/grr/NON-INT-GC-MSPLS-95-ABSTRACT
  http://www.cs.purdue.edu/homes/grr/NON-INT-GC-MSPLS-95-SLIDES

and the Lingua Franca Distributed System

	http://www.cs.purdue.edu/homes/russo/LF.ps

Also I would like to make the implementation of the distributed
garbage collector work transparently with a commercial CORBA
implementation such as ORBIX.



       DIFFERENCES BETWEEN CARLINI's, FUCHS', AND MY APPROACH
       ------------------------------------------------------

Carlini's, Fuchs' and my approach share the same concept of
distributed garbage collection through back-tracing. However
there are several differences in the proposals.

I consider that in order to implement the back tracing algorithm,
there are at least four issues that have to be solved:

	i.   How to obtain the back-references to do back-tracing.
		Local-back-references and Remote-back-references

	ii.  How are garbage suspects (zombies or GCRs) determined?

	iii. Need to synchronize multiple back-tracings running at
             the same time

	iv.  Back-references may change during back-tracing


1) Fuchs in his paper

	http://galt.cs.nyu.edu/students/fuchs/dist-gc.ps

describes the basic back-tracing algorithm and addresses ii. and iii.
However he assumes that back-references are already given (i.).
He also does not explain how to handle changes in the back-references
while the back-tracing is going on (iv.). The later could cause live
objects to be erroneously collected. 

Carlini's proposals (ftp://iecc.com/pub/gclist/messages) addresses all
these issues. However his solutions for iii. and iv. are not yet
complete.

My approach addresses the four issues.
It uses auxiliary tracing to obtain local-back-references and
scion-stub pairs for remote-back-references (i.). Also suspects are
chosen among the exported objects that are not reachable locally and
that have not been called for some time (ii.). It stores all the
back-tracing status in a single token-message to allow simultaneous
back-tracings (iii.). It uses the RPC counter of exported objects as
the barrier to verify the validity of back-references (iv.).

The solutions for i. and ii. are similar to Carlini's. The solutions
for iii. and iv. are different.

Additionally in the report I give a proof of correctness and progress
of the algorithm.


2) Carlini's approach has problems with iii. A deadlock may occur
if several zombies in a cycle start back searches at the same time.
He tries to solve this problem by letting only one back-search to
continue when a cycle is detected. The details of how to do this are
part of a future work. 

In my approach all the state related to a back tracing is represented
by a single token-message that has a list of the ids of the proxies
involved in the back tracing. The token-message is first created for
a suspect. When the token is created, all the ids of the proxies whose
scions reach the suspect are added to the token. The token is sent to
the node where the next unvisited proxy lives. Then it determines if
any of the proxies in the list is reachable from a local root. If this
is the case, the token is discarded. Otherwise the current proxy is
marked visited in the token, and all the proxy ids of the unvisited
proxies whose scions reach the current proxy are added to the token.
When all the proxies in the token have been visited, the objects these
proxies represent are garbage. However this is only true if the
back-references have not changed since the token was created.
To verify that the back-references have not changed, a verification
phase starts. The verification phase is described below. (If this
explanation seems unfamiliar to you, I would recommend you to read
the report first).

Having all the state of the back tracing stored in a single
token-message and not distributed among nodes has the advantage that
no clean up is necessary when abnormal events such as a system,
network crash, or lost messages happen.

In the other hand, the maximum size of the token-message in the
worst case is the maximum number of edges that separate two objects
in a graph. This could be a problem if a cycle includes too many
objects.

3) Carlini's solution does not completely solve iv. Back-references
do not change only when new scions are added. Back references may also
change as a result of remote procedure calls (RPC) in zombies.

    From Carlini's:

    "But, there is a race condition. A back search may examine a stub,
    and start searches on each scion which can reach the stub. Each of
    these returns that no local reference was found. The back-search
    reports that no local reference was found. Then, a new scion is
    created which can reach the stub. The stub is now reachable, so the
    zombie is reachable. But, the back search for the stub has already
    reported that the stub was unreachable. Thus, the zombie may be
    collected."

    "The solution is to make a second pass, and to implement a "new
    scion" barrier. This is similar to the write barrier used by
    local collectors to determine that they must rescan an area of
    memory."

    "Whenever a stub has been scanned, the space is notified that new
    scions should be marked dirty. If the dirty scion is the first for
    the target, then the RV scan must be run on the scion. We then need
    only look at the target's visa. Any stub cross referenced by the visa
    needs to be marked as reachable. When the back-search returns to the
    stub, it must wait for all dirty scions created after the first scan
    to be checked. After they have, it can report on whether it is
    reachable or not."

Carlini's solution will fail in the following case:
For example, assume that zombie z1 reaches stub (proxy) p2.
Then a back search examines p2, and start a search on scions that
reach p2. It returns that no local reference was found. Then there is
an RPC on z1 that assigns to a global variable a pointer to p2. Now p2
is reachable from a local root. The back-search starts the second pass.
Since no scions were created, the back search reports that p2 is
unreachable.

To solve this problem I propose a solution where an RPC counter
is used as a barrier to detect modifications in the back-references.
Every exported object has an RPC counter that is incremented every
time an RPC on this object occurs or when a scion for this object is
created. If this counter is incremented since the last time the back
references were determined, the back-references below this object are
invalid.

Note that the only way a graph of objects that is reachable from
scions and not from local roots can be modified is through RPCs or
scion creation.

In my solution, when a proxy id is added to the token-message,
the corresponding RPC counter of the object is also added to the token.

During the verification phase, the RPC counters in the token are
compared to the corresponding RPC counters of the objects. If they are
equal, the objects described in the token are garbage. Otherwise,
the information used for the back tracing is no longer valid and the
token-message is discarded.

I recommend to read the report to understand better the ideas
described here.

I strongly think that a running prototype is necessary to completely
show that the algorithm is correct, works, and it is useful.

My apologies for not making this information available before.

Also my appreciation to Matthew Fuchs and Giuliano Carlini for their
illuminating ideas and for bringing distributed garbage collection to
the attention of this forum.


	Sincerely,

		Gustavo Rodriguez-Rivera
		Ph.D. Student
		Computer Science Department
		West Lafayette, IN 47907

		grr@cs.purdue.edu
		http://www.cs.purdue.edu/people/grr