[gclist] Distr. GC general discussion for FAQ (was re: Weighted RC)
Paul R. Wilson
wilson@cs.utexas.edu
Thu, 21 Mar 1996 09:08:54 -0600
>From majordom@iecc.com Thu Mar 21 05:13:07 1996
>To: Marc Shapiro <shapiro@prof.inria.fr>
>From: Christian Fabre <fabre@gr.osf.org>
>
>that we have a maximum number of copy per object.
>
>Let us assume a 32 bits architecture. Each object starts with a
>Weight Credit of 2^31, the Allocated Weight is divided by 2 at each
>duplication. In the worst case, if we duplicate 31 time we might end
>up with a duplicate having an AW of 1.
>
>1/2=0, so how do we proceed to duplicate that one?
>
>Or did I miss something obvious?
No, you're exactly right. The usual trick is to pin the weighted reference
count at 1, conservatively making that object seem "live" ever after,
and rely on a distributed tracing GC to reclaim the garbage. There are
also wieghting schemes other than binary splitting, which don't run
out of weight as fast.
We need some background here...
Maybe this should go in the FAQ, although it's
my own analysis and may be controversial:
---
Distributed garbage collection is tricky, because tracing all reachable
data in a distributed system can take a very long time. If it takes
too long, the GC may effectively fail---you can run out of storage
space holding on to garbage. This is exacerbated by a potential
thrashing phenomenon. If too much garbage accumulates, and must be
paged to disk, the system may slow down even more.
Many designs use reference counting or reference listing to avoid
distributed tracing. Since reference counting can't reclaim cycles,
this can lead to leaks if cycles span nodes.
Many people who advocate reference counting/listing advocate using it in
conjuction with a more complete tracing GC. A common scheme is:
1. Use a local tracing collector at each node to get back local
garbage quickly. ("Local" means not reachable from *any* remote
object, live or dead.) The local tracing collection reveals
pointers to other nodes for the distributed GC or GC's.
2. Use distributed reference counting or reference listing to get
back acyclic distributed garbage fairly quickly.
3. Use distributed tracing of some sort to get back distributed
cyclic garbage (and anything reachable from the cycles) eventually.
Note that this is generally a parallel incremental collector, whose
increments are done at local tracing collections.
(Some GC's leave out one of these 3 components.)
Typically, the local tracing has a generation-like character with
respect to the distributed GC's. It first propagates reachability
information from true roots, to find outgoing pointers that are definitely
live. Then it propagates conservative information locally, from the
set of incoming pointers from other nodes (which may or may not actually
still be live); this ensures that it doesn't reclaim anything that's
reachable in the distributed sense until the distributed GC has made
sure it's not reachable anymore.
Several potential problems lurk here.
First, distributed cycles are not unlikely.
It is fairly common for objects in distributed systems to have cyclic
connections, because objects that communicate with each other remotely are
likely to have pointers to each other, and often this communication is
bidirectional. For example, in client/server-style computing, an object
representing a server may have pointers to its clients, and the clients
of course have pointers to the server. More generally, directory
structures used to manage resources are likely to have pointers
to things on lots of nodes. So it's nice to use reference counting, but
you may still need a distributed tracing GC to get back the "hard stuff."
(Notice also that this favors nondestructive programming styles, and
bushy tree-like structures in preference to long linear lists.)
Second, long distributed lists are problematic.
A normal distributed tracing collection of a linear list can't go any faster
than a message send per cross-node link. If somebody creates a very long
distributed list---say, by doing a destructive and roughly fair merge
of two long lists on different nodes---the global distributed trace
may take much longer. A single user of a globally-collected system
may therefore impact the overall time to do a distributed GC for everyone.
Third, in the typical configuration, the distributed GC is dependent on
the local GC's to propagate information for them.
(This is a lot like Deutsch-Schiffman deferred reference counting.
In deferred reference counting, the reference counter's view of the graph
of objects is only updated at stack scans. In these distributed GC's, the
distributed GC's view is only updated at local traces.)
In between local traces, the distributed GC's don't notice changes in
reachability due to changes in the local graph. This could cause
performance problems in the face of highly-distributed lists, which
cross a lot of nodes. For distributed RC, it means that even acyclic
garbage can't be reclaimed faster than one distributed link per local
GC. (That is, transitive freeing is delayed at each node until the
next local GC.) For distributed tracing, it's even worse, because
the distributed trace can't complete---and therefore can't reclaime
*any* garbage---until the longest distributed list has been traced,
and the propagation of tracing is delayed until the next local trace
_at_each_node_crossing_.
For these reasons, conventional distributed GC's may have serious scalability
problems, especially if a local GC takes a long time---e.g., if nodes may
have a lot of data in virtual memory, and have to page to traverse it.
A different approach is to use an incremental local tracing GC, which
is always going in the background, and to propagate information for the
distributed trace asynchronously from local GC cycles. This can reduce
the time to propagate information for the distributed trace GC's, by
keeping a local stop-and-collect GC's entire cycle out of the critical
path of the distributed GC. (We built a simple distributed GC for
RScheme that works this way; it's one of the reasons we favor incremental
GC's.)
Another approach is to have a completely separate distributed trace,
more or less independent of the local traces, rather than having the
local GC's propagate information for the distributed trace. This
can lead to extra work, but may greatly shorten the critical path
of the distributed GC. (I believe Jul and Juul's GC for Emerald is
something like this; there may be others.)
Still another approach is to ship data, not just marks. A conventional
tracing GC leaves the reachability graph where it is, distributed
across the system, and the conceptual "marks" of the tracing process
are propagated around the system. Another way to do it is to send
the data to a central service to be traced. On the face of it, this
seems expensive, copying data around the system---but it may greatly
shorten the critical path of a distributed trace.
A refinement of this is to use a _logically_ centralized service, which
conceptually gathers the whole reachability graph together so that
it can be traced locally. This does not actually require having
a copy of the whole graph on one machine, because the logically
centralized server may in fact be distributed. (This is what
Ladin and Liskov's collector does.) It is not clear what the
performance issues are here; if there is a good way to partition
the work among the actual servers, it may work fine. If not, you
may _introduce_ locality problems by separating data that _could_have_
been traversed locally. Much of the data could be replicated, so that
it can be traversed anywhere it's cheap to do so---i.e., where the
connected parts happen to be colocated---but it seems tricky and
dependent on some subtle locality issues that haven't been explained.
A related alternative is to use a data-shipping approach to GC, like
a distributed virtual memory, rather than a function-shipping approach.
It may be faster to page (or otherwise ship) data to the GC than to
force the GC's traversal to crawl around the network tracing the
data. For example, suppose a tracer is tracing a list that ping-pongs
back and forth between two nodes, but which has good locality on each
node. (E.g., you have a few pages of data on each node, but strung
together by links that cross nodes.) If you just page the data onto
one node, you can traverse them in a hurry, and then page them back
as needed. This may greatly reduce the number of message round-trips
in the critical path.
The problems outlined above make me think that distributed GC is
far from a solved problem; for the time being, I don't think you
can "garbage collect the world", just small or maybe medium-sized
networks, preferably without large volumes of data, and preferably
fairly reliable networks of fairly reliable machines. (These
problems are exacerbated by having to handle node failures.)
---