[gclist] Garbage collection FAQ: distributed systems

Charles Fiterman cef@geode.geodesic.com
Fri, 1 Mar 1996 11:45:49 -0600


> Here is an outline for the part of the FAQ on collecting distributed systems,
> file systems, databases, and caching stores.
> 
> DISTRIBUTED SYSTEMS
> 
> What is different about distributed systems?
>  - messages are costly
>  - asynchrony: slow messages, slow processors, no single global order
>  - failures 
>     * messages are lost, duplicated, delivered out-of-order 
>     * processes fail
>     * can't reliably detect failures
Missing here is that processors may be unavalable. However you can
allow leaks here if you simply do this. When processor A asks processor
B for a pointer to an object processor B remembers that it exists.
 
>  - consistency
> 
> Standard distributed systems algorithms (snapshots, transactions): costly,
>   don't scale, needlessly general.
> 
> Distributed reference counting
>  - the naive approach: unsafe, costly
>  - weighted reference count
>  - reference listing
>  - pro: scales; con: cyclic garbage, no root identity
> 
> Distributed mark and sweep
>  - the naive approach: costly
>  - Hugues algorithm: use timestamps (rather than mark bits); GVT
>  - Ladin and Liskov: centralized GC
>  - pro: fault tolerant; con: doesn't scale

Distributed Treadmill with synchronization.
   - Independent processors do their own gc treating externaly held
     pointers as roots. At synchronization time unavalable processors
     are regarded as keeping all their known pointers.
> 
> Partitioned GC (the hybrid approach)
>  - trace each subset part, count (or list) across boundaries
>  - Network Objects
>  - SSP Chains: fault tolerance
>  - Lang, Queinnec, Piquer (unnecessarily complicated): collecting cycles
>  - con: no root identity, no persistence
> 
> DSM, cached distributed stores (see also: GC of persistent stores)
>  - what is different: inconsistent data, variable caching
>  - key insights:
>     * a subset is not necessarily a process or a set of processes
>     * it is OK to reorder events, as long as creation events are ordered
>       before destruction events that (may) causally depend on the former. 
>  - Larchant
> 
> GARBAGE COLLECTING DATABASES, PERSISTENT STORES, AND FILE SYSTEMS
> 
> What is different:
>  - lots of data on disk, I/O is costly
>  - inconsistent data
>  - transaction rollback
>  - clustering
> 
> Reference counting and listing, mark and sweep: see distributed systems
> 
> Partitioned GC
>  - see distributed systems
>  - selecting the best partition to trace
>  - Naughton
>  - Sprite LFS
> 
> Tolerating transaction rollback and inconsistent data
>  - note the identity between the two problems
>  - Amsaleg
>  - Larchant
> 
> Clustering
>  - EOS
> 
> -- 
> 						Marc Shapiro
> 
> M. Shapiro, INRIA Rocquencourt, BP 105, 78153 Le Chesnay Cedex, France.
> Tel.: +33 (1) 39-63-53-25.  Fax: +33 (1) 39-63-53-30.
> e-mail: marc.shapiro@inria.fr.  http://www-sor.inria.fr/SOR/
> 
>