[gclist] Hughes' algorithm

Paul R. Wilson wilson@cs.utexas.edu
Fri, 15 Mar 1996 02:16:43 -0600


Does anybody out there really understand Hughes' algorithm?  I don't,
or if I do, it doesn't seem correct.  I don't understand the
published descriptions I've seen, Hughes' or anybody else's.

My basic conception of Hughes' algorithm is that really it's a global
incremental tracing algorithm, where you can have multiple traces
going on at the same time.  If this is the right intuition, I don't
see how it can work correctly, due to termination detection problems.

Here's the minus-one'th order version, as I see it. (The fact that it's
distributed doesn't really matter for what I'm getting at; it's just an
incremental trace that happens to be really slow.)

  Start a GC by propagating marks from the roots.

  Propagate marks incrementally until done.  When you're done, anything
  that's not marked is garbage.

  As an optimization, you can start _another_ GC later, using a
  different mark.  Whenever a GC terminates---that is, when there
  are no more of that kind of mark left unpropagated---anything that's
  not marked with its mark is garbage, unless it was created since the
  GC started.  (Or something like that.)  This lets us pipeline multiple
  GC's---we don't have to wait for one to finish before we can begin
  another one.

First question: is this even roughly right?  Second question: is the
trace a snapshot trace, or an incremental update trace?  (I may have
figured that out tentatively, at one time years ago, but if I did,
I've forgotten.)

Now we get to the zeroth-order approximation.

  Rather than spreading multiple completely independent marks around
  the system, we use timestamps.  Each mark we use for a GC is a number
  that's higher than the number used for GC's that were started
  earlier (but which may still be ongoing).

  Now instead of doing termination detection separately for each trace,
  we do a global virtual time trick.  We basically keep track of
  which timestamp is the earliest one that hasn't been propagated as
  far as it will go.

  When we spread marks, we exploit the fact that anything that's marked
  with respect to a later timestamp cannot be garbage with respect to an
  earlier timestamp; if it were garbage earlier, it can't be non-garbage
  later---nothing that's unreachable can become reachable again.
  Rather than propagating each mark separately, we _combine_ the marks
  by considering each timestamp to _include_ all lower timestamps.
  (E.g., propagating mark 4 implicitly propagates marks 3, 2, and 1.)

  If we're propagating a mark (say, 4) and we come to an object that's
  marked with a lower mark, we just clobber the lower mark with the
  higher one and spread the higher one (only) from there.

This is where I notice that I'm seriously confused.   If the above is
approximately right, I don't see how you can ever detect termination.
For example, there may be no unpropagated 3 marks left in the system, but
if there are 4 marks or 5 marks, you can't tell whether GC 3 has actually
terminated or not---since 4 marks or 5 marks can clobber 3 marks, it
may be that there are things that were reachable for GC 3 _and_still_are_,
but the 3 marks have been hidden by clobbering them with 4's and 5's.
You can never be sure that GC 3 has terminated until you finish 4, and
you can never be sure that 4 has terminated until you finish 5... so
you never finish.

It seems to me that the only way to avoid this is to do something that
amounts to NOT clobbering 3's with 4's and so on.  The 4's would have
to push the 3's along ahead of them, and not hide them.  (Or you could
somehow distinguish marks that have clobbered lower marks from those
that haven't---but then what's the point of the ordered timestamps?)

I'd be very grateful if somebody who actually understands this algorithm
would explain the basic intuition to me.  I don't get it from Hughes' paper.