[gclist] Instanteneous GC

Jerrold Leichter jerrold.leichter@smarts.com
Thu, 29 Jul 1999 15:53:18 -0400 (EDT)


| 2. Faster implementation: Assign each object a positive number, starting
| with 0 at the root.  For each object, maintain for lists of references:
| (incoming, outgoing) x (to lower numbered objects, to higher numbered
| objects).  Enforce a rule that every object always has at least one incoming
| reference from a lowered number object.  Whenever a reference changes that
| breaks this rule, move the offending object to the end of the list (make it
| the highest numbered object), and readjust the other objects--which may
| cause them to recursively fall to the bottom of the list.  In this process,
| some objects that land at the bottom of the list will have no upward
| references, and those objects have immediately become garbage.  The overhead
| of this is (on a 32 bit machine) 8 bytes per object + 24 bytes per reference
| variable.  This has a lot of O(n^2) performance cases, for example many
| double-linked list implementations, but I think the algorithm can be
| restructured to never be worse than O(n).  Still this is impractically slow.

Fascinating.  You've come up with an algorithmic description of an approach
I've thought about for years, but which doesn't seem to produce any usable
implementations.

Imagine the GC problem as follows:  We have a room with a bunch of hooks on
the ceiling (the root set).  We also have a supply of S-shaped chain links
on the floor (the free set; each chain link is a node).  The (mutator's) sole
operation is to take any existing chain link; unhook it from zero or more of
the links it's currently attached to; and then hook it to zero or more other
links or hooks.  (Note that as written, this allows the mutator to hook free
links to each other, leaving them still free; and it can "allocate" multiple
links at the same time.  You could tighten the rules to eliminate these
operations, but they don't actually change anything.)

Needless to say, we assume lack of friction (i.e., links are only hooked to
each other when we say so - they don't hang in the air due to tangles) and
instantaneous transmission of forces.  The former is trivial in the software
translation, but the latter is the real nub of the problem:  If we have a
chain of n hooked links, the first hooked to a ceiling hook, and we unhook
that one from the ceiling hook, the bottom link can't begin to fall until
a wavefront (moving, nominally, at the speed of sound in the link material)
reaches it.  In software, this means that discovering that the chain is
garbage takes time linear in its length - no real surprise, if we contrast
this situation with the same chain, but now looped back so that both ends hang
from the ceiling hooks - and then remove on ceiling hook attachment point.

Mr. Sweeny's counts of in and out sets amount to keeping track of the force
being exerted on each link.  (In fact, you can see that what matters is only
the difference between the number of uplinks and downlinks - when the downward
force on a link exceeds the upward force on it, it's "garbage".)
 
| So, my question is: is there any reasonably efficient way to do
| instantaneous garbage collection, using a write-barrier function that can
| immediately determine that objects have become garbage?

I don't see how any algorithm can possibly do better than linearly in the
length of the longest chain of references.  If you can bound that - well, for
one thing, you'd be outlawing cycles, and then reference counting gives you
what you want!
							-- Jerry