(course on) garbage collecting

Francois-Rene Rideau rideau@clipper
Mon, 26 Dec 94 2:54:23 MET


Dear Joyous Tunespeople,

   First of all, I wish you a Merry Sir Isaacmas (the 25th december 1642,
Sir Isaac Newton was born).

   Now, I shall again remind you the problem of garbage collection.
This is a foremost problem; it is compulsory to have one in any dynamic
system (which our system should be).
   Persistency does not mean anything without a garbage collection policy,
and any trusted module must be garbage collection aware. We *must* provide
standards for this, and it will most surely be a very basic module in the
system (what Mike may like to call a kernel; but the only kernel I reckon
in a FORTH system is the __NEXT__ macro !).
   Believe me; GC is a most important feature for our system. It is what
distinguishes a high-level-able system from a low-level-only one. See C,
FORTH, and so on. Even newest C++'s will have to include a GC !

   What is a GC, and how does it work ? First let me refer to the SICP
book, or any LISP-based or LISP-like system (including Smalltalk, or the
HP-28/48 series of calculators). All such systems manipulate objects.
Objects may be manipulated, duplicated, and forgotten; and unless all
objects are read-only and you have enough memory so you can afford actually
duplicating the full binary representation of every duplicated object,
you must duplicate only pointers, and thus have a problem when it comes
to free an object's resources, which you have to do if you want your system
to be persistent, i.e. to run indefinitely without needing to reboot, and
perhaps surviving accross reboots.

   So here is what finally appears to me as a course in garbage collection
for those of us who never heard (or thought) about it. Search for [!!!]
to find TUNES-specific things.


   The most stupid method, that comes to any mind (and that's how many
industry-standard software do it) is to just have a "reference count" on
each object, that you increase when creating a new independant set of
pointers on the object, and decrease when destroying such set. When the
count reaches zero, you just delete the object from memory. Such method *is*
stupid, and does not solve the problem. Firstly, there is the swiss cheese
syndrom that arises: the allocation-deallocation scheme quickly leads to a
memory with a lot of tiny "holes" of free memory, so new objects cannot be
allocated even though there can be a lot more than enough of total free
memory available. Secondly, unless objects contain only data, you must also
track down all pointers in a destructed object to decrement the count of
any object pointed at by the destroyed object (which is the why of destructor
functions in C++: free a destroyed object's resources). Last but not least,
even if you do have such destructor procedure, you have a resource leak when
recursive objects come to be created: whenever some object A recursively
contains a pointer to itself, it becomes unremovable, even though nobody
may use it; with time, such objects (called pointer cycles) accumulate and
eventually fill memory with no way to repair; this is not bearable in a
persistent system, and unavoidable if the user is free.
   Those problems are not specific to the reference-count method,
and arise in all GCs. Each problem alones leads to tons of complications:
problem #1 raises the problem of allocating only aligned fixed-size
blocks, or moving allocation blocks and updating pointers; problem #2
alone notably raises those of identifying all pointers inside an
allocated object, and finding out the allocation block corresponding to a
given pointer value; as for problem #3 it raises the problem of a
possibly long global (vs. inefficient local) garbage collection
mechanism, and thus the real-time response of the system (even though a
global system may *reduce* total time spent about memory management, it
may greatly *increase* time for an *atomic* memory transaction, when a
garbage-collection cycle occurs). Please think about it longly before
emitting any remark concerning pointers, memory management, and GC.


   Then, let me describe the two GC algorithms I know that solve problem
#1 (unless you want only fixed-size no-multiple-contiguous block allocation).
In both cases, you walk through the structure of useful objects in the system.
You begin from a "root", and whenever an object is reached, you know the object
is useful, and continue with all sub-objects.
   The first algorithm is mark&sweap: first you walk through the set of
useful blocks, and mark them to differentiate them from useless ones;
then, you remove the useless objects from the heap together with moving
down useful objects; *but*, you must also update pointers, so you must
first simulate the move to calculate the new addresses of blocks, and
re-walk through the useful objects to update pointers *before* to actually
move the objects. This method has the advantage that it works in constant
memory place and wastes no memory (well, you still need reserve a stack
at most twice as deep as the number of objects in heap -- easy: let the
stack grow down at the same time as the heap grows up). *But* you must
walk through the useful objects *twice*, and you must run all the allocated
objects once (to compute new addresses). That's why it's used in small
 systems (particularly the HP28/48 series of pocket calculators), or
when you expect the useful objects/useless objects ratio to be high.
   The second algorithm is "stop&copy": there is only one useful object walk,
and only useful objects are walked. You must have two object zones: a source
zone where objects originally are, and a destination zone, where objects are
copied; Whenever you reach a useful object, either it has already been moved,
in which case it will point to its new location in the destination zone, and
you replace the pointer value for this object by the pointer value in
destination zone, or it hasn't been yet, in which case you allocate space for
it in destination zone, and copy it inside with proper values
You still need a stack to walk (or you may again simulate it easily by
reserving some two extra words on each heap-allocated object; but if you
have VM & caching, an external stack seems faster to me).
After the GC, the two zones are simply logically exchanged, and the
destination zone for current GC will be source zone for next one; virtual
memory makes this very easy, and if you can't afford using only half of
the address space (because it maps 1:1 to physical memory), it is easier to
use disk as a (cached) destination zone (note that only sequential access
is done, so a tape is enough !), which is exactly how virtual memory actually
works, and is also how early LISP systems actually worked !!!
This algorithm seems greater and easier in all respects than previous one
to me.


   So, I had a discussion with Mike about GC; the unavoidable requirement
for a GC is that at any PAUSE and/or call to a GC-aware memory manager,
the system has some easy way to differentiate pointers from other data, to
solve problem #2. So, either we spill enough registers to separate pointer
registers and stacks from integer/data registers and stacks, or we have to
tag every single register/stack value (and thus lose a bit or more per
word) to make the difference.
   At first, the first way seems conceptually the simplest, and that may
be how future C++ implementations will do it. But this also implies many
registers on the processor, and a lot of stacks (and thus stack limit
problems); which may prevent porting to simpler (8 bit) architectures, or
even to the i386 (which has got only 8 registers). The opposite concept
(differentiating values, not containers) allows having only one stack of
mixed objects, simplifies everything (and particularly GC code), and still
allows efficient compiling toward the previous concept if you prefer, while
the opposite compilation produces inefficient code.
   I'm convinced [!!!] -- but I do not know enough about actual program
profiles to be *sure* about it, so you can still make me change my mind
if you have good arguments -- that differentiating by losing a bit in
every pointer-size cell is the cheapest way to go: on most architectures,
pointers must be aligned anyway so this bit is no loss to them; as for
integers, 15 or 31 bit is large enough for most everyday-life cases, and
multi-word is still possible, while objects in memory (or transiently on
stack) can very well be 32 bit if *really* needed. Note that the
restriction may hold only for on-stack objects, while other objects in
the heap may or may not follow it.


   Be aware that in both cases, all structures on GC-able heap must be
properly tagged, so you can identify easily where pointers are inside it.
A full descriptor is needed for every block, and calling a user-defined
descriptor, and/or decoding it, is a very time-costly operation, and also
a code-hungry one. It may quickly lead to a very huge system with a lot
of GC-related code. If there is a quick way to differentiate pointer from
data value, the object is its own descriptor, and decoding it becomes
trivial and quick. In both cases, an accelated descriptor could consist in
the number of pointers in the block, all pointers being just after the
beginning of the block. You may also use additional tagging information
inside pointers to ease object description, using again bit patterns to
identify object type, and/or a dynamic BIBOP (BIg Bunch Of Pages) to group
objects into sets of objects having the same GC management, to ease code
dispatch together with reducing description overhead.
   In both cases, you need identify the block pointed at by some
pointer value. So either you allow "infix" pointers, that may point anywhere
inside the block (in any case, *never* before or after the block, as may be
used in some "optimizations"), and face the tremendous cost of determining
the beginning of the block each time the GC mechanism deals with a pointer;
or you restrict (unless in transient state, of course) pointer values to
beginning of blocks -- so to simulate C pointers in some native (as opposed
to unix/dos/whatever-emulation-in-a-VM-box) C implementation with pointer
arithmetics, you'll need have a "segment:offset" thing :). Again, the more
restrictive case leads to quite simpler, easier, smaller implementations,
at the cost of a few possible optimizations that can be done by a compiler
back-end anyway.


   Binary code, with relative jumps, unaligned and/or masked pointers, etc,
is a special case of very difficult structure to manage with respect to GC,
particularly as we want efficient code with inlined access to objects [!!!]:
we want an access to a memory object to be just a MOV (or whatever)
instruction, a function call to be just a CALL instruction, etc. Now,
what to do when binary code is moved due to a GC ? Either we set a
fixed-sized (well, more exactly a don't-cross-the-alignment) page
allocation for binary code, and then, (assuming we have a large enough
static code address space, separate from data address space), we never
need move code, but need have static code, and page granularity. Or we
allow dynamically evolving code (because of, e.g. on-the-fly code
compilation [!!!]), and then we do need code moving. Note that
the two systems can co-exist on large systems, and particularly that real-time
code, that must live outside a lagging GC, will use the first model.
   Because objects migrate, code inlining may become obsolete. So I proposed
that according to the concerned code, and instead of updating obsolete code
(whether because of obsolete pointers or obsolete inlines), source code were
conserved, and the object recompiled on-demand. Code would be invalidated by
replacing it by a stub. Some code may be updated anyway though. This
process could be generalized to any kind of object; when memory is tight,
instead of swapping or migrating an object, it might as well be forgotten
and replaced by some lazily evaluated procedure that would recompute it (if
you're sure it won't take too long a time, e.g. do not forget results from
long computations). Actually, the more you study the problems, the more you
see how deeply garbage collection, persistence, and migration are linked
topics.
   Code also raises again the problem of infix pointers. If you forbid
infix pointers, you must ensure that every return/continuation address that
may reside in a stack and may be GC'ed or migrated is valid w/ respect to
GC. Or you may selectively allow infix pointers for code alone (or some say,
for some objects, among which code), with a bit pattern to recognize them.
Well, code sure earns having some distinguishing pointer pattern (and/or
bunch of pages), because of the difficulties it raises. I'm not completely
decided with respect to these problems. Any help welcome. This *is* a serious
and complex problem.


   Well, enough for today. I await your remarks. But please read all the
message. This is a *serious* and *complex* problem, yes sir, and you're
all directly concerned, as anything you do, a GC must be aware of, and
the most simple GC we use, the more efficient we can make it.


P.S.:
   I also met Marc Shapiro at french INRIA, who manages a research team about
Distributed OSes, and explained to me the difficulties of a distributed
garbage collector, and how his leading-edge technology GC worked. He proposed
that I do my PhD thesis in his team, but I was not so sure, and he gave me
pointers to other teams too.
   The point about distributed GCs is that if the net is large enough
(e.g. world-wide), you can't achieve the necessary synchronization for a
full GC at reasonable cost. That's why you can have only a conservative GC
(i.e. wide enough cycles are not destroyed). You must also have some proper
protocol to deal with asynchronism, and recover from host/line crashes, etc.
He wrote his GC in C++ for a DSM (Distributed Shared Memory). I'm convinced
that both C++ and DSM are too bulky and costly for us to choose if we can
(and we actually can), and he used that both because those were what was
actually used in nowadays software ):, and as a worst-case scenario (-8.

P.P.S.:
   I hope "walking through" was the right idiom.

P.P.P.S.:
   I wish you a merry X-mas too !

P.P.P.P.S.:
   Arg ! I post a bit too late for those !

--    ,        	                                ,           _ v    ~  ^  --
-- Fare -- rideau@clipper.ens.fr -- Francois-Rene Rideau -- +)ang-Vu Ban --
--                                      '                   / .          --
MOOSE project member. OSL developer.                      |   |   /
Dreams about The Universal (Distributed) Database.       --- --- //
Snail mail: 6, rue Augustin Thierry 75019 PARIS FRANCE   /|\ /|\ //
Phone: 033 1 42026735                                    /|\ /|\ /