Persistence: a proposal

Scott L. Burson gyro@zeta-soft.com
Thu, 22 May 1997 12:41:03 -0700 (PDT)


As I have mentioned, persistence has been an interest of mine for a long time.
In fact, I spent a good half of 1994 writing a persistent memory manager/OODB.
I got maybe 1/3 of the way through before I ran out of money.  So I've had to
leave the project on the back burner ever since.

Let me sketch for you the design of this memory manager, which is called
ZetaBase.  Being a memory manager, it is designed to be integrated into a
language implementation rather than tacked on from outside; and to be the sole
memory manager for the implementation, so that every object is potentially
persistent.

ZetaBase divides the persistent universe into "segments".  A process can be
accessing any number of segments at a time (subject only to address space
limitations).  Each segment is (normally) garbage-collected independently of
the others; the garbage collector is incremental (using a variation on the
Brooks algorithm [Brooks84]), compactifying, and generational.  Cross-segment
pointers are tracked automatically using reference counting.  Any pointer can
cross segments; there is no source-level distinction between intrasegment and
intersegment pointers.

You've probably never heard of the Brooks algorithm, but the key to it is that
all objects are referenced via an indirection through what has come to be
called a "handle".  This does impose an additional cost on object access, of
course, but a number of object-oriented language implementations these days
use handles (though I don't know of any using Brooks' GC).  Anyway, in
ZetaBase we get many benefits from the use of handles; they really are what
make the whole thing work at all, and we get incremental GC besides.  I think
it's a good trade.

Incremental GC algorithms in general require the ability to trap when a
pointer is read from memory (a "read barrier") and/or when one is written to
memory (a "write barrier").  In ZetaBase specifically, the write barrier is
fairly complicated -- though I've tried to make it fast in the most common
cases -- but the read barrier consists simply of the handle indirection.
Beyond that, I use Unix memory protection calls and SEGV handling to trap
references to segments that aren't mapped in yet, and for a few other
purposes.  I have no actual measurements yet, but I expect the performance to
be pretty good -- not up to that of transient memory managers, but close
enough that the price is clearly worth paying for the vast majority of
applications.

I've made every effort, in designing ZetaBase, to make it as easy to use as I
could.  I've already mentioned that persistence is orthogonal -- any object
can be persistent.  It's also automatic; any object reachable from a
persistent root is automatically persistent.  There is, thirdly, a mostly-
automatic clustering mechanism that attempts to place each object in the most
appropriate segment.  It's not perfect, but I expect it will handle most
cases.  (One can always cluster manually if necessary.)  In short, while I
can't say that ZetaBase makes persistence "totally transparent", I expect that
it will generally be much easier to use than disk files.

Other features of ZetaBase include areas (used to further control locality
within segments), weak locations, provisions for versioning of segments, and
some special code in the GC to maximize locality of copied objects.  Objects
can be manually deallocated (the handle remains allocated, to permit trapping
of attempts to access the object, until the GC proves that no references to it
remain).  I'm pretty sure destructors can be supported.

Downsides: the major weakness of ZetaBase, at least from an OODB point of
view, is the poor support for small transactions.  Not only is locking done on
a segment basis -- extremely coarse-grained -- but committing changes is a
relatively expensive operation.  ZetaBase is designed for small numbers of
large transactions, not the other way around.  I don't think this is going to
be a problem very often, but it does admittedly leave the system short of the
full generality one would like.

It also must be said that, while ZetaBase can be used with just about any safe
language, Lisp is not the very best match as a host language.  First, Lisp
likes to use lots of itty bitty objects (conses, flonums, bignums, etc.), for
which the space overhead imposed by the use of handles starts to become
substantial.  And secondly, as a dynamically typed language, Lisp is forced to
invoke the write barrier on almost all writes to memory; by way of contrast, a
statically typed language like Java can know at compile time that a given
write is storing a non-pointer, which doesn't require use of the write
barrier.

Implementation status: I'm a little unclear on this since it's been a while
since I've worked on it, but as I recall, a lot of the basic things are
working: the GC, intersegment pointer tracking, mapping segments in and
writing them out.  The work remaining to be done is of three major kinds.  One
kind is code to handle various kinds of special cases, such as what happens
when two segments want to use the same memory page.  This kind of code won't
need to reach its full complexity until the system is fairly heavily stressed
-- you have lots of segments mapped into one address space and/or lots of
processes mapping the same segments.  The second kind consists of various
support utilities, notably an "offline" GC (meaning one that operates on
segments not currently in use) that can collect several segments
simultaneously (the only way to collect trans-segment circular garbage) and
perform certain optimizations that can't be done on segments while they're in
use.  And the third piece of work consists of designing and implementing
certain high-level aspects such as the transaction model.

Finally, there is a fair amount of work to do to incorporate ZetaBase into a
language implementation.  The compiler, in particular, has to know quite a bit
about what is going on, and has to inform the GC about the structure of stack
frames and any other stuff it creates.  If threads are to be persistent, which
I think they should be, this has to be designed in as well.  The combination
of handles, multithreading, and incremental GC poses a nasty little problem,
which is that an object can get moved between the instruction that loads its
actual address from the handle and the instruction that uses that address;
the compiler and runtime have to guard against this.

As for operating system requirements, ZetaBase requires at the very least a
lower layer that provides the various memory mapping primitives it uses; any
modern Unix qualifies.  I think, though I'm not entirely sure, that there are
additional benefits, both semantic and performance-wise, to be had by running
on top of a Mach-based system like MkLinux.  It's my hunch that there are yet
additional benefits to be gained by running on top of the Hurd, but I haven't
looked into this.

By now you may be wondering whether I'd be willing to let LispOS use ZetaBase.
Here's the situation.  I have some AI research I want to do which requires a
high-performance persistent Lisp with firstclass continuations.  A persistent
version of T, the Yale dialect of Scheme, would do nicely.  However, I won't
have time in the foreseeable future to build this myself (alas; it would be
fun).  If someone out there is willing and able to do it, I will make the
ZetaBase sources available, under the condition that I reserve the right to
charge a license fee for commercial use.

Now frankly, I think this would be an excellent plan for at least the first
iteration of LispOS (you might well want to layer CL on top of T, but that
shouldn't be too hard).  I will explain why in a subsequent message.  For now
I just want to make clear that this is something that would benefit me and
which I would therefore be willing to support in this way.

I would be happy to explain more about how ZetaBase works; I've tried to keep
this message relatively brief, at the risk of making it too brief.

-- Scott