[gclist] using two types of garbage collectors for scheme interpreter

Ji-Yong D. Chung virtualcyber@erols.com
Fri, 20 Apr 2001 07:31:10 -0400


My basic question is:

    Would it be a good idea to use
both a copying collector and Boehm's
collector for a single Scheme interpreter?
(a general question).

    It seems that a scheme interpreter
uses (1) garbage collected heap almost like
a stack (by constantly creating
objects needed for running it), in
addition to (2) its use of the heap as a
place to create "data objects."

    This suggests that using two types of
memory allocator (and thus two types of
garbage collector) for the interpreter may improve
performance; one type of collector for
"running" the interpreter and another
type of collector for creating long-lived,
data objects (Boehm's collector).

    I just wondered what others
thought of the idea.  Perhaps the
preceding idea is not so good,
given, that Boehm's
collector collects young objects
first.

============================

    Here is some background
information.

    In setting up my baby scheme
interpreter, I have used both Boehm's
collector and a simple Cheney-type
collector, each in turn, to see how
the interpreter performs.

    Because, at the time of testing, my
interpreter did not optimize on tail-calls for creation of
closures (in addition there is no unboxing),
I was creating way too many objects.  (For a
single test run, I was creating 4 million
objects, whereas when I ran another interpreter,
it was allocating only about 400,000 objects).

    The result of all this object creation was that
the garbage collectors' memory allocators
(either with Cheney style collector or
Boehm's collector) were pushed to the limit.
Rephrased, the interpreters spent great
deal of its time allocating and deallocating.

    Cheney's collector spent at best 4 instruction
per memory allocation.  Boehm's collector, from
what I gather, spent about 20 instructions
in its shortest pathlength, to allocate an object.

    Because the interpreter was almost exclusively
creating objects that were immediately garbage
collected, Cheney's collector would be
faster in allocation and garbage collection.

    In other words, short-lived memory
allocation and deallocation, needed for running
the scheme interpreter, (i.e., to pass certain types
of variable values, for creating short lived closures whose
lifespan could not be determined at "compile"
time) seems to be better served by Cheney's algorithm,
despite the fact that Boehm's collector uses generational
collection method.

    Obviously, when a scheme interpreter runs,
it has to create data objects whose lifespan is much longer
than one garbage collection cycle.  If lifespan of large objects
increase, then I can see the copying collector start having difficulty,
because of the amount  of copying it has to do.

    The preceding observations lead to the following
thought.  For running the interpreter, maybe one should
use two types of collectors.  One for just "purely running"
the interpreter itself (use Cheney's collector here) and
another one (Boehm's mark and sweep collector) for
somewhat longer lived objects.

-------------------------------------------

    Is this a good idea??