[virtmach] VM with garbage collection
Stephan H.M.J. Houben
Fri, 11 Jan 2002 11:02:50 +0100
I am trying to build a VM for a garbage collected language.
The byte code will typically contain many references to
garbage collected objects. This means that the GC must be
somehow able to distinguish bytecode instructions from
pointers to GC-able objects.
Some approaches I have considered:
1. Since I am planning to represent small integers as tagged
words, I could represent every bytecode by such a tagged small
integer. This means that every opcode takes a full word, but
it is very simple for the GC, since it can treat a code vector
just like an ordinary array. In fact, in this scheme, a code
vector would *be* an ordinary array.
However, it seems somewhat wasteful of space.
2. Apparently, the Smalltalk 80 VM splits a code vector in 2 parts:
a part with the byte codes and a "constant vector" containing
pointers to GCable objects. The bytecode contains indexes into
the constant vector. This leads to 1 extra redirection whenever
a GCable object has to be found. Moreover, it leads to problems
when there are more than 256 constants in a procedure.
3. One could also directly embed the pointers in the instruction stream
and have a separate table that gives the offsets of all pointers, for
use by the GC. This is more complex and may lead to pointers not being
put on aligned addresses.
4. As in 3., but observe that if the GC understands the instruction format,
it doesn't need the table but can find the pointers itself. This is very complex.
I am leaning towards method 1. Any ideas if I'm going to lose big on space or time
efficiency? Any better ideas?