[virtmach] VM with garbage collection

Stephan H.M.J. Houben virtmach@iecc.com
Fri, 11 Jan 2002 11:02:50 +0100

Hello list,

I am trying to build a VM for a garbage collected language.
The byte code will typically contain many references to
garbage collected objects. This means that the GC must be
somehow able to distinguish bytecode instructions from
pointers to GC-able objects.

Some approaches I have considered:
1. Since I am planning to represent small integers as tagged
   words, I could represent every bytecode by such a tagged small
   integer. This means that every opcode takes a full word, but
   it is very simple for the GC, since it can treat a code vector
   just like an ordinary array. In fact, in this scheme, a code 
   vector would *be* an ordinary array.

   However, it seems somewhat wasteful of space.

2. Apparently, the Smalltalk 80 VM splits a code vector in 2 parts:
   a part with the byte codes and a "constant vector" containing 
   pointers to GCable objects. The bytecode contains indexes into
   the constant vector. This leads to 1 extra redirection whenever
   a GCable object has to be found. Moreover, it leads to problems
   when there are more than 256 constants in a procedure.

3. One could also directly embed the pointers in the instruction stream
   and have a separate table that gives the offsets of all pointers, for
   use by the GC. This is more complex and may lead to pointers not being
   put on aligned addresses.

4. As in 3., but observe that if the GC understands the instruction format,
   it doesn't need the table but can find the pointers itself. This is very complex.

I am leaning towards method 1. Any ideas if I'm going to lose big on space or time
efficiency? Any better ideas?