[virtmach] VM with garbage collection

Durchholz, Joachim virtmach@iecc.com
Fri, 11 Jan 2002 11:27:43 +0100


> 1. Since I am planning to represent small integers as tagged
>    words, I could represent every bytecode by such a tagged small
>    integer. This means that every opcode takes a full word, but
>    it is very simple for the GC, since it can treat a code vector
>    just like an ordinary array. In fact, in this scheme, a code 
>    vector would *be* an ordinary array.
> 
>    However, it seems somewhat wasteful of space.

You could put several bytecodes into one pointer-sized word and let the
associated pointers follow. 

> 2. Apparently, the Smalltalk 80 VM splits a code vector in 2 parts:
>    a part with the byte codes and a "constant vector" containing 
>    pointers to GCable objects. The bytecode contains indexes into
>    the constant vector. This leads to 1 extra redirection whenever
>    a GCable object has to be found. Moreover, it leads to problems
>    when there are more than 256 constants in a procedure.

Not necessarily. You can use a "constants pointer" that iterates through the
constants array as the instructio pointer progresses.

In both of the above schemes, you need to resynchronize the constants
pointer whenever you branch. IOW every branch target needs a
synchronization. For variant 1, it would suffice to require that branch
targets are at a word boundary. Alternatively, every branch target in the
code could have a reference into the constants array so that the constants
pointer can be reinitialized from it.

Having an extra indirection as for a naive Smalltalkish implementation would
probably be too much overhead.

> 3. One could also directly embed the pointers in the 
>    instruction stream
>    and have a separate table that gives the offsets of all 
>    pointers, for
>    use by the GC. This is more complex and may lead to 
>    pointers not being
>    put on aligned addresses.

Avoid pointers on nonaligned addresses. Most modern processors penalize them
heavily, some even disallow them.
Besides, making GC refer to an extra table to find out which values are
references slows down the mark phase by a considerable amount. This is a
general problem that also applies to data blocks that contain pointer data
and nonpointer data. (You should read on the GC literature to find out
what's the state of research here. Even better, use an existing GC and
tailor your VM to work well with that GC. The Boehm/Demers/Weiser collector
is what's mentioned most often in this context.)

Regards,
Joachim
--
This is not an official statement from my employer.