Some notes concening VM :)

sig siguctua at gmail.com
Sat Apr 28 19:25:31 PDT 2007


Hello , i'm new here and have particular interests in developping a Slate VM.
I didn't decided yet if i will participate in this project in nearest
future, so i just want to discuss some of my ideas with you for now.

Concerning OMM (object memory model) and stack organization.
I'm refer to tables which you can see on this page:
http://slate.tunes.org/doc/mobius/

5.1.1 stack format.

Table 3. Block format.

I see little reason why selectors must be kept apart from literals.
The only place i see where they handled differently is at
interpretation stage to use different opcode 'Load Selector'.
If there's no extra processing except that, we can merge selectors
with literals slot and remove an opcode 'Load Selector'.

This is the code that shows that they handled in same manner..

static INLINE void PSInterpreter_loadLiteral_(struct Interpreter * i,
unsigned long int n)
{
  PSInterpreter_stackPush_(i, (((i -> method) -> literals) -> elements)[n]);
}

static INLINE void PSInterpreter_loadSelector_(struct Interpreter * i,
unsigned long int n)
{
  PSInterpreter_stackPush_(i, (((i -> method) -> selectors) -> elements)[n]);
}

Maybe i don't see potential optimizing patterns which may utilise such
distinction? Please let me know.

Same about 'Push Integer' opcode. Encoding integer constant in
bytecode may be somewhat faster than extra lookup in literals array,
but only in cases when your constant between 0 and 14, otherwise it
will require extra processing: loading next byte from opcode array and
adding its value with 15, also checking that its higher bit is 0...
this extra processing may be even more costly than simply loading
literal from literals array.

You may say that merging all these opcodes to use single literal array
will lead to higher number of literals in methods, which in turn leads
to higher probability having a literal index greater than 14 in
opcodes, and thus, will increase overall time of instruction
processing as well as bytecode size.
But we have 2 unused bytecodes, yes? Then why we can't reuse them to
load literals with index higher than 14 , which are already do not fit
in single byte opcode.
So, for high literal indexes we can introduce Load Literal2 , and even
Load Literal3
- first takes index from next byte of instruction stream, second takes
next two bytes from instruction stream for insanely big indexes higher
than 255.
Both of these instructions can be placed under single opcode index but
with different Immediate value, which will instruct interpreter how
many bytes in instruction stream is reserved for index.
Honestly i'm not very care about compact bytecode representation, i'm
care about speed and uniform data organization.
Using immediate value in opcode which if > 14 must be added to next
byte requires additional processing - adding 15 to next byte instead
of just loading next byte(s) and use them directly as final index
value.
I'd rather use 4 bits of 'Immediate value' for indicating exact number
of bytes in bytecode stream reserved for index value instead of using
immediate value directly and also checking following bytes for higher
bit set (0x80) and so on.

Why interpreter must spend its time to calculate index value (see the
PSInterpreter_decodeImmediate function) , when its value already known
at compilation time and we already can say if it fits in 1 byte code
or 2 bytes or 3?


The next things is concerning stack organization.

For those of you who knows the needs of Slate better than me, may find
this unsiutable or less effective or more complex. But first, read to
the end. :)

Lets review the Frame Format (in Table 2)
The need in stack frame is the need of tracking current context,
return point, current instruction pointer and previous stack frame
where we must return from local return.
Other values on stack, like arguments and locals is pushed by opcode
instructions and they number depends from current execution point and
context e.t.c.
But frames always have fixed number of variables.
At some moment i asked myself: what if organize separate stack for
frames? Can it improve an execution speed as well as reduce complexity
of some sort?

So, lets divide our stack on two parts: in first part we store all
variables used by methods/blocks while in second we store only stack
frames, which have fixed sizes.
First that is evident: we have much simplified methods for traversing
between stack frames - since they having fixed size we can traverse
the frames stack by simply adding/subtracting contstant value from
frame stack pointer.
Second: we don't need a Pop instruction in bytecode. Pop is used to
remove values on stack when we returning from method/block closure,
but our stack frame already knows its return point, so by returning
from method we discarding all values up to point which stored in
frame's return point. Simple.
Third: accessing lexical scope for nested blocks and/or checking
expired context requires much less processing.
Fourth: most variables which responsible for current interpretation
state can be moved from Interpreter struct into Frame struct leaving
only pointer to current Frame and then,
we don't need to reload these variables each time we enter/return from
method. We simply initialize them for the new Frame on entering and
decrementing Frame pointer on exiting method. No extra processing.

--- there are so many things left to discuss, but i must go sleep :)

I hope my ideas was worth reading them :)

regards,
sig ( Stasenko IGor )



More information about the Slate mailing list