Some notes concening VM :)

Sun Apr 29 21:50:54 PDT 2007

> Well, I can say that for introspection, it is easier to tell what
> selectors a method sends if they are in a separate array, so easier
> to track down a call graph without referring to source code. Granted,
> in the general case, plenty of those literal Symbols are also used by
> sendTo: and variants, or Symbols can be generated at run-time by the
> method, but this makes the base case easier. Also granted, just doing
> "method literals select: [| :each | each is: Symbol]" isn't that much
> harder (although is: is kind of expensive in aggregate). So I
> wouldn't fight much for this one.

as i see the CompiledMethod structure holds a sourceTree object, which
i suppose holds exact answers on what literals is selectors and what
is symbols or something else.
Im just against sacrificing the runtime model simplicity in favor of
analysis, which can be simply accessible through other means.

> > Why interpreter must spend its time to calculate index value (see the
> > PSInterpreter_decodeImmediate function) , when its value already known
> > at compilation time and we already can say if it fits in 1 byte code
> > or 2 bytes or 3?
>
> These are good points, and worth considering, but I think it'd be
> worth some profiling time to know what the real costs/benefits are.
>

My approach is simple: minimalize number of operations required to
evaluate single opcode instruction.The less we have - the more we
gain.

> > The next things is concerning stack organization.

> Yes, these things are all true and your analysis seems sound.
>
> I think the balancing factor which Lee had in mind was the ability to
> share the stack format between the bytecode system and the more
> complex binary compiler run-time. I don't recall his reasoning on
> this, or how well he documented it, however. Perhaps simply by
> knowing this, you might see where his mind was going when he made
> these decisions.
>

Using stack to run both CPU and Slate bytecode is dangerous. In case
of exception it may be too hard to unwind stack to some safe point. If
you know how this can be safely done, tell me. I don't think that its
impossible, it just demands some research.

Also GC at mark phase must know the set of its root objects , this
means that it  must scan stack to determine roots.. And it can be too
hard and expensive to scan stack where real objects mixed with foreign
data.
But! If we use dual stack this can be quite possible. Because each
stack frame have information of its starting position in data stack
and ending position (point where it's locals and pushed values ended,
and thus anything beyond this point is not of GC's interest).
Then we can use data stack to operate on anything we want, we just
need to ensure that we don't exceed its size when executing system/vm
subroutines.

And one more thing: we dont need to check the stack boundaries each
time we pushing something on stack. The maximum stack depth can be
easily calculated by compiler for each compiled method. So by entering
method we just need to ensure that stack is capable of holding this
maximum. This value can be encoded in first bytecode instruction in
method, or just by adding field to the CompiledMethod struct.