UVM and Microsoft
Michael Korns
mkorns@ix.netcom.com
Mon, 19 May 1997 14:59:47 -0700
John,
> When the code can run in-line, the argument passing is
> no longer an issue. Many other logistical issues arise
> however, such as how we store the implementation -
> and easier ways to produce the implementation than
> using assembly.
>
> Each 32 bit VM opcode is broken into four bytes as follows:
> opcode inlineArgModifier1 inlineArgModifier2 inlineArgModifier3
> <<
>
> What is your meaning of "inlineArgModifier" here? Do
> you mean a definition of how the arguments are
> used in terms of when the actual instruction is
> invoked?
These implementation details are relevant to fast VM emulation only. For
instance, the following hypothetical byte encoded VM instruction has one
in-line argument:
pushFrameInteger source
The above hypothetical byte encoded VM instruction is represented by the
following hypothetical byte stream at VM emulation time:
A2 0E
The VM emulator (hypothetically written here in C) appears as follows:
int* VMRegister[_REGISTERCOUNT]; /* Array of VM register pointers */
int* Ip;
/* VM instruction pointer */
switch (*(Ip++))
/* Jump to the proper VM instruction */
{
case pushFrameInteger: /* Push Frame Integer onto Stack. Frame
register displacement is an inline argument. */
*(VMRegister[_STACK]++) = VMRegister[_FRAME][*(Ip++)];
break;
}
As we discussed in our previous email, word encoded VM's emulate faster
because it often takes 4 byte encoded VM instructions to do the work of one
word encoded VM instruction. Of course, word encoded VM instructions have
more in-line arguments plus argument modifiers
> ie. could an arg modifier dictate that the
> arg is placed into a certain register? No?
Yes, that is exactly correct. Or, displaced off a certain register.
> And for added dynamics, perhaps even the ability to
> dynamically change instruction sets on the fly. There's a good
> reason for this - in a component-based environment, one
> component may be written in one language (ie. one model)
> and another component in another language (ie. another
> model).
Our current thinking is tending toward a VM for each Agent. This is not
very different from current thinking back in the LispOS group. They are
talking about VM's being first class LispOS objects. Maybe it is time for
both groups to attempt our first formal information exchange? How do we do
this?
AgentBase agents are cute little things which carry there own genomes and
self descriptions with them as they are distributed from one database
server to another. It would be relatively simple to add genetic information
about the Agent's VM, and JIT compiler to the current AgentBase agent
structure. This would allow a seperate VM for each child agent or for each
method of a class.
> >> You mentioned translation tables. I prefer rule base
> semantic analysis production systems. Since the database query language
in
> AgentBase is Lisp, this is not a problem for us. It is easy to construct
> rules recognizing patterns of instructions to be translated into native
> binary bit streams. <<
>
> Can you expand on this? Would it handle
> storage optimisation? ie. if we wanted to produce our own,
> fully optimised, version of JVM bytecode spec - would
> it handle this?
AgentBase is an agent-oriented database server. It is designed for very
high volume databases, 1 million records up to 100 million records per
table. Recently research into clustered hardware configurations would
support up to a billion records per table and more. The database "query
language" is a flavor of Lisp which has been modified/extended to support
agent-oriented computing and database server activities in general. Agents
are expected to migrate from one server to another, across the Web, to
perform their analysis.
Because of the cheap availability of Lisp on the server-side, we routinely
use server agents which are chart parsers, forward rule production systems,
symantic analyzers. We just invoke a specialized agent from the database
agent library. If one is not available, it's relatively easy to write
specialized server agents for such purposes.
> Could it handle diverse program-flow
> definitions? Such as adding specific extensions for
> large C style switch constructs? Or an event based
> program?
Yes, yes, and yes.
I am about to look at another approach to VM's known as Juice
http://www.ics.uci.edu/~juice/? . Let's talk further. This is getting
exciting.
----------
> From: John Wood <tenshon@msn.com>
> To: Michael Korns <mkorns@ix.netcom.com>
> Cc: Bill House <bhouse@dazsi.com>; Gilda Cabral <gcabral@dazsi.com>;
lispvm@math.gatech.edu
> Subject: RE: UVM and Microsoft
> Date: Saturday, May 17, 1997 9:21 PM
>
> Michael, thanks for the reply. I've added some comments below -
>
> >> Our AgentBase Virtual Machine (ABVM) and the Java Virtual Machine
(JVM)
> deliver highly efficient compiled code precisely because the VM
> instructions are constrained to a static set of low level instructions.
In
> fact these low level instructions are precisely those which modern CISC
> cpu's can handle natively.<<
>
> May I add that if this UVM is actually "Universal" - it should be
> targetable to an ultimately universal range of processors.
> These processors may or may not be similar to our current
> CISC processors, and the target processor should be
> chronologically independent (if that's a valid term). By looking
> at the abstract definition of a processor, we can come up
> with something which meets these requirements not just
> today, but potentially for many years to come.
>
> >> A problem arises when we try to extend a word encoded VM with object
> methods calls. When we do this we get a degradation in run time speeds
> under emulation but not after compilation. This is because function and
> methods arguments cannot be handled in-line. <<
>
> You're assuming that the method will not run in-line. If
> it didn't run inline, then the JITC speeds would degrade
> to that of emulation because of the stack based calling
> for each instruction.
>
> We should be able to offer the ability to have an object-
> interface-orientated dynamic instruction set, and yet
> still retain the ability to effectively copy the code so that
> it runs in-line (in the case of JITC), or just stack-invoke
> it (call) in the case of emulation. Of course, in-line may
> not always be available - but it should certainly be an
> option.
>
> When the code can run in-line, the argument passing is
> no longer an issue. Many other logistical issues arise
> however, such as how we store the implementation -
> and easier ways to produce the implementation than
> using assembly.
>
> >>Each 32 bit VM opcode is broken into four bytes as follows:
> opcode inlineArgModifier1 inlineArgModifier2 inlineArgModifier3
> <<
>
> What is your meaning of "inlineArgModifier" here? Do
> you mean a definition of how the arguments are
> used in terms of when the actual instruction is
> invoked? ie. could an arg modifier dictate that the
> arg is placed into a certain register? No?
>
> >> So it would seem simple to use your idea of extending the word encoded
VM
> by supplying loadable emulators (a new Class for each emulator i.e.
> instruction set). Then we get the following: extendedOpcode source1
source2
> target <<
>
> And for added dynamics, perhaps even the ability to
> dynamically change instruction sets on the fly. There's a good
> reason for this - in a component-based environment, one
> component may be written in one language (ie. one model)
> and another component in another language (ie. another
> model). This means that we may potentially have 2 or more
> copies of the VM going with 2 or more instruction sets loaded.
> Rather than doing it this way, we could have 1 copy of the VM
> loaded and both optimally and efficiently swap instruction sets
> with minimal overhead. I'd suggest a "context" instruction
> which sets the instruction-set class.
>
> btw, in your model - are the number of arguments
> constrained by design?
>
> >> You mentioned translation tables. I prefer rule base
> semantic analysis production systems. Since the database query language
in
> AgentBase is Lisp, this is not a problem for us. It is easy to construct
> rules recognizing patterns of instructions to be translated into native
> binary bit streams. <<
>
> Can you expand on this? Would it handle
> storage optimisation? ie. if we wanted to produce our own,
> fully optimised, version of JVM bytecode spec - would
> it handle this? Could it handle diverse program-flow
> definitions? Such as adding specific extensions for
> large C style switch constructs? Or an event based
> program?
>
> >> Do we try to compile any language to all VM's? <<
>
> A compiler will output instruction-set specific code, or
> more accurately - code which reflects the model it is
> describing. It would be both inefficient and nonsensical
> to even attempt to compile one language to a foreign
> model. However, if we have 2 VMs which have similar
> models (eg. Java and Visual Basic) - we could use
> the VM to reverse-map instructions and perhaps use
> the information for optimisation (do you see where
> i'm coming from here?).
>
> >> Do we restrict
> compilation to language and VM pairs. If so, do all VM's have to be
> synchronized to have the same
> VM instruction (table, object, etc.) swap opcode? <<
>
> Can you explain the above a bit more?
>
> >> should I take more LSD? <<
>
> If it involves adding a "tangerine" instruction, then no.
>
> Thanks,
>
> John
>
>