Universal Virtual Machines

Ian Piumarta piumarta@prof.inria.fr
Sat, 28 Jun 1997 03:36:04 +0200


Mike,

I'm not sufficiently familiar with your intended approaches to be able
to say anything very concrete about them.  Instead I'll offer some
random (and subjective) thoughts on the issues.

> we are deliberating between two choices: create a UVM and port
> SmartLisp, Java, and Squeak to the UVM; or port the server-side
> portions of the Java and Squeak VM's into the AgentBase database
> server.
...
> Do you have any opinions on the matter that could save us grief
> later?

I'm afraid I have no easy answer about which approach would save you
the most grief.  I think it depends on what sort of grief you want to
avoid.

If I understand your goals correctly, you want to be able to download
server-side "agents" from clients written in several languages, in
bytecoded form, for execution in the server.

Porting the server-side portions of various VMs would undoubtedly be
the fastest route to interoperability.  The major challenges would be
environmental, rather than VM-related.  Smalltalk agents would expect
a significant amount of infrastructure (mainly a subset of the regular
class hierarchy) to exist in the server.  A useable subset of the
Smalltalk image would have to be identified, and bolted into the
server.  In addition to that, a de-facto "standard" would have to be
created for serialised Smalltalk code and data in the agents.  Java
and Lisp would be less trouble in this respect for various reasons:
neither of them take objects and reflectivity to the extremes that
Smalltalk does (so a smaller instrinsic environment is needed to
support them); Lisp has (or can have) much less of an initial
environment in which to execute code; and Java already comes with
standards defining the basic environment and agent (applet) format.

The VM-related work, beyond the initial porting of the bytecode
interpreters, should be limited to interoperability between these
languages.  The difficulty of that depends on how transparent you want
to make the interaction between an agent and the underlying database,
and whether you also want to support interoperability between software
components written for different VMs.

None of the above should be an enormous amount of work, but the
"grief" is that a significant amount of the same process must be
repeated for each VM that you decide to support in the server.  (And
who's to say if Yet Another Internet Language isn't just around the
corner?)  The final artefact could be made to work like a UVM (but
limited by the amount of effort available to implement each specific
VM) if UVM state is compatible between VMs, and the execution context
knows which VM is appropriate for executing a particular activation.

I don't think there's much to choose between the above approach and a
UVM in the IBM/Taligent sense---having a single opcode set designed to
support a fixed set of VMs (which is my inference based on the flimsy
statements they've made about it).  More work is necessary to
synthesize the "common instruction set", and to create an object
memory supporting all of the targeted languages.  Such a UVM would
still have to translate between the "natural" format of objects for
each VM and the object model used by the UVM (or this could be done in
the client when packaging an agent), but interoperability would be
cheaper in the executing code where only the single UVM object model
is being used.  The "grief" would appear when deciding (later) that
another VM should be supported: it might be difficult to accomodate
that VM in a UVM designed to support a particular set of other VMs.

There are at least two approaches to this kind of UVM.  The first,
which is what I think you imply when you say "porting X, Y, Z to...",
is to rewrite the compiler/debugger/etc. of language X to use the
common bytecode set (and object model) of the UVM.  The second is to
keep the bytecodes and object model of X, Y and Z, and to dynamically
translate them into a single, common executable representation in the
UVM at runtime.  Both require significant work, but I suspect the
latter is the better approach, since some kind of dynamic translation
is probably required anyway to achieve adequate performance.  The
disadvantage of the latter approach is that you need to implement the
transformation of execution state in both directions (at least for
Smalltalk, where execution contexts are defined by the language and
the debugger is written entirely in Smalltalk---but this also depends
on just how much of the Smalltalk development stuff you want to
support directly on your UVM; if you're looking for an "execution
only" UVM then the problem goes away).

However the UVM approach would be a *lot* more work, to do it "right".
You might not agree, but my opinion is that a UVM should be built
without a fixed set of supported VMs in mind.  Rather, it should be
extensible (so adding new VMs requires a minimum of effort) and
provide seamless interoperability between those VMs.  (Who wants to
rewrite a software component N times for N different supported
languages?  Better to spend N times the effort on a single
implementation and make it N times more robust, etc., then share it
between all possible clients.)

The "grief" in this approach is significantly larger, but almost
entirely "up-front".  It involves designing a generic and extensible
UVM instruction set and object model, finding a single abstration over
the native OS services for use by VM implementations, defining a "meta
language" for VM (and agent format) descriptions, and significant
effort to make the UVM execution mechanisms efficient.  But the
benefits are enormous, and such a UVM would provide solutions to many
problems---including agent-based servers, but extending far beyond
(which might mean it's "overkill" in your particular domain).  I could
write several paragraphs about the work involved and the expected
benefits, but I'll avoid repetition by refering you to

	http://www-sor.inria.fr/~riccardi/vvm3.html

instead.  (Let me know if the server refuses access: I'll send you the
PostScript from the LaTeX version.)

You will certainly succeed in making something useful, whether you
decide to reimplement each VM or to build a UVM supporting several
languages.  (I think "multiple" or "combined" VM is a better
name---since universality is a rather bold claim unless the thing can
be dynamically extended, on demand at "runtime", to execute an
arbitrary bytecode set.)  I think that both approaches have associated
grief, but the grief comes in different places.  It's not obvious (to
me) which particular bunch of grief would be most managable.

(And if you ever decide to start building a *real* universal VM, with
properties similar to the one described in the above document, let me
know: I'll send you my CV by return of post! ;-)

Regards,

Ian
------------------------------- projet SOR -------------------------------
Ian Piumarta, INRIA Rocquencourt,          Internet: Ian.Piumarta@inria.fr
BP105, 78153 Le Chesnay Cedex, FRANCE         Voice: +33 1 39 63 52 87
----------------------- Systemes d'Objets Repartis -----------------------