[virtmach] Virtual Machines and Code Representation

John R Levine johnl@iecc.com
Mon, 24 Jul 2000 23:09:54 -0400 (EDT)

> > Me neither.  Despite claims to the contrary by some of its authors, it's yet
> > another attempt at an UNCOL, a universal intermediate language, and it's
> > failed for the same reasons that every other UNCOL project has failed

> I don't think this is quite true. Provided you limit your aims, and stay
> really low level, the right combination of care and brutality can be
> found. My own attempt at this, Mite (see my home page, URL below), is
> certainly flawed, but illustrates, I think, that it is possible to design
> a low-level VM that gives binary portable code for a wide range of
> languages.

I think it depends on what you mean by portable and a wide range of
languages.  (I'm not being snide, opinions legitimately differ.) If you
restrict your target machines to byte-addressed twos-complement machines with
8 bit bytes and ASCII-ish character codes, which is indeed the majority of
computers around today, and your input languages to ones that are sort of
like C, you can be somewhat successful.  But when you stretch beyond 
that, heat death ensues, e.g.:

 -- Support Fortran as a source language, taking advantage of all of the 
    semantic restrictions that permit optimization, e.g., you can assume
    that arguments to a routine never alias each other nor any visible
    global static data.  Don't forget EQUIVALENCE which lets you overlay
    different types of data.
 -- Support Common Lisp as a source language, with all of its data types,
    incremental compilation, and sophisticated garbage collection (mark
    and sweep is unlikely to be adequate)
 -- Support Cobol as a source language, particularly PICTURE data which
    simultaneously has character and numeric values depending on how you
    use it, and decimal (real or carefully simulated) arithmetic
 -- Support C++ and Smalltalk as source languages, both of which need lots
    of gross ad-hoc hacks to get decent code.

Once you've done that, support target architectures like the two Unisys
mainframe lines, one of which is 36 bit word addressed, the other 48 bit word
or character addressed. 

This is the swamp that every UNCOL has sunk into after initial promising
looking results.  Don't go there unless you've thoroughly researched why all
the previous attempts sank and have a clear idea how you will remain afloat
where nobody has before. 

John Levine, johnl@iecc.com, Primary Perpetrator of "The Internet for Dummies",
Information Superhighwayman wanna-be, http://iecc.com/johnl, Sewer Commissioner
Finger for PGP key, f'print = 3A 5B D0 3F D9 A0 6A A4  2D AC 1E 9E A6 36 A3 47