[virtmach] parrot VM?

Chris Lattner virtmach@iecc.com
Sat, 13 Jul 2002 12:26:31 -0500 (CDT)

> > > Yes.  I imagine register allocation (done right) would be a significant
> > > complication in the bytecode compiler.
> > Although the register allocator would give much better results than simple 
> > tree based heuristics used for stack machines.
> What kind of results do you mean?  The register allocator may reduce
> the number of VM instructions executed (although even that is not
> clear, given that it might produce explicit code for spilling, whereas
> spilling is implicit in stack-based code), but

The whole point over explicit register allocation, rather than using 
implicit allocation based on the stack, is that you actually have a decent 
chance of doing a good register allocation, regardless of how many 
registers your target architecture has.

To be more concrete, lets consider two machines: one is x86like (~6 
general purpose regs), one is mips/sparc'like (32 GPRs).  From what I 
understand of register allocation for stack machines, you basically assign 
slots in the stack to different registers.

For most stack machines, there is a very clear correspondence between the 
code as written by the human, and the stack code generated.  For example:

  X = (A+B)*(C+D)

would turn into:

load A
load B
load C
load D
store X

Or somesuch.  Stack based register allocation would trivially assign three 
registers to that expression (one for each of the adds and the mul).  
Depending on the sophistication of the VM, the variables themselves may be 
in registers, but that's irrelevant for this discussion.

This is a nice clean, simple approach to register allocation, for sure.  
The problem then is that you are depending on the human to do the register 
allocation for you, which means that you cannot do a good job for the 
machine you are targetting (you lose platform independence).  For example, 
on the x86 type of machine, you will probably do a somewhat reasonable 
allocation, but there is almost no chance that you will do a good 
allocation for the sparc (the stack will rarely grow to 32 entries deep).

One poster mentioned that a nice property of a stack machine is that you 
get explicit lifetime information from the stack properties (when a value 
is popped, it IS dead).  This simplifies liveness analysis used for 
register based graph coloring as well as the simple stack case:  The Latte 
JVM, for example, adds annotations to the registers when it converts from 
Java bytecode to its internal register based architecture.  It then uses 
these annotations to make liveness analysis very easy.  The only problem 
is that they must be tracked and updated when optimizing.

> - in a VM interpreter this reduction has to make up for the increased
> decoding overhead of register-based instructions before gaining any
> speed.

That's is certainly always a tradeoff faced by VMs: better optimizations 
must be paid for with repeated use of the generated code.  Having lots of 
spills do to poor allocation is a very good way to slow down the 
application though.

> - a VM-to-native compiler will reallocate the registers anyway for
> best results, specific to the number of registers etc. on the target
> machine, so an earlier register allocation effort is probably wasted.

I'm not talking about earlier register allocation, and perhaps that's 
what's confusing things.  I'm advocating an infinite register VM (for 
example see the SafeTSA paper, published in PLDI2001).  Part of my point 
is that if you are going to convert your stack machine to a register based 
machine, then there is no reason to use a stack machine in the first 

> > One very significant drawback of stack-based VMs is that they are almost 
> > impossible to do meaningful optimizations on.
> Well, for some reason the architects of the Amsterdam Compiler Kit,
> the MIPS compilers, and the Garden Point compilers (Queensland) chose
> stack-based intermediate representations for their compilers, and
> optimized them, so it cannot be that bad.

I am honestly curious what types of optimizations you performed, and what 
transformations you applied to the stack to transform it.  As I see it, 
optimizing a stack machine is effectively equivalent to optimizing an AST 
based representation, which is possible, but very painful.  Simple 
optimizations like LICM and CSE seem reasonable on a stack machine, but 
how do you handle things like load/store motion, PRE, loop 
transformations, scalar replacement, etc...

> I would not use a stack-based VM for most optimizations, though, and
> neither would I use a register-based VM.  VMs are for direct execution
> by an interpreter, or at least as a close-to-execution representation
> of a program; for most optimizations I would use data-flow graphs, SSA
> and such, and these can be created from stack-based code at least as
> easily as from register-based code.

Again, when I say "register based", I mean in contrast to stack based... 
SSA is a perfect representation for many sparse optimization (my work is 
in the context of an SSA based infinite register VM, f.e.).  So "register 
based" is already a great representation for optimization...