[virtmach] parrot VM?
Chris Lattner
virtmach@iecc.com
Sat, 13 Jul 2002 12:26:31 -0500 (CDT)
> > > Yes. I imagine register allocation (done right) would be a significant
> > > complication in the bytecode compiler.
> > Although the register allocator would give much better results than simple
> > tree based heuristics used for stack machines.
>
> What kind of results do you mean? The register allocator may reduce
> the number of VM instructions executed (although even that is not
> clear, given that it might produce explicit code for spilling, whereas
> spilling is implicit in stack-based code), but
The whole point over explicit register allocation, rather than using
implicit allocation based on the stack, is that you actually have a decent
chance of doing a good register allocation, regardless of how many
registers your target architecture has.
To be more concrete, lets consider two machines: one is x86like (~6
general purpose regs), one is mips/sparc'like (32 GPRs). From what I
understand of register allocation for stack machines, you basically assign
slots in the stack to different registers.
For most stack machines, there is a very clear correspondence between the
code as written by the human, and the stack code generated. For example:
X = (A+B)*(C+D)
would turn into:
load A
load B
add
load C
load D
add
mul
store X
Or somesuch. Stack based register allocation would trivially assign three
registers to that expression (one for each of the adds and the mul).
Depending on the sophistication of the VM, the variables themselves may be
in registers, but that's irrelevant for this discussion.
This is a nice clean, simple approach to register allocation, for sure.
The problem then is that you are depending on the human to do the register
allocation for you, which means that you cannot do a good job for the
machine you are targetting (you lose platform independence). For example,
on the x86 type of machine, you will probably do a somewhat reasonable
allocation, but there is almost no chance that you will do a good
allocation for the sparc (the stack will rarely grow to 32 entries deep).
One poster mentioned that a nice property of a stack machine is that you
get explicit lifetime information from the stack properties (when a value
is popped, it IS dead). This simplifies liveness analysis used for
register based graph coloring as well as the simple stack case: The Latte
JVM, for example, adds annotations to the registers when it converts from
Java bytecode to its internal register based architecture. It then uses
these annotations to make liveness analysis very easy. The only problem
is that they must be tracked and updated when optimizing.
> - in a VM interpreter this reduction has to make up for the increased
> decoding overhead of register-based instructions before gaining any
> speed.
That's is certainly always a tradeoff faced by VMs: better optimizations
must be paid for with repeated use of the generated code. Having lots of
spills do to poor allocation is a very good way to slow down the
application though.
> - a VM-to-native compiler will reallocate the registers anyway for
> best results, specific to the number of registers etc. on the target
> machine, so an earlier register allocation effort is probably wasted.
I'm not talking about earlier register allocation, and perhaps that's
what's confusing things. I'm advocating an infinite register VM (for
example see the SafeTSA paper, published in PLDI2001). Part of my point
is that if you are going to convert your stack machine to a register based
machine, then there is no reason to use a stack machine in the first
place!
> > One very significant drawback of stack-based VMs is that they are almost
> > impossible to do meaningful optimizations on.
>
> Well, for some reason the architects of the Amsterdam Compiler Kit,
> the MIPS compilers, and the Garden Point compilers (Queensland) chose
> stack-based intermediate representations for their compilers, and
> optimized them, so it cannot be that bad.
I am honestly curious what types of optimizations you performed, and what
transformations you applied to the stack to transform it. As I see it,
optimizing a stack machine is effectively equivalent to optimizing an AST
based representation, which is possible, but very painful. Simple
optimizations like LICM and CSE seem reasonable on a stack machine, but
how do you handle things like load/store motion, PRE, loop
transformations, scalar replacement, etc...
> I would not use a stack-based VM for most optimizations, though, and
> neither would I use a register-based VM. VMs are for direct execution
> by an interpreter, or at least as a close-to-execution representation
> of a program; for most optimizations I would use data-flow graphs, SSA
> and such, and these can be created from stack-based code at least as
> easily as from register-based code.
Again, when I say "register based", I mean in contrast to stack based...
SSA is a perfect representation for many sparse optimization (my work is
in the context of an SSA based infinite register VM, f.e.). So "register
based" is already a great representation for optimization...
-Chris
http://llvm.cs.uiuc.edu/
http://www.nondot.org/~sabre/Projects/