[virtmach] Your VM

Anton Ertl anton@mips.complang.tuwien.ac.at
Tue, 29 May 2001 14:28:45 +0200 (MET DST)

David Rush wrote:
> It's worth noting that your most-used instructions may benefit from
> being if-dispatched depending on underlying CPU issues. In particular,
> CPUs with branch predictors can't use them on jump tables, so you can
> easily get I-cache misses and pipeline stalls.

Well, branch target buffers (BTBs), as present on Pentium, P6, Athlon,
and 21264 (known there as next line predictor) work on indirect
branches.  For an interpreter with one copy of the dispatch code per
VM instruction, they give about 40%-50% hit rate on large benchmarks
(more on smaller ones).  For interpreters with only one copy of the
dispatch code for the whole interpreter (e.g., typical switch-based
interpreters), they give 0%-20% hit rate.

I don't see how a BTB miss will influence I-cache misses
significantly.  Most of the time the BTB miss should cause a hit in
the I-cache, because the BTB will predict a recently executed piece of

Now, about the idea of special-casing the most frequent instructions
with ifs: How often are the most frequent instructions executed?

- For Forth, the most frequent instructions are consistently call and
return, at 13%-17% each. 

- For Java, the most frequent instruction varies a lot with the
application.  Here's the data from

		mol	sea	eul	ray
dload		33.3%	<0.6%	 2.8%	 1.1%
iload		 7.0%	13.2%	19.7%	 1.8%
get_field	 4.3%	 7.3%	16.2%	26.1%

So, from this we can guess that usually the most frequently executed
VM instruction usually makes up less than 20% of the executed VM
instructions of a typical program.

So with special-casing one instruction we get:

#cases	cost
>80%	conditional branch + indirect branch
<20%	conditional branch

Assuming a 0% BTB hit rate for the indirect branch, and the same costs
for conditional and indirect branch mispredictions, already a 20%
misprediction rate for this conditional branch is sufficient to
elminate the misprediction advantage of doing less indirect branches;
in addition, there is also the overhead from correctly predicted
conditional branches to consider.

How large is the prediction accuracy of the conditional branch?  I
don't know, but I am pretty sure that you should not just assume it is
the same as for SPECint, or for simple benchmarks on the VM

> Additionally, if
> branches can often take advantage of CPU delay slots.

Indirect branches have delay slots on the same architectures.  In VM
interpreters with central dispatch there is often nothing to fill
these delay slots.

- anton