Lisp and VLIW (was Re: Genera)
Mon, 6 Apr 1998 09:39:25 -0600 (MDT)
[stuff about VLIW and such cut]
>I'm not sure what you are asking about. I understand the VLIW part,
>not the predicated execution part.
The point that I was trying to get at is that if you are using a VLIW
model similar to IBM's ForestalPC you can basically execute multiple
branches of a condition in parallel before the test of the condition
is completed. When the result of the conditional test is available
then the correct path of computation is selected and the others are
discarded. (Warning.. what I just wrote was a gross simplification and
should be treated as such.) The offshoot of this is that you could
compile code which did say a fixnum add, flonum add, unbox + fixnum
add, and unbox + flonum add in parallel with checking the operands'
tags then keeping the result. I understand that this type of thing
was done on the Symbolics where tag comparisons and operations, like
addition, were done in parallel. If the tags were wrong the result was
discarded. However, my account of the Symbolics' behavior is purely
>>The issues of tag comparisons and the such being performed in parallel
>>with the operations seem to be a gimmy source of instruction level
>>parallelism which would be easy to take advantage of on such a
>>processor. Could it be that the next generation of general purpose
>>processors could be a real win for lisp? Has anyone looked into this
>The reason, at least in my mind, for having special hardware for the
>tag comparisons is that they are very stylized and instead of taking
>up hardware that could be used for another higher-level operation, it
>is more efficient to only consume hardware resources commensurate to
Conversely, is it also not beneficial to have a general mechanism that
can be put to use when possible. For instance, the compiler has
proven that the following section of code will only use fixmums so we
can use the ALU that "normally" would be doing tag checking for "real"
>That being said, it is clear that the VLM emulator gets back at least
>some of the performance lost to emulation by having a fairly high
>proportion of dual-issue instructions in the hand-crafter instruction
>dispatch loop. As Tucker Withington said, when the VLM was designed
>typical compilers wouldn't necessarily do as well.
>So, using instruction level parallelism in the hardware can get some
>of the same effect as having separate hardware.
That was my impression. What I am really interested is a sort of
empirical assessment of the situation. For instance, I would like to
compare the performance of a new (fictional) VLIW lisp compiler
"Monty" and GCC version 7.3 (equally fictional) which is really good
at optimizing C code for VLIW processors. The reason that this
comparison is interesting is that there is usually only a bounded
amount of parallelism in a given algorithm or problem, and furthermore
this amount of parallelism can vary with regard to which part of the
algorithm you are in. So lets say that when compiling a given C
program GCC is able to use 70% of the parallel resources of the
processor. If the Monty compiler was sufficiently smart to fit our
"Lisp overhead" into the remaining 30% of the processor's resources we
could have our cake and eat it too. "As fast as C performance(tm)"
with all the "Cool dynamic Lisp features(tm)".
This is all conjecture however since most of these things are simply
vaporware. There are lots of smart people trying to come up with ways
to get ILP out of C code and they may be good enough to raise C's
utilization of the processor's parallel capabilities to 95% where we
would not have a chance. However, this is not practical
wherever there are a lot of data dependencies in the calculations and in
these cases the added costs of Lisp could simply be swept under the
Then again this could all just be wishful thinking,
Note: All numbers quoted in the above are entirely fictional and
resemblance to numbers living or dead is purely coincidental.