Proposals

btanksley@hifn.com btanksley@hifn.com
Wed, 12 Jul 2000 11:21:50 -0700


From: Lynn H. Maxson [mailto:lmaxson@pacbell.net]

>Billy wrote:
>"However, I'll also note that we disagree even on 
>the size of the *finite* version of the problem; I 
>would ask that you look at the example I spent so
>long writing, and try to estimate the number of 
>potential programs such a program would generate.  
>It's not at all small!"

>Even here we are in agreement.  I find that one 
>week after IBM delivers its ASCII White system 
>which perform 13 terraflops/sec and has 13 trillion 
>bytes of RAM that we would understand that there is 
>large and there is infinite, but the large is quite 
>significant relative to our situation.  The point 
>is as a human I have to employ heuristics and shoot 
>for "close enough" because the amount of brute 
>force iterations I can do in an interval is finite.  
>While a software system on a 1GHz machine may not 
>perform an infinite amount in the same interval, it 
>certainly runs into the hundreds of millions and 
>billions whether you are talking of an interval of 
>an hour or a day.

Generating the best possible program for "3 4 + ." is undeniably possible in
software, thanks to well-chosen heuristics.  It's been done for a slightly
more complicated problem, although it took a LOT of time.

My point is not that my toy program is impossible to optimise; but rather
that the complexity of the toy problem illustrates how much more complex a
real problem would be.  You're talking about an hour or a day; I'm telling
you, with complete honesty and seriousness, that what you described will
take centuries and aeons, and that the results you originally asked for
physically can't be stored.

Now, you seem to me to be recanting (if I may use the word).  In your
orignal messages you spoke of generating ALL logically equivalent forms of
the program; however, when I illustrate the complexity of that, you claim
that you only generate a couple of forms of the program (chosen by
heuristic), and generate a couple types of code from each one.  Well...
That's great, and I have every reason to believe that it'll work.  But it
won't produce the results you're expecting.

>We are engaged in a discovery process in which I 
>can afford to do something on a computer that I 
>cannot do as a human even if I could afford it.  In 
>general it follows the principle that humans work 
>well in the small while software systems extend our 
>abilities in the large.

>I looked at your example, basically agreeing with 
>every statement you made in it.  The problem was I 
>couldn't connect it to the discovery process I had 
>in mind, which as I tried to explain had far lesser 
>scope at the component level.

It's certainly possible that we're getting our signals crossed.  I had
understood that you were trying to generate the most efficient version
possible of the program specified.  I believe now that you're merely trying
to make a program more efficient than could be generated by an assembly
language programmer.  There's no reason I can see why you can't achieve the
second goal; however, I'm still completely baffled by your insistance on
mechanisms for allegedly doing that which have no visible possibility of
working.  At the program level we both agree that there's too much scope;
however, at the component level there's not enough scope.

>In a sequence of 
><operand><operand><operator> with well-defined 
>attributes the enumerated options for each is both 
>small and finite and their cartesian product even 
>if a thousand times greater (which it is not) is 
>not significant in terms of time.  It would not 
>take even seconds.

The 'enumerated' options for each <item> in your list is always uncountably
infinite (NOT small), but reasonably reducable (by heuristic).  The problem
is that if you pay attention only to the <operand><operand><operator> you're
working on now, you'll miss the fact that a previous
<operand><operand><operator> used a subsequence that you could borrow
economically if you rearranged the program a bit.

>Moreover for a given situation of this triad and 
>any non-repeating patterns in hierarchies of such 
>triads I only have to do it once, saving the 
>patterns and results for reuse.  The fact is that I 
>don't even have to wait for a given expression to 
>occur as I can enumerate all possible supported 
>non-repeating expressions and allow the software 
>system to tackle them one-at-a-time when it has no 
>other function to perform.  In a week or a month 
>this runs into the trillions.  It all lies within 
>an available (and affordable) processing and 
>storage capacity.

Um...  No, you can't precompute all possible encodings of all possible
expressions.  If that's what you're saying.  Even if you could, you
certainly couldn't search the resulting database at trillions of
transactions per week (this works out to 1x10^8 transactions per minute --
the world record (as claimed by a vendor) is 6x10^5 transactions per
minute); more reliable sources (the TPM/C site) says 4x10^5.  Trillions per
month is a still-unfeasable 2x10^7 TPM.  The record, of course, is
established by benchmarks, which will turn in far higher figures than any
real application ;-).

>That does not mean that because it is there that I 
>want to waste it or have it behave in a less 
>intelligent manner.  That's why we impose rules 
>governing its behavior, rules which we have 
>developed over fifty years in millions of 
>applications.

This sure makes a lot of sense to me.

>I would only have rules that 
>enumerated every possible instance if that were the 
>only choice available.

So I'm guessing that your talk about "exhaustive search" was talking about
something else, then.

>I wouldn't hesitate to do 
>so when I know I can do a million iterations a 
>second in a world where I have less than 3 dozen 
>primitive operators and 2 dozen types of operands.  

I'm not sure what an "iteration" is, but there's one thing I'm sure of: it
has nothing to do with optimization.  If you're doing a million of them a
second, that gives you only 1000 clock cycles per iteration to do whatever
you need to have done.

>If I have categorized the operand data types and 
>the operator types (logical, boolean, arithmetic, 
>string) to further restrict the discovery process 
>of a particular instnnce, I exhaust the 
>possibilities within an extremely small interval.

Setting aside the "extremely small" and replacing it with a "reasonably
small", I absolutely agree.  If you apply all these restrictions you can
generate decent code.  I have a better suggestion, though: why not use the
restrictions and heuristics the compiler researchers have been working on
all these years to the problem?

>That's where what I am describing and what you 
>describe differ.  I already have rules which apply 
>to operands (variables or literals) and operators 
>and to their conversion in mixed expressions.  I 
>have them because they have been present and in use 
>in PL/I since its inception.

Perhaps this is the really important point I'm completely missing.  Even if
it's not important, I'm completely missing it.  You keep bringing up the
"rules in use in PL/I" as though they had anything to do with generating
optimal code.  I don't understand that at all.

>How many different 
>ways can I set up an operand associated with a 
>given operator?  I either move it to a register or 
>align it to some address in memory or set its 
>address for use in an instruction.

Ahem.  Or you realise that it's been computed (or partially computed) as
part of a previous operation, so you move the code which needs to use the
value together so that they can share the value in a register without having
to do a memory access -- or perhaps you shouldn't move the code, but instead
store the value in memory where nothing else will touch it in the meantime.

I could go on, but I already have in a specific example.

>Even the 
>choices for arithmetic operator instruction is 
>essentially predetermined by the operand type.  
>Because I have rules and categories I don't have to 
>perform the iterations you suggest.  I am not 
>starting from scratch each time nor from 
>inexperience.

Ah.  So you're willing to accept the 'type' limitations implied by the
language the programmer was using.  That's fine; if the programmer had
wanted an efficient program he would have used a better type for efficiency,
even though it would hurt readability.

>I look at your example and I relate it to the 
>doctoral thesis using "linear names" because 
>"non-linear name usage" is expensive.  My only 
>response is that it is expensive only if you cannot 
>afford it.  If you choose to retain an 
>object-oriented methodology, then maybe it is too 
>expensive.  However you always have the option of a 
>different methodology.

I'm guessing you're talking about Baker's "The FORTH Shall be First" paper,
which was an article in an ACM journal.  If so, you misread it.

>"Let us assume them?  In other words, you're saying 
>that you want to restrict yourself to a specific 
>language's rules and optimizations?  ALREADY I can
>promise you that our assembly language programmer 
>has you beat."

>Wrong on two counts.  One, it is not a restriction 
>to use something which works and basically is 
>language independent in terms of expression 
>evaluation.

One, it IS a restriction to permit the optimised code to contain only types
which were present in the programmer's code.

>It is a set of rules which apply in 
>the absence of explicit programmer intervention.  
>As such it falls under the minimalist approach that 
>Fare espouses.  There is nothing wrong with using 
>defaults if they achieve your purposes.

This is vacuously true -- the defaults cannot achive your purposes, so
therefore "There is nothing wrong with using defaults if they achieve your
purposes."

>Two, the assembly language programmer cannot 
>economically compete in terms of performance and in 
>terms of maintenance, the dynamics of the 
>programming environment.  If he could, then we 
>would all be assembly language programmers today 
>and HLLs a footnote in history.  As I tried to 
>point out in my last response the only reason there 
>is a performance difference is that we have 
>deliberately chosen an implementation with those 
>characteristics.  No assembly language programmer 
>having to follow the same implementation rules 
>could perform better.

Ah, yes!  At last we come to something we can both agree on.  At least
technically, the assembly language programmer can't compete with the HLL
programmer.  Except for the minor fact that many programmers I've seen write
assembly in C.  It's true that they never touch a register or write the
'register' keyword, but they do carefully consider inlining or unrolling a
loop, because they know that their compilers can't optimise worth a hill of
beans.

But that's a minor point, because we both do agree that it's possible to
build a compiler (even for C) which could optimise "worth a hill of beans"
(although I still posit that you'll never get as good as an intelligent
assembly programmer).  I suspect that our disagreement is on means, not
ends.  You seem to want a specific methodology which I'm slowly bringing to
light -- it may be that this specific method of generating code will work.
I can't tell, although I can tell you for CERTAIN that the method you
started out explaining wouldn't.

And yes, no assembly programmer _following the same rules_ could do better.
I certainly agree on that.  Your rules are completely alien to humans (which
is why the computer handles them well); the assembly programmer would
violate the rules almost as soon as he sat down to code, without really
meaning to (not that you could pay a human enough to make him follow those
rules).

>"The output is clearly fixed, yes.  I don't see why 
>the fact that "both 3 and 4 are literal values" 
>should have any impact on the generated code; after
>all, it's clearly possible that there are many ways 
>to generate them, some of which may be more 
>efficient than 32-bit integer literals."

>hehe.  3 and 4 are fixed "decimal" values not 
>"binary" integers in PL/I.  What you generate is 
>the most efficient decimal arithmetic.  It would be 
>faster to have assigned them to fixed binary 
>variables, e.g. 'dcl a fixed bin(31) init(3), b 
>fixed bin init(4); put list (a + b);'.  If I had 
>wanted them to be binary integers, then they would 
>have the literal values of 0011b (3) and 0100b (4) 
>respectively.  The PL/I data world is far greater 
>than any you have encountered in any language.<g>  
>I get upset at the ignorance that C introduced with 
>int that decimal values are necessarily binary 
>integers.  I come from a commercial programming 
>environment where binary integers are restricted 
>primarily to use as indices.<g>

That's great, but this is another example where being restricted to the
written types is a really bad idea.  The problem can be solved most
economically without using any arithmetic at all -- the code "'7' emit" I
listed is probably (short of some other vaguely possible code which didn't
need to use a literal) the most efficient implementation.

>"Again, though, you miss the point -- regardless of 
>the language, the way we carry out the instruction 
>will impact how long the operation takes, and how
>much space the program takes."

>I guess my response here is, "So what's new?".  I 
>don't have the infinite or even large enumerations 
>that occurred in your description.

I guess you mean this.  Odd.

>"PL/I isn't listed?  That's terribly stupid.  PL/I 
>is a really facinating language; one of the aspects 
>of its design was that "every apparently
>meaningful concatenation of symbols should have a 
>defined meaning.""

>Well, here I must say we are positively in 
>agreement.

I think so.

>Not only was this aspect of its design 
>different, but also was its emphasis on enabling 
>programmers to write as they saw fit regardless of 
>efficiency or optimization.  The programmer is 
>king.  The function of the implementation is to 
>take its orders from the king and not as it is in 
>other languages to dictate terms, e.g. strong 
>typing without defined conversions for mixed 
>expressions.  I have yet to see another programming 
>language that even comes close.

I'm confused.  I thought you just said that the optimiser would never change
a type the user gave it (as when you claimed that the user had given decimal
numbers); yet you admit that binary numbers would be faster.  This is a
minor case, but there are many far worse ones: if the optimiser can't choose
its own types and operations, how can you claim that the programmer is
free???  Won't the programmer HAVE to choose the most efficient way to do
what he needs, REGARDLESS of readability?

>"Whoah.  You're MAJORLY jumping the gun!  You're 
>selecting which operator to use, when you've 
>already assumed which operands you're going to use.  
>You've assumed incorrectly that the form of the 
>program is going to be "3 4 + .""

>I only 
>know internals of IBM compilers and information 
>from compiler writers.

I've only written one compiler, and that one had very limited optimization.

>I know the data types of 
>the operands and those operators which support 
>their use.  There's no way I am going to toss out 
>what I know and not incorporate them in my code 
>generator rule set.  You did and that's why you end 
>up with this infinite none-sense.  The whole 
>purpose of having rules is not to have to go 
>through that.  You choose to do it.  I choose not 
>to.

I did *not* toss information out.  I used it in every sense possible!
You're the one who's tossing information out: you're paying all this
attention to low-level types, and pretending that they're unchangable, when
actually only the results are unchangable.  The information you're ignoring
is the fact that programs contain *internal* information, that is
information not visible, and that information is "an implementation detail".

>"Perhaps it's here that we differ.  Do you believe 
>that an instruction sequence which satisfies "0 3 + 
>4 1 * +" does not also satisfy "3 4 +"?  If
>so, here's your mistake.  It does."

>Nope.  We do not differ here.  Guess again.<g>

Huh?

I'm running out of guesses.  If we agree that both satisfy the constraints,
then why would you eliminate one of them?

>"However, that sort of decision isn't easily made 
>with a computer, and the rules of logic give
>absolutely no guidance; you have to depend on what 
>humans call "experience", and algorithms call 
>"heuristics".  If you're going to accept using a
>heuristic here, then we're in agreement: this 
>problem is entirely solvable using heuristics."

>Whew.  It took awhile, but we seem now to be on the 
>same track.  I have said from the beginning that 
>the logic engines were "rule-based".  That includes 
>any encoded heuristic.  I have never said that the 
>computer had to behave less intelligently in a 
>discovery process than I would.  I have only said 
>that if I am restricted to a less intelligent, i.e. 
>brute force, approach that the computer can do 
>faster and cheaper and much more extensively than I 
>can.

This is certainly true.  However, due to the magnitude of the problem, we're
fortunate to not be limited to brute-force (since, after all, brute force
wouldn't work at all for this problem).

>"You forgot XOR and OR.  Not to mention that 0 + + 
>is equivalent to +."

>Why assume that I have forgotten anything in the 
>rule set.

Because you explicitly said that there was only a couple of ways to express
a "+", all of them having to do with addition.  Since there are actually an
infinite number of ways to do that, and only a "few" of them have to do with
addition, it's quite obvious that you've done something akin to forgetting.

>It is one thing to know a priori that 0 
>is the "only" value of an operand than it is to 
>know that it is a possible value.

This sentence seems to be missing some words -- it makes no sense and seems
ill-formed.

Assuming that you meant to write, "It is one thing to know a priori that 0
is the "only" value of an operand; it's completely different to know that it
is a possible value," I'm still puzzled.  What does this have to do with the
fact that "an instruction sequence which satisfies "0 3 + 4 1 * +" does also
satisfies "3 4 +""?

>My triad is 
><operand><operand><operator> which in APL is 
>referred to as a dyadic (two operand form).  APL 
>also supports a monadic form (one operand), 
>e.g.<operand><operator>.  In either case wherever 
>an operand can occur another monadic or dyadic 
>expression may appear.  An operand is an expression 
>and thus a sub-expression within an expression.

True.  (I know Forth far better than I know APL, so your initial RPN was
quite clear to me.)

>"Again, though, you're falsely assuming limitations 
>on 3 and 4."

>If I am falsely assuming limitations here, they 
>have been the ones I have used for the last 35 
>years with PL/I.  There not limitations, only rules 
>for converting literals into data types.

Er -- they're limitations.  They "limit" you from using a more efficient
representation, even when the more efficient representation is logically
equivalent.

>I do it 
>to avoid the "song and dance" that you go through 
>in your description.  If it makes you feel better, 
>call them "heuristics".  Whatever you call them 
>they work.<g>

You've just spent a long time telling me that current compiler technology
doesn't work.  Now you're telling me that it does.  If the current
technology isn't broke, why do you want to spend energy fixing it?

As far as I can tell, the optimiser you're describing now is utterly simple:
it takes in PL/I code, and emits machine code which impliments *exactly*
what the programmer requested, same type, same sequence of operations, same
everything.  I wouldn't call that an optimiser; my first attempt at my
Oberon compiler was far more sophisticated.

>"I'm guessing, though, that the general gist of 
>this paragraph is to say that "optimizations for 
>programs need have no impact on optimizations for
>expressions."  This is UTTERLY false; an 
>optimization for a program might, for example, 
>REMOVE an expression.  Or, alternately, might merge 
>two expressions together, making them reuse a value 
>which was not obviously being reused."

>Optimizations occur at multiple levels.  Normally 
>they occur at the most global level and work 
>progressively downward to the most local.  In that 
>sense they have a possible impact in their downward 
>movement for all things within their scope.  

This has some truth for algorithmic optimization.  It's not true for
optimization in general, because in reality you have to make only one
optimization: the one which takes the current program, and produces a better
program.  If you're going to pretend that optimization is possible in
multiple steps, you should at least admit that the steps aren't discrete:
each step affects all of the others, including the ones before it.

>However if a higher level optimization removes a 
>lower level entry, then it has no impact on its 
>evaluation, because when that level is reached no 
>evaluation is necessary.

A strange definition of "no effect".  Here's a new slogan: "Murder, the NEW
victimless crime."  Since the person killed no longer exists, the crime had
no effect on him!  ;-) :-)  I used expression elimination as an extreme
example; however, I can understand that it's not an obviously good example
:-).

Perhaps more effective: consider an optimization which causes one expression
to use an intermediate value produced by another (otherwise unrelated)
expression.  Now, you surely see that optimizations to the "unrelated"
expression affect the original expression; in fact, it might be that it
would be wiser for the overall program to optimise the unrelated expression
and discard our optimization on the original expression.

>The point is that optimization and its ordering are 
>rule-based, not arbitrary.

And my point is that any rules which imply ordering are truly arbitrary,
because optimization is NOT ordered.  Your optimiser may turn out to be the
best ever written; however, there's nothing to stop some company from
writing a better one (by using a different set of rules) and returning the
marketplace to the same state it was in before.

>Here as elsewhere we 
>have encoded the rules that our logic engine is to 
>follow.  The logic engine may exercise more 
>"reflection" on the application of the rules, but 
>it does so only within guidelines set by other 
>rules.

Excellent.  (By "reflection" you clearly mean "careful reasoning", not
"self-examining and modifying code".)

>For reasons not clear to me you continue to 
>ignore the continued references I make to a 
>rule-based system, substituting in its place one of 
>an abritrary discovery process.  I don't falsely 
>assume rules as it is not false for me to do so.

This is because you clearly and repeatedly talked about how your program
would explore all programs which are logically equivalent to the
specifications.  If you're not in fact going to do that, then it would be
highly proper of you to not say you are.

>"Where on EARTH did you hear that nonsense?  Humans 
>are currently FAR more capable of working such 
>problems out than are machines."

>That "nonsense" did appear in a response by someone 
>else relative to writing optimal code sequences for 
>a Intel Pentium processor.

True optimal sequences are theoretically possible, but practically not so.
Getting close enough is good enough.

I have a friend who's well-paid because he's so good at handwriting code for
that.

>Basically it parallels 
>that introduced much earlier with IBM RS/6000 
>family of machines, in which assembly language was 
>an option, but was discouraged in favor of using a 
>HLL with its builtin optimization processes.

This was a theory for a while, but just like the 4-minute-mile was disproved
by breaking it, so this one was disproved.  However, I will add that I like
HLLs: all I ask is that they produce code which is good enough.

>"Don't waste my time.  You HAVE to know better than 
>that -- simply because you haven't HEARD of a 
>mathematical law doesn't mean it doesn't apply to
>you.  Shoot, that's not even true for political 
>laws."

>Again I don't have to waste my time accepting 
>something that works without understanding laws 
>that say something else doesn't.  It makes no 
>difference if I have heard it or not.  If something 
>works, it obviously does not apply.  What more do I 
>need to know?

Rice's theorem applies quite well: you claim that you can generate a set of
possible programs, then choose the fastest from among them (and resolve ties
by choosing the smallest).  Rice's theorem points out that program
termination is uncomputable; there's no computable way of knowing whether an
arbitrary program will terminate, much less how long it takes to do so.
Therefore, you cannot choose the fastest of the programs in general.

You *can* choose the smaller ones, or the less complex ones.  This is what I
would choose to do.  However, that's only a heuristic, and at that a very
uncertain one.

>"You failed in your goal the moment you limited 
>yourself to PL/I rules."

>Somehow you had better make up your mind about 
>PL/I.<g>  If I had limited myself to the rules of 
>any other computer language or of all of them 
>combined, I would have more limits than I have with 
>PL/I.<g>

That's true.  I could have made a stronger statement by using any other
language.  But my statement still stands: not as an insult to PL/I, but
rather as an assesment of the problem.

>So if there is something limited in PL/I rules 
>regarding expression evaluation, it is far less a 
>limit than what you based your lengthy description 
>on.

I never cared one whit about expression evaluation.  Optimization is our
topic.  Only the inputs and outputs of the program matter.

>"We are utterly disagreed.  Software produces 
>MISERABLE results compared to even a halfhearted 
>assembly programmer.  Especially software with the
>horrible heuristics you're describing."

>Then let us continue to agree to disagree.  You and 
>I do not exist in the same compiler universe.<g>

You were, just a few messages ago, talking about how the current compilers
didn't meet your desires.  Now they suddenly seem to make you very happy.

-Billy