Proposals

btanksley@hifn.com btanksley@hifn.com
Fri, 14 Jul 2000 16:57:21 -0700


From: Lynn H. Maxson [mailto:lmaxson@pacbell.net]

>Billy wrote:
>"The fact that it's impossible to examine all of 
>the possibilities renders such an optimiser 
>impossible; and that should have been my clue that 
>you were talking about something else."

>Just when I think we are finally on track

One difficult part of talking with you is that you never answer yes or no to
a question, nor do you directly address a statement.  You instead discuss
something related to the statement which happened to be on your mind at the
time.  I'm going to assume that I DID guess your original meaning correctly;
and further, I'm going to guess that you also like to talk about optimisers,
and you consider the two topics different subjects.

Now, with that out of the way, let's chat about optimisers.

>it would seem that we understand optimizers differently.

True.  :-)

>I 
>understand them as "specialties", belonging to a 
>group of IT professionals focused on no other 
>activity.  That, at least, is the method that IBM 
>has chosen in translating all language source into 
>an intermediate language (closer to symbolic 
>assembler but still machine-independent) and then 
>subjected it to an optimizer code generation 
>process.  What is included in that intermediate 
>language I think appears in what is designated as  
>SYSDATA, another optional output available to the 
>compiler user.

I don't understand how your comment about "specialties" has any relation to
the rest of your paragraph.  How does it?

Optimization is not a specialty -- it's everything.  Permit me to quote from
the Tao of Programming: "The Tao gave birth to space and time.  Therefore
Space and Time are Yin and Yang of programming.  Programmers that do not
comprehend the Tao are always running out of time and space for their
programs."  Everyone who's ever written a program is an optimiser.  Anyone
who's ever walked a new route to an old place is an optimiser.

There are many specialties within optimization, of course.  Some of the most
rigorous involve the attempt to translate some aspects of optimization to a
machine-expressible algorithm.

>The point is that those responsible for optimizing 
>code generators are in effect "machine code freaks" 
>in the class of the best assembly language 
>programmers you have in mind.

Those responsible for code generators must be, yes.  I see you're bringing
in a confusion between code generation and optimization again; the ultimate
optimiser is also a code generator, but merely making a fancy code generator
doesn't give you an optimiser.  Optimising is FAR more difficult.

>I think you will 
>agree that they have only two methods of discovery 
>of optimal code sequences, one by chance and the 
>other based on the logic of their experience 
>(heuristic).

Hmm...  Actually, finding optimal sequences is easy for people using
documentation.  The hard part is recognising when they apply in the field.
That's what we use computers for ;-).

>If by chance, they have no means in their lifetime 
>or in a week's salary of discovering what a 200MHz 
>Pentium machine can in that same week.  In fact 
>what the machine may discover is the "optimum of 
>the week" to change a previous "optimum".

The optimum solution doesn't change from week to week.  It does, of course,
change from computer to computer (which is another reason not to waste your
time looking for the true optimum).

The computer won't find the optimum for anything but the most trivial
imaginable program in a week.  I disproved this in my last post.  In that
week, the programmer will have found not an optimum, but rather a very
efficient way to do what was needed -- and in the process, he'll have
changed the low-level design of the code just a bit to better fit the
architecture, something your computer can't do in a million years.

>If you 
>insist that a best assembly language programmer has 
>a better discovery process that is not 
>heuristically based and thus transferable to a 
>machine, be my guest.

Yes, I do.  I insist that the programmer will code NOT the best possible
code for the low-level design, but rather code which is BETTER than possible
for that low-level design.

>If it is not by chance, then it is by heuristic and 
>thus specifiable.  Whatever is the optimum 
>determined by any heuristic is a member of a 
>(finite) class of optimums determined by the set of 
>all "best" assembly language programmers.  If you 
>want to claim that this set of choices is infinite, 
>I will assert that you need an infinite number of 
>best assembly language programmers to produce it. 
>Anything less is finite and anything finite is 
>certainly well within the capacity of my 200MHz 
>machine to examine this week or next week or 
>however many weeks under the human effort is 
>exhausted.

Do you still believe any of that?  Then let me disprove it in ONE WORD:

Go.

ONE PERSON can defeat a computer at Go, even allowing the computer a huge
amount of time.

And your claim that "anything finite is well within my machine's capacity"
is pure nonsense.  2^20000 is finite, yet a problem that large will remain
forever out of the reach of ANY machine.  Heck, Go is finite, and your
machine can't handle THAT!

You're also mistaken when you associate heuristics with optimum solutions.
Heuristics don't even try to find optimum solutions; they merely find
solutions which are 'good enough'.

You're also badly mistaken (shockingly so) when you claim that selecting
from an infinitely large set requires an infinite number of programmers.
The very act of programming IS by definition selecting a single program from
an infinitely large set (the set of all programs); so is optimization (the
set of all programs which fit the specs).  This is done all the time by a
SINGLE programmer.

>In either case the best is the best (even if only 
>temporary) and whatever is the best is specifiable 
>in machine logic which must follow the rules of 
>formal logic.  Now the machine (or the software 
>executing in the machine) once encoded is always 
>running at its best.  That is something normally 
>not possible in any human being: to always be at 
>their best.  So in an exhaustive showdown in which 
>one is always producing its best a million times 
>faster than someone who isn't, chances are (perhaps 
>a million to one) that the former will win out over 
>the latter.<g>

Sigh.  But the computer *isn't* producing its best.  The computer can't
possibly even figure out what "its best" is.  It's only able to produce
either a heuristic guess as "good enough" (which usually is, but which
results in only one answer), or random guesses which are mostly sheer junk
(exhaustive search).  The human also uses what we might as well call a
"heuristic" (although it's not, really), but the human, unlike the computer,
can alter the specifications in the process of searching for the solution.

>I will assert to you that the number of such 
>programmers is not infinite nor is their output.  
>All of them together cannot discover by chance what 
>a single 200MHz machine can do within the same 
>fixed interval.  If their "improved" heuristics 
>gets translated into software, they cannot 
>consistently match its ability to produce the best 
>every time.

Sure!  That's why we run software called "compilers" rather than paying
compiler consultants every time we want to compile a program.  It's good
enough.  (It's not the best, though; any of them could probably do better.)

>Thus if the software is written to pursue any 
>chance discovery process used by humans or to 
>pursue any heuristic developed by humans, there is 
>no way in hell that any human can match it in 
>either accuracy (each time production) or volume.  

That's true.  That's why new versions of compilers come out every once in a
while.

>Moreover there is no way anyone can develop a 
>"secret" heuristic and keep it that way in 
>practice.  Once it exists in machine code the 
>secret is out: it becomes another among a host of 
>heuristics.

What?  That's impossible.

>Now can certain assembly language programmers 
>produce code that executes faster than that 
>produced by a specific compiler.  Yes, but only if 
>that compiler's code generation process has not 
>taken advantage of the previous discussion.  The 
>problem lies in seeing that every compiler gets the 
>best in code generations possible.  Giving some 
>thought to the matter you will (I think) come to 
>the conclusion that it is not a technical problem 
>(for certainly such code generation exists 
>somewhere) but a political and economic one due to 
>the competition among compiler vendors.

No, it's because heuristics vary.  The same heuristic can be coded in
different ways; or the code to recognise when the heuristic fits might be
better or worse.  The compiler's optimiser might be better or worse (and
that's one area where there's DEFINITELY no one best way).

It's not a technical problem; it's a mathematical impossibility.

>Now let's get to the matter of code optimization.  
>The first level of optimization lies in "expression 
>reduction", along the same lines that we have 
>learned in our mathematical education.  The second 
>level of optimization lies in the execution order 
>of the reduced expression of the first level.

Sure, if you want.  Of course, that's not true; optimization is a single
process, and performing it piecemeal will always result in something
inferior.  However, it's generally good enough.

>Now to cover everything we could allow the 
>additional control structures of iteration (loops) 
>and decision to our sequence-only directed graph.  
>All that does is increase the number of possible 
>paths.  Whatever that number it is always finite.  
>It does not make a difference how many times a path 
>is repeated, each repeated path (sequence) is 
>optimized.

Ah, but this isn't true!  Ever heard of "loop unrolling"?  It can VASTLY
speed up code!  Of course, it increases the bulk of code...

>This means we don't give a damn about Rice's 
>Theorem as we are not trying to determine when a 
>program will terminate.  We are only trying to 
>determine that it will terminate under an "expected 
>mix" of input in less time than it would 
>"unoptimized".

Do you hear what you're saying?  Read it again!  Look at what you're saying
to me!!!!!!!  Please!  You're claiming that you're not trying to find when a
program terminates; what you're actually trying to find is WHETHER A PROGRAM
TERMINATES IN LESS TIME.  Same thing!

Actually, Rice's theorem states that you can't compute _whether_ a program
terminates, not merely when it terminates; you misquoted it.  So you're
wrong either way.

>If the mix differs, i.e. outside 
>our expectations, then we will react based on the 
>number of such instances which in turn changes our 
>expectations.  In which case we repeat the 
>optimization process.

Ah, yes.  That's a decent way to handle things.  One method for handling
this is called "dynamic optimization"; in that, you include a few optimisers
with the executable, and you profile the program while it's running.  Parts
of the program which run frequently can get optimised with more
time-consuming optimization processes.

Suprisingly, it seems to work well.

>Now you may believe that you have some human being 
>that can beat a machine at this, including making 
>the changes as the expected input mix changes, but 
>I doubt it.<g>

Hmm.

One of the the problems with pitting a human against a machine heuristic is
that most people are an expert in only one field.  VERY few people who are
experts in figuring ways to optimise code are also supremely skilled in
figuring out ways to tell computers how to use their heuristics.  Therefore,
the computer is not truly competing against the human; instead, the person
who is good at coding heuristics which optimise code is competing against
the person who is good at optimising code.  The person who is good at
optimising code will win (almost) every time.  So it's not really fair.

On the bright side, most of us aren't as good at optimization as even the
heuristics expert.  So his code _can_ beat us, if it's on a level playing
field.

My point is twofold.  First of all, the playing field is seldom level;
humans can change the conditions of the problem in ways the computers can't
hope to, by redesigning the solution.  Second, 

>Just to make it clear you will also note that all 
>iterative, decision, and operator (sequence) inner 
>nodes involve "expression evaluation".  Quite 
>frankly, Billy, there isn't any other kind.  It's 
>not that I had an arbitrary choice in the 
>matter.<g>

What are you talking about?  Did you change the subject?

>What does get interesting is that regardless of the 
>number of sub-expressions appearing with an 
>expression that eventually we get to the the 
>end-nodes which is an 
>(<operator><operand><operand>) triad a set of three 
>members whose order (prefix, infix, postfix 
>operator) is unimportant.  On evaluating it it 
>becomes in turn a single member in yet another 
>triad and so on until it results in a single triad 
>which when evaluated produces our result.

Essentially true (assuming some form of currying).  Okay.

>Regardless of the complications (the number of 
>paths) introduced by our control structures any 
>program has only a finite number of paths extending 
>from its root node to any end node.  Though 
>programmers have this tendency to copy a Pacific 
>island natives counting system of one, two, three, 
>infinity, infinity in fact does not occur.  If it 
>did then compilers could not provide us with an 
>analysis of the internal structure of a program 
>that we can use within a performance and tuning 
>exercise to further insure optimality against an 
>expected input mix.

>Now programs have finite, not infinite, paths.  
>Paths have finite, not infinite, nodes.  
>Expressions (triads) have finite, not infinite, 
>possibilities in terms of machine code set by the 
>application of rules.  We can start at the lowest 
>triad in any path, optimize it, then consider it as 
>a member in the triad containing it, again 
>performing the optimization for this path segment.  
>We can continue this for all sub-paths in the 
>program, reoptimizing all that we have done from 
>this point to all end nodes within this sub-path 
>until we reach the root node where we again 
>reoptimize on the basis of "new" information.

Have you ever attempted to count the number of paths in a trivial program?
You're correct that it's finite (by your definition of the word 'path');
however, it's a HUGE number.  Using your algorithm, we'd have to optimise
each path seperately; you miss the fact that any two paths can overlap, and
usually do.

>I go through this tedious description to point out 
>that regardless of the finite additional time this 
>takes software to do I challenge you to find a 
>single assembly language programmer that would ever 
>attempt this practice on a fixed term contract.<g>  

None would, I'm sure.  Absolutely.  Mainly because the algorithm doesn't do
anything useful -- it produces a HUGE number of executables, one for every
single possible path in the program (minus loops).

You speak of "finite" additional time -- but are you sure that the time is
finite?  I certainly agree that it's not infinite, but I believe you'll
discover that the limited amount of mass in the universe will make running
the computation impossible (because we can't make enough disk drives to
store the results).

>While there may be an extremely large number of 
>nodes to evaluate that large number in no way 
>approaches infinity.  Moreover the larger the 
>number of nodes (thus the larger the program) the 
>less likely an assembly language programmer can 
>apply the same thoroughness (and keep his job<g>).

Your last point is very true.  This is where HLLs _really_ get useful :-).
Even though they don't produce the best possible code, or even anything
close to it, they're still better than never producing any code (which is
what an assembly programmer would do in that case).

It's important for me to remember that assembly programmers aren't generally
needed, any more than the optimal programs are needed.  We just need "good
enough", and for most cases, current compilers give us that, commercial or
not.

>Just to make this interesting, eliminate the 
>current compilation scope limit of a single 
>external procedure so that you can apply the same 
>optimizing process to the set inter-related (data 
>connected) programs comprising an application 
>system (or an operating system).  Now the software 
>tool is unconcerned about size and will perform 
>with the same level of accuracy on a thousand (or 
>million) modules that it does for one.

Er -- no, it won't.  Any programmer you ask to code that algorithm for you
will either drop to the floor laughing or will be coding until he's dead.
Solving the problem that way is impossible.

>Nowhere 
>have you ever seen a single programmer attempt such 
>an achievement, including full optimization.

I think I've seen people attempt proofs that way.  Never seen anyone
succeed, except with utterly trivial things.

>You can do things your way, creating an infinity of 
>possibilities and thus guaranteeing failure.  I am 
>not so inclined and thus prefer doing it a 
>different way.  That is the way I intended my 
>description to be interpreted, not the 
>interpretation that you have given them.  Yours is 
>certainly valid and well thought out, but it isn't 
>mine.<g>

Why are you posting to this list if you do not wish to be understood?  I
want to understand you.  My best efforts have failed.  I've been frank about
that.  All you answer me is that "my way" guarantees failure.  The thing you
call "my way" is simply my understanding of your way.  Please correct my
misunderstanding.

I'm not interesting in doing impossible things.  I'm stubborn enough to keep
trying to understand you, though.

>Now if others are not included in this discussion, 
>more's the pity.  I don't set the rules for the 
>Tunes distribution list.  In other responses I have 
>deliberately kept them private at the responder's 
>request.  

I didn't mean to imply that you were at fault.  I was asking the list in
general (or the list-admin in specific) why the software was configured that
way.  No, not your fault; my fault, if anyone's.

>In the system I propose such practices are 
>impossible.  In the system I propose no need for a 
>standards effort exists.  The only requirement is 
>that it conform to the syntax.  That also says that 
>it is impossible to introduce an incompatibility.  
>In the system I propose no need for a FSF or a 
>"free" or "open" software license exists.  No 
>protections nor restrictions of any kind are 
>necessary.  Moreover full (and unrestricted) 
>ownership of all source--the tool, the language, 
>and the user applications--transfer to the user.  
>No software use licenses apply.  The user owns the 
>source and the means of converting it to an 
>executable, not the vendor.

You've said this before, and I don't understand it.  How does your system
cause this to happen?  I don't understand at all.

>I am not trying to wear you down as much as I am 
>patiently trying to explain why infinity does not 
>fit.  I don't have your hangups on this.

With all due respect, you lack my understanding as well, and until you
manage to communicate your own I can't really be impressed.  I don't want to
be disrespectful of the knowledge you do have, but you've not only failed to
communicate in spite of both of our earnest efforts, and demonstrated your
complete lack of knowledge about important computer science results like the
"Halting Problem", complexity, and so on; but you've also gone on to insult
me by describing my honest attempts to communicate as "hangups".

Well, I'm not going to hang up the phone on you -- I'm going to keep trying.
You're wearing my patience very thin, though.

>Optimization, particularly of the iterative kind, 
>deals with system dynamics quite well.  Programs do 
>not have infinite paths no matter what number of 
>loops (iterations).  Certainly with the same set of 
>heuristics I would trust the machine far more in 
>the large than I would the world's best assembly 
>language programmer.<g>

I wouldn't trust either.  Give me the same language without an optimiser,
and we have the best of all worlds -- a simple, brief, and domain-specific
language, together with a compiler which is also simple and easily
debuggable by a domain expert.  This is part of what makes Forth so
powerful, even Forths without optimisers.  It's possible to understand them.

-Billy