Proposals

Lynn H. Maxson lmaxson@pacbell.net
Sat, 01 Jul 2000 01:41:02 -0700 (PDT)


Response to Billy.

"I'm still having a hard time figuring out what you 
mean when you say "an exhaustive true/false testing 
process."

If my this time I have not made the meaning clear 
in this set of responses, I have to refer you to 
any Prolog text or frankly anything which describes 
the logic engines of all AI systems.  I have read 
them.  I have used them.  I understand what "they" 
mean.  I'm only repeating here.  It is no invention 
on my part.

"You underestimate the complexity of modern 
programs -- a million lines is not even close to 
Windows NT's 50 million lines (ugh).  I forget what 
Win2K weighed in at."

First off I understand the difference between 
complex and complicated, the former dealing with 
the interconnections between units and the other 
with their absolute number.  I also know that these 
units have "names" and that in a manufacturing 
sense they exist as "assemblies".  Within them 
these assemblies consist of (explicit) references 
to other assemblies (who also have a name), control 
structures, and primitive operators.

I will go out on a limb and say not one of these 
units contain a million lines of code, regardless 
of the total number involved in their hierarchy of 
interconnections (complexity).  Not even in a M$ 
system.

I do not have to apply any optimization beyond the 
boundaries of any assembly.  I could but current 
compilers limit their scope of compilation to a 
single external procedure and thus apply no 
optimization beyond its boundaries.  This is true 
even if I include the option to "inline" invoked 
procedures all the way down a hierarchical path to 
its lowest level.

I don't dispute your numbers (or the numbers any 
one else has placed before me).  The truth is that 
they don't apply here.  It takes M$ more than a 
year to release a new version of Windows due to its 
software methodology which uses source files as 
input and compilation limits of a single external 
procedure (executable).  Allow only a single source 
and a scope of compilation equal to that of an 
entire operating system, write it in a 
specification language, and M$ could produce a new 
(error-free) version daily.  Of course, it could no 
longer stay in business.<g>  But that's another 
story.

Back to the issue of what it is possible to 
optimize and what is not.  If you accept that I 
don't have to optimize more than what current 
compilers attempt to do, i.e. a single external 
procedure, and you have followed my logic that no 
logical equivalence is required in explicit 
references in one unit to another, pause and take a 
breath.<g>  You must accept that all higher level 
assemblies must eventually "decompose" in control 
structures and primitive operators.  If for no 
other reason than that is what happens.<g>

That means then that I have a stream of elements 
with very definite boundaries.  That includes even 
a sequence control structure.  Not only that I only 
have to consider a stream which lies within the 
boundaries of a "named" assembly.  The process has 
only to "optimize" each "named" assembly in turn, 
i.e. one at a time.

Now actually it doesn't have to optimize at all.  
It simply has to find the logically equivalent 
sequence in a set of machine-dependent 
specifications that matches the control structures 
and primitive operators of the machine-independent 
specifications.  If there are none, then we cannot 
generate the necessary code.  If there is only one, 
then we have no choice but accept it as optimal.  
If there is more than one, then we can reprocess 
this list in some manner in which only one will be 
returned meeting our "rules of optimization".

The plain fact is that the number of primitive 
operators (which are only optimized one at a time) 
is relatively small and only need occur once, not 
each time it appears in the stream.  Moreover the 
number of possible control structures is even less.

Let's state what the actual problem is here, as 
hopefully by this time you will agree that what you 
think I said I may have miscommunicated.  If this 
is so "possible" as I claim, then why hasn't anyone 
done it?

The problem is one of pattern matching of somehow 
decomposing a higher level pattern into a lower.  
Thus far we have chosen to use human pattern 
matching processes, use human experience over time 
to develop heuristics, and over time incorporate 
those heuristics as optimising algorithms.

A specification is text, whether machine-dependent 
or -independent.  It is not an "appropriate" 
representation for pattern matching.  How we humans 
do it that allows us to discover heuristics is not 
a logical process (yet).  That which we as humans 
can do we have found no way to replicate within 
software.  In truth we have no idea how it occurs 
in us, or why it should occur in some and not in 
others.

What I am talking about doing here cannot be done 
currently for "representation" reasons, not because 
its some "huge" number.  The same reason accounts 
for the heuristics given in "Design Patterns" for 
determining reusable objects.

So let's take the best of the optimization methods 
we discover for which we can encode algorithms and 
do some of the "inlining" things that is possible.  
This should allow us to fall on the performance 
scale somewhere closer to assembler and farther 
from C.  When I solve the "representation" problem 
along with its decomposition rules, then we will 
have what I claim is possible.

"Correct.  Of course, this can be impossibly hard, 
so Prolog has ways of allowing the programmer to 
specify order."

Strictly speaking, no.  Prolog has implemented a 
"non-logical" means of writing a sequence as an 
"and" expression.  This irritates me as in logic an 
"and" expression is commutative throughout, i.e. 
order is unimportant.  In my specification language 
a sequence is just exactly that and thus I can 
"obey" the laws of logic.<g>

"But even with that, Prolog doesn't attempt to 
generate all of the possible executables which 
could fulfill the specifications.  There are just 
too many!  Instead it uses some heuristics to 
generate the best executable it can think of."

By this time you should realise that I do not 
either.

"Oh!  So you're saying that at some point in this 
process, we removed all of the explicit 
references."

No, what I am saying is that at this boundary point 
between the lowest level machine-independent 
specifications (control structures and primitive 
operators) we have "no" explicit references to the 
machine-dependent ones.  They do not exist.

We know that it must be possible to generate the 
code, but as I said earlier we do not have the 
"representative" means of decomposing the "pattern" 
of the higher level into those of the lower.  That 
pattern matching now occurs in humans only who then 
encode their "discovery" into algorithms to 
generate code.

"Now, machine independance is still good, but your 
two steps are still wrong; the first step is 
wasting its time by pretending to find an optimal
solution."

Well, until I solve the representation problem I 
don't want to waste any more time.  Until I do you 
are correct.  After that point even you will be 
convinced.<g>

The only reason I ever brought this up was a 
reference in the Slate documentation about 
something performing at "50% of optimized C++ 
code".  I have already described in other responses 
what accounts for that and how to change to achieve 
more nearly assembly language performance. 
Steeltoe in another response verified this with 
respect to the Watcom C++ compiler with its 
inlining ability.

My basic take is that regardless of the user 
language which eases his task of expression, a goal 
of the Tunes HLL, nothing says that we have to 
retain its form once inside (away from the user's 
view) as long as it remains logically equivalent.  
Relative to OO this simply means to treat it like 
beauty, only skin deep.  If we do so, we can 
achieve assembly level performance.

Beyond that I don't believe there is an OO language 
that permits the ease of expression available in a 
specification language anywhere near close to 
textbook expressions.  I would certainly challenge 
any Lisp variant in terms of expressive ease.