Proposals

Lynn H. Maxson lmaxson@pacbell.net
Tue, 11 Jul 2000 10:46:22 -0700 (PDT)


Billy wrote:
"You've said this before, and we've disagreed 
before.  I think you need to clarify the statement 
you're attempting to make."

At issue is what Prolog and SQL as logic 
programming based specifications languages do.  
What they do I do not invent.  Both employ a 
two-stage proof mechanism, one of completeness and 
the other of exhaustive true/false testing.  That 
two-stage proof mechanism lies at the heart of 
every rule-based AI expert system regardless of the 
particular language involved.

I did not claim that SQL created its own data.  As 
Billy says SQL uses existing data, in fact data 
whose "source" is specified within the SQL 
statement.  That source is not infinite, however 
large it may be.  SQL provides as a result all rows 
of the data satisfying the search criteria, an 
exhaustive true/false proof process.  It need not 
actually "touch" each input row, but "behaves" as 
if it had.

So we can set SQL aside hopefully allowing it to do 
what it does regardless of any claim or disclaimer 
made.<g>  I say it is an exhaustive true/false 
search process because it leaves no "true" instance 
out.  It can only do that if it "logically" 
considers all instances whatever their number.

Now that we have established that SQL does not 
generate data (and I made no claim that it did), 
only performs an exhaustive true/false test on the 
data it is given, we can similarly lay all issues 
relative to Prolog to rest.  Here again Billy and I 
are in agreement and thus both on the right 
path.<g>  Prolog will produce all true instances 
from its input domain.

At this point I would hope it clear to everyone 
that we are in basic agreement on what occurs.  Now 
where we disagree lies in whether an input domain 
is finite or infinite.  I see finite.  Billy sees 
infinite.  Clearly we must resolve the differences 
in what we see in the same thing.<g>Here we will 
use his example of "add 3 and 4, and display the 
result".

I might want to restate this as "display (3 + 4);" 
or in PL/I as "put list (3 + 4);".  In this 
instance let's use PL/I, which has been my 
programming language of choice for the last 35 
years.<g>  As PL/I has rules let us assume them in 
this effort.

In this instance both 3 and 4 are literal values.  
In fact both are fixed decimal (1,0), i.e. decimal 
integers.  Their result is fixed decimal (2,0) 
under the rules of PL/I.  The value is "displayed" 
as 7 instead of 07 as PL/I employs "right 
adjustment with left zero suppression".

None of the other "confusion" in Billy's 
description exists within PL/I's rules.  In fact 
PL/I will allow string literals in an arithmetic 
expression as well as a mixed-type expression, e.g. 
"put list ('3' + 4);".  In this example the '3' is 
converted from a 'char(1)' implied declaration to a 
fixed decimal (1) temporary variable so as to match 
the types of the two operands.  You can find all 
these rules regarding arithmetic operators and 
operands in the PL/I Language Reference Manual.

Now PL/I someehow did not make the Tunes language 
list though it still enjoys far more commercial 
success than most of the entries there.  PL/I 
employs "sensible" strong typing in that it allows 
mixed types within an arithmetic expression, 
including bit string and character string operands.  
So there is none of the "awkwardness" of pure 
strong typing languages, although PL/I also offers 
the means for the writer (programmer) to explicitly 
control the conversion from one data type into 
another.

The point is that like all programming languages 
PL/I is rule-based.  That it is probably more 
flexible in terms of mixed data types and operands 
(boolean, string, arithmetic) than any other 
language readers of this have encountered speaks to 
the resolve of the data processing community in the 
early 60's who basically dictated the terms of the 
language.  Had the DOD (Department of Defense) 
dictated it instead of COBOL in the same 60's we 
would be enjoying a far better world today.<g>

I regret to inform Billy that at least this part of 
his description has been laid to rest as we have 
well-defined rules for expression evaluation.  Now 
to get to the remaining parts.

I would hope that we would agree that the 
"remaining parts" deal with automated code 
generation, first with generating a list all 
possible sequences and second with selecting the 
best from this list.  This is where I see finite 
and Billy sees infinite.  I hope that we can settle 
it here.

First off we do not generate anything except by 
following rules.  We have already employed rules 
regarding expression evaluation.  What we didn't 
discuss are the rules regarding the ordering of the 
operators and operands of the evaluation.  Let's 
assume RPN or postfix operator designation.  Thus 
the ordering of our example expression becomes (3 4 
+).

Now relative to any computer architecture we have 
different choices when it comes to translating this 
into machine code.  Let's understand that we have 
existing rules for generating actual code from 
symbolic code.  In fact for any given machine 
architecture we have sets of different rules for 
the same expression conversion.  Regardless of 
claims to the contrary they are quite finite sets.  
The issue is for any instance selecting an optimum 
choice, one choice equal to or better than any 
other.

"By your rules we're supposed to generate all 
possible programs; however, by my rules that's 
ridiculous;"

In a sense we are in agreement here.  One, by my 
rules we are not supposed to generate all possible 
programs.  The issue is all possible instruction 
sequences satisfying "3 4 +" in a target machine.  
Given the attributes of both 3 and 4 (already 
determined by our earlier rules), this sets limits 
on what + choice applies.  In any machine 
architecture given the attributes of the operands 
the operator is a relatively small finite choice, 
for the most part one.<g>

You see a program is larger than an expression.  An 
expression is larger than a sub-expression.  A 
sub-expression is larger than its components.  It 
is for its components after multiple levels of 
optimization have occurred for which we desire "all 
possible instruction sequences" and from them 
select an optimal.

Now Billy introduces other factors than our optimal 
instruction sequences that impact performance in 
different machine architectures, e.g. page faults 
and pipelines.  That says that after we have done 
the one we must now consider revising it according 
to another set of rules in consideration of the 
other.  We have other evidence in another response 
that it is best to allow the software to implement 
such rules rather than people (assembly language 
programmers) trying to do it on their own.  It is 
already an admission that the software provides a 
better solution.

"Are you familiar with Rice's Theorem?  It says 
that the only way to do that is to run ALL of the 
programs with ALL possible inputs.  If our program 
had more than one possible input, the answer could 
be indeterminate -- we might discover that program 
A was faster for x=1, but program B was faster for
x=2.  Anyhow, we don't have that problem here, 
because there's only one possible input (nothing).  
So we merely have to run all of the programs
we've generated, and time them."

Fortunately I hadn't heard of Rice's Theorem and 
nor do I have to conform to it.  The goal here, if 
you can think back that far, is to produce code 
equal to or better than the best generated by an 
assembly language programmer.  Already Rice and 
others have made that task absurdly difficult for 
programmers operating on their own energy.<g>  We 
are agreed that software allows us to have more 
optimal results more consistently.  Such software 
is rule-based and as such capable of embedding 
within a two-stage logic engine.

If we get all this other "confusion" aside, about 
what was and was not said, it gets down to the 
practical matter that we can use software to 
generate more efficient (as efficient) code than 
can an assembly language programmer.  Something 
else then must account for the performance 
difference between a HLL and a LLL.  Whatever it is 
it is not coding efficiency.  That's the point!!!

If we choose then to have our HLL to have the same 
performance efficiency as a LLL, we can certainly 
do so.  The truth is that we have deliberately 
chosen otherwise.  If we imposed our choice, what 
makes up our inefficiency, on the assembly language 
programmer his code would run no faster than 
ours.<g>

When we read then that something is "50% of an 
optimized C++", understand that it is the best of a 
deliberate poor choice.<g>  For whatever reason-- 
esoteric, exotic, practical, or otherwise--we have 
chosen the lesser performance path.  In short we 
are not forced by anything other than our own 
initiative to travel it.

That means if we choose (and we have a choice) not 
to follow it that we can enjoy a performance 
"boost" actually not obtainable by a pure assembly 
language programmer, certainly nowhere near within 
the same time and cost framework.

If anyone here continues to believe that an 
"operand operand operator" postfix expression where 
the attributes of all three components are known 
will generate an infinite number of possible 
results, then your view of the universe differs 
from mine.  Even if you combine this generic form 
into all possible expressions which either have or 
could appear within a program, you will still not 
generate an infinite number.  You cannot because 
the combinations of non-repeatable sequences will 
remain finite.  It doesn't make any difference to 
software that they number in the thousands or 
hundreds of thousands.

If it is going to appear anywhere, it should in 
PL/I as it allows a greater variety of mixed 
expressions (data types and operators) within a 
single expression than any other programming 
language.  And in thirty-five years of world-wide 
experience it has never happened.  The rules 
dictate otherwise.

So you cannot argue that something that works 
doesn't.  You can argue that when a number (or time 
interval) of results exceeds a certain threshold to 
abort a process and respecifying it iteratively 
until it no longer occurs.  That's the purpose of 
rules and their expression in formal logic 
specifications.  When they don't work as desired 
you change them.  That's what we have done since 
writing the first program.