Response regarding coding efficiency

Lynn H. Maxson lmaxson@pacbell.net
Thu, 29 Jun 2000 12:31:23 -0700 (PDT)


Billy wrote:"You just finished claiming that no 
current system ever generates all of the
logically equivalent solutions to a given 
specification.  Now you're claiming that many of 
them do.  These statements appear contradictory."

It's bad enough to be caught in a lie.  It is even 
worse when it's a contradiction.<g>  Perhaps even 
greater when it's a "transmitter's" failure to 
communicate.  This one, I think, we can resolve 
easily.

The fact is that specification languages, e.g. 
Prolog, SQL, and Trilogy, include an exhaustive 
true/false test of the specifications.  In order to 
do so they must first logically organize the 
specifications into an executable sequence and then 
they must convert that sequence into executable 
code.  It is this coversion of the executable 
sequence into executable code that they do not 
produce all possible logically equivalents.  

However well they do the former they effectively 
shift their gears in the latter, settling in some 
manner (mostly determined externally in another 
process) on a single solution.  That's what occurs 
in code generation.  Someone has made a decision 
from the set of (his) discovered logically 
equivalent code about which shall be used here.

The issue is not that there is anything wrong with 
this or that it is unsuccessful.  It must work 
because that's the way it happens.  It's obvious 
that depending on whose doing the discovery 
process, how much time and resources they have 
available, and other factors in human software 
development that different "someones" will come up 
with different results.  They will vary in 
performance and space.

Now users don't develop application systems.  They 
hire people to do it (custom software) or buy what 
someone else has produced (packaged software).  In 
either case it is a "chancy" decision.  If we have 
then such a desire to be "user-oriented", why not 
remove the "chance" from this decision as far as 
performance is concerned?

You see there is a whole picture here.  When 
concentrating on producing a language which "eases" 
what the user must write to get a desirable result 
why not insure that result offers the best 
performance: the best of the best of assembly 
language coding?  Instead of offering him the best 
"here" offer it throughout the entire solution.

You may find, for example, that OO methods through 
a GUI, in which the user uses prepackaged objects 
and arranges them through a means of 
interconnection, provides the easiest way a 
particular user can express what he wants.  But if 
really want to completely rethink computing 
systems, you have to ask is it necessary to 
maintain the "form" in the "result", which has lead 
to a corresponding loss in performance.  Simply 
because that's the way it has been up to now, does 
not mean that it has to remain so.

That means, through the principle of logical 
equivalence, that you can use OO methods in the 
description of what you want (your specifications) 
and yet produce an executable based on the finest 
of procedural assembly language coding.  You do not 
have to take a performance hit in order to ease the 
user's effort.  

Our insistence on maintaining the "purity" of the 
OO form throughout, even though its source is no 
more than a specification, leads us into 
"regressive" software.  Instead of adopting means 
that allows software technology to stay on pace 
with that of hardware, we in fact regress our 
technology in the belief (or the excuse) that 
increase in hardware technology will compensate for 
it.

I guess I'm just saying that if we "completely 
rethink" what we do, that we do not have to offer 
regressive solutions nor make excuses (nor consider 
it somehow wonderful that we execute at 50% of some 
optimized C++ compiler<g>).  The instrument is one 
of our own devising in logic: logical equivalence.

"This is utterly preposterous.  You can't toss out 
an issue just by contradicting it."

To spite you I could but I won't.<g>  You must 
understand that everytime an AI expert system or a 
Prolog program executes it does so in "real time" 
even though is has (logically) performed an 
exhaustive true/false test of every enumeration of 
input variable states.  Whether it takes seconds, 
minutes, hours, or days.

The machine I'm writing this with is a 200MHz 
Pentium Pro which leads me to believe that it is 
200,000,000 times faster than I am at my age 68.<g>  
If you don't buy that number, then pick a lower 
value, say 200,000.  ["How do I love thee?  Let me 
count the ways."]  Consider the expression "(A + B) 
/ (C * D**E).  We already know that this infix 
operator form we will change to a postfix notation 
(a la Forth or RPN) as part of the internal 
optimization.  Already we "ignore" what the user 
has written though we maintain its logical 
equivalence.  Pick any machine architecture and I 
will guarantee that there are not 200,000 different 
ways of expressing the optimized form.  There may 
be ten or a hundred or at least one (to insure that 
it is "true").  Have a program with a million such 
expressions of whatever complexity you like.  No 
single application, including an operating system, 
exists with this number of computations.  This is 
realizable in "real time", even if it is days.

Certainly there are "huge" numbers in theory, but 
their hugeness is not one we find often in 
practice.  There are few such that we have 
undertaken.  The only one which comes to my mind is 
that of calculating prime numbers which has an 
infinite source.  I cannot deny its happening, but 
our experience thus far indicates it is highly 
unlikely.  Particularly for the finite, relatively 
small number of instruction sets to enumerate 
through.

" The problem is that the human is seeking an 
optimum, and the computer is only trying to produce 
all of the solutions."

It has been my experience that the computer does 
what we tell it to do, what we specify.  If we have 
asked for all the solutions, it will provide it.  
If we have asked for the optimum in terms of 
performance, it will provide it.  If we have asked 
for the optimum in performance and the minimum in 
space, it will provide it.  If we have a multiple 
result, we can ask to see them all, only the first 
or last, somewhere inbetween, or at random.  It 
will provide it.

The point here is that it will provide it through 
an exhaustive true/false test of all possibilities, 
something for which we may have neither the time 
nor energy.  But nevertheless something for which 
we wish we did.  That's why it's a tool, an 
extension of our own abilities.  Together we make a 
system.  As I type this I do not think of the 
computer as something separate, but of us working 
together as a system.  It's very malleable when it 
comes to adapting to the dynamics of my needs.

I keep referring to SQL as it represents a system 
which engages in an exhaustive search (true/false) 
which meets whatever conditions we set for it, even 
if it produces zero results (no true instances).  
Considering its use worldwide and the number of 
"working" application systems it supports, all of 
which occur in real time, for choices far in excess 
of those occurring in instruction sets, you should 
have some comfort in this exhaustive search 
practice.

"I have no clue what you mean by that last 
sentance.  However, the rest of this paragraph is 
misguided.  GO is a very good comparison, even 
though GO is infinitely less complex: a GO move 
does change the nature of many future moves, but it 
doesn't change the shape of the GO board."

All I was saying is that if we could "reasonbly" 
handle the set of moves possible with either GO or 
chess, then application systems or operating 
systems would be a breeze in comparison no matter 
how poorly we expressed them.  Here I am just 
looking at the degree of difficulty in determining 
the solution set from the problem set.

" I don't know what you mean by "completeness", for 
example.  I spent a couple of years studying how
completeness is impossible, and you're building an 
algorithm to detect it."

No, no, I'm building nothing I'm simply using the 
term in its use in logic programming.  Completeness 
says, "Have I been given enough information (facts, 
relations, conditions, etc.) which allows me to 
test the truth or falsity of an assertion (a goal).  
It is not complete in the sense that you mean, but 
only in the limited sense.  For example, I assert 
that "3 = 5/27" (3 equals 5 divided by 27).  We 
should note that this is a valid expression in 
logic programming languages, whose purpose lies in 
determining the truth or falsity of an assertion, 
while it is unacceptable (as an assignment 
statement) in procedural language.  Given that the 
logic engine knows the "meaning" of the terms, it 
has all the information it needs to determine the 
truth or falsity of this assertion.  Thus it passes 
its "completeness" proof.

So to continue with your next remark, if the 
completeness proof is successful, then the logic 
engine can proceed with the exhaustive true/false 
proof.  It is not concerned with knowing all about 
anything, but only that within a specified range.  
Within the small you can be complete that you might 
not achieve in the large.

You're having difficulty with this exhaustive 
true/false proof.  Yet this is exactly what occurs 
with every SQL query, every AI-based expert system, 
and every time you execute a Prolog program.  If 
any of them seem to take too long, you abort them, 
determine where "your" logic caused the problem, 
correct it, and let them get on with the new 
"knowledge".  Fortunately their backtracking 
assistance eases the task for you.

"Second, I don't see how "our completeness proof" 
has anything to do with it -- according to you, 
"our completeness proof" has only one result, a 
boolean which we currently assume is "true"."

With some reservations with respect to subgroupings 
a logic programming language like Prolog accepts an 
"unordered" set of input specifications (goals, 
facts, rules, conditions, etc.).  The key here is 
"unordered", not ordered<g>.  It does not then 
"dare" execute them in the order of their 
appearance.  It must reorder them.  It does so by 
determining the highest level of goals and then 
works backwards (hierarchically downward) until it 
has established a path from the "necessary" knowns 
through all the intermediary unknowns (sub-goals) 
to the main set of goals (possibly only one).

That says that it creates an "ordered" set from the 
"unordered" input.  That's part of the reason for 
calling it "programming in logic" (doing what is 
logically necessary) and not the "logic in 
programming" of procedural languages like C.  You 
tell it "what" it must do and it determines the 
"how".  It produces an "organized logical system" 
(the how) from the given set of specifications.  
The process of doing so is part of the completeness 
proof, as in truth it may (true) or may not (false) 
have enough given in the specifications to do so.

I emphasized the word "necessary" before to point 
out that the completeness proof only "selects" from 
the input set of specifications those necessary in 
meeting the goals, ignoring all "unnecessary" ones, 
which it "logically" determines.

Now this completeness proof quite frankly is a 
thing of beauty and a joy forever.  For a 
procedural language programmer like you in C it's 
like an emanicpation proclamation, freeing you from 
the slavery imposed by your own logic.<g>  Now I 
know that you know what control structures 
(sequence, decision, iteration) are.  

Ignoring for the purposes of illustration the fact 
that one may occur within the body of the other, 
consider that each control structure is expressed 
as a single specification.  Now imagine that each 
specification is a card in a card deck.  In logic 
programming no matter how you shuffle them nor how 
often (to insure their unordered nature) the 
completeness proof will nevertheless organize them 
logically in the same manner, i.e. how you place 
them, where you put them, has no effect.

This means then that I can transition from a 
"programmer", one who orders source in writing, to 
a pure "writer", one who writes source, leaving the 
ordering up to an "loyal, obedient, and damn 
accurate assistant".  That leaves me to concentrate 
on the "small" of the source which experience 
indicates I can do quite accurately, i.e. 
error-free, leaving it up to the assistant to put 
it all together in the "large", in which experience 
indicates my accuracy falls off.

Moreover if I make an error, it always occurs 
within a "small".  It may occur in several of them, 
but I only correct each one in isolation of the 
others.  I may make "small" additions, deletions, 
and replacements, but the completeness proof will 
take that all into consideration in determining the 
logical order.

What does it mean?  It means that regardless of the 
size of an application system or an operating 
system that I never have to "freeze" changes in the 
course of implementation.  It means the difference 
in time between making a change and coordinating 
change among a set of programs within an 
application system has now been reduced to a matter 
of seconds and minutes from days, weeks, and 
months.  In the instance of an operating system, 
depending upon how quickly you can write the 
changes, it means producing a new version daily (if 
desired) instead of yearly.

Take my advice.  Don't fight the completeness proof 
present in all of logic programming.  It is the 
greatest labor-saving device since the invention of 
the symbolic assembler.<g>

"Roughly speaking, both paragraphs are saying, "We 
want to produce the best executable which meets the 
specifications, even if it has to be reflective."

I realise that you think you got me.  I really hate 
to disappoint anyone.  More to the point we will 
produce the best executable because we are using 
reflective means.  We always incorporate it within 
the production tool itself.  Whether we extend it 
to the production itself is a matter of choice.  
The point is not to engage in an either/or trap, 
e.g. declarative or non-declarative.  The point is 
to be able to use either as best suits the 
situation.  We want to avoid exclusionary means in 
considering a solution.

"So when you say, "Here we do not have the explicit 
invoking that we had before," what you mean is "We 
don't have a proof that the source produced the 
binaries which we ourselves produced."

No, it's nothing that complicated.  When a higher 
level function invokes a lower, e.g. "a =  c - 
sqrt(b);" it does so by using its name ("sqrt").  
This is an "explicit" reference.  It is the only 
kind used in creating the "functional" hierarchy.  
Everything expressed in these eventually decompose 
into control structures and primitive operators, 
the raw material of all software assemblies.

Now having all this logically organized we must 
convert it to an executable form.  This means we 
must determine the target machine codes for our 
control structures and primitive operators used.

A "+" operator in the expression "a + b" requires a 
sequence of "loading" both a and b (determining 
their location) and then performing an addition.  
Now the "+" works depending of the nature of "a" 
and "b", whether integer, real, decimal, binary, 
etc.  So "+" is not an explicit call in the sense 
that it is a name of a lower level function, but 
one which we must "discover" from the set of 
possibilities (the instruction specifications).  
Thus the "+" is an implicit reference, a contextual 
one actually, that we must determine.

Somehow over (and during) the years we (as humans) 
have been doing this successfully, but not 
necessarily optimally.  It is not a question of 
determining the rules, because rules is what runs 
code generators now.  But the known rules are 
sometimes contradictory in terms of optimal, some 
instances are more optimum than others.

The challenge, and I don't make light of it by any 
means, is entering a set of rules as 
specifications, as conditions for the selection 
choices, as goals for the overall process.  That 
means applying logic programming to code generation 
as a reflective programming means to guarantee 
consistent, optimum results regardless of source.

In a sense it is a two-step completeness proof.  
The first step produces an optimal, logically 
equivalent form of the source as an logically 
executable sequence of control structures and 
primitive operators.  The second step produces an 
optimal, logically equivalent form of a "true" 
executable sequence of control structures and 
primitive operators.

Now in a rush to judgement we might desire to make 
the two steps here into one.  Remember that the 
first step produces a machine-independent set of 
control structures and primitive operators while 
the second, machine-dependent.  The separation 
allows us not only to do one without regard for the 
other, but allows us to repeat the other for any 
target environment.  Notice that we can do this 
from any source environment.  This is 
cross-platform to a level that no C implementation 
has been able to travel.<g>

"Are we on the same page?  If so, please continue.  
All of the algorithms I know of for this are 
NP-SPACE; that is, they have to search the whole 
space and store all results."

Basically, yes, we just apply different thought 
processes to our understanding.

"I don't understand -- are you implying that your 
optimiser distinguishes in some way between local 
and global optimizations?  How does it make the
distinction?  The process I've outlined above 
certainly isn't doing that."

Optimization in software occurs by rules which 
govern the order.  It occurs in every step in the 
patch from source to executable.  It is a field 
which is a specialty all its own.  The most 
interesting I have seen recently is that of the 
Transmeta team with their "software morphing" on 
the Crusoe chip.  Here they would change the 
optimum designed for an Intel pentium to its 
logical equivalent form on the Crusoe.  But they 
don't do it by hand (although they may have 
determined it that way unfortunately), they do it 
in software by rules.

It isn't "my" optimizer.  It's the recognition of 
what already exists, applying the same process of 
specification to all those rules, and from this 
produce a common optimizer that guarantees the best 
in an given instance.  If it's made available to 
every vendor (who will resist giving up any 
advantage to a competitor<g>), the benefit to the 
user is obvious: it reduces the risk of his 
decision.

">I've been through this argument 
>before, about general code versus specific code.

So what did you decide?  (I don't see how these two 
arguments are even related, much less "the same 
argument.")"

In general I am against general code which for 
"esoteric" reasons offers more than what is 
specified.  I follow the goals of "highest 
cohesion", which is no more nor no less than 
necessary.  That I borrowed from the "master" Larry 
Constantine.  If you followed the logic before on 
the unordered specifications and working in the 
small, then the "implied" effort of making a change 
which "enhances" a specification to meet an 
"enhanced" need is trivial.  Therefore I do not 
have to "protect" myself from the "ignorance" of 
the user by imposing my greater "intelligence".  I 
now have the means to track the user as he gets 
smarter.  I don't have to anticipate his needs.

"It sounds like you disagree with me, but you're 
unable or unwilling to discuss that disagreement.  
Is that accurate?"

I hate to bury something this important in the 
middle of all this, but it does relate to the 
response just before this one.  My major concern 
lies in bringing down the cost and time in software 
maintenance.  There is little I can do about that 
required to write an initial set of specifications.  
That's a human activity that proceeds at a human 
speed.

However, when it comes to making (and reflecting) 
changes in those specifications, in what we refer 
to as maintenance, then what I propose here in 
terms of reducing time and costs cannot be matched 
by any other means.  Once you realise that more 
than 90% of development is spent in maintenance, in 
changing what you have already written, then 
writing in the small, allowing unordered input, and 
having a mechanism like the completeness proof 
together minimize what you have to do.  This 
increases your productivity.

Now Fare may not have thought of exactly this as 
part of his evolving more of the division of labor 
from people to machines (automation).  But by 
putting "reflective" in the tool means reducing the 
amount of it the writer has to assume, an 
assumption which requires considerable time.

"Simply unbelievable.  This system, you claim, can 
produce correct behavior based on incorrect specs?"

No.  It cannot produce anything not in the specs, 
which it produces accurately.  If you don't like 
the result, then correct "your" error by entering a 
new specification.  The specification is never in 
error.  It is never incorrect.  Don't act as if it 
had some choice in the matter.<g>  The system will 
produce exactly what you specify.  The only one who 
can make a mistake is you.

Fortunately, if you look at the entire software 
development process from input of requirements or 
changes which get translated into specifications, 
that's the only writing you do "within" the 
process.  In the development stages of 
specification, analysis, design, construction, and 
testing the only place you do any writing is in 
specification.  The rest you get automatically.  
Therefore the only thing you have to get right is 
the specification.  If you don't, you will know it 
quickly enough (as well as being led patiently 
through your own logic by the backtracking 
mechanism).  You correct it without having to 
bother with what anyone else is doing.  Understand 
that no matter how large the application system or 
operating system being treated as a single unit of 
work in this environment, no more than one person 
is necessary to initiate it regardless of the 
number concurrently maintaining specifications in 
the specification pool.  Not tens, not hundreds, 
not thousands of people, just one.

The scope of compilation is unlimited from a single 
specification statement to the largest assembly of 
application systems.  All you do is select the set 
of specifications, input them to the tool, and 
review the results.  Among those results are the 
outputs (all automatically produced) of the 
remaining stages (analysis, design, construction, 
and testing).  Logical equivalence at its 
presentation best.  No need for CASE tools, UML, 
compilers, linkers, source documenters, etc.  Just 
one tool (your assistant) and you.  Capable of 
producing a custom operating system on demand in a 
matter of minutes or hours, depending on the 
overlap between its specifications and those 
already in the specification pool.

"Determine what?"

Determine if the result is as expected with the 
time measured in seconds and minutes compared to 
days, weeks, months, and years currently.  The 
specification is the earliest writing stage in any 
methodology.  In this it is the only as well.<g>

"Why would you need software to tell you the 
obvious?  If you're the one who modifies the spec, 
you're the one who's responsible for updating the 
rest of the documentation."

Most installations do not have more than a partial 
attempt at writing formal specifications.  They 
keep thinking that it will get fleshed out in the 
later stages of analysis, design, construction, and 
testing.  Programmers who operate in the 
construction stage seldom want anybody earlier 
doing their work for them.  They will encourage the 
skipping or glossing over the earlier stages, 
because they are done by dummies who have no 
programming experience.  Thus "thinking" or 
scribbling on paper is not doing anything.  You are 
only doing something when you are coding 
(regardless of how many times you redo it<g>).

This attitude is reflected widely in the 
profession.  The most obvious evidence of it are 
the mis-named IDEs (Integrated Development 
Environments) offered by vendors which cover no 
more than the construction stage.  Within it the 
most obvious evidence is the ability to create a 
"new" project or source file.  You cannot in a 
continuous process where the output of one becomes 
the input of the next have anything new appearing 
which is not the output of something earlier.

What we have are isolated processes connected 
through people filters who may or may not translate 
correctly what is given them.  The net effect 
organizationally is to be incapable of 
incorporating changes anywhere near the rate at 
which they occur.  This results in a backlog.  This 
is what OO was supposed to correct.  Nevertheless 
the backlog continues to increase.

So I simplify this organization to communicators 
responsible to the users, in terms of translating 
requirements and changes into formal specifications 
as well as writing user documentation, and 
developers, responsible for inputting a selected 
set of specifications into the tool.  The only 
reason I have developers is because it is a "lesser 
skilled" position than one required for 
communication with users, which is far more 
important overall.

In such a system those who write specifications are 
those who change them.  It may be that one who 
wrote is is not the one who changed it.  Having a 
means of tying documentation to specifications is 
one way of synchronizing "people activities".

"Like literate programming, only exactly the 
opposite -- in litprog the programmer shows the 
computer the relationship between the specs and the
program, and with your system it seems to be the 
other way around (although I don't believe that's 
possible or reasonable)."

Yes, fortunately you are correct.<g>  I solve the 
problem by eliminating the programmer, something 
I'm sure Knuth would be loathed to do.<g>  As the 
system creates the programs (as part of the 
completeness proof) and the developer simply 
monitors an automated process, this leaves it up to 
the communicators to insure that the documentation 
(of which now the specifications are an integral 
part) remains in sync.

Is it reasonable?  To me, yes.  You are going to 
take some convincing.<g>

"Okay.  If the programmer's responsible for 
specifying things, I guess that makes sense.  Like 
in Javadoc."

Hey, no programmer nor programming.  They've been 
automated.  One grand and glorious way of reducing 
software costs.  Instead we have substituted people 
who can communicate and translate properly between 
informal (user) and formal (system) languages.  
It's a writing job down to two forms: user 
documentation and specifications.  They both exist 
in completely mixed fashion within a data 
repository accessed through a directory.

">To correct that the user could 
>request a "list" of existing ambiguous references.  

But everything's ambiguous if you're going to 
refuse to define a single meaning for everything."

But I'm not.  Like any good Lisp advocate I exist 
in a world something can have 0, 1, or more 
meanings.  If it has 0, it is undefined or 
indeterminate.  If it has 1, then it is singularly 
defined.  If it has more, then it is multiply 
defined or ambiguous.

Now getting back to correct and incorrect 
specifications (whose writers have incorrectly 
written).  0 means I have really got something to 
correct. 1 means great if that what I expect, 
otherwise not so great.  More means great if that 
is what I expect, otherwise not so great.  Again I 
refer you to SQL which simply reflects what you 
have specified.  If it doesn't meet with your 
expectations and your specifications are correct, 
then change your expectations to match your 
logic.<g>

Ambiguity simply indicates that more than one 
choice or path is possible.  The system (again 
through its backtracking mechanism) explains the 
reason for the ambiguity.  You can ask the system 
to act on it (produce the multiple results), hold 
off for the moment, or change the specification set 
in some manner that eliminates it.

The point is that some ambiguity is acceptable and 
some is not.  The system doesn't care.  You do.  
It's job is not to dictate your judgement, but to 
reflect it.  In order to reflect it, keeps a "list" 
handy for your reference.  You may know that you 
will need to correct an ambiguity before your 
effort is complete, but you (and the system) are 
willing to live with it until it interferes with 
something else you desire.  You make that 
determination, not your tool (not fool) assistant.

"I like this, of course.  I don't think it's 
possible, that's all."

As we near the end of today's journey clearly 
neither of us has the impatience interfering with 
the desire to want to understand the other.  While 
others in Tunes may want to focus primarily on HLLs 
and operating systems and secondarily on tools,I 
choose to reverse them.  I look at how it is 
possible to deliver an HLL or an operating system 
in the least time, at the least cost, and the most 
in competitive ability.  As admirable as the ends 
are, they can never exceed the abilities of the 
means.

For those who might feel that an HLL is a means, I 
would point out that it is never more than its 
implementation: the means of achieving the means.  
No implementation should have a range less than the 
domain of the language.  That means no compromise 
due to time or cost constraints anywhere along the 
path.  It is the means available that determine the 
time and cost.  Therefore I give it a higher 
priority so I don't have to skimp on the ultimately 
more important ends.

I thank you very much for your patience.  I hope 
that I am closer to communicating and not confusing 
things.  I will make the time.  I see that I have 
another response for you which I will have to 
forego until later.  Aside for its 
inappropriateness to discussions of this sort I 
certainly wouldn't want to bring "noise" to the IRC 
channel.<g>

Until next time.