Response regarding coding efficiency
Lynn H. Maxson
lmaxson@pacbell.net
Thu, 29 Jun 2000 12:31:23 -0700 (PDT)
Billy wrote:"You just finished claiming that no
current system ever generates all of the
logically equivalent solutions to a given
specification. Now you're claiming that many of
them do. These statements appear contradictory."
It's bad enough to be caught in a lie. It is even
worse when it's a contradiction.<g> Perhaps even
greater when it's a "transmitter's" failure to
communicate. This one, I think, we can resolve
easily.
The fact is that specification languages, e.g.
Prolog, SQL, and Trilogy, include an exhaustive
true/false test of the specifications. In order to
do so they must first logically organize the
specifications into an executable sequence and then
they must convert that sequence into executable
code. It is this coversion of the executable
sequence into executable code that they do not
produce all possible logically equivalents.
However well they do the former they effectively
shift their gears in the latter, settling in some
manner (mostly determined externally in another
process) on a single solution. That's what occurs
in code generation. Someone has made a decision
from the set of (his) discovered logically
equivalent code about which shall be used here.
The issue is not that there is anything wrong with
this or that it is unsuccessful. It must work
because that's the way it happens. It's obvious
that depending on whose doing the discovery
process, how much time and resources they have
available, and other factors in human software
development that different "someones" will come up
with different results. They will vary in
performance and space.
Now users don't develop application systems. They
hire people to do it (custom software) or buy what
someone else has produced (packaged software). In
either case it is a "chancy" decision. If we have
then such a desire to be "user-oriented", why not
remove the "chance" from this decision as far as
performance is concerned?
You see there is a whole picture here. When
concentrating on producing a language which "eases"
what the user must write to get a desirable result
why not insure that result offers the best
performance: the best of the best of assembly
language coding? Instead of offering him the best
"here" offer it throughout the entire solution.
You may find, for example, that OO methods through
a GUI, in which the user uses prepackaged objects
and arranges them through a means of
interconnection, provides the easiest way a
particular user can express what he wants. But if
really want to completely rethink computing
systems, you have to ask is it necessary to
maintain the "form" in the "result", which has lead
to a corresponding loss in performance. Simply
because that's the way it has been up to now, does
not mean that it has to remain so.
That means, through the principle of logical
equivalence, that you can use OO methods in the
description of what you want (your specifications)
and yet produce an executable based on the finest
of procedural assembly language coding. You do not
have to take a performance hit in order to ease the
user's effort.
Our insistence on maintaining the "purity" of the
OO form throughout, even though its source is no
more than a specification, leads us into
"regressive" software. Instead of adopting means
that allows software technology to stay on pace
with that of hardware, we in fact regress our
technology in the belief (or the excuse) that
increase in hardware technology will compensate for
it.
I guess I'm just saying that if we "completely
rethink" what we do, that we do not have to offer
regressive solutions nor make excuses (nor consider
it somehow wonderful that we execute at 50% of some
optimized C++ compiler<g>). The instrument is one
of our own devising in logic: logical equivalence.
"This is utterly preposterous. You can't toss out
an issue just by contradicting it."
To spite you I could but I won't.<g> You must
understand that everytime an AI expert system or a
Prolog program executes it does so in "real time"
even though is has (logically) performed an
exhaustive true/false test of every enumeration of
input variable states. Whether it takes seconds,
minutes, hours, or days.
The machine I'm writing this with is a 200MHz
Pentium Pro which leads me to believe that it is
200,000,000 times faster than I am at my age 68.<g>
If you don't buy that number, then pick a lower
value, say 200,000. ["How do I love thee? Let me
count the ways."] Consider the expression "(A + B)
/ (C * D**E). We already know that this infix
operator form we will change to a postfix notation
(a la Forth or RPN) as part of the internal
optimization. Already we "ignore" what the user
has written though we maintain its logical
equivalence. Pick any machine architecture and I
will guarantee that there are not 200,000 different
ways of expressing the optimized form. There may
be ten or a hundred or at least one (to insure that
it is "true"). Have a program with a million such
expressions of whatever complexity you like. No
single application, including an operating system,
exists with this number of computations. This is
realizable in "real time", even if it is days.
Certainly there are "huge" numbers in theory, but
their hugeness is not one we find often in
practice. There are few such that we have
undertaken. The only one which comes to my mind is
that of calculating prime numbers which has an
infinite source. I cannot deny its happening, but
our experience thus far indicates it is highly
unlikely. Particularly for the finite, relatively
small number of instruction sets to enumerate
through.
" The problem is that the human is seeking an
optimum, and the computer is only trying to produce
all of the solutions."
It has been my experience that the computer does
what we tell it to do, what we specify. If we have
asked for all the solutions, it will provide it.
If we have asked for the optimum in terms of
performance, it will provide it. If we have asked
for the optimum in performance and the minimum in
space, it will provide it. If we have a multiple
result, we can ask to see them all, only the first
or last, somewhere inbetween, or at random. It
will provide it.
The point here is that it will provide it through
an exhaustive true/false test of all possibilities,
something for which we may have neither the time
nor energy. But nevertheless something for which
we wish we did. That's why it's a tool, an
extension of our own abilities. Together we make a
system. As I type this I do not think of the
computer as something separate, but of us working
together as a system. It's very malleable when it
comes to adapting to the dynamics of my needs.
I keep referring to SQL as it represents a system
which engages in an exhaustive search (true/false)
which meets whatever conditions we set for it, even
if it produces zero results (no true instances).
Considering its use worldwide and the number of
"working" application systems it supports, all of
which occur in real time, for choices far in excess
of those occurring in instruction sets, you should
have some comfort in this exhaustive search
practice.
"I have no clue what you mean by that last
sentance. However, the rest of this paragraph is
misguided. GO is a very good comparison, even
though GO is infinitely less complex: a GO move
does change the nature of many future moves, but it
doesn't change the shape of the GO board."
All I was saying is that if we could "reasonbly"
handle the set of moves possible with either GO or
chess, then application systems or operating
systems would be a breeze in comparison no matter
how poorly we expressed them. Here I am just
looking at the degree of difficulty in determining
the solution set from the problem set.
" I don't know what you mean by "completeness", for
example. I spent a couple of years studying how
completeness is impossible, and you're building an
algorithm to detect it."
No, no, I'm building nothing I'm simply using the
term in its use in logic programming. Completeness
says, "Have I been given enough information (facts,
relations, conditions, etc.) which allows me to
test the truth or falsity of an assertion (a goal).
It is not complete in the sense that you mean, but
only in the limited sense. For example, I assert
that "3 = 5/27" (3 equals 5 divided by 27). We
should note that this is a valid expression in
logic programming languages, whose purpose lies in
determining the truth or falsity of an assertion,
while it is unacceptable (as an assignment
statement) in procedural language. Given that the
logic engine knows the "meaning" of the terms, it
has all the information it needs to determine the
truth or falsity of this assertion. Thus it passes
its "completeness" proof.
So to continue with your next remark, if the
completeness proof is successful, then the logic
engine can proceed with the exhaustive true/false
proof. It is not concerned with knowing all about
anything, but only that within a specified range.
Within the small you can be complete that you might
not achieve in the large.
You're having difficulty with this exhaustive
true/false proof. Yet this is exactly what occurs
with every SQL query, every AI-based expert system,
and every time you execute a Prolog program. If
any of them seem to take too long, you abort them,
determine where "your" logic caused the problem,
correct it, and let them get on with the new
"knowledge". Fortunately their backtracking
assistance eases the task for you.
"Second, I don't see how "our completeness proof"
has anything to do with it -- according to you,
"our completeness proof" has only one result, a
boolean which we currently assume is "true"."
With some reservations with respect to subgroupings
a logic programming language like Prolog accepts an
"unordered" set of input specifications (goals,
facts, rules, conditions, etc.). The key here is
"unordered", not ordered<g>. It does not then
"dare" execute them in the order of their
appearance. It must reorder them. It does so by
determining the highest level of goals and then
works backwards (hierarchically downward) until it
has established a path from the "necessary" knowns
through all the intermediary unknowns (sub-goals)
to the main set of goals (possibly only one).
That says that it creates an "ordered" set from the
"unordered" input. That's part of the reason for
calling it "programming in logic" (doing what is
logically necessary) and not the "logic in
programming" of procedural languages like C. You
tell it "what" it must do and it determines the
"how". It produces an "organized logical system"
(the how) from the given set of specifications.
The process of doing so is part of the completeness
proof, as in truth it may (true) or may not (false)
have enough given in the specifications to do so.
I emphasized the word "necessary" before to point
out that the completeness proof only "selects" from
the input set of specifications those necessary in
meeting the goals, ignoring all "unnecessary" ones,
which it "logically" determines.
Now this completeness proof quite frankly is a
thing of beauty and a joy forever. For a
procedural language programmer like you in C it's
like an emanicpation proclamation, freeing you from
the slavery imposed by your own logic.<g> Now I
know that you know what control structures
(sequence, decision, iteration) are.
Ignoring for the purposes of illustration the fact
that one may occur within the body of the other,
consider that each control structure is expressed
as a single specification. Now imagine that each
specification is a card in a card deck. In logic
programming no matter how you shuffle them nor how
often (to insure their unordered nature) the
completeness proof will nevertheless organize them
logically in the same manner, i.e. how you place
them, where you put them, has no effect.
This means then that I can transition from a
"programmer", one who orders source in writing, to
a pure "writer", one who writes source, leaving the
ordering up to an "loyal, obedient, and damn
accurate assistant". That leaves me to concentrate
on the "small" of the source which experience
indicates I can do quite accurately, i.e.
error-free, leaving it up to the assistant to put
it all together in the "large", in which experience
indicates my accuracy falls off.
Moreover if I make an error, it always occurs
within a "small". It may occur in several of them,
but I only correct each one in isolation of the
others. I may make "small" additions, deletions,
and replacements, but the completeness proof will
take that all into consideration in determining the
logical order.
What does it mean? It means that regardless of the
size of an application system or an operating
system that I never have to "freeze" changes in the
course of implementation. It means the difference
in time between making a change and coordinating
change among a set of programs within an
application system has now been reduced to a matter
of seconds and minutes from days, weeks, and
months. In the instance of an operating system,
depending upon how quickly you can write the
changes, it means producing a new version daily (if
desired) instead of yearly.
Take my advice. Don't fight the completeness proof
present in all of logic programming. It is the
greatest labor-saving device since the invention of
the symbolic assembler.<g>
"Roughly speaking, both paragraphs are saying, "We
want to produce the best executable which meets the
specifications, even if it has to be reflective."
I realise that you think you got me. I really hate
to disappoint anyone. More to the point we will
produce the best executable because we are using
reflective means. We always incorporate it within
the production tool itself. Whether we extend it
to the production itself is a matter of choice.
The point is not to engage in an either/or trap,
e.g. declarative or non-declarative. The point is
to be able to use either as best suits the
situation. We want to avoid exclusionary means in
considering a solution.
"So when you say, "Here we do not have the explicit
invoking that we had before," what you mean is "We
don't have a proof that the source produced the
binaries which we ourselves produced."
No, it's nothing that complicated. When a higher
level function invokes a lower, e.g. "a = c -
sqrt(b);" it does so by using its name ("sqrt").
This is an "explicit" reference. It is the only
kind used in creating the "functional" hierarchy.
Everything expressed in these eventually decompose
into control structures and primitive operators,
the raw material of all software assemblies.
Now having all this logically organized we must
convert it to an executable form. This means we
must determine the target machine codes for our
control structures and primitive operators used.
A "+" operator in the expression "a + b" requires a
sequence of "loading" both a and b (determining
their location) and then performing an addition.
Now the "+" works depending of the nature of "a"
and "b", whether integer, real, decimal, binary,
etc. So "+" is not an explicit call in the sense
that it is a name of a lower level function, but
one which we must "discover" from the set of
possibilities (the instruction specifications).
Thus the "+" is an implicit reference, a contextual
one actually, that we must determine.
Somehow over (and during) the years we (as humans)
have been doing this successfully, but not
necessarily optimally. It is not a question of
determining the rules, because rules is what runs
code generators now. But the known rules are
sometimes contradictory in terms of optimal, some
instances are more optimum than others.
The challenge, and I don't make light of it by any
means, is entering a set of rules as
specifications, as conditions for the selection
choices, as goals for the overall process. That
means applying logic programming to code generation
as a reflective programming means to guarantee
consistent, optimum results regardless of source.
In a sense it is a two-step completeness proof.
The first step produces an optimal, logically
equivalent form of the source as an logically
executable sequence of control structures and
primitive operators. The second step produces an
optimal, logically equivalent form of a "true"
executable sequence of control structures and
primitive operators.
Now in a rush to judgement we might desire to make
the two steps here into one. Remember that the
first step produces a machine-independent set of
control structures and primitive operators while
the second, machine-dependent. The separation
allows us not only to do one without regard for the
other, but allows us to repeat the other for any
target environment. Notice that we can do this
from any source environment. This is
cross-platform to a level that no C implementation
has been able to travel.<g>
"Are we on the same page? If so, please continue.
All of the algorithms I know of for this are
NP-SPACE; that is, they have to search the whole
space and store all results."
Basically, yes, we just apply different thought
processes to our understanding.
"I don't understand -- are you implying that your
optimiser distinguishes in some way between local
and global optimizations? How does it make the
distinction? The process I've outlined above
certainly isn't doing that."
Optimization in software occurs by rules which
govern the order. It occurs in every step in the
patch from source to executable. It is a field
which is a specialty all its own. The most
interesting I have seen recently is that of the
Transmeta team with their "software morphing" on
the Crusoe chip. Here they would change the
optimum designed for an Intel pentium to its
logical equivalent form on the Crusoe. But they
don't do it by hand (although they may have
determined it that way unfortunately), they do it
in software by rules.
It isn't "my" optimizer. It's the recognition of
what already exists, applying the same process of
specification to all those rules, and from this
produce a common optimizer that guarantees the best
in an given instance. If it's made available to
every vendor (who will resist giving up any
advantage to a competitor<g>), the benefit to the
user is obvious: it reduces the risk of his
decision.
">I've been through this argument
>before, about general code versus specific code.
So what did you decide? (I don't see how these two
arguments are even related, much less "the same
argument.")"
In general I am against general code which for
"esoteric" reasons offers more than what is
specified. I follow the goals of "highest
cohesion", which is no more nor no less than
necessary. That I borrowed from the "master" Larry
Constantine. If you followed the logic before on
the unordered specifications and working in the
small, then the "implied" effort of making a change
which "enhances" a specification to meet an
"enhanced" need is trivial. Therefore I do not
have to "protect" myself from the "ignorance" of
the user by imposing my greater "intelligence". I
now have the means to track the user as he gets
smarter. I don't have to anticipate his needs.
"It sounds like you disagree with me, but you're
unable or unwilling to discuss that disagreement.
Is that accurate?"
I hate to bury something this important in the
middle of all this, but it does relate to the
response just before this one. My major concern
lies in bringing down the cost and time in software
maintenance. There is little I can do about that
required to write an initial set of specifications.
That's a human activity that proceeds at a human
speed.
However, when it comes to making (and reflecting)
changes in those specifications, in what we refer
to as maintenance, then what I propose here in
terms of reducing time and costs cannot be matched
by any other means. Once you realise that more
than 90% of development is spent in maintenance, in
changing what you have already written, then
writing in the small, allowing unordered input, and
having a mechanism like the completeness proof
together minimize what you have to do. This
increases your productivity.
Now Fare may not have thought of exactly this as
part of his evolving more of the division of labor
from people to machines (automation). But by
putting "reflective" in the tool means reducing the
amount of it the writer has to assume, an
assumption which requires considerable time.
"Simply unbelievable. This system, you claim, can
produce correct behavior based on incorrect specs?"
No. It cannot produce anything not in the specs,
which it produces accurately. If you don't like
the result, then correct "your" error by entering a
new specification. The specification is never in
error. It is never incorrect. Don't act as if it
had some choice in the matter.<g> The system will
produce exactly what you specify. The only one who
can make a mistake is you.
Fortunately, if you look at the entire software
development process from input of requirements or
changes which get translated into specifications,
that's the only writing you do "within" the
process. In the development stages of
specification, analysis, design, construction, and
testing the only place you do any writing is in
specification. The rest you get automatically.
Therefore the only thing you have to get right is
the specification. If you don't, you will know it
quickly enough (as well as being led patiently
through your own logic by the backtracking
mechanism). You correct it without having to
bother with what anyone else is doing. Understand
that no matter how large the application system or
operating system being treated as a single unit of
work in this environment, no more than one person
is necessary to initiate it regardless of the
number concurrently maintaining specifications in
the specification pool. Not tens, not hundreds,
not thousands of people, just one.
The scope of compilation is unlimited from a single
specification statement to the largest assembly of
application systems. All you do is select the set
of specifications, input them to the tool, and
review the results. Among those results are the
outputs (all automatically produced) of the
remaining stages (analysis, design, construction,
and testing). Logical equivalence at its
presentation best. No need for CASE tools, UML,
compilers, linkers, source documenters, etc. Just
one tool (your assistant) and you. Capable of
producing a custom operating system on demand in a
matter of minutes or hours, depending on the
overlap between its specifications and those
already in the specification pool.
"Determine what?"
Determine if the result is as expected with the
time measured in seconds and minutes compared to
days, weeks, months, and years currently. The
specification is the earliest writing stage in any
methodology. In this it is the only as well.<g>
"Why would you need software to tell you the
obvious? If you're the one who modifies the spec,
you're the one who's responsible for updating the
rest of the documentation."
Most installations do not have more than a partial
attempt at writing formal specifications. They
keep thinking that it will get fleshed out in the
later stages of analysis, design, construction, and
testing. Programmers who operate in the
construction stage seldom want anybody earlier
doing their work for them. They will encourage the
skipping or glossing over the earlier stages,
because they are done by dummies who have no
programming experience. Thus "thinking" or
scribbling on paper is not doing anything. You are
only doing something when you are coding
(regardless of how many times you redo it<g>).
This attitude is reflected widely in the
profession. The most obvious evidence of it are
the mis-named IDEs (Integrated Development
Environments) offered by vendors which cover no
more than the construction stage. Within it the
most obvious evidence is the ability to create a
"new" project or source file. You cannot in a
continuous process where the output of one becomes
the input of the next have anything new appearing
which is not the output of something earlier.
What we have are isolated processes connected
through people filters who may or may not translate
correctly what is given them. The net effect
organizationally is to be incapable of
incorporating changes anywhere near the rate at
which they occur. This results in a backlog. This
is what OO was supposed to correct. Nevertheless
the backlog continues to increase.
So I simplify this organization to communicators
responsible to the users, in terms of translating
requirements and changes into formal specifications
as well as writing user documentation, and
developers, responsible for inputting a selected
set of specifications into the tool. The only
reason I have developers is because it is a "lesser
skilled" position than one required for
communication with users, which is far more
important overall.
In such a system those who write specifications are
those who change them. It may be that one who
wrote is is not the one who changed it. Having a
means of tying documentation to specifications is
one way of synchronizing "people activities".
"Like literate programming, only exactly the
opposite -- in litprog the programmer shows the
computer the relationship between the specs and the
program, and with your system it seems to be the
other way around (although I don't believe that's
possible or reasonable)."
Yes, fortunately you are correct.<g> I solve the
problem by eliminating the programmer, something
I'm sure Knuth would be loathed to do.<g> As the
system creates the programs (as part of the
completeness proof) and the developer simply
monitors an automated process, this leaves it up to
the communicators to insure that the documentation
(of which now the specifications are an integral
part) remains in sync.
Is it reasonable? To me, yes. You are going to
take some convincing.<g>
"Okay. If the programmer's responsible for
specifying things, I guess that makes sense. Like
in Javadoc."
Hey, no programmer nor programming. They've been
automated. One grand and glorious way of reducing
software costs. Instead we have substituted people
who can communicate and translate properly between
informal (user) and formal (system) languages.
It's a writing job down to two forms: user
documentation and specifications. They both exist
in completely mixed fashion within a data
repository accessed through a directory.
">To correct that the user could
>request a "list" of existing ambiguous references.
But everything's ambiguous if you're going to
refuse to define a single meaning for everything."
But I'm not. Like any good Lisp advocate I exist
in a world something can have 0, 1, or more
meanings. If it has 0, it is undefined or
indeterminate. If it has 1, then it is singularly
defined. If it has more, then it is multiply
defined or ambiguous.
Now getting back to correct and incorrect
specifications (whose writers have incorrectly
written). 0 means I have really got something to
correct. 1 means great if that what I expect,
otherwise not so great. More means great if that
is what I expect, otherwise not so great. Again I
refer you to SQL which simply reflects what you
have specified. If it doesn't meet with your
expectations and your specifications are correct,
then change your expectations to match your
logic.<g>
Ambiguity simply indicates that more than one
choice or path is possible. The system (again
through its backtracking mechanism) explains the
reason for the ambiguity. You can ask the system
to act on it (produce the multiple results), hold
off for the moment, or change the specification set
in some manner that eliminates it.
The point is that some ambiguity is acceptable and
some is not. The system doesn't care. You do.
It's job is not to dictate your judgement, but to
reflect it. In order to reflect it, keeps a "list"
handy for your reference. You may know that you
will need to correct an ambiguity before your
effort is complete, but you (and the system) are
willing to live with it until it interferes with
something else you desire. You make that
determination, not your tool (not fool) assistant.
"I like this, of course. I don't think it's
possible, that's all."
As we near the end of today's journey clearly
neither of us has the impatience interfering with
the desire to want to understand the other. While
others in Tunes may want to focus primarily on HLLs
and operating systems and secondarily on tools,I
choose to reverse them. I look at how it is
possible to deliver an HLL or an operating system
in the least time, at the least cost, and the most
in competitive ability. As admirable as the ends
are, they can never exceed the abilities of the
means.
For those who might feel that an HLL is a means, I
would point out that it is never more than its
implementation: the means of achieving the means.
No implementation should have a range less than the
domain of the language. That means no compromise
due to time or cost constraints anywhere along the
path. It is the means available that determine the
time and cost. Therefore I give it a higher
priority so I don't have to skimp on the ultimately
more important ends.
I thank you very much for your patience. I hope
that I am closer to communicating and not confusing
things. I will make the time. I see that I have
another response for you which I will have to
forego until later. Aside for its
inappropriateness to discussions of this sort I
certainly wouldn't want to bring "noise" to the IRC
channel.<g>
Until next time.