Proposals
Lynn H. Maxson
lmaxson@pacbell.net
Sat, 01 Jul 2000 01:41:02 -0700 (PDT)
Response to Billy.
"I'm still having a hard time figuring out what you
mean when you say "an exhaustive true/false testing
process."
If my this time I have not made the meaning clear
in this set of responses, I have to refer you to
any Prolog text or frankly anything which describes
the logic engines of all AI systems. I have read
them. I have used them. I understand what "they"
mean. I'm only repeating here. It is no invention
on my part.
"You underestimate the complexity of modern
programs -- a million lines is not even close to
Windows NT's 50 million lines (ugh). I forget what
Win2K weighed in at."
First off I understand the difference between
complex and complicated, the former dealing with
the interconnections between units and the other
with their absolute number. I also know that these
units have "names" and that in a manufacturing
sense they exist as "assemblies". Within them
these assemblies consist of (explicit) references
to other assemblies (who also have a name), control
structures, and primitive operators.
I will go out on a limb and say not one of these
units contain a million lines of code, regardless
of the total number involved in their hierarchy of
interconnections (complexity). Not even in a M$
system.
I do not have to apply any optimization beyond the
boundaries of any assembly. I could but current
compilers limit their scope of compilation to a
single external procedure and thus apply no
optimization beyond its boundaries. This is true
even if I include the option to "inline" invoked
procedures all the way down a hierarchical path to
its lowest level.
I don't dispute your numbers (or the numbers any
one else has placed before me). The truth is that
they don't apply here. It takes M$ more than a
year to release a new version of Windows due to its
software methodology which uses source files as
input and compilation limits of a single external
procedure (executable). Allow only a single source
and a scope of compilation equal to that of an
entire operating system, write it in a
specification language, and M$ could produce a new
(error-free) version daily. Of course, it could no
longer stay in business.<g> But that's another
story.
Back to the issue of what it is possible to
optimize and what is not. If you accept that I
don't have to optimize more than what current
compilers attempt to do, i.e. a single external
procedure, and you have followed my logic that no
logical equivalence is required in explicit
references in one unit to another, pause and take a
breath.<g> You must accept that all higher level
assemblies must eventually "decompose" in control
structures and primitive operators. If for no
other reason than that is what happens.<g>
That means then that I have a stream of elements
with very definite boundaries. That includes even
a sequence control structure. Not only that I only
have to consider a stream which lies within the
boundaries of a "named" assembly. The process has
only to "optimize" each "named" assembly in turn,
i.e. one at a time.
Now actually it doesn't have to optimize at all.
It simply has to find the logically equivalent
sequence in a set of machine-dependent
specifications that matches the control structures
and primitive operators of the machine-independent
specifications. If there are none, then we cannot
generate the necessary code. If there is only one,
then we have no choice but accept it as optimal.
If there is more than one, then we can reprocess
this list in some manner in which only one will be
returned meeting our "rules of optimization".
The plain fact is that the number of primitive
operators (which are only optimized one at a time)
is relatively small and only need occur once, not
each time it appears in the stream. Moreover the
number of possible control structures is even less.
Let's state what the actual problem is here, as
hopefully by this time you will agree that what you
think I said I may have miscommunicated. If this
is so "possible" as I claim, then why hasn't anyone
done it?
The problem is one of pattern matching of somehow
decomposing a higher level pattern into a lower.
Thus far we have chosen to use human pattern
matching processes, use human experience over time
to develop heuristics, and over time incorporate
those heuristics as optimising algorithms.
A specification is text, whether machine-dependent
or -independent. It is not an "appropriate"
representation for pattern matching. How we humans
do it that allows us to discover heuristics is not
a logical process (yet). That which we as humans
can do we have found no way to replicate within
software. In truth we have no idea how it occurs
in us, or why it should occur in some and not in
others.
What I am talking about doing here cannot be done
currently for "representation" reasons, not because
its some "huge" number. The same reason accounts
for the heuristics given in "Design Patterns" for
determining reusable objects.
So let's take the best of the optimization methods
we discover for which we can encode algorithms and
do some of the "inlining" things that is possible.
This should allow us to fall on the performance
scale somewhere closer to assembler and farther
from C. When I solve the "representation" problem
along with its decomposition rules, then we will
have what I claim is possible.
"Correct. Of course, this can be impossibly hard,
so Prolog has ways of allowing the programmer to
specify order."
Strictly speaking, no. Prolog has implemented a
"non-logical" means of writing a sequence as an
"and" expression. This irritates me as in logic an
"and" expression is commutative throughout, i.e.
order is unimportant. In my specification language
a sequence is just exactly that and thus I can
"obey" the laws of logic.<g>
"But even with that, Prolog doesn't attempt to
generate all of the possible executables which
could fulfill the specifications. There are just
too many! Instead it uses some heuristics to
generate the best executable it can think of."
By this time you should realise that I do not
either.
"Oh! So you're saying that at some point in this
process, we removed all of the explicit
references."
No, what I am saying is that at this boundary point
between the lowest level machine-independent
specifications (control structures and primitive
operators) we have "no" explicit references to the
machine-dependent ones. They do not exist.
We know that it must be possible to generate the
code, but as I said earlier we do not have the
"representative" means of decomposing the "pattern"
of the higher level into those of the lower. That
pattern matching now occurs in humans only who then
encode their "discovery" into algorithms to
generate code.
"Now, machine independance is still good, but your
two steps are still wrong; the first step is
wasting its time by pretending to find an optimal
solution."
Well, until I solve the representation problem I
don't want to waste any more time. Until I do you
are correct. After that point even you will be
convinced.<g>
The only reason I ever brought this up was a
reference in the Slate documentation about
something performing at "50% of optimized C++
code". I have already described in other responses
what accounts for that and how to change to achieve
more nearly assembly language performance.
Steeltoe in another response verified this with
respect to the Watcom C++ compiler with its
inlining ability.
My basic take is that regardless of the user
language which eases his task of expression, a goal
of the Tunes HLL, nothing says that we have to
retain its form once inside (away from the user's
view) as long as it remains logically equivalent.
Relative to OO this simply means to treat it like
beauty, only skin deep. If we do so, we can
achieve assembly level performance.
Beyond that I don't believe there is an OO language
that permits the ease of expression available in a
specification language anywhere near close to
textbook expressions. I would certainly challenge
any Lisp variant in terms of expressive ease.