Optimization, the gist of it
Lynn H. Maxson
lmaxson@pacbell.net
Wed, 05 Jul 2000 09:31:59 -0700 (PDT)
After a brief flurry there came a momentary pause.
Whether that meant I answered things satisfactory
or not, the pause occurred at an appropriate time,
the Fourth of July weekend. In celebration of this
brief respite I thought I might offer a
composition, a tympany in four movements in C
major.
The first movement entitled "SQL Revisited" deals
with optimization as it pertains to modern database
usage. Once we have laid it to rest as it were we
enter the second movement entitled "The
Experiential Use of Brute Force Intelligently
Applied" or "E tu Brutus?". Here we will lay to
rest the issue of code optimization and whether
available methods fall within the range of the
possible (and solvable) or the impossible.
This leads naturally to the third movement entitled
"Other Modern Miracles of Our Time". Here we add
to the roster of functional programming (Lisp
variants, Forth, and OO) the additional entries of
logic programming, APL2, and PL/I. Treating these
new entries as a Rule of Three we meld them into a
Rule of One.
This in turn leads us to the fourth movement
entitled "Operational Fare" or if you prefer "Fare
Revisited". Here we look the requirements for the
Tunes HLL and transforming them into
specifications, possibly adding two (reading ease
and writing ease) from the ranks of the implicit to
the explicit. This allows us to shift from a
qualitative to a quantitative evaluation of Tunes
HLL candidates.
We could, for example, allow an index in the range
of 0.0 to 1.0 for each category and from them
create a language profile using average, mean,
standard deviation, min, and max. This allows a
submitter to offer a "pre-evaluation". This in
turn allows a comparison with other candidates as
well as pointing the author(s) as to what needs
improving, basically wiping the slate clean as it
were as well as improving the reflectivity index.
#1. SQL Revisited
Regardless of anything else SQL is a specification
language. This means that it allows a user to
specify "what" is desire and its conditions without
concern for "how". This follows the general
"tendency" of the stepwise philosophy of
"structured methods" (Walkthrus, Programming,
Analysis, Design). This means the user focuses
only on one aspect aspect at a time in an orderly
manner until all "necessary" aspects are covered.
In SQL the user focuses on the "what", leaving it
up to the "system" to determine the "how".
Nevertheless this is a Structured Query Language
(SQL) and not an Unstructured one (UQL). This
means that expressing the what appears in a
specific order. In neither instance does do we
need to maintain the order within this expression
in performing the "how". This is inline with what
occurs in the optimization processes used within
compilers.
A simple form an SQL query consists of three
clauses, a SELECT clause, a FROM clause, and a
WHERE clause, e.g. SELECT emp_name, emp_num, salary
FROM emp_table WHERE salary => 6000 AND birthdate <
1970. The order of the entries within the clauses
is unimportant.
In general the only requirement on the "system", in
this instance the database manager, is to logically
organize the query (generate code) which allows its
proper execution. No need arises to seek out "all"
possible "generations" to select the "best", i.e.
no attempt at optimization.
Optimization in a database manager occurs
"external" to SQL processes (queries, updates,
additions, deletions). It is based on the
"aggregate demand" of these processes experienced
over designated intervals, i.e. actual usage. The
object is not to optimize each query, but actually
to optimize their aggregate impact by changes in
the actual database itself (distribution of
tablespaces, use of indexing, etc.) based upon
experience and forecasts over a set of intervals.
Conceptually this does not differ from the file
management systems the database replaced. In them
you had "master" files. In whatever order you
edited the transactions (additions, deletions,
updates) for the master file, you sorted them into
master file sequence prior to applying them. Then
once transaction processing had completed you did a
and extract (create new files) and sort for each
desired report. This "edit-sort-update" and
"extract-sort-report" was the "mode" of the file
management processing. In effect the data was
optimized for processing. In this instance you had
multiple optimized data files.
In moving to a database manager where now you only
had a single instance of the data, you do not
optimize it for any particular use. You optimize
it for its aggregate use based on frequency
(transaction rate) and volume-per-use of each
transaction. The transactions themselves may or
may not be optimized. They need only work.
Thus optimization in database systems is determined
principally by "hindsight" and not "foresight".
While optimization of a query (or any other
transaction type) may occur relative to the given
structure of the underlying data this in no way
results necessarily in optimum performance.
On the other hand if two different transaction
forms are logically equivalent and optimization
does occur then their run times however seemingly
unoptimized should be the same. If they are not,
it is a failure of implementation, not of the
language. In an optimizing process using
"reflection" we would have to have a running
(historical) account of enumerated variable use as
well as the expectation that the future follows the
past. In effect that usage (and users) is not (are
not) dynamic.
The truth is that we do not employ reflection on
the particular, only on the aggregate. Thus we
change "aspects" of the underlying data, e.g.
indexing, that which reflects aggegate usage. This
is global optimization. Local optimization within
a transaction depends entirely on these aspects.
Thus the organization for a particular transaction
may differ from one run to the next reflecting the
dynamics of the underlying data. If it does not,
blame the implementation.<g>
However, optimization with respect to selection
from a list of equivalent forms can only occur on
the basis of what you know or what you can
determine at the time of selection. You can
"pretest" each one assuming that the test
conditions will prevail in the runtime environment.
The most "persistent" of these are those entirely
"local" to a selection and not dependent upon
changes in the external environment.
Thus you can optimize a query based on its internal
processing, it's own internal runtime. This does
not optimize its overall runtime which must include
external dependencies. Certainly you can have in
addition the foresight to optimize based on
"current" aspects of the underlying database as
well as that possible with hindsight (assuming the
future follows the past). That falls within the
Tunes definition of "reflective". Whether a vendor
chooses to do so in turn does not "reflect" on the
"reflective" support of a language like SQL, but
rather on the implementation.
The language and an implementation exist as a
"system". Both should support and not hinder the
other. We should distinguish between which is at
fault when laying blame. If it is the
implementation, then we should not blame the
language.