Optimization, the gist of it

Lynn H. Maxson lmaxson@pacbell.net
Wed, 05 Jul 2000 09:31:59 -0700 (PDT)


After a brief flurry there came a momentary pause.  
Whether that meant I answered things satisfactory 
or not, the pause occurred at an appropriate time, 
the Fourth of July weekend.  In celebration of this 
brief respite I thought I might offer a 
composition, a tympany in four movements in C 
major.

The first movement entitled "SQL Revisited" deals 
with optimization as it pertains to modern database 
usage.  Once we have laid it to rest as it were we 
enter the second movement entitled "The 
Experiential Use of Brute Force Intelligently 
Applied" or "E tu Brutus?".  Here we will lay to 
rest the issue of code optimization and whether 
available methods fall within the range of the 
possible (and solvable) or the impossible.

This leads naturally to the third movement entitled 
"Other Modern Miracles of Our Time".  Here we add 
to the roster of functional programming (Lisp 
variants, Forth, and OO) the additional entries of 
logic programming, APL2, and PL/I.  Treating these 
new entries as a Rule of Three we meld them into a 
Rule of One.

This in turn leads us to the fourth movement 
entitled "Operational Fare" or if you prefer "Fare 
Revisited".  Here we look the requirements for the 
Tunes HLL and transforming them into 
specifications, possibly adding two (reading ease 
and writing ease) from the ranks of the implicit to 
the explicit.  This allows us to shift from a 
qualitative to a quantitative evaluation of Tunes 
HLL candidates.

We could, for example, allow an index in the range 
of 0.0 to 1.0 for each category and from them 
create a language profile using average, mean, 
standard deviation, min, and max.  This allows a 
submitter to offer a "pre-evaluation".  This in 
turn allows a comparison with other candidates as 
well as pointing the author(s) as to what needs 
improving, basically wiping the slate clean as it 
were as well as improving the reflectivity index.

#1. SQL Revisited

Regardless of anything else SQL is a specification 
language.  This means that it allows a user to 
specify "what" is desire and its conditions without 
concern for "how".  This follows the general 
"tendency" of the stepwise philosophy of 
"structured methods" (Walkthrus, Programming, 
Analysis, Design).  This means the user focuses 
only on one aspect aspect at a time in an orderly 
manner until all "necessary" aspects are covered.

In SQL the user focuses on the "what", leaving it 
up to the "system" to determine the "how".  
Nevertheless this is a Structured Query Language 
(SQL) and not an Unstructured one (UQL).  This 
means that expressing the what appears in a 
specific order.  In neither instance does do we 
need to maintain the order within this expression 
in performing the "how".  This is inline with what 
occurs in the optimization processes used within 
compilers.

A simple form an SQL query consists of three 
clauses, a SELECT clause, a FROM clause, and a 
WHERE clause, e.g. SELECT emp_name, emp_num, salary 
FROM emp_table WHERE salary => 6000 AND birthdate < 
1970.  The order of the entries within the clauses 
is unimportant.

In general the only requirement on the "system", in 
this instance the database manager, is to logically 
organize the query (generate code) which allows its 
proper execution.  No need arises to seek out "all" 
possible "generations" to select the "best", i.e. 
no attempt at optimization.

Optimization in a database manager occurs 
"external" to SQL processes (queries, updates, 
additions, deletions).  It is based on the 
"aggregate demand" of these processes experienced 
over designated intervals, i.e. actual usage.  The 
object is not to optimize each query, but actually 
to optimize their aggregate impact by changes in 
the actual database itself (distribution of 
tablespaces, use of indexing, etc.) based upon 
experience and forecasts over a set of intervals.

Conceptually this does not differ from the file 
management systems the database replaced.  In them 
you had "master" files.  In whatever order you 
edited the transactions (additions, deletions, 
updates) for the master file, you sorted them into 
master file sequence prior to applying them.  Then 
once transaction processing had completed you did a 
and extract (create new files) and sort for each 
desired report.  This "edit-sort-update" and 
"extract-sort-report" was the "mode" of the file 
management processing.  In effect the data was 
optimized for processing.  In this instance you had 
multiple optimized data files.

In moving to a database manager where now you only 
had a single instance of the data, you do not 
optimize it for any particular use.  You optimize 
it for its aggregate use based on frequency 
(transaction rate) and volume-per-use of each 
transaction.  The transactions themselves may or 
may not be optimized.  They need only work.

Thus optimization in database systems is determined 
principally by "hindsight" and not "foresight".  
While optimization of a query (or any other 
transaction type) may occur relative to the given 
structure of the underlying data this in no way 
results necessarily in optimum performance.

On the other hand if two different transaction 
forms are logically equivalent and optimization 
does occur then their  run times however seemingly 
unoptimized should be the same.  If they are not, 
it is a failure of implementation, not of the 
language.  In an optimizing process using 
"reflection" we would have to have a running 
(historical) account of enumerated variable use as 
well as the expectation that the future follows the 
past.  In effect that usage (and users) is not (are 
not) dynamic.

The truth is that we do not employ reflection on 
the particular, only on the aggregate.  Thus we 
change "aspects" of the underlying data, e.g. 
indexing, that which reflects aggegate usage.  This 
is global optimization.  Local optimization within 
a transaction depends entirely on these aspects.  
Thus the organization for a particular transaction 
may differ from one run to the next reflecting the 
dynamics of the underlying data.  If it does not, 
blame the implementation.<g>

However, optimization with respect to selection 
from a list of equivalent forms can only occur on 
the basis of what you know or what you can 
determine at the time of selection.  You can 
"pretest" each one assuming that the test 
conditions will prevail in the runtime environment.  
The most "persistent" of these are those entirely 
"local" to a selection and not dependent upon 
changes in the external environment.

Thus you can optimize a query based on its internal 
processing, it's own internal runtime.  This does 
not optimize its overall runtime which must include 
external dependencies.  Certainly you can have in 
addition the foresight to optimize based on 
"current" aspects of the underlying database as 
well as that possible with hindsight (assuming the 
future follows the past).  That falls within the 
Tunes definition of "reflective".  Whether a vendor 
chooses to do so in turn does not "reflect" on the 
"reflective" support of a language like SQL, but 
rather on the implementation.

The language and an implementation exist as a 
"system".  Both should support and not hinder the 
other.  We should distinguish between which is at 
fault when laying blame.  If it is the 
implementation, then we should not blame the 
language.