Database Theory Links

Kyle Lahnakoski kyle@arcavia.com
Sun, 02 Apr 2000 14:47:58 -0400



Tom Novelli wrote:
> 
> Kyle Lahnakoski <kyle@arcavia.com> wrote:
> >
> > Massimo Dentico wrote:
> >
> > >
> > > - ftp://ftp.netcom.com/pub/hb/hbaker/letters/CACM-RelationalDatabases.html
> > >
> > > In my experience he is right. However, even OODBMS are not the ultimate
> > > solution, at least in the current state.
> >
> > I will agree that no all data can be effectively represented in a
> > database.  I found the theory excellent starting point for defining,
> > what I call, "set operations".  There are currently * basic DB operators
> > (depending on who you talk to) select, filter, join, aggregation, union,
> > intersection, difference, and cartesian product.  I am trying to write a
> > document on a ninth DB operator: the "Self Join"; it will complete much
> > of DB theory's shortcomings (but still not a holy grail).
> 
> Self Join... do you mean a simple way to nest subassemblies to an
> arbitrary level, rather than an explicit join for each level?  That would
> certainly help.

Yes, and making it theory will help implementors make the self-join
efficient.  Right now many vendors provide it, but it sucks.

> Still, the theory is fundamentally flawed.  It's based on index cards,
> where items are referenced by some "key" (e.g., catalog number).  When the
> US military started building computers, they just used them to automate
> centuries-old logistics practices.  They accumulated piles of
> punch-cards.  Relational database theory offered a way to keep that data
> consistent... and 30 years later they adopted it.

There are two aspects to the catalog number.  The first acts as a key. 
This key is usefull to only the machine and should not be seen by the
user.  The other aspect, that I realized only recently, is a GUI issue. 
Without catalog numbers how do you find the object you are looking for
when presented with a large list?  I have only seen mailing programs
that only begin to address the issue.  

Consider a list of all email addresses, with user info, on the
internet.  The address acts much like a catalog number.  I only know a
few people, so I can remember most addresses I need.  Mailing those
people are quite easy once I know the address.  If I do not know the
address I would have to search though the giant list.  I could add
filters to reduce it to something managable: but how many "John Doe"s
can there be?  

Now consider a mailing system without the mail address.  How many
attributes would the system have to track, for each person, before each
has a unique set of attribute values?  You would not be able to simply
type in your fiends name into your mail program without conflicting with
possibly hundres of other users of the same name.  An address book would
be nessesary in this senario.

Generally, I do not know how large amounts of similar data could be
handled in a GUI fashion without human-meaningful catalog numbers. 
Refering to a piece may be too difficult otherwise.


> Meanwhile, other people were taking advantage of the computer's
> capabilities -- especially in graphics and game programs.  Many games do
> in real-time what relational databases take minutes or hours to do.
> Imagine running queries 30 times per second in a flight simulator.. :)

I kinda agree here.  DB's are mostly slow because they assume thier data
is large, and on disk.  Speed gains can be had by assuming data is in
memory.  In any case, I only plan to use relational theory as the
language for set operations.   Then there are two camps: one that makes
optimizations for specifica applications, and the other that does not
care; only telling the machine what to do, not how to do it.

> Summary: Typical data structures have a _fixed_ relationship.  Relational
> theory generalizes to allow arbitrary joins, but in practice these are
> mainly used to recreate the fixed relationship which already exists in a
> non-relational database.

Yes, relational theory does allow arbitrary joins, but that is only an
issue of unneeded flexibility.  Many DB implementations have solved that
problem by keeping track of relationships in data.  I too have done
something similar.  Both add the meta information required to optimize
joins.



----------------------------------------------------------------------
Kyle Lahnakoski                                  Arcavia Software Ltd.
(416) 892-7784                                 http://www.arcavia.com