DataBase Debunking: clarifications

Massimo Dentico m.dentico@virgilio.it
Sun, 30 Jun 2002 17:11:12 +0200


Tom Novelli <tcn@tunes.org> wrote:

> Hmm, that's kind of interesting. Nothing says a database has
> to be a crappy monolithic implementation like Oracle.. it could
> be simple and purpose-built, work with data in RAM where
> it should, etc.

Yes, these are aspects at physical level.

Now I'm reading off-line a couple of old books (in italian) about
DBMS beacause for some reasons, not worth a mention here, I neglected
somewhat the subject. Retrospectively that was a *huge* mistake
because I was conscious that this field is in a sorry state but
I was incapable to say what is wrong.

These books credits the DBTG-CODASYL (1), proponent of the network
data model in 1971, for the introduction of fundamental concepts such as
physical and logical independence (however the concept of "data model"
was due to Codd).

(1) Data Base Task Group - Conference On DAta SYstems Languages

The point of physical and logical independence is to establish
well defined protocols between 3 levels, physical, logical and
conceptual (or business), in such a way that a change in
an underlying level does not (or barely) affects the levels below
and ultimately application programs and power users, which usually
should see the conceptual level (views).

So there are no reasons to see a DBMS as a black-box: in old days
was contemplated the concept of a DMCL (Data Media Control Language)
or DSL (Data Storage Language, which now conflicts with another
acronym, Domain Specific Language, so IMO the precedent is preferable).

The protocol (or interface) toward upper level is the combination
of a DDL (Data Definition Language) and a DML (Data Manipulation
Language). In this respect the relational data model offers a good
foundation on which to base declarative DDLs and DMLs because
it escapes navigational (procedural) metaphor altogether (see
citation below "Access Path Dependence").

This is, all in all, the idea of Open Implementation. Citing from the
Open Implementation Home Page at Palo Alto Research Center (PARC
- was Xerox PARC), emphasis mine:

  http://www2.parc.com/csl/groups/sda/projects/oi/default.html

  Open Implementation is a software design technique that helps
  write modules that are both reusable and very efficient for a
  wide range of clients.

  *** In the Open Implementation approach, modules allow their
  clients individual control over the module's own implementation
  strategy. ***

In this light supplying a DMCL is a quite modern approach. I'm not
sure that current DBMSs gives comparable control power over =
implementation
details with their parameters settings for performance optimizations
(but I could be wrong: a book, exceeding 1000 pages, dissuades me
to deepen the subject relatively to Oracle).

> We use tables all the time in non-database programs.. perhaps
> a database model would provide a simple way for programs to share
> data, by communicating the necessary type and structure information.

This is a key point of a DBMS; as Fabian Pascal explains:

  "07/21/2001 - Exchange on Data Types"
  (http://www.pgro.uk7.net/suneido2.htm)

  [..] in order to do data management, a DBMS must support some data
  model. A data model has the following components=20

  - data types
  - structure
  - integrity
  - manipulation

  These are /database/, /not application/ functions and /must/ be
  implemented in the DBMS.

Emphasis in the original. Renouncing one only of these components means
to delegate such component to each specific application, with possible
redundancies, incompatibilities, conflicts, corruption and loss of data,
difficulties in data access, etc .. exactly the current situation with
file system based applications (imagine the mess on your hard disk,
dispite any attempt to keep your data organized) which is comparable
to the pre-DBMS era (at least 30 years ago).

> I could see this tying together various tables and documents I use
> on my unix box, or on a heterogenous LAN.. it could also replace the
> whole "filesystem" concept, in a new OS..

The idea of a DBMS as a fundamental system service is certainly not new
and surely it is appealing, for the reasons given above.

At least AS/400 and BeOS have something of this kind (much more
sophisticated in the case of AS/400, probably a near complete DBMS) but
underutilized in both cases for legacy reasons (at least this is my
superficial impression).

But when you consider distribuited systems then the idea of
Distribuited DBMS is really attractive.

In this regard, the relational data model (and then the RDBMSs)
offers an overwhelming advantage respect to hierarchical and
network data models, as pointed out by E. F. Codd in his seminal
paper "A Relational Model of Data for Large Shared Data Banks"
(http://www.acm.org/classics/nov95/):

  1.2.3. Access Path Dependence

  [...]

  In both the tree and network cases, the user (or his program)
  is required to exploit a collection of user access paths to
  the data. It does not matter whether these paths are in close
  correspondence with pointer - (defined paths in the stored
  representation - in IDS the correspondence is extremely simple,
  in TDMS it is just the opposite). The consequence, regardless
  of the stored representation, is that terminal activities and
  programs become dependent on the continued existence of the user
  access paths.=20

  One solution to this is to adopt the policy that once a user
  access path is defined it will not be made obsolete until all
  application programs using that path have become obsolete.
  Such a policy is not practical, because the number of access
  paths in the total model for the community of users of a data
  bank would eventually become excessively large.

Does it sounds familiar? The web is exactly an enormous data base
founded on the network data model (a graph) but there is NOT a
(Distributed) DBMS by any means (the most apparent fallacy:
it lacks integrity). Not only, the structure of every single
"object" (page, document) is hierarchical.

So the W3C is (badly) reinventing the wrong wheel when it extends
the purpose of XML from an interchange format (and even limited
to this, XML is ugly; Lisp s-exps, for example, are better) to data
management (see for example XML Query http://www.w3.org/XML/Query
and XPath http://www.w3.org/TR/xpath).

An excerpt from "11/03/2001 - Comments on Comments at XML-DEV"
by Fabian Pascal ():

  [..] it is quite instructive that while hierarchic databases have a
  foundation in graph theory, hierarchic DBMSs do not adhere to it.
  IBM's IMS is not based on the theory and W3C, by own admission, did
  not base XML on it because it is too complex for both implementers
  and users. Indeed, relational technology has been invented precisely
  to eliminate such complexities and, as I demonstrate in Chapter 7,
  Climbing Trees in SQL in PRACTICAL ISSUES IN DATABASE MANAGEMENT,
  RDBMSs can represent and manipulate hierarchies better than
  hierarchic DBMSs. That SQL and its implemented dialects don't is only
  a demonstration of the practical consequences of flouting relational
  theory.

> Tom Novelli

Massimo Dentico (MaD70)