In defense of Prevalence

Massimo Dentico m.dentico@virgilio.it
Fri Feb 13 10:51:01 2004


> On Friday, Feb 13, 2004 3:35 AM +0100, "Francois-Rene Rideau" wrote:
>
> > > On Thursday, Feb 12, 2004 ay 4:18 PM +0100, Francois-Rene Rideau wrote:
> > > http://cliki.tunes.org/Prevalence
>
> With all this heat, the stuff should be moved to a "debate" page.

Probably is better to leave only a link to this thread.


> > On Fri, Feb 13, 2004 at 01:32:59AM +0100, Massimo Dentico wrote:
> >
> > If you like we can call a "paradigm shift" the return to an old
> > *discarded* technique, but this does not change the substance of the
> > problem.
>
> It's all about a new, higher-level look, to an old technique, journalling.

The disparaging comments are  *precisely* about the pretension  of doing
*data management* without an adequate data model. Now even if you prefix
"higher-level"   to   the  "old   technique",   journalling  remain   an
implementation technique, nothing to do with a data model.


> > I thought that one of the points of Tunes is indeed that of getting
> > rid of the "application" concept as an artificial barrier
> > to the free flow of information);
>
> Rather, Tunes is about having a more meta-level view of it.
> The whole system is a generic/higher-order/reflective application;
> it has sub-applications, characterized by a mapping to the generic framework
> as defined in a programming language or another.
> The meta-level approach means that instead of having to have
> a one-size-fits-all runtime that interprets application requests,
> we can dynamically and coherently generate ad-hoc code
> that correctly and efficiently implements application semantics.

Well, I can no more cope with this fog: now we have  "sub-applications",
so we can  decompose recursively applications  to the level  of a single
machine  instruction? Just  to save  the concept  of "application"?  I'm
sorry, I'll give up, I have no time for these rhetorical tricks.


> > Answer these questions: what is more important for you *as user*, your
> > applications or your data?
>
> The whole point is that this is a false dichotomy.
> The application is what makes the data meaningful,
> and the data is what the application uses to be meaningful.

Well, I'll  reveal you  a great  secret: both  applications and data are
strings  of bits  for a  computer (at  some level  of abstraction).  The
unique "meaning"  that a  computer can  extract from  such bits  are the
mutual formal relationships that  we have established, according  to its
"axioms" (the hardware  in the ultimate).  And that is  enough about the
duality data/programs.

What is meaningful for you *as a human being* is not Photoshop, or GIMP,
or a digital  camera/web-cam/scanner/printer/display device driver,  nor
file formats like  JPEG, GIF, TIFF,  and so on,  but your collection  of
images.

We don't care  about what applications,  nor the OSes,  nor the hardware
our bank uses, we care about  our bank account and transactions on  this
bank account; despite from the point of view of the system a transaction
is a computation, the system hold a trace as data about this computation
because that is what  is meaningful for us  human clients of a  bank (so
much that they send us a periodical report about our account).

That to  say that  we, as  programmers, think  yet too  much in  term of
processes  or  data-flows  than  in  term  of  relationships  (more   in
procedural than declarative way).


> > Is it not true that you can change
> > algorithms, applications, data formats, operating systems, hardware but
> > what *really* interests you are your data? This is true not only for a
> > single user but even more for entire organizations.
>
> What you fail to see is that in the dynamic "application" view,
> the whole system is a meta-level application,
> and that application change, schema evolution, etc.,
> are part of the state transitions of the reflective system.
> Certainly, a static view of immutable applications is doomed;
> but so is a static view of immutable data schema.
> What is doomed is the static point of view,
> and no amount of moving around what your consider "static" will save you.
> It's displacement of problems, not resolution.

How the relational  data model (R/DM)  prevent schema evolution,  as you
seem to  believe, totally  eludes me.  On the  contrary, it favors *data
independence* and  so schema  evolution: isolating  the conceptual model
(perhaps de-normalized),  exposed to  users and  applications, from  the
(normalized) logical model (via views,  which are updatable with a  True
RDBMS  and  proper   modeling)  and  the   logical  from  the   physical
(implementation) model.

Besides this,  I'm guilt  of "staticism":  if you  need to change schema
every  week  than  probably  you have  not  done  a  proper requirements
analysis and data modeling, so yours conceptual and logical models  need
continuos   "massaging"   (remember   that   I'm   not   speaking  about
implemetations:  the  system  is  free  to  choose  dynamically whatever
implementations it  likes according  to collected  statistics and  other
forms of feedback, even user "hints").

Of course, with proper reflective abilities you can amortize the cost of
schema  evolution even  on deployed  and running  systems and  here  I'm
TOTALLY with you (and the  R/DM absolutely favors this: so  called "meta
-data", the catalog precisely,  *must* be a Relational database itself).
But this is not an excuse to neglect a proper data modeling activity.


> > As programmer you values your programs *as data*:
>
> Exactly - we must thus consider the development system
> as an application at the meta-level.
>
> > In particular declarative /integrity constraints/ are a key feature of a
> > True RDBMS that SQL-based DBMSs don't implement correctly (as I
> > understand it, the prevalence model lacks completely this concept).
>
> What if the integrity constraints change?

Do you understand what integrity constraints are? They are the only  way
with which a DB designer declare  the "meaning" of data to the  DBMS.  I
don't know of a better method than  *declarative*  integrity constraints
to support schema evolution.

See "CONSTRAINTS AND PREDICATES: A BRIEF TUTORIAL"

PART 1 - http://www.dbdebunk.com/page/page/622772.htm
PART 2 - http://www.dbdebunk.com/page/page/622766.htm
PART 3 - http://www.dbdebunk.com/page/page/622764.htm

It is clear  that they insist  to define a  DBMS as an  inference engine
based on predicate logic. Now, you can surely raise some questions about
such choice as  adequate to model  the "real world"  (as for the related
field of logic programming), for example what about monotonicity?

In "The Logical Foundations of Computer Science and Mathematics"
http://cs.wwc.edu/~aabyan/Logic/

monotonicity is defined in chapter "Non-monotonic Logics" as:
http://cs.wwc.edu/~aabyan/Logic/Nonmonotonic.html

    a logic is monotonic if the  truth of a proposition does not  change
    when new information (axioms) are added to the system

or the "Closed World Assumption":
http://cs.wwc.edu/~aabyan/Logic/CWA.html

    When the  semantic mapping  between the  language and  the domain is
    incomplete or  even missing,  it may  not be  possible to  determine
    whether a sentence  is true or  not. The closed  world assumption is
    used provide a default solution in the absence of a better solution.

    Closed  world  assumption:  if  you cannot  prove  P  or  ~P from  a
    knowledge base KB, add ~P to the knowledge base KB.

Where ~P means the negation of P.

You can raise such questions thanks to the fact that the relational data
model is based on a solid, scientific foundation.

Another outcome of this mathematical approach: the theory of  functional
dependencies  for  relations and  the  NORMAL FORMS  conceived  to solve
insert/update/delete anomalies caused by redundancies, eleminating  such
redundancies in the first place.

In an OO "model" you are even unable to *perceive* these problems (which
are present, nevertheless).


> A static schema is no solution.

As explained  before, a  relational schema  is "less  static" than other
types of "schema" (I suspect that every data structure implementation is
a schema for some OO "data model" proponents).


> And a dynamic schema takes you at the meta-level.

Even a typical SQL-based DBMS is more metaprogrammed, dynamic,  adaptive
of what you think:

Feature-Oriented Programming
http://cliki.tunes.org/Feature-Oriented%20Programming

    [...]

    The  future  of  software engineering  is  in  automation. The  best
    practical example  of automated  software engineering  is relational
    query optimization. That is, database queries are expressed using  a
    declarative  specification  (SQL)  that is  mapped  to  an efficient
    program. The general problem of mapping from a declarative spec to a
    provably  efficient  program  is  very  difficult  --  it  is called
    automatic programming. The success of relational databases rests  on
    relational algebra  -- a  set of  fundamental operators  on tables -
    - and that query evaluation programs are expressions -- compositions
    of relational algebra operators. Optimizers use algebraic identities
    to rewrite, and hence optimize, expressions, which in turn optimizes
    query  evaluation  programs.  This  is  how  databases  "solve"  the
    automatic programming problem for query evaluation programs.

    [...]


> Now, with a meta-level at hand, you can meta-use prevalence directly,
> and skip the need for a complex RDBMS.

Again  logical-physical  confusion:   an implementation technique cannot
be a surrogate of a conceptual/logical model.


> And no, prevalence doesn't "lack" the concept of integrity constraint;
> rather, it takes an algebra of integrity-preserving transactions
> as *a parameter* to the functor that provides persistence to the application.
> And a programmable meta-level allows you express your constraints
> and automate their enforcement in what you feed to the prevalence functor.

They are clear in your  mind. Without proper education, programmers  and
even DB Administrators ignore the concept of integrity constraints  even
if they  are more  or less  available in  SQL-based DBMSs (usually badly
implemented: but  if you  understand what  a constraint  is you can cope
with the problems of these implementations).


> Of course you may misspecify your algebra and your meta-programs;
> just like you may misspecify your RDBMS constraints. Doh.

A Straw Man here: I have never said that the relational data model nor a
True RDBMS solve every data modeling problem.


> Where Prevalence+reflection frees you is that you can now tailor
> your data representation algebra to your applicative needs,
> instead of the other way around.

Oh, that's pratical:  let's invent a  new algebra for  every application
need. Why Codd bothered  us with his single,  stupid algebra (or their 2
calculi)? He was clearly a cultist of the Church of Staticism!


> > [...] data model:
> > · data types
> > · structure
> > · integrity
> > · manipulation
>
> Prevalence + reflection allows to decouple all these,
> where a static RDBMS gives you a one-size-fits-all combination.

How prevalence can decouple something wich is alien to it is a  mystery:
an  implementation  technique  is  NOT  a  data  model  by   definition.
Prevalence is  an implementation  technique (or  you think  that is it a
data model??)


> > Renouncing one only of these components
> > means to delegate such component to each specific application, with
> > possible redundancies, incompatibilities, conflicts, corruption and
> > loss of data, difficulties in data access, etc ..
>
> This implicitly supposes that there exist no other way
> but centralization through the DBMS
> to achieve factoring of such components accross applications.

Wrong. See below.


> Now, the very core belief of TUNES is that factoring is best done
> through open systems with a good meta-level than through centralization.
> Yes, there is a need for strong consistency-enforcement;
> static thinkers see that as an argument for a centralizing kernel;
> dynamic thinkers see that as an argument for declarative programming.

Declarative programming is *exactly* what the R/DM favors. See

WHAT NOT HOW - THE BUSINESS RULES APPROACH TO APPLICATION DEVELOPMENT,
by Chris Date (a book)
http://www.dbdebunk.com/page/page/623335.htm

Your  bunch  of  data  structures  linked  with  pointers  in  a typical
imperative OO programming language,  for which the prevalence  technique
was developed, is certainly NOT declarative programming.

Also in SQL-based  DBMSs stored procedures  must be frequently  used for
integrity because the declarative  correct solution is not  supported. A
sad state of affairs.


> And indeed, in the end, the kernel will have to act upon declarations,
> but will end up interpreting them in a clumsy paranoid way at runtime.
> The open reflective system will compile these declarations before link-time,
> so they are efficiently hardwired or optimized away at runtime.

You clearly ignore everything I have written on the subject, see

"DataBase Debunking: clarifications"
http://lists.tunes.org/archives/review/2002-June/000174.html

where Tom Novelli *appropriately* replayed to my initial message with this:

    > Hmm, that's kind of interesting. Nothing says a database has
    > to be a crappy monolithic implementation like Oracle.. it could
    > be simple and purpose-built, work with data in RAM where
    > it should, etc.

I replayed  agreeing "..  So there  are no  reasons to  see a  DBMS as a
black-box. .." (see  the message for  the explaination; in  brief: again
the physical and logical independence theme) and mentioning the  concept
of  Open  Implementation. The  relational  data model,  being  a general
mathematical theory of data, is *mute* about implementation details.


> > [...] All Requirements are Data Requirements
>
> And dually, are application requirements.
> That's just a shift in point of view.
> But the meta-level shift is about seeing the previous shift,
> and giving control over it to the developers.
> So yes, data modelling is an important of application development,
> but it's as wrong to fixate things on this point of view,
> as to fixate things on algorithms and neglect data.
>
> > This merit a comment apart: so you argue that because ".. few people
> > encode the structure of 3D objects or music samples in relational
> > tables" then the relational data model "needn't be the right one[sic]"?
> > Well, that's science! What are the better alternatives? Hierarchical or
> > network structures (graph theory)? When a data model is "the right one"?
>
> Well, sometimes, sets, streams, trees, fifos, soups, matrixes,
> hash-tables, pre-sorted binary decision trees, vertex arrays,
> or whatever, are the right data-structure.

And  the R/DM  does NOT  prevent to  use these  data structures  at the
*implementation level*!  They can be selectd automatically, as detailed
above, according to actual usage of data.


> Why are strings not represented as tables in DBMSes?

First, a "table"  is a *representations*  of a relational  variable (rel
-var) value:  a  snapshot of the  tuples in a  relation at a  particular
point in time.

Second, nothing can  prevent you to  model strings of  text as rel-vars,
for a particular purpose (text  retrieval? A text/document editor? An  e
-mail client?) The R/DM does  not care about implementations so you (or,
better, your meta-program) can implement rel-vars for this  purpose with
an adequate, efficient data structure.

See:

Encapsulation Is a Red Herring
by C.J.Date
http://www.dbpd.com/vault/9809date.html

> Why a need for bobs and blobs at all?

Please, stop to think in terms of SQL-based DBMSs.

You seem to fail to distinguish the notion of domain (type) and that  of
relation: if you are not interested at some point about the structure of
a particular data then you create  a domain (a type) with operations  on
it. On the  contrary, if you  are interested in  the structure than  you
model it as relation(s).

Of course, you can always change your mind and decompose a type into one
or more relations,  insuring the appropriate integrity constraints.  And
if you don't drop the original type, this has no impact to applications,
user operations and others parts of the database.


> In typed functional languages, even numbers and characters can be defined
> in the internal type-system, and those provided by the system can be seen
> as just optimization from the standard library rather than additional
> features unexpressible in the programming algebra.

A theoretical exercise that you can do even with the R/DM.

See ON SIMPLICITY OF RELATIONAL OPERATORS
with Chris Date
http://www.dbdebunk.com/page/page/627049.htm

    [...]

    On pages 92-98 of the same  book, we show how the more  conventional
    operators of the relational algebra can all be expressed in terms of
    A. In  particular, we  show how  to deal  with *numbers and ordinary
    arithmetic in that framework*.

    [...]

Where A is a "minimal relational algebra".


> And yes M. Date, that's tree/graph/foo-theory; if you don't like it,
> why not show everyone how much simpler it is to express all these things
> (or show they are not needed) in a relational data model?

They don't say that  tree or graphs are  useless, but that the  R/DM can
express their structures  and operations in simple ways.  Relations  can
express one-to-many and many-to-many relationships, what's the problem?

Forget SQL and think for a moment in terms of Prolog predicates (for the
reader not familiar with the subject: relational algebra and  relational
calculi are  equivalent under  some assumptions  and Prolog Horn-clauses
are  related). Some  facts expressed  with a  classical predicates  that
represent a tree (a genealogy tree):

    father(John, Tom).
    father(John, Mary).
    father(Tom, Nick).
    mother(Mary, Marc).

And  now  a  derived  predicate,  that  "climbs"  this  tree,  thanks to
unification and backtracking:

    grandfather(X, Y) :- father(X, Z), father(Z, Y).
    grandfather(X, Y) :- father(X, Z), mother(Z, Y).

Some goals:

    ? father(X, Nick)              % who is the father of Nick?

    X = Tom

    ? grandfather(John, Y)         % who is/are the grandson(s) of John?

    Y = Nick
    Y = Marc

    ? grandfather(John, Marc)      % Is John the grandfather of Marc?

    Yes

    ? grandfather(John, _)         % Is John a grandfather?

    Yes

Well, I'm not  fluent with Prolog  and this is  from memory, I  have not
tested  these   examples.  But   anyone  vaguely   familiar  with  logic
programming can understand these examples even in case they are wrong.

The  representation  of  a  maze  as  a  graph  via  predicates  and its
"solution" is left as an exercise. A checkers or chess or the Travelling
Salesman Problem (TSP), anyone?


> > "COMPLEX" DATA TYPES: WHAT OBJECT PROPONENTS DON'T TELL YOU
> > http://www.pgro.uk7.net/fp1a.htm
>
> I fail to see how an object database like, say, allegrostore,
> fails to provide every requested feature. Can you pinpoint the problem?

Lack of understanding here: this was intended to replay to your repeated,
more  or less explicit  assumption  ("map  this data type,  map that data
type") that the R/DM have not "complex" domains (types). The usual adagio
is that "the R/DM lacks these complex types", which is wrong, it is a
misuderstanding of 1FN (first normal form).


> > ***the only requirement is that whatever structures are physically stored
> > must be mapped to relations at the logical level and be hidden from
> > the user***.
>
> And what if relations are clumsy for the data at hand?
> In this frequent case, the relational model is just a PITA
> that people have to work around, and the relations ends up being
> the "physical" low-level layer to a higher-level "logical" view.

Of course, the R/DM "is just a PITA" if you misunderstood it.


> > For example, suppose the implementation in some object database [...]
> > is changed from an array to a linked list. What are the implications
> > for existing code that accesses that object [...]? It breaks.
>
> No. You define new generic functions and/or you change-class your objects,
> or you update their class.

Of course, update your  program and you are  ok. Do you see  how that is
stupid?? Data  independence is  a concept  invented just  to avoid this:
avoid to  update yours  application programs,  so much  or few they are,
every time you change the implementation details of your data.

The alternative is a generic, shared protocol for *all* your  collection
classes: but that is just what a relational algebra/calculus is.


> In any case, it's a problem of static vs dynamic programming,
> that static relational data schema doesn't solve.

See above: this "*static* relational data schema" is a fantasy.


> > 9. It is better to have 100 functions operate on one data structure
> > than 10 functions on 10 data structures.
>
> It is better to have a meta-level algebra to define your arbitrary functions.
> (In functional language lingo, they call that a type-system.)
> And add generics and/or dynamics to be able to manipulate arbitrary data
> the specific type of which is unknown.

The context, that you deleted, was about *collections* types (otherwise,
the  R/DM  and  type  systems  are  somewhat  orthogonal)  and  relative
protocols: without a common protocol you end up adding more complexity.


> So yes, Pascal-style strongly data-structures without either generics
> or dynamics is dumb and clumsy.

For  the reader:  he is  implictly suggesting  here that  because I  use
professionally  an  Object  Pascal variant (Delphi) than my knowledge is
limited to  that "little  universe". No  point in  confronting with this
rhetorical trick.


> > Hugh Darwen and C.J. Date in a book, "The Third Manifesto", show another
> > approach, a complete integration between the relational data model and a
> > theory of types with a proper concept of "inheritance" (this term IMO is
> > an inappropriate choice, too confused; subtyping is better). This
> > approach is incarnate in a *pedagogical* programming language, Tutorial
> > D.
>
> Well, indeed, if they want to be taken seriously,
> proponents of relational data models will have to produce
> serious programming languages that implement these models.

You  forget  that  they  are not  language  designers and  they  are not
omniscient.

Besides, the  R/DM is  not something  new that  needs "proponents":  the
scientific community  took Codd  very seriously.  The problem  is the IT
industry and its malefic effect on educational institutions. So they are
about database debunkings, not proposing.


> AP5 comes to mind - and Prevalence could be a way to add persistency to it,
> just as it could help make persistent completely different data models.

"Data model"  is a  precise technical  term. What  are these "completely
different data models"? Hierarchical and network DBMSs don't adhere to a
formal data model, graph theory,  beacuse, they say, it is  too complex.

But you don't  need to trust  them on faith:  under our noses  there are
both hierarchical and network structured data, File Systems and the Web.
They are  a mess  *exactly* because  of their  *exclusively* reliance on
the navigational paradigm. As pointed out by Codd more than 30 years ago
in his seminal paper:

A Relational Model of Data for Large Shared Data Banks
http://www.acm.org/classics/nov95/

    1.2.3. Access Path Dependence

    [...]

    In both the  tree and network  cases, the user  (or his program)  is
    required to exploit a collection  of user access paths to  the data.
    It does not matter whether  these paths are in close  correspondence
    with pointer - (defined paths in the stored representation - in  IDS
    the  correspondence is  extremely simple,  in TDMS  it is  just the
    opposite). The consequence, regardless of the stored representation,
    is that  terminal activities  and programs  become dependent  on the
    continued existence of the user access paths.

    One solution to this is to adopt the policy that once a user  access
    path is defined it will  not be made obsolete until  all application
    programs using that path have become obsolete. Such a policy is  not
    practical, because the number of access paths in the total model for
    the  community  of users  of  a data  bank  would eventually  become
    excessively large.

    [...]


Regards.

--
Massimo Dentico