In defense of Prevalence
Massimo Dentico
m.dentico@virgilio.it
Fri Feb 13 10:51:01 2004
> On Friday, Feb 13, 2004 3:35 AM +0100, "Francois-Rene Rideau" wrote:
>
> > > On Thursday, Feb 12, 2004 ay 4:18 PM +0100, Francois-Rene Rideau wrote:
> > > http://cliki.tunes.org/Prevalence
>
> With all this heat, the stuff should be moved to a "debate" page.
Probably is better to leave only a link to this thread.
> > On Fri, Feb 13, 2004 at 01:32:59AM +0100, Massimo Dentico wrote:
> >
> > If you like we can call a "paradigm shift" the return to an old
> > *discarded* technique, but this does not change the substance of the
> > problem.
>
> It's all about a new, higher-level look, to an old technique, journalling.
The disparaging comments are *precisely* about the pretension of doing
*data management* without an adequate data model. Now even if you prefix
"higher-level" to the "old technique", journalling remain an
implementation technique, nothing to do with a data model.
> > I thought that one of the points of Tunes is indeed that of getting
> > rid of the "application" concept as an artificial barrier
> > to the free flow of information);
>
> Rather, Tunes is about having a more meta-level view of it.
> The whole system is a generic/higher-order/reflective application;
> it has sub-applications, characterized by a mapping to the generic framework
> as defined in a programming language or another.
> The meta-level approach means that instead of having to have
> a one-size-fits-all runtime that interprets application requests,
> we can dynamically and coherently generate ad-hoc code
> that correctly and efficiently implements application semantics.
Well, I can no more cope with this fog: now we have "sub-applications",
so we can decompose recursively applications to the level of a single
machine instruction? Just to save the concept of "application"? I'm
sorry, I'll give up, I have no time for these rhetorical tricks.
> > Answer these questions: what is more important for you *as user*, your
> > applications or your data?
>
> The whole point is that this is a false dichotomy.
> The application is what makes the data meaningful,
> and the data is what the application uses to be meaningful.
Well, I'll reveal you a great secret: both applications and data are
strings of bits for a computer (at some level of abstraction). The
unique "meaning" that a computer can extract from such bits are the
mutual formal relationships that we have established, according to its
"axioms" (the hardware in the ultimate). And that is enough about the
duality data/programs.
What is meaningful for you *as a human being* is not Photoshop, or GIMP,
or a digital camera/web-cam/scanner/printer/display device driver, nor
file formats like JPEG, GIF, TIFF, and so on, but your collection of
images.
We don't care about what applications, nor the OSes, nor the hardware
our bank uses, we care about our bank account and transactions on this
bank account; despite from the point of view of the system a transaction
is a computation, the system hold a trace as data about this computation
because that is what is meaningful for us human clients of a bank (so
much that they send us a periodical report about our account).
That to say that we, as programmers, think yet too much in term of
processes or data-flows than in term of relationships (more in
procedural than declarative way).
> > Is it not true that you can change
> > algorithms, applications, data formats, operating systems, hardware but
> > what *really* interests you are your data? This is true not only for a
> > single user but even more for entire organizations.
>
> What you fail to see is that in the dynamic "application" view,
> the whole system is a meta-level application,
> and that application change, schema evolution, etc.,
> are part of the state transitions of the reflective system.
> Certainly, a static view of immutable applications is doomed;
> but so is a static view of immutable data schema.
> What is doomed is the static point of view,
> and no amount of moving around what your consider "static" will save you.
> It's displacement of problems, not resolution.
How the relational data model (R/DM) prevent schema evolution, as you
seem to believe, totally eludes me. On the contrary, it favors *data
independence* and so schema evolution: isolating the conceptual model
(perhaps de-normalized), exposed to users and applications, from the
(normalized) logical model (via views, which are updatable with a True
RDBMS and proper modeling) and the logical from the physical
(implementation) model.
Besides this, I'm guilt of "staticism": if you need to change schema
every week than probably you have not done a proper requirements
analysis and data modeling, so yours conceptual and logical models need
continuos "massaging" (remember that I'm not speaking about
implemetations: the system is free to choose dynamically whatever
implementations it likes according to collected statistics and other
forms of feedback, even user "hints").
Of course, with proper reflective abilities you can amortize the cost of
schema evolution even on deployed and running systems and here I'm
TOTALLY with you (and the R/DM absolutely favors this: so called "meta
-data", the catalog precisely, *must* be a Relational database itself).
But this is not an excuse to neglect a proper data modeling activity.
> > As programmer you values your programs *as data*:
>
> Exactly - we must thus consider the development system
> as an application at the meta-level.
>
> > In particular declarative /integrity constraints/ are a key feature of a
> > True RDBMS that SQL-based DBMSs don't implement correctly (as I
> > understand it, the prevalence model lacks completely this concept).
>
> What if the integrity constraints change?
Do you understand what integrity constraints are? They are the only way
with which a DB designer declare the "meaning" of data to the DBMS. I
don't know of a better method than *declarative* integrity constraints
to support schema evolution.
See "CONSTRAINTS AND PREDICATES: A BRIEF TUTORIAL"
PART 1 - http://www.dbdebunk.com/page/page/622772.htm
PART 2 - http://www.dbdebunk.com/page/page/622766.htm
PART 3 - http://www.dbdebunk.com/page/page/622764.htm
It is clear that they insist to define a DBMS as an inference engine
based on predicate logic. Now, you can surely raise some questions about
such choice as adequate to model the "real world" (as for the related
field of logic programming), for example what about monotonicity?
In "The Logical Foundations of Computer Science and Mathematics"
http://cs.wwc.edu/~aabyan/Logic/
monotonicity is defined in chapter "Non-monotonic Logics" as:
http://cs.wwc.edu/~aabyan/Logic/Nonmonotonic.html
a logic is monotonic if the truth of a proposition does not change
when new information (axioms) are added to the system
or the "Closed World Assumption":
http://cs.wwc.edu/~aabyan/Logic/CWA.html
When the semantic mapping between the language and the domain is
incomplete or even missing, it may not be possible to determine
whether a sentence is true or not. The closed world assumption is
used provide a default solution in the absence of a better solution.
Closed world assumption: if you cannot prove P or ~P from a
knowledge base KB, add ~P to the knowledge base KB.
Where ~P means the negation of P.
You can raise such questions thanks to the fact that the relational data
model is based on a solid, scientific foundation.
Another outcome of this mathematical approach: the theory of functional
dependencies for relations and the NORMAL FORMS conceived to solve
insert/update/delete anomalies caused by redundancies, eleminating such
redundancies in the first place.
In an OO "model" you are even unable to *perceive* these problems (which
are present, nevertheless).
> A static schema is no solution.
As explained before, a relational schema is "less static" than other
types of "schema" (I suspect that every data structure implementation is
a schema for some OO "data model" proponents).
> And a dynamic schema takes you at the meta-level.
Even a typical SQL-based DBMS is more metaprogrammed, dynamic, adaptive
of what you think:
Feature-Oriented Programming
http://cliki.tunes.org/Feature-Oriented%20Programming
[...]
The future of software engineering is in automation. The best
practical example of automated software engineering is relational
query optimization. That is, database queries are expressed using a
declarative specification (SQL) that is mapped to an efficient
program. The general problem of mapping from a declarative spec to a
provably efficient program is very difficult -- it is called
automatic programming. The success of relational databases rests on
relational algebra -- a set of fundamental operators on tables -
- and that query evaluation programs are expressions -- compositions
of relational algebra operators. Optimizers use algebraic identities
to rewrite, and hence optimize, expressions, which in turn optimizes
query evaluation programs. This is how databases "solve" the
automatic programming problem for query evaluation programs.
[...]
> Now, with a meta-level at hand, you can meta-use prevalence directly,
> and skip the need for a complex RDBMS.
Again logical-physical confusion: an implementation technique cannot
be a surrogate of a conceptual/logical model.
> And no, prevalence doesn't "lack" the concept of integrity constraint;
> rather, it takes an algebra of integrity-preserving transactions
> as *a parameter* to the functor that provides persistence to the application.
> And a programmable meta-level allows you express your constraints
> and automate their enforcement in what you feed to the prevalence functor.
They are clear in your mind. Without proper education, programmers and
even DB Administrators ignore the concept of integrity constraints even
if they are more or less available in SQL-based DBMSs (usually badly
implemented: but if you understand what a constraint is you can cope
with the problems of these implementations).
> Of course you may misspecify your algebra and your meta-programs;
> just like you may misspecify your RDBMS constraints. Doh.
A Straw Man here: I have never said that the relational data model nor a
True RDBMS solve every data modeling problem.
> Where Prevalence+reflection frees you is that you can now tailor
> your data representation algebra to your applicative needs,
> instead of the other way around.
Oh, that's pratical: let's invent a new algebra for every application
need. Why Codd bothered us with his single, stupid algebra (or their 2
calculi)? He was clearly a cultist of the Church of Staticism!
> > [...] data model:
> > · data types
> > · structure
> > · integrity
> > · manipulation
>
> Prevalence + reflection allows to decouple all these,
> where a static RDBMS gives you a one-size-fits-all combination.
How prevalence can decouple something wich is alien to it is a mystery:
an implementation technique is NOT a data model by definition.
Prevalence is an implementation technique (or you think that is it a
data model??)
> > Renouncing one only of these components
> > means to delegate such component to each specific application, with
> > possible redundancies, incompatibilities, conflicts, corruption and
> > loss of data, difficulties in data access, etc ..
>
> This implicitly supposes that there exist no other way
> but centralization through the DBMS
> to achieve factoring of such components accross applications.
Wrong. See below.
> Now, the very core belief of TUNES is that factoring is best done
> through open systems with a good meta-level than through centralization.
> Yes, there is a need for strong consistency-enforcement;
> static thinkers see that as an argument for a centralizing kernel;
> dynamic thinkers see that as an argument for declarative programming.
Declarative programming is *exactly* what the R/DM favors. See
WHAT NOT HOW - THE BUSINESS RULES APPROACH TO APPLICATION DEVELOPMENT,
by Chris Date (a book)
http://www.dbdebunk.com/page/page/623335.htm
Your bunch of data structures linked with pointers in a typical
imperative OO programming language, for which the prevalence technique
was developed, is certainly NOT declarative programming.
Also in SQL-based DBMSs stored procedures must be frequently used for
integrity because the declarative correct solution is not supported. A
sad state of affairs.
> And indeed, in the end, the kernel will have to act upon declarations,
> but will end up interpreting them in a clumsy paranoid way at runtime.
> The open reflective system will compile these declarations before link-time,
> so they are efficiently hardwired or optimized away at runtime.
You clearly ignore everything I have written on the subject, see
"DataBase Debunking: clarifications"
http://lists.tunes.org/archives/review/2002-June/000174.html
where Tom Novelli *appropriately* replayed to my initial message with this:
> Hmm, that's kind of interesting. Nothing says a database has
> to be a crappy monolithic implementation like Oracle.. it could
> be simple and purpose-built, work with data in RAM where
> it should, etc.
I replayed agreeing ".. So there are no reasons to see a DBMS as a
black-box. .." (see the message for the explaination; in brief: again
the physical and logical independence theme) and mentioning the concept
of Open Implementation. The relational data model, being a general
mathematical theory of data, is *mute* about implementation details.
> > [...] All Requirements are Data Requirements
>
> And dually, are application requirements.
> That's just a shift in point of view.
> But the meta-level shift is about seeing the previous shift,
> and giving control over it to the developers.
> So yes, data modelling is an important of application development,
> but it's as wrong to fixate things on this point of view,
> as to fixate things on algorithms and neglect data.
>
> > This merit a comment apart: so you argue that because ".. few people
> > encode the structure of 3D objects or music samples in relational
> > tables" then the relational data model "needn't be the right one[sic]"?
> > Well, that's science! What are the better alternatives? Hierarchical or
> > network structures (graph theory)? When a data model is "the right one"?
>
> Well, sometimes, sets, streams, trees, fifos, soups, matrixes,
> hash-tables, pre-sorted binary decision trees, vertex arrays,
> or whatever, are the right data-structure.
And the R/DM does NOT prevent to use these data structures at the
*implementation level*! They can be selectd automatically, as detailed
above, according to actual usage of data.
> Why are strings not represented as tables in DBMSes?
First, a "table" is a *representations* of a relational variable (rel
-var) value: a snapshot of the tuples in a relation at a particular
point in time.
Second, nothing can prevent you to model strings of text as rel-vars,
for a particular purpose (text retrieval? A text/document editor? An e
-mail client?) The R/DM does not care about implementations so you (or,
better, your meta-program) can implement rel-vars for this purpose with
an adequate, efficient data structure.
See:
Encapsulation Is a Red Herring
by C.J.Date
http://www.dbpd.com/vault/9809date.html
> Why a need for bobs and blobs at all?
Please, stop to think in terms of SQL-based DBMSs.
You seem to fail to distinguish the notion of domain (type) and that of
relation: if you are not interested at some point about the structure of
a particular data then you create a domain (a type) with operations on
it. On the contrary, if you are interested in the structure than you
model it as relation(s).
Of course, you can always change your mind and decompose a type into one
or more relations, insuring the appropriate integrity constraints. And
if you don't drop the original type, this has no impact to applications,
user operations and others parts of the database.
> In typed functional languages, even numbers and characters can be defined
> in the internal type-system, and those provided by the system can be seen
> as just optimization from the standard library rather than additional
> features unexpressible in the programming algebra.
A theoretical exercise that you can do even with the R/DM.
See ON SIMPLICITY OF RELATIONAL OPERATORS
with Chris Date
http://www.dbdebunk.com/page/page/627049.htm
[...]
On pages 92-98 of the same book, we show how the more conventional
operators of the relational algebra can all be expressed in terms of
A. In particular, we show how to deal with *numbers and ordinary
arithmetic in that framework*.
[...]
Where A is a "minimal relational algebra".
> And yes M. Date, that's tree/graph/foo-theory; if you don't like it,
> why not show everyone how much simpler it is to express all these things
> (or show they are not needed) in a relational data model?
They don't say that tree or graphs are useless, but that the R/DM can
express their structures and operations in simple ways. Relations can
express one-to-many and many-to-many relationships, what's the problem?
Forget SQL and think for a moment in terms of Prolog predicates (for the
reader not familiar with the subject: relational algebra and relational
calculi are equivalent under some assumptions and Prolog Horn-clauses
are related). Some facts expressed with a classical predicates that
represent a tree (a genealogy tree):
father(John, Tom).
father(John, Mary).
father(Tom, Nick).
mother(Mary, Marc).
And now a derived predicate, that "climbs" this tree, thanks to
unification and backtracking:
grandfather(X, Y) :- father(X, Z), father(Z, Y).
grandfather(X, Y) :- father(X, Z), mother(Z, Y).
Some goals:
? father(X, Nick) % who is the father of Nick?
X = Tom
? grandfather(John, Y) % who is/are the grandson(s) of John?
Y = Nick
Y = Marc
? grandfather(John, Marc) % Is John the grandfather of Marc?
Yes
? grandfather(John, _) % Is John a grandfather?
Yes
Well, I'm not fluent with Prolog and this is from memory, I have not
tested these examples. But anyone vaguely familiar with logic
programming can understand these examples even in case they are wrong.
The representation of a maze as a graph via predicates and its
"solution" is left as an exercise. A checkers or chess or the Travelling
Salesman Problem (TSP), anyone?
> > "COMPLEX" DATA TYPES: WHAT OBJECT PROPONENTS DON'T TELL YOU
> > http://www.pgro.uk7.net/fp1a.htm
>
> I fail to see how an object database like, say, allegrostore,
> fails to provide every requested feature. Can you pinpoint the problem?
Lack of understanding here: this was intended to replay to your repeated,
more or less explicit assumption ("map this data type, map that data
type") that the R/DM have not "complex" domains (types). The usual adagio
is that "the R/DM lacks these complex types", which is wrong, it is a
misuderstanding of 1FN (first normal form).
> > ***the only requirement is that whatever structures are physically stored
> > must be mapped to relations at the logical level and be hidden from
> > the user***.
>
> And what if relations are clumsy for the data at hand?
> In this frequent case, the relational model is just a PITA
> that people have to work around, and the relations ends up being
> the "physical" low-level layer to a higher-level "logical" view.
Of course, the R/DM "is just a PITA" if you misunderstood it.
> > For example, suppose the implementation in some object database [...]
> > is changed from an array to a linked list. What are the implications
> > for existing code that accesses that object [...]? It breaks.
>
> No. You define new generic functions and/or you change-class your objects,
> or you update their class.
Of course, update your program and you are ok. Do you see how that is
stupid?? Data independence is a concept invented just to avoid this:
avoid to update yours application programs, so much or few they are,
every time you change the implementation details of your data.
The alternative is a generic, shared protocol for *all* your collection
classes: but that is just what a relational algebra/calculus is.
> In any case, it's a problem of static vs dynamic programming,
> that static relational data schema doesn't solve.
See above: this "*static* relational data schema" is a fantasy.
> > 9. It is better to have 100 functions operate on one data structure
> > than 10 functions on 10 data structures.
>
> It is better to have a meta-level algebra to define your arbitrary functions.
> (In functional language lingo, they call that a type-system.)
> And add generics and/or dynamics to be able to manipulate arbitrary data
> the specific type of which is unknown.
The context, that you deleted, was about *collections* types (otherwise,
the R/DM and type systems are somewhat orthogonal) and relative
protocols: without a common protocol you end up adding more complexity.
> So yes, Pascal-style strongly data-structures without either generics
> or dynamics is dumb and clumsy.
For the reader: he is implictly suggesting here that because I use
professionally an Object Pascal variant (Delphi) than my knowledge is
limited to that "little universe". No point in confronting with this
rhetorical trick.
> > Hugh Darwen and C.J. Date in a book, "The Third Manifesto", show another
> > approach, a complete integration between the relational data model and a
> > theory of types with a proper concept of "inheritance" (this term IMO is
> > an inappropriate choice, too confused; subtyping is better). This
> > approach is incarnate in a *pedagogical* programming language, Tutorial
> > D.
>
> Well, indeed, if they want to be taken seriously,
> proponents of relational data models will have to produce
> serious programming languages that implement these models.
You forget that they are not language designers and they are not
omniscient.
Besides, the R/DM is not something new that needs "proponents": the
scientific community took Codd very seriously. The problem is the IT
industry and its malefic effect on educational institutions. So they are
about database debunkings, not proposing.
> AP5 comes to mind - and Prevalence could be a way to add persistency to it,
> just as it could help make persistent completely different data models.
"Data model" is a precise technical term. What are these "completely
different data models"? Hierarchical and network DBMSs don't adhere to a
formal data model, graph theory, beacuse, they say, it is too complex.
But you don't need to trust them on faith: under our noses there are
both hierarchical and network structured data, File Systems and the Web.
They are a mess *exactly* because of their *exclusively* reliance on
the navigational paradigm. As pointed out by Codd more than 30 years ago
in his seminal paper:
A Relational Model of Data for Large Shared Data Banks
http://www.acm.org/classics/nov95/
1.2.3. Access Path Dependence
[...]
In both the tree and network cases, the user (or his program) is
required to exploit a collection of user access paths to the data.
It does not matter whether these paths are in close correspondence
with pointer - (defined paths in the stored representation - in IDS
the correspondence is extremely simple, in TDMS it is just the
opposite). The consequence, regardless of the stored representation,
is that terminal activities and programs become dependent on the
continued existence of the user access paths.
One solution to this is to adopt the policy that once a user access
path is defined it will not be made obsolete until all application
programs using that path have become obsolete. Such a policy is not
practical, because the number of access paths in the total model for
the community of users of a data bank would eventually become
excessively large.
[...]
Regards.
--
Massimo Dentico