[TDP] Alternative to mark-up languages

Massimo Dentico m.dentico at virgilio.it
Wed Mar 28 18:00:24 PDT 2007


Hello Tunespeople.

Tom, I will reply only in part to your e-mail. I lack enough
sleep already.


Tom Novelli wrote:

> Now, let's see if I understand your proposal:

You got the big picture or, to be more precise, the little
fragment of the big picture of which I wrote.


> 2. Deal with _meaningful_ patterns in the text
> (don't do formatting/styling)

To be more precise: style is an automatic annotation, for the
most part, at parsed meaningful patterns (ad-hoc annotation
is possible).

An example: LAPIS has a named pattern

  Business/Address/State

You can annotate a style to this named pattern (well,
my idea is that you can annotate whatever you want
to this named pattern) so that ALL occurrences of
ALL parsed states are styled in a certain manner.

A Terminal (http://tunes.org/wiki/Terminal) can choose
to apply your(s) style(s) or not (think of a better CSS
that moreover don't need to refer to HTML/XML/.. tags).

Concretely an annotation is implemented by primary key
-foreign key (I cannot go into more details here, we need
to analyze more requirements and try some alternative
schema designs).

Apropos of CSS, another fragment (a reminder, so that
I'll not forget it):

  UW Constraint-Based Systems
    http://www.cs.washington.edu/research/constraints/index.html

  Constraints and the Web
    http://www.cs.washington.edu/research/constraints/web/

  A Constraint-Based Specification for Box Layout in CSS2
    http://www.cs.washington.edu/research/constraints/web/css2.html

  Abstract

  Cascading Style Sheets provide a flexible mechanism for
  governing the appearance of Web pages. Cascading Style
  Sheets Level 2 (CSS2) are an enhancement to the original
  CSS1 specification, giving Web page designers additional
  control over the appearance of Web pages. However, the
  CSS2 specification is written in English, leaving open the
  possibility of ambiguity or inconsistency. We present a
  formalization of a subset of the CSS2 specification using
  constraints hierarchies to help ensure that potential
  problems in the specification are caught and corrected. We
  also comment on the formalization process.


> 4. Use an advanced decentralized RDBMS.
> (Not SQL... Is this relational Python thing, Dee, more helpful?)

wrt SQL-based DBMSs I do not exclude them /a priori/ as an expedient,
and I haven't looked deeply into Dee, yet. These are options we
need to consider.

But I don't think of SQLite as a good option, at least /as is/.

There is a list of features not implemented in SQLite here:

  http://sqlite.org/omitted.html

the first is FOREIGN KEY constraints. Not a good sign.
CHECK is supported, see CREATE TABLE here:

  http://www.sqlite.org/lang_createtable.html

so, if not severely limted, FOREIGN KEY can be emulated
(but with too much contortion for my taste). Also note
this (same page):

  CHECK constraints are supported as of version 3.3.0.
  Prior to version 3.3.0, CHECK constraints were parsed
  but not enforced.

A relative new feature then, because actual version is 3.3.13.
This says to me that his author (Richard Hipp) think that
declarative constraints checking (data integrity) is not
essential.

I don't know if, with care and a small wrapper around,
it can serve us decently.


> A COUPLE OF OTHER THINGS
>
> XML, I think, is a grey area.  It's cleaner and more flexible than
> HTML, on one hand, but the markup format sucks.  Is that your only
> complaint?

You underestimate the problem. I will try again, with hope to be more
convincing this time.

W3C is trying to push XML for EVERYTHING. Go to see with your
eyes:

  World Wide Web Consortium
  http://www.w3.org/

XML = absurd complication (which I define as absolutely
      useless complexity).

Citing our documentation:

  The TUNES Interfaces Subproject
    http://tunes.org/Interfaces/

  ...

  Principles

  Terminal-Independance

  ...

  Later systems have promoted the idea of "object-oriented"
  (say, CORBA) or recursive name-attribute-laden media (XML,
  Schema, XSLT, DAML/OIL, Ontologies), but unfortunately
  using a very poor model of object and attribute which is
  often ***very wasteful and requires extra effort on both
  sides of the medium (not just computationally, but for the
  designers / specifiers as well***). Generally, recent
  solutions have had this flavor of reducing both sides to a
  less-efficient and less-expressive medium at the
  computational and definitional cost to both sides.

Emphasis added. I want to stress: inefficiency for machines and
humans.

This was Brian, updating what Faré wrote on the subject.
I completely agree (I think he is not changed his mind
on this) and in fact my collection of quotions is
concordant:

  http://tunes.org/wiki/TUNES_vs_the_WWW

Even the W3C were forced to admit that XML is inefficient
(unfortunately they somewhat admitted some machine
inefficiency only, nothing about the human side -- see the
XML Binary Characterization Working Group):

  http://www.w3.org/XML/EXI/

Let me add a quotation of Victoria Livschitz:

  The Next Move in Programming: A Conversation
  with Sun's Victoria Livschitz
    http://java.sun.com/developer/technicalArticles/Interviews/livschitz_qa.html

  The world has gone crazy with XML and then web services;
  SOAP and UDDI are getting enormous attention, and, yet,
  from a software engineering standpoint, they seem to me a
  setback rather then a step forward.

  We now have a generation of young programmers who think of
  software in terms of angle brackets. An enormous mess of
  XML documents that are now being created by enterprises at
  an alarming rate will be haunting our industry for
  decades. With all that excitement, no one seems to have
  the slightest interest in basic computer science. Still,
  there must be people out there who think differently.

This means thinking individuals and small groups like us.

For entities with vested interest in these "standards", complication
means power & money.

Joel Spolsky calls such strategy "fire & motion". Please, read what
he wrote at least from this passage quoted until the end of the
article:

  Fire And Motion
  by Joel Spolsky
    http://www.joelonsoftware.com/articles/fog0000000339.html

  ...

  "When I was an Israeli paratrooper a general stopped by to
  give us a little speech about strategy. In infantry
  battles, he told us, there is only one strategy: Fire and
  Motion. You move towards the enemy while firing your
  weapon. The firing forces him to keep his head down so he
  can't fire at you."

or at least until he mentions XML / SOAP / CDE / J2EE.

I cannot quote more here. See his rules:

  Linking, Quoting, Reprinting
  http://www.joelonsoftware.com/Linking.html

What he wrote is true not only for small businesses but
even more valid for free software projects like our.


CONCLUSION

It is obvious thet we cannot hope to subvert the Web overnight,
so we need a *good* migration path. Besides, we need to cite
much material on the web.

But consider: to import or wrap we can use existing libraries,
like

  HTML Tidy Library Project
    http://tidy.sourceforge.net/

Specifically:

  Document Tree
    http://tidy.sourceforge.net/docs/api/group__Tree.html

  ...

  Detailed Description

  A parsed and, optionally, repaired document is represented
  by Tidy as a Tree, much like a W3C DOM. This tree may be
  traversed using these functions. ...

To export, we don't really need to re-parse nothing, only
to pretty-print to whatever format.

And, anyway... data are more secure an accesible in a decent DBMS,
with integrity, back-up, replication, ODBC... as standard features.


> I've added links to old content.  The old index.html is gone,
> but sitemap.html is the real "portal" to the old site.
> Let me know if I missed anything else.

Ok, thanks; it seems nothing is missing. This will do for the moment.


Best regards.

--
Massimo Dentico







More information about the TUNES mailing list