On Tunes Distributed Publishing (TDP) (was: Tunes)

Tue Mar 20 21:23:54 PDT 2007

> Massimo, welcome back!

Thanks, Tom.

On 3/20/07, Tom Novelli wrote:
> ...
>
> TDP hasn't gotten the attention it deserves.  Right now it would be
> really nice to have a general mechanism for version control and
> annotations.  I know what I want, but I don't have the answers, so I'm
> looking to someone with more expertise in this area.

I'm certainly NOT an "expert", but I have some proposals.
In what follow there is enough food for eager contributors.

w.r.t. version control: we can find inspiration and ideas in
literature regarding, of course, Version Control Systems (VCS)
and Software Configuration Management (SCM). A couple of
starting point:

  Streamed Lines: Branching Patterns for Parallel Software Development
    http://www.cmcrossroads.com/bradapp/acme/branching/references.html

  Configuration Management with Version Sets
    http://www.infosun.fmi.uni-passau.de/st/papers/zeller-phd/

There are relevant papers & software about synchronization here:

  Unison
    http://www.cis.upenn.edu/~bcpierce/unison/index.html

  Harmony
    http://www.seas.upenn.edu/~harmony/

  Benjamin C. Pierce's Papers
    http://www.cis.upenn.edu/~bcpierce/papers/index.shtml
    ignore any occurrence of "XML", read "tree" instead
    notably:
    - Relational Lenses: A Language for Updateable Views
    - Agreeing to Agree: Conflict Resolution for Optimistically Replicated Data
    - Exploiting Schemas in Data Synchronization
    - A Formal Investigation of Diff3
    - The Weird World of Bi-Directional Programming
    .. and so on.

Other relevant keywords for versioning are distributed DBMS & FS,
disconnected operation/update...

w.r.t. annotations: they are not special, a general mechanism for reference
suffice; a primary-key / foreign-key constraint is a good solution.
The general case (distributed/decentralized) needs more thought.

Joins are not really a problem in this setting, as
usually objected. Hints: AutoJoin, various degrees
of independence from the logical schema, Conceptual Queries, ...

  Ramon Lawrence publications:
    http://people.ok.ubc.ca/rlawrenc/research/publications.html
    as above, ignore "XML"
    - AutoJoin: Providing Freedom from Specifying Joins
    - Querying Relational Databases without Explicit Joins
    - ...

  INFER: A Relational Query Language without the Complexity of SQL
    http://portal.acm.org/citation.cfm?id=1099607

  Autojoin: A Simple Rule Based Query Service for Complex Databases
    http://adsabs.harvard.edu/abs/2003ASPC..295..287G

  No-Schema SQL (NS-SQL): Querying Relational Databases Independent of Schema
    http://www.cs.uiuc.edu/class/fa06/cs511/nssql.pdf

  Conceptual Queries
    http://www.orm.net/pdf/ConceptQueries.pdf

  ...

     me: "please, say 'STOP' here"
you all: "STOP!" (loudly)
     me: "thank you".

> What would TDP look like?  Something like an XML DOM tree, without the
> stupid markup, perhaps?  "Links" would be references to objects,
> ranging in size from single letters to book-length.  But would you
> reference a certain version "frozen in time", or the current version?
> And if it's gone, how would the system try to handle it gracefully?
> And finally, would you store it all in a relational database, or what?
> ....... Just some "food for thought."

Too long to replay here in full. So, only quick shots:

> What would TDP look like?

In the long run a fully distributed/decentralized system (peer-to-peer).

Note that "publishing" is about making public anything, not only text.
It is a special case of migration:

  http://tunes.org/wiki/Migration

> Something like an XML DOM tree, without the stupid markup, perhaps?

Also, without imposing a single, specific, one level structure
("document") to all content.

Think "fragments", "pieces", "chunks" of information that is
possible to re-combine, mix, mashup, transclude, transforme...
in multiple structures.

This because "content" (text, for example) possess structures
at different levels (characters, syllables, words, phrases,
paragraphs, chapters, volumes, tomes, books...). So, the distinction
content vs structure is relative, at least, to a particular level.
(I will use "content" for convenience here, even if I find the
term sloppy).

It is possible, for example, to assemble fragments of text from
different texts (citing & quoting) but is also possible to re-assemble
fragments of the same text in new ways (different structures).

Content, structure or both could be transformed (automatic
summarization, for example).

Meta-text, meta-browsing, tactics... I think this recombinations
and transformations are what Faré had in mind back when he wrote
(pure hermeneutics of Faré... shocking!):

  Tunes Distributed Publishing
    http://tunes.org/Interfaces/tunesvswww.html

Some references:

  NetBook - a data model to support knowledge exploration
    http://www.vldb.org/dblp/db/conf/vldb/Shasha85.html

    Knowledge exploration is the activity of finding out what
    other people have thought about. Normally, people explore
    knowledge by reading books or articles or by talking to other
    people. This paper discusses an alternative approach: a sys-
    tem whose knowledge is in the form of text fragments plus a
    query language to help users access appropriate fragments.
    Drawing primary inspiration from database theory, hypertext
    systems, knowledge representation, and a study of textual
    fragments called fragment theory, the paper describes and
    motivates a data model to support knowledge exploration.

  Towards a more piece-ful world
    http://portal.acm.org/citation.cfm?id=954202

    We envision a world in which we can develop, synthesize,
    adapt, integrate, and evolve software based on high-
    quality, perpetually flexible pieces. New pieces may be
    produced by generation, adaptation of existing pieces, or
    integration of pieces, and this process of "pieceware"
    engineering continues-statically or dynamically-until a
    piece with the desired capabilities and properties is
    synthesized. The pieces themselves may comprise fragments
    of requirements, models, architectures, patterns, designs,
    code, tests, and/or any other relevant software artifacts.

> "Links" would be references to objects,
> ranging in size from single letters to book-length.

Yes (but not really URLs, of course).

> But would you reference a certain version "frozen in time",
> or the current version?

Both. Lets that users choose (this is nothing new, to be honest
-- see wikies).

> And if it's gone, how would the system try to handle it gracefully?

Once published, do not let it gone:

  Resilient bulk data distribution - increasing data
  survivability in volatile peer-to-peer networks
    http://www.nada.kth.se/utbildning/grukth/exjobb/rapportlistor/2006/rapporter06/mattsson_andreas_06013.pdf

> And finally, would you store it all in a relational database, or what?

In a Relational DBMS, SQL DBMSs being a bad aproximation.

Nota bene: what I propose has nothing to do with using logic
to model Natural Languages (NLs). Relational Algebra (RA) is
used only for data management purpose, to easy creation and
manipulation of structure while preserving its integrity.
Everything else belongs to upper levels.

As I shown you in a private e-mail, RA can cope with graphs
egregiously. For the curious:

  Maintaining Transitive Closure of Graphs in SQL
    http://www.comp.nus.edu.sg/~wongls/projects/ies-sql/index.html
    Notable, in this paper, the idea of Icremental Evaluation System (IES)
    that increase the expressive power of Relational Algebra.

  Incremental Maintenance of Shortest Distance and Transitive
  Closure in First Order Logic and SQL
    http://e-hrc.net/pubs/abstract/RP-CP-GD-KR-incr-maint-short-dist-trans-clos-logic-sql.htm

See about trees also here:

  http://tunes.org/wiki/KnowOS#Other_papers_on_graphs_and_trees_in_RDBMSs

And if you are comfortable with Python, have a look at this
to avoid SQL:

  Dee - makes Python relational
    http://www.quicksort.co.uk/

A final plus about security:

  [cap-talk] capabilities for databases and database-like systems
    http://www.eros-os.org/pipermail/cap-talk/2005-April/thread.html#3461
  In brief: views = capabilities

This is related to security also, don't be deceived by the title:

  Solving equations in the relational algebra
    http://arxiv.org/abs/cs.LO/0106034

All this is not exactly a "quick shot". What is left?
Oh, mark-up languages. I promise, this will the subject of
another e-mail. (you all: "Oh my....!")

> The existing TDP pages are 99% negative criticism...

Yes, of course. One needs to explain what is wrong with current
solutions before proposing something different.

> I'm really looking forward to your practical suggestions.
> I don't mean to rush you... maybe you could send me a draft
> copy to proofread... :-)

Core ideas are in a so messy state that you would not understand
much of what I have jotted down.

> Seriously, I'm busy enough without this, but I want to get
> the ball rolling.

Well, by now reader's balls are probably turning around
rather quickly. Apology for any such inconvenience...

Regards.

--
Massimo Dentico