Data mining for Tunes doc? (was: Preliminary Review database)

Massimo Dentico
Mon, 12 Jun 2000 18:57:55 +0200

David Manifold <> wrote:
> Right now I have a minimally functional Review database.  You can browse
> it, and I can create user accounts which can only add new data.  There is
> hardly any data in it yet, but if you want to contribute seriously, send
> me an email and I can set you up an account to edit it through the web.
> [...]
> I am working closely with water and coreyr to coordinate what is happening
> in that department. 

Brian Rice <> wrote:
> [...]
> Tunes members are more than welcome to contact Corey, the DB administrator 
> for the moment, and ask for access to manipulate nodes. I'm not leaving his 
> email address here. I suggest you use the #tunes IRC channel to contact him 
> and discuss it there.
> [...]

Dear David and Brian,

I appreciate very much your (and of other) effort to reorganize and
improve the Tunes project documentation. Feel free to create and
communicate me an account, via e-mail, possibly suggesting the
guidelines for the new documents structure (sorry Brian, I'm afraid
that it's not practical for me to chat in English: I haven't real time
performances ... well, even my batch performances are not better :-).
I think, in this phase, I can easily help in the migration of the
precedent documentation.

Only a little of perplexity: with this entire restructuring we break
every external link. Is absolutely impossible to keep a little bit of
compatibility with the static old structure? For example, transforming
the old pages to indexes to new information, with links at every anchor
instead of information?

I want propose also an idea to your attention: the big problem with
this textual unstructered information is to create and maintain a
useful classifications and cross-links. With traditional DB techniques
this require a massive human intervention that is boring and time

I think it's possible to use statistical methods and other machine
learning methods to overcome (at least partially) this problems.
The field seems quite well developed, with commercial applications
already available. In fact there is at least one start up which has
grown greatly in the last years with these methods: Autonomy.

However, I don't know how is difficult to implement these techniques
and if it's practical to explore the subject in this moment: it is
only a suggestion.

Some references:

  CMU World Wide Knowledge Base (Web->KB) project

  Bow: A Toolkit for Statistical Language Modeling, Text Retrieval,
  Classification and Clustering

  Naive Bayes algorithm for learning to classify text

  Central Inductive Agency.

  Data Mining from spam

  Data Mining and CRM (Kurt Thearling)

About Autonomy:

  Wired 8.02: The Quest for Meaning

  Michael Lynch CEO Autonomy

  Autonomy - Knowledge Management and New Media Content Solutions

This product is free for personal use, but unfortunately only for

  Autonomy Kenjin

Best regards.

Massimo Dentico