A case against standards

Armin Rigo arigo@tunes.org
Fri Oct 31 04:49:02 2003


Hello Ilmari, hello Lynn,

On Thu, Oct 30, 2003 at 08:32:52PM +0200, Ilmari Heikkinen wrote:
> I think one important goal in making a meta-converter is to allow the
> use of several programming languages, tools and data structures together
> with minimum effort, gluing the current mess together. To use a network
> analogy, the current situation is like bang-path email, where you have
> to manually define all machines in the route, vs the current system of
> routers automatically figuring out the route.

Yes.  This is the operational aspect of the conversion ("horizonal" in my 
pictures).  The "vertical" aspect is what I'm describing next:

> > I'm saying is that we need to be able to express the *existence* of aspects
> > (the "context"), and that it can be useful even if we do not necessarily
> > formalize these aspects, because we can say which ones are considered relevant
> > in particular cases (it may not be the same ones from program to program), and
> > say which of these aspects conversion routines preserve.
> 
> Yes, agreed. 
> 
> Though I think that the function that uses the context actually
> formalizes it, albeit ambiguosly.

Yes, the operations and morphisms (i'm saying "morphism" instead of "functors"
now) are in some way formalizing the concepts: a concept only has meaning as
far as it is related to other concepts (which is also true in some views about
what intelligence is, as far as I understand it).

Let me try to summarize this again.  (BTW thank you for all these e-mails,
with this and other feedbacks I will rewrite and complete important sections
of the draft)

The "operations" defined between concepts correspond to the horizontal 
direction in my diagrams.  They capture what you usually express inside of a 
programming language.  This is most importantly algorithms.

The "morphisms", or vertical arrows, relate different points of view on
concepts.  They are *not* formally defined in any language.  This is an
important point: it is possible to have powerful reflective languages in which
we think we can express the morphisms, but my current point of view is a bit
different.

I'd like to see how far we can go by cleanly separating the algorithmic part
of a problem ("all languages are Turning-equivalent anyway") and its semantic
part ("what is this variable/argument supposed to be?").  I'm not saying that
reflectivity is bad or unnecessary.  I'm saying that it is not the ultimate
answer.  Lynn's suggestion about a "universal" specification language is
interesting but in my humble opinion any such attempt is doomed to be only
"yet another" specification language, however good or seemingly complete it
is.  (By essence, ideas don't necessarily fit into any predefined formalism; I
could also invoke the Godel argument here.)

I'm thinking about "concept models" as a way to capture some information and
semantics that are implicit in a language.  A good example I should develop
more is that of compilation: in a high-level language you can express a lot of
properties about the objects you are considering; when you compile it (say) to
C, you create functions and variables that still have the same meaning as some
concept in the original program but more implicitely.  (They also have a lot
more of low-level semantics, which don't correspond to anything in the
original source.)  And when the C is compiled to machine code it still gets
more implicit.  Ultimately, the machine code is still a refinement of the
semantics of the original program, but a lot of the original program is
entierely implicit, though some constructions, or some set of machine words,
or even some bit at some memory location, still has the same meaning as
*something* in the original program.

You can probably model C and Machine code in your high-level language and
express all these relationships there, but fundamentally this hierarchy is
external to your source language.  As far as I can tell it makes much more
sense to consider them as metaproperties of languages, and there is no
particular reason to encode them in precisely the same language as your source
code (though you can, of course).

Capturing this kind of relationships is always tedious and bug-prone and
limited; see for example debugging information in executables.  I'm thinking
about a way to express it explicitely: I'm not saying that storing debugging
information in executables is not needed, I'm saying that there is no need to
store it in any predefined format as long as you can express how your data
relates to the source code.  It is often not easy to relate it, so that you
often cannot just say "this concept here is exactly that concept there".  
Debugging formats are data formats that try to be complete enough so that you
can express these relationships, but they are limited -- I can think of tons
of compiler optimizations that you cannot encode in stabs or COFF.  Instead
you should just be able to write algorithms (in some language, doesn't matter)
that are able to undo the optimizations.  For example decompiling a running
machine frame to get back to source-level variables might require reversing
the weird encoding choosen by the optimizer.  This is not always possible;  
optimizations that cannot be undone correspond to the inference engine not
finding a way to decode it using the available set of operations.  This is a
useful piece of information in itself, because it can be used e.g. to prevent
the compiler from doing some optimizations when the user (or some other
program) wants the ability to inspect the result.

Again I'm not pretending that my "concept" formalism itself is complete or the
ultimate answer to anything (and not either that it is original).  I'm just
thinking that we would need something like this for almost anything you want
to do with a computer :-)


A bientot,

Armin.