Strong typed system

Armin Rigo arigo@planetquake.com
Mon, 8 May 2000 10:28:53 +0200 (METDST)


Hello Billy,

[Why doesn't the mailing list set 'Reply-To' to itself ? Sorry Billy, I
did not notice I first sent you my answer in a private e-mail...]


On Fri, 5 May 2000 btanksley@hifn.com wrote:
> I agree that the system should manage communication.  I don't understand
> what you mean by saying that the application should never ask the system;
> how will the system be able to manage something that never occurs?

What I meant is that the applications should not ask for a system service,
like they always do in all current API-based environments. Instead, they
should somehow explain the system what it can do with them. This is
developped in http://student.ulb.ac.be/~arigo/bazar/infobase/why.html

> > * the system must know exactly what it is doing at any time.

For our current purpose I mean that it must be aware of the type of data
it handles. (More generally I also mean that it should somehow be aware of
why it handles this data, as seen in the aforementionned page.)

> Strong typing serves two purposes: first, to catch category errors, and
> second, to permit conveniences such as polymorphism.  I don't see how it
> helps a system know anything.

In the above sense, typing helps the system in knowing the type of data it
is handling (well, that's obvious) but also in knowing what it can or
should do with this data. More generally I believe in strong, uniformized
data typing as a fundamental move away from today's protocol and file
formats (either text-based or non-flexible binary black boxes).

> The "problem" you're looking at here is that all the languages you mention
> (except Forth) use a context-independant parser, and you've identified that
> parser with their syntax.  That's not really correct; in reality, the syntax
> for those languages cannot be described by BNF because it's context
> dependant.

All right, I see your point.

Well, as I think you have understood, I think typing is the way towards an
efficient data representation (as it let us encode only exactly what we
need); and code is in my opinion just one type of data (I personally tend
to consider data as the central part of computing, not code). The syntax
of a programming language (or should I say its default one) is of course
not the main point; what I think however is that strongly typed
intructions are the most natural way to let the user define its own, rich
syntax.

Tell me if I'm wrong, but I believe there is no direct way, say, to
display and edit a Lisp program with a syntax richer than (x y z). Of
course it is possible to detect (+ x y) instructions and display them as
x+y, but I would call it an indirect (or artificial) solution because the
underlying representation of the instruction does not depend on which
instruction it is. On the other hand, one could encode the typed
instruction "sum of two operands of type T" as a record of two
instructions that produce a result of type T -- T being implicitely
assumed in the encoding of these two subinstructions. This encoding seems
more natural. (BTW why limit us to instruction with n linearily-ordered
arguments ? Complex instructions might need richer data structures.)

> At any rate, this has little to do with Tunes.  You're assuming a single
> language with a single representation; Tunes wants to have a single language
> with multiple representations.  Syntax is only a surface issue, although
> it's an important one.  The less syntax Tunes has, the more flexibly it'll
> be able to represent its code.

I guess it will look like I'm opposed to you (but I think we are finally
fighting for the same solution) : I believe a rich syntax is important,
although of course not a fixed one ! The syntax is the "interface" of the
language, and as with all interfaces my credo is that the type of any data
should be enough to build a nice interface for it (or rather nice
interface*s* -- there is of course no single best way to build an
interface for some given data).

So in summary I think that

* data should always have some type that a piece of code outside the
application should be able to identify;

* code is data; syntax is the interface for this data;

* the data that represents code can be pretty complex (not just a function
name followed by some number of arguments), if it is more natural in this
way;

* always encode data in the most natural way :-)

As a side note, I'd like all instructions to be type-safe, i.e. no
instruction can ever let the program access out-of-bound or silently
casted data (here, a proof checker helps more than just dummy checks each
time, say, we access an item in a array).

> This is used in Oberon's slimBinaries.  Although you DO realise that the
> type of the expression DOES encode the number of arguments; it's fundamental
> information theory that SOMEONE has to.

Certainly. The type of the expression (or of anything else) encodes quite
a lot in my mind -- much more than just a number of arguments.

> Who knows the type?  Someone must.  Personally, I believe that it's enough
> that the programmer knows and his compiler checks, and then with a secure
> binary the user's loader should check -- but Tunes seems to believe that the
> runtime system should also know the type.

So do I. Data exchange (if only this) cannot work correctly without type
information. So does garbage collection, data conversion, data display and
editing... well call it an intuition, but after quite a lot of
programming, typing (outside the boundaries of an application) is the
thing that seems the most natural to me and that I misses most. I would
even say that it's where I think the C++ language completely fails, even
more than its fixed syntax and reflexiveless nature : you can represent
quite complex data structure inside an application, but as soon as you
must communicate this data with anything outside, you are back to the
60's, programmatically speaking : write 0-0-1-0-1-...


A bientot,

Armin.