Language Syntax Suggestion

Hans-Dieter Dreier Ursula.Dreier@ruhr-uni-bochum.de
Sat, 13 Mar 1999 14:42:29 +0100



Jeremy Dunn schrieb:

> Hans Dieter Dreier wrote:
> ...
>
> I think the confusion is this, the comma (as I am using it) precedes
> the
> element it designates and says "the following element is ordered, it
> must be in this position in the list". So the comma both separates
> elements and carries syntax as you say. Thus [,a,b] is a list of two
> ordered elements, [,,a,b] is a three element ordered list whose first
> element is empty, [.a.b] is a list of two unordered elements i.e.
> equivalent to both the lists [,a,b] and [,b,a]. In conventional
> languages the comma as you say is nothing more than a separator because
> it is always assumed that the list is ordered. I merely suggest that we
> allow the programmer the ability to decide for himself whether order or
> disorder is appropriate to his particular problem.

I see. Usually the comma is a separator, which means that there must be a
list element on both sides. In your approach it is a prefix, also serving
as a separator if appropriate.As to "ordered" vs. "unordered": I think that
depends.
If the list that is created by that clause is just a temporary one (such as
a parameter list), I'd see no point in allowing to make it explicitly
unordered. Since it is written sequentially, there always is an order.
Whether that has to be preserved or not depends on the further usage of the
list. Anyhow, if that decision is anticipated by the way the programmer
writes the list, the compiler is at least restricted in its choice of
possible optimisations.
If the list is part of a declaration (such as the member list of a class),
the nature of the declared item already determines whether the list
elements are ordered or not. Also, there may be cases where the programmer
does not know (cannot know, does not want to know...) whether the list will
be ordered internally or not, since this might be implementation dependent.
If that implementation changed, the declaration would have to be rewritten.

> >True, but again the reader has to scan the context to detect what ","
> >means in a particular place. That's one reason why they invented
> >keywords - to make the token express its meaning more clearly if it is
> >a special one (like a control construct) without the need to look at
> >the surrounding text. Of course no special symbols are ever required
> >except () - just look at Lisp. But many people have big problems to
> >read a Lisp program as fluently as a more traditional one, say Pascal
> >or C, at least until they are used to it.
>
> Let us take the example "If A Then B ElseIf C Then D" again and
> consider
> it in my format as it would probably be written in an actual program:
>
> [if,A
>    ,B
>    ,C
>    ,D
> ]
>

Shouldn't there be a comma before the "if"? I'd think that the if must be
the first element (hence ordered) to make it a condition clause.

> This is more LISP like.
> ...
>
> [e
>  ,K
>  {C'b'A
>   ,s,a',s,b',s,c',s,d
>
> It should be clear that the arguments of a function are always indented
> farther to the right for each increase in the depth of the expression,
> we don't need the trailing brackets because it is always clear what
> each
> expression is operating upon. This type of code would be very easy to
> parse and at least as readable as most languages, it would eliminate
> LISP's trailing parentheses. This layout is one reason I was suggesting
> writing an expression such as s+t as {a,s,t}, it is necessary to write
> it this way to make the proposed vertical layout workable.

It certainly is easy to parse. But not easy to read, at least not for me.
Also please keep in mind that programmers make typing (or indentation)
errors. It helps if the syntax is designed such that as many of these
mistakes as possible result in illegal programs. This is one of the reasons
why many programming languages use case dependent delimiters or even "noise
words" instead of more general delimiters, like in "if a then b else c
endif".

> >How do you express a loop that has a conditional break somewhere in
> the
> >middle of the loop body? Or two of them?
>
> Other than the appearance and layout of the test condition my Loop
> function behaves pretty much like any other loop, the body is merely a
> series of steps like we would normally see. You could have something
> like:
>
> [Loop,<test>
>    ,[
>      ,step1
>      ,step2
>      ,[if,X,[Break,]]
>      ,
>      ,stepN
>     ]
> ]
>
> Break would be a function with an empty argument that does nothing but
> break the loop. There are probably several ways to do this.

Yeah, but why use the <test>? If it is needed, it too can be written as a
break clause that happens to be the first element of the body.

> The meaning of the commas was explained above. The test condition of
> the
> loop is either an integer or a list. If a list the list can have only 2
> or 3 elements and they must be the form and type indicated for the Loop
> function. The expressions you wrote would be interpreted as:
>
> [,a,] an ordered two element list with the 2nd element empty
> [,,a,] an ordered three element list with the 1st and 3rd elements
> empty
> [,,,a] an ordered three element list with the 1st and 2nd elements
> missing
>
> I hope this clarifies things.

It does.

> >So do I, but I would do it differently.
>
> I'm game, how would you do it differently? This is precisely the kind
> of
> discussion I am interested in.

I'd define a loop as follows:

do <expression> od

where expression is an expression yielding some value which is the value of
the loop. The loop can thus be used anywhere an operand can be used. There
is a binary operator named ; which takes its first argument, evaluates that
and discards the result, then takes its second (right side) argument,
evaluates that and yields the result. There is an operator named , which
evaluates its left argument, then evaluates its right argument, then forms
a list from both results or if the left argument already was a list,
appends the right argument to that list. There would be a unary (prefix)
operator called "break" that breaks the current loop, yielding its argument
as the result of the loop. It would only be allowed inside loops. Operators
would be subject to precedence rules ("binding powers") as commonly used.
"(" would be a prefix op if used to change evaluation order or an infix op
if used for a function call. Some operators would need matching unary
postfix ops (namely ()[]{} do od if fi case esac). There would be a
(shorthand) prefix op named "while" which would be equivalent to a
conditional break like this:

if x then break fi  <=>  while not x

Also there would be an "until" prefix op:

if x then break fi <=> until x;

These shorthands could only be used if the loop yielded a void result,
because at all convergences of control flow each branch must yield a value
of a type that can be automatically coerced to "the" type at that
convergence point. (There is no "no result"; instead there is a "void"
result". Anything can be converted to a void result, but the opposite need
not always be true). You see, it is quite conventional. I think writing
effort is not significantly bigger than in your approach; error catching
may actually be better in pratice. I'd claim that everyone who has ever
programmed a line in BASIC can read it right from the start without any
explanation; I'd regard that as an advantage.

A long time ago I actually designed an interpreter to use such a syntax,
and found it quite easy to implement as well as to write programs in it.
I'd call this sort of syntax "expression oriented" since everything is an
expression and yields a value. That has the advantage that the need for
variables is greatly reduced, thus making program semantics more local and
easier to understand. It also had a set of rather unusual control flow
constructs (operators, of course). I can give some examples if you like.

BTW, we have been discussing primarily the "executable" parts. I'd like to
know how you would do the declarations and module structure as well.


Regards,

Hans-Dieter Dreier