Language Syntax Suggestion

Matthew Tuck matty@box.net.au
Wed, 10 Feb 1999 10:48:57 +1030


Jeremy Dunn wrote:

> My interest in computer languages revolves around the symbolism that one
> actually has to type in to write the desired code. Every language I have
> run across suffers from syntax inconsistency. What do I mean? A language
> like C has many functions in the form func(arg1,arg2...argN) but then
> will allow you to write 2+3+.. , an expression of the form arg1 func
> arg2 func...argN. To this I say MAKE UP YOUR MIND!

LISP has functions like this performed by postfix, so you'd write
something
like +(2,3).  I think most programmers rejected it because it was harder
to
understand.  But it is a largely personal thing.

> Language writers seem to be pretty loose about creating mutually
> contradictory ways of writing things. It is my belief that we must pick
> a function syntax and stick with it without any exception in the
> language, > this makes for simpler compilers

While it is certainly a true that it makes compilers more complex, I
have
two points to make against this:
(a) compilers are there to be complex rather than our programs being
complex, and if extra complexity makes programming better, then so be
it.
After all, the least complex compiler is none at all.
(b) syntax is well understood and is the most easy part of the compiler
to
write, if not automatically generate.  Therefore syntax overhead is not
a
large issue.

> and reduces the mental overhead that the programmer must keep track of
> to write code.

I wouldn't agree that this sort of thing creates mental overhead.  It
exists to make things easier rather than the other way around.  Some
initial overhead in learning always exists, but you benefit in the long
run.  Of course, this sort of thing is rarely quantified into data
contrasting two approaches.

> Programmers use the terms set,array and list interchangeably but I will
> use the term SET in my discussion in respect to the science of
> mathematics which predated computers to begin with.

Actually the term "collection" seems to be becoming the standard for
discussing this sort of structure - set is reserved for unordered
collections without duplicates, as with maths.  Lists are ordered
collections with duplicates.  Arrays are often looked upon as some sort
of collection, but I consider them to be functions from indexes to
elements.  That is to say, you could take a set of indexes (domain), OR
a set of elements (range), but it is not a set itself.

> I denote a general set of items of any type as [,a,b,...,n] where the
> square brackets indicate a general set. We note that the elements of the
> set are seperated from each others using commas, we also note that the
> first element has a preceding comma. What is that all about? A comma
> preceding an element of a set denotes that the element which follows it
> is an ORDERED element i.e. it must be read in the order going from left
> to right that it appears in. If we precede a set element with a period
> as in [.a.b] then we are indicating that the elements are UNORDERED. So
> the set [.a.b] is equivalent to either of the sets [,a,b] or [,b,a]. It
> is clear that a set with unordered elements must have AT LEAST 2
> unordered elements, it makes no sense to write [.a].

Basically you're looking to be able to define literals for both sorts of
collections.  I'm not entirely convinced there's a reason to do this
though.  If you create a list literal, the conversion to any other type
of
collection is rather straight forward.  The only disadvantage is one of
typing.  If assigning to a set you might want to ensure no dupes were
entered, since it might be a bug somewhere.

> A FUNCTION is a set [funcname,arg1,arg2...,argN] in which the first
> element (the function name) does not have a preceding comma. The lack of
> comma enables our parser to find the function name if one exists. We
> have two other kinds of brackets to denote two other special kinds of
> sets. We use the left and right parentheses to bound a STRING SET i.e.
> the string "house" would be written as (house). We may also write this
> as (,h,o,u,s,e) if we desire. If there are no commas in the string set
> then they are assumed by the program. Strings are a whole seperate issue
> and I do not propose to get into their syntax in detail in this first
> email except to indicate that they are a set with elements of a
> particular type and are treated syntactically like any other set.

This is basically what LISP does.  Everything is done with lists,
including
function applications.

Just a question - why do you allow two different ways of writing
strings,
yet not two different ways of writing expressions?

> The final type of set I call a HOLOR SET. Holor is a term introduced by
> Parry Moon and Domina Spencer in their book "Theory Of Holors".
> Basically a holor can be thought of as a nth order matrix. Integers,
> real numbers, vectors, complex numbers and matrices are all holors.

What might be a holor would be a subjective question.  Any information
can
be broken down into two or more pieces of information, right down to
booleans.  It really amounts to whatever is most convenient. 
Considering
complex numbers as 2D vectors for example, has the benefit of inheriting
the +, -, scalar *, etc. operations.

I'm not convinced that integers or reals have benefit being considered
as
n-dimensional matrices.  Functions maybe.  What is the rationale behind
basing functions on holors rather than vice versa?

> A holor set is bounded by the curly brackets { and }. Integers can be
> written normally as in 243 or as {243} if we wish to indicate their
> general holor nature. A real number like 2.34e24 would take the form
> {r,234,25} where "r" indicates a real number function that takes the
> integers 234, attaches a decimal point to the front to get .234 and then
> multiplies it by ten to the 25th power. The number 0.234 would simply be
> written as {r,234} where the power takes the value of 0 if it is

Well that's all well and good but would anyone want to use it?

> omitted. The brackets of a function set are of the type of the arguments
> which it takes, and all the arguments that the function acts upon must
> be of the same type. If the arguments are not strings or numbers or are
> of more than one type then square brackets must be used. A complex
> number a+bi is written as {,a,b}.

I thought that was a set?  Does that mean I can assign a complex number
to
a set of numbers?

> REVERSE ITERATION
> An expression such as s-t-u-v can be written as {A,s,t,u,v} where this
> is the same as writing (((s-t)-u)-v). This successive application of the

This is generally referred to as left and right scanning in functional
languages.
You generally have something like:
sumlist(list) = scanl(+,list,0)

the last element is added for operations with an identity value to allow
empty lists to operate so recursively:

e.g scanl(+, (2, 3), 0) = 0 + 2 + 3
    scanl(+, (), 0) = 0

> FUNCTION NESTING
> Suppose we have an expression of the form [Z,[Y,[X,p]]] where three
> functions X,Y and Z are being applied successively to the argument p, we
> are allowed to rewrite this as [Z'Y'X,p] where the apostrophe indicates

I think this is called function combination in maths?  Maybe something
else.

> COLLECTION OF ARGUMENTS
> If we have an expression of the form [b,[x,p][x,q][x,r]] we note that
> the same function "x" is being applied to three different arguments p,q
> and r. We can rewrite the expression as [b,[x,p'q'r]] where the

This is generally referred to as mapping in functional languages. =)

You generally have something like:
map(add_one, (1, 2, 3) ) = (2, 3, 4)

> COLLECTION OF FUNCTIONS
> If we have an expression of the form [b,[x,p][y,p][z,p]] we now have a
> situation where the argument is the same but the functions are
> different. We may write this as [b,[x^y^z,p]] where the ^ indicates the
> collecting of functions upon an argument.

I can't think of an functional language equivalent to this but there
might
be.  Where might you use this?

> I am sure that at first glance that this notation must seem pretty alien
> but it has the virtue of being consistent in its methodology and results
> in extremely compact expressions when one gets the hang of it.

APL did too, but it hasn't been copied very often.  It's generally
considered to be not very readable.

> It seems logical to me to treat everything as a set of elements, this has
> the advantage that any function that we devise to operate on sets will
> operate upon a set of ANY type of item.

This is the point of an object oriented typing hierachies and generic
systems.  The difference is that here, you actively consider stuff as
sets,
whereas normally that is hidden.  I prefer the latter, as you can change
representation.

> For instance, suppose we denote the union of set X and set Y as Jn[,X,Y].
> The expression Jn[(house)(boat)] would perform string concatenation. We
> could also write Jn[,354,68] and get 35468 as the result, we get GENERAL
> concatenation rather than a specific form of concatenation. This kind of
> generality eliminates the need for the programmer to remember several
> functions which are really performing the same fundamental operation.

I was planning to do something similar in the collection type hierachy.
The only interesting point to make is that often you have operations,
like
concatenate, and reverse concatenate (a rc b = b concat a), which become
the same operation when applied to orderless sets.

> I only briefly touched upon the arithmetic operators, but suffice to say
> that these operations should be overloaded to be operations upon general
> matrixes so that one has the complete complement of complex addition,
> vector addtion etc all subsumed within the function. Our programmer
> should not have to be a mathematical wizard and create a matrix multiply
> so that he can do something that should have been provided for him. Many
> languages have terribly inadequate math functions.

That's true, a good general set of functions that apply to all the types
they could work over is important.  But it's not always easy.  For
example,
it'd be nice to consider complex numbers as 2D vectors.  But
object-orientedly speaking, the complex numbers would inherit the 2D
vector
multiplication operations which make no sense.

Then there are people who argue that having something like + implies
commutativity, so it shouldn't be used for string concatenation.
Similarly, every operation that has a multiply operation has two
division
operations (pre-multiply by multiplicative inverse and post-multiply
...).
Of course, multiplication is usually commutative so they're the same. 
But
not for matrices.

> I think I will cut off at this point, I don't wish to go into details on
> individual functions that I desire until I get some kind of sense back
> from the group as to whether they like or despise any of this. To my
> mind the only point of writing another language is really to develop a
> better syntax or function set because it is primarily those areas which
> are the point of aggravation to most people.

You'll probably be interested in the design of the mathematical library,
especially ensuring there is generally.  Identifying common operations
is
useful.

As you've probably read, we're looking at some sort of
syntax-independent
language.  Hence you could write your own syntax like this and be
largely
independent of those who don't make the same syntax tradeoffs as you.
There's a fair amount of work to be done in this area though.  I imagine
you'll be interested in designing special syntactic support for the
underlying mathematical functions to support what you've detailed in
this
message.

Have you looked at functional languages before?  If not, definitely do
so, there's a lot you'd be interested in in them.

-- 
     Matthew Tuck - Software Developer & All-Round Nice Guy
             mailto:matty@box.net.au (ICQ #8125618)
       Check out the Ultra programming language project!
              http://www.box.net.au/~matty/ultra/