Programming project

Matthew Tuck matty@box.net.au
Mon, 24 Aug 1998 20:35:59 +0930


Tanton Gibbs wrote:

>    A strongly typed language is a must.  For scripting languages like
> Perl and JavaScript, weak types are ok and beneficial, but for a heavy
> duty programming language, strong types help prevent many errors.

Certainly.  I personally can't see any point in weak typing. 
Conversions just aren't common enough to warrant the errors implicit
conversions lead to.  And the related feature, implicit declaration, is
also horrible to use.

>    Another problem with many programming languages(including Perl) is
> that they try to please everybody.  I'm more in favor of a RISC like
> programming language that provides a complete yet minimal set of
> instructions.  Don't get me wrong.  I include += -= and so forth as
> complete as well as for, while, do, etc...  The minimal part comes in
> where there is 5 ways to write a do while loop, etc...  You don't need
> repeat-until if you have do-while.

A language can please everybody all it likes.  Features not many people
will use can be put in the library.  A large library is a great asset
for a language - it standardises interfaces to things that need to be
written anyway, preventing several incompatible versions of the same
library being written by someone else, improving compatibility and
reducing programming time.

As for your "repeat-until" example, I'll have to strongly disagree. 
Syntactic sugar (or shorthand forms) is extremely important in
programming languages, and the complexity it adds to them is vastly
overrated.  Parsing is a well understood aspect of compiler
construction, and parsers are easy to write (or better yet, generate).

Where a shorthand is a simple mapping onto another language feature
implementing it in a compiler is simple.  There is a distinct difference
between RISC and programming languages.

With RISC, the computer is executing instructions all the time - the CPU
must be as fast as possible.  With assembly language programming close
enough to dead, the tougher instruction set is not so important.  The
semantic gap is bridged by the compiler.

With high-level programming languages, you are providing a language to
the programmer, not the compiler.  You want to speed up the programmer. 
You might lose time in compiling, but then the programmer saved that in
writing the program anyway.  Syntactic sugar is more readable,
writeable, understandable and maintainable.

Pre-tested, mid-tested and post-tested loops may be all examples of an
infinite loop with an exit statement in the body, or one may be
implementable in terms of another, but that doesn't mean we should treat
them as such.

Let's look at the options:

(a) Dump the post-tested loop, define in terms of the pre-tested loop

To have no post-tested loop, we would need to replace:

do
  A
loop while condition

with:

A
while condition
  A
loop

or something similar.

This horrible contraption that Pascal programmers are subjected to
results in the repetition of A which could be any number of statements. 
This means that there is two copies of something which must be keep
consistent.  Or we could procedurise A to keep consistency, which may be
otherwise totally unnecessary.  So we have exploded code size and
reduced readability, writeability, maintainability ... all because some
compiler or language person didn't want to implement it?  That's what
compilers are there for.

(b) Same, but keep a variable to see if this is the first loop

Replace:

do
  A
loop while condition

with:

done := false
while condition or not done
  A
  done := true
loop

Speaks for itself I think.  Similar to the proposal to replace exit
statements with boolean variables.  Being an unnatural way to do things,
it suffers the same disadvantages as above.  And I think it'd be a
pretty good compiler to optimise this back to what it should have been
in the first place, whereas at least with (a) the optimisation could be
reasonably achieved.

(c) Dump the pre-tested loop, define in terms of the post-tested loop

Replace:

while condition
  A
loop

with:

do
  exit when condition
  A
loop while condition

Instead of duplicating the code section, we've duplicated the condition,
and must keep them consistent.  Remember A could potentially be quite
large and the two conditions could be well separated.

(d) Define in terms of labels and gotos

A nice simple solution wouldn't you say?

(e) Define all in terms of an infinite loop

Here we define a pre-tested loop as:

infinite loop
  exit when condition
  B
end

a mid-tested loop as:

infinite loop
  A  
  exit when condition
  B
end

and a post-tested loop as:

infinite loop
  A
  exit when condition
end

There is not repetition here, and it is reasonably compact.  But I don't
know of any language that has done this.  It's a bit harder to
understand.  You look at the head of the loop when A is a huge block of
statements.  Is it post-tested, mid-tested, infinite or a more complex
control structure?

Structured programming zealots don't like exit for some reason.  They
forget that structured programming was invented to make programming
easier.  It's a tool, not a religion.  In some places exit is just the
best tool for the job, that makes the program easier to understand. 
Sure, we could use structured control structures, but in some cases that
makes it harder to understand, which is what structured programming is
meant to do.  If the tool doesn't work you get another one rather than
trying to hammer in a screw.

It also takes more writing, which is why I'm against it.  Shorthands
should exist for common constructions.  Pre-tested and post-tested loops
are both quite common, and save programming time.  As our languages get
more high level, they get more general, and sometimes the more general
constructions are more complex.  Shorthands are vital to get this extra
flexibility while still keep the old form as a shorthand mapping which
is shorter and used in most cases.  Only the commonly used cases are
candidates for shorthands of course.

For that matter why include for when you have while?

Shorthands are there to make programming easier, but they shouldn't make
it a lot harder to learn.  That's why the language is the place for
putting features almost everyone will use, whereas the library is the
place for less common features.  Of course they do make it a little
harder to learn, but I understand it's a simple mapping, and I will
gladly accept the little bit of learning to make it a lot easier later
on.

The simplification of a programming language is a good principle, but it
is just one vector in the tradeoffs of programming languages.  Sure,
some languages don't use it enough, but these are mainly legacy shackled
programming languages.  I look at Ada 95 and C++ in particular here
since they're two I have good familiarity with.

Even in Ada 83 there are three difference places where types can be
parameterised, that is, generic packages, array and discriminated
records.  The three have totally different dynamics, have different
rules and can be used for different things, rather than just subsuming
them into one feature that allows defining all three.  Note the distinct
difference here though - rather than being shorthands, these are three
totally different language features with non-trivial mappings onto each
other - if any.

>    Functional programing languages( like LISP ) were designed to provide
> a uniform approach to programing( just use Parenthesis!! ).  I feel that
> this was a good idea, but taken to extremes.  I think a better idea is
> that if constructs are similar, they should look similar, if different,
> then they should look different.  In that way, the looping statements of
> C++ should look more like

I haven't had as much exposure to LISP as I'd have liked.  Basically I
know that it was a very early high level programming language, which had
bracketing for everything, rather than using precedence rules.  I think
off memory operators were prefix rather than infix.  And that it was a
pure functional language that was later dialected into being impure.

>   loop( int x = 0; x < y; x++ )  // this is a for loop
>   ...
>   endLoop;
> 
>   loop( x < y )  // this is a while loop
>   ...
>   endLoop;
> 
>   loop  // this is a do while loop
>   ...
>   endLoop( x < y );
> 
>   In this way, the looping constructs look similar and there is less for
> the programmer to have to remember.  I am not advocating this for our
> programming language, but instead just pointing out the merger of
> functional ideas with procedural ones.

I'm not sure how this relates to functional programming.  Anyway, the
same for same and different for different is certainly a good principle
which should be adhered to in the absence of other factors.

As for integrating functional and procedural (or object oriented) ideas,
I was really referred to two things.

Firstly, the use of higher order functions to the degree of functional
languages (both higher order functions and procedures), which is largely
possible at the language level but for some reason is not done at the
library level (how many languages have a map function or procedure?)

Secondly, the introduction of pure (deterministic and referentially
transparent) functions AND procedures into languages.  Non pure
submodules can stay, but they would have less rights - for example you
couldn't use non pure functions in say a := b + c.  This makes programs
easier to understand, and is a good help to the compiler in
optimisation.  I believe some languages have tried this sort of thing,
but I don't know in specific one at this stage.

>> I despise contrived complexity. Programming languages should make simple
>> tasks simple, and more complicated tasks more complicated.
> 
> I don't believe this is quite true.  I think that more complicated tasks
> should be just as simple as simple tasks.  However, that can only be
> achieved by using libraries that other people have written.  The
> integration of libraries and user code must be an important and well
> designed feature of this language.

I think you're just talking about the same thing in two different ways
here.  The simple things should be simple because they are commonly used
operations.  The more complex things are probably either buildable from
the simpler things and hence don't warrant inclusion in the language
proper, or take some writing to implement.  If it's not used a lot,
don't provide a statement for it.  Of course useful algorithms and
classes can be put in a library.

I'm not sure what you mean by "integration of libraries and user code"
here.

> (b) Language wars (which is best) can tend to become heated, but I think
> they should be at least an issue on this list.  Which languages you know
> do you like and dislike, any why?
> 
>   I don't like Java because it forces you to program in one paradigm.
> Even if it is an object oriented one, the notion that I "have" to do
> things a particular way upsets me.  I'm a big fan of C and C++
> especially.  I think that there are a number of things that could be
> improved, but for the most part, I enjoy the languages.

But you just said above you thought a language shouldn't try to appeal
to everyone.  But what is a paradigm if not appealing to the style of
writing of different people (to the group of people for who it would be
the best paradigm?)

Object oriented programming is popular because it subsumes imperative
programming.  Now that doesn't mean in some cases the procedural point
of view would not lead to a shorter program.  But an object oriented
programming language can wipe most of this out with shorthands.

Firstly, the class is a superset of the familiar imperative concept of
"packages".  That is, some collection of variables, procedures and
functions, also often called a module.  Some people try to deny this -
they advocate keeping the concepts separate, which is harmful.  A
package is a "singular object", a class which only has one object.  And
should be NOTHING else - possibly just a shorthand, or possibly not even
that.  The mechanisms for accessing it should be the same as for a
class.

Secondly, there is the concept of "free procedures" which C has.  Now
these are code blocks that are not in any class.  Whether this should be
allowed is not something I want to discuss much here - but it is an
important language design.  Originally the methods in class was a simple
enough idea, but if you have classes at the method level, as designers
are realising more and more is useful, why can't you have methods at the
class level?  But the one thing that must be stressed - and which C++
does not do, is this MUST be done in an object oriented way - let me
explain.

There is a serious problem in C++ with procedural parameters.  Most of
the time they work fine.  But in some cases they break down.

Assume you have some sort of collection (list, array, tree, whatever) of
integers, and an arbitrary map over them.

modifier: type function(a: integer) -> integer
map: procedure (coll: collection, mod: modifier) := ?

Now map basically goes and passes each member through the modifier
function and replaces it with the output value, in some arbitrary
order.  The issue is with the mod function, and there are several
possibilities.

Firstly, you could have a simple function which fits into the map
procedure just nicely, say an increment function.

Secondly just no increment function, just an add function, which doesn't
fit in.  You can define a thunk:

increment: procedure(a: integer) -> integer := add(a, 1)

So far so good.  Notice how we had a static value of 1 to add.  This can
be compiled into the increment function just fine.

Thirdly, that value (1 in the previous case), might not be statically
determinable.  This is where we start running into trouble.  Say the
value we want to add to each member on the list is entered by the user
or obtained from elsewhere in the program or some such.

What to do?  We can't write:

increment: procedure(a: integer) -> integer := add(a, b)

where b is this value - this interprets b at the call time of increment,
rather than the create time of the function, which is what we want (to
create a new function and treat it as a first class citizen).  If we can
be sure b won't change, it is ok.

But if not, we could write:

c: integer
increment: procedure(a: integer) -> integer := add(a, c)

and copy b into c at the create time.  Remember that there would be one
c for all the increment functions, so if we wanted to define several
different add procedures which would coexist, we would have to write:

c, d: integer
add_c: procedure(a: integer) -> integer := add(a, c)
add_d: procedure(a: integer) -> integer := add(a, d)

Fourthly, we might want to have an unknown number of these functions. 
You're out of luck here - no values can be attached to functions.

To be fair, as far as I can determine, there is a way of doing it in
imperative languages.  Simply add a pointer to the attached data in the
implementation of procedural parameters.  Now in languages with nested
submodules there is already a pointer to environment and this could be
chained onto that, but in C, there is no such thing.

The object oriented solution to this is much easier.  Every procedural
parameter becomes an class with one method.  Now, if you want to attach
data, just extend the class!  You can have as many of the object as you
want with different data values and different lifetimes.

There are two objections that could be levelled at this.

Firstly, that the object oriented way has a lot of syntactic overhead. 
Well our friend the shorthand comes in here.  Notice how we have
provided a more general facility that takes increases code size?  Well,
define "free methods" as a shorthand for a class definition, and you can
get the smaller code size, plus retain the ability to extend the class
if necessary.

Secondly, that the implementation of objects is more expensive than
procedural parameters.  But an optimiser should be able to fix this.  If
you don't need to attach data, the optimiser could implement it as a
normal procedural pointer.  As a bonus, any classes you wrote as normal
classes might be able to be optimised done in the same way.

I have personally been confounded by this problem programming (in C++)
before, and it is only not very prevalent amongst procedural and
object oriented programming because of their general wasteland of
higher order submodules, where this problem arises.  This is accepted
functionality in functional languages.  I forget what the name of this
problem is - I think I read about it sometime/somewhere after I
discovered it.

>   As for string handling, I enjoy the way C does string handling( much
> to the chagrin of the other members :)  The problem with built in string
> handling is that it is too restrictive.  Lets look at two alternatives
>   String s;
> First, what does this do?  Is it like Pascal and take up 256 characters,
> or is it more like the string class in C++ and dynamic.  If it takes up
> 256 characters and the string is "HI", then there is a severe memory
> loss, enough to make the language not practical for embedded systems and
> many other memory critical areas.  On the other hand, dynamic allocation
> is much slower and will require a lot of compiler overhead and is often
> Machine dependent.  Furthermore, which memory allocation scheme should
> we use since most schemes don't work for everyone.  Either way it is a
> mess.  The C++ alternative provides a clean way to create a string that
> can be extended by classes to provide a nice solution.

You're tickling my personal language design beliefs today.  =)  There
are various kinds of strings - fixed size, bounded (maximum) size and
unbounded size.  Ada 95 manages to provide all three in a library. 
You're concerned here with the efficiency of string implementation

Programming languages have for too long confused the notions of
interface and implementation.  Why do you have a class which has one
implementation?  What languages should do is provide an interface for
strings, and several implementations, whereupon the programmer simply
chooses the best one.

The implementation for this feature is already here - inheritance and
abstract classes.  But it shouldn't be written at the language level. 
Why?  Among other things, programmers wouldn't bother, restricting their
classes to a single implementation.

This is one of the features I'd say languages are crying out for -
sorely needed.  It has been suggested at various times but I don't know
any languages that do it.

>   C++ does not need the C preprocessor.  An import statement similar to
> Java would be nice.

Certainly.  Every single facility of the preprocessor can be done better
in other ways.

>   Languages need to have a full array of features to make them complete,
> including multiple inheritance.  A feature should not be taken out of a
> programming language just becuase one use could misuse it, we should
> create a language that is easy to use, yet not idiot proof.  We should
> expect competence from our programmers.

True, but we are human, and we should still try to protect against
error.  That's what strong typing is, after all.  Some form of multiple
inheritance definitely, and not Java interfaces, either.

>> It's a case in point in what happens when you take a language and modify
>> it over and over again - it turns out a mess, through the necessity to
>> be backwards compatible with previous versions.
> 
> I agree with this totally, and that is what is wrong with many things,
> not just programming languages( Win95 for instance ).

Code wise this is not necessary for operating systems, they can shunt
legacy stuff off into subsystems, but the efficiency of old programs is
a concern (only for one or two versions though unlike languages, which
are hamstrung forever), as is the necessity to only change the UI
slowly.

> As far as the back end goes, I believe we should create a back end for
> the JVM as well as C.  Both are good choices and even Java can be
> compiled to Machine code if you know the target platform.  Therefore, if
> we do all our optimizations on an independent intermediate code level we
> could easily write to both JVM and C at the users discretion.

Yes, but I don't believe JVM is a good level for optimising at, at least
from what I've heard.  I think the best is an AST, which can be
converted as necessary.  Basically, all that is important is that it is
at the source code level, but with all redundancy (forward
declarations), unnecessary stuff (full identifier names where no
reflection is used, keywords, import statements, etc.) and shorthands
are removed.  Then we output to the JVM as necessary.

> One problem with compiling to C is that it doesn't have enough features.
> We would have to extend the C language to allow for things like
> exception handling for example( which caused the extinction of the
> orignal C-Front C++ compiler ).

Well, C++ has exceptions, we could use that instead.  We might even be
able to map our inheritance onto its inheritance.

> Now for Basic.  One member wrote that the 1.5 M DLL for a program is
> unsatisfactory, and for one program I agree.  However, any program
> created in VB uses only one DLL, the same one!  So all programs are in
> effect reduced by 1.5 Megs.  Delphi on the other hand can produce
> programs that don't require the DLL, but they are larger.  It is a
> quantity-space tradeoff.  If you are developing 100 apps, VB would
> probably be better.  One app, you should go with Delphi.

Static vs. Dynamic linking is an issue I hadn't thought about, but this
stuff should usually be in the OS in the first place.  Not that we can
use this excuse in our project.  Allowing both might be an option.

-- 
     Matthew Tuck - Software Developer & All-Round Nice Guy
                              ***
       Check out the Ultra programming language project!
              http://www.box.net.au/~matty/ultra/