Programming project

Matthew Tuck matty@box.net.au
Sat, 29 Aug 1998 22:12:54 +0930


Are these messages getting longer and longer or is it just me?

Tanton Gibbs wrote:

> Right here...I think you misunderstood my example.  First, I agree
> with the need for a prefix loop, postfix loop, etc...  I just said
> that you did not need two nearly identical ways to do a postfix
> loop.  For example, Perl allows both
> do{ ... } while( ... ) and repeat{ ... } until( ... )

So here we're getting back to making like constructs look alike?  Does
Perl allow do until and repeat while?

I like the way Visual Basic does this  That is, call it do/loop, and
allow attaching a condition to either, which can be while or until, for
a total of four possibilities.  This is more orthogonal that the normal
while/do or repeat/until and they look more alike.  They're also
slightly more readable since they use while not instead of until.

> These are both so similar that it will be confusing to many
> programmers to see one when they are used to seeing the other.

To be fair here I don't think in this example a programmer would get
used to using one or the other - they are both common and used in all
programming languages.

> Another example is
> the Perl statements
> if( x ){ exit }
> and
> exit if( x );

Ada has the same thing, although it looks to me from your code that Perl
uses open form construction, while Ada uses closed form construction. 
That is to say,

  Open           Closed
if X then      if X then
 Y;             Y;
               end if;

if X then      if X then
 begin          Y; 
  Y;            Z;
  Z;           end if;
 end;

Now for some reason most languages tend to use open form.  I suspect it
is mainly because it makes the top left form possible which is more
appealing than the top right.  But then on the second row the opposite
is true, and I'd say the second row would be more common.

Closed form has many advantages over open form - it prevents adding a
statement to the top left form thinking it will come under the scope of
the if, and further prevents the dangling else problem which causes
syntactic ambiguity.

In closed form languages, the exit if shorthand is a nice way of getting
the conciseness of both the top left and bottom right.  In an open form
language there's really no point however, since there is no syntactic
advantage.

My real qualm with the way this is done in Ada (and in Perl?), is that
the "exit if" form is only for the exit statement.  WHY?  I know
burying  the exit statement in control logic makes a program hard to
understand, but why not just be orthogonal and allow it on all simple
statements?

If this was done you'd get the advantages I discussed above on all
simple statements, eliminating the "end if" syntactic junk in many
places throughout the program.  It is a pretty common occurrence.  Add
it to procedure calls, exception raises, assignments, the whole lot.

Now here you might be thinking this is all well and good but that it
shouldn't look so different?  Well you could choose something which
looked more like it, but obviously there has to be some difference
between the two or the compiler wouldn't know whether the second
statement it parsed was under the scope of the if.

You might not want to make it look too much the same though, otherwise
we'll get the same problem of a programmer accidentally adding a
statement to the if thinking it comes under the scope of the if.

The reasoning behind making statements look same for same and different
for different is as far as I can determine to make the language easier
to learn.  It doesn't really matter how grotty the syntax is once you've
learned it - I think we can assume this from COBOL.

In this case "exit if" is immediately understandable, so the problem
doesn't occur, except for the detail of the statement scope of the if,
which is still significant I have to admit.

Or is the problem that they look too much alike (both use "if") and that
they could be confused?  This could be argues too, in which case use
"when" instead for the reverse form, like in Ada.

A more revolutionary scheme would use indenting for this sort of stuff,
and Miranda (functional) uses indenting for parsing decisions.  I
wouldn't recommend this though, but I do have a few ideas on this that
I'll share when more fully formed.

Perhaps it might be worthwhile to consider than English has both "if A
then B" and "B if A".  Although English is not the height of linguistic
clarity, it shows that both forms are useful.

> These are completely unnessary, they do the same thing in two
> different ways when one would suffice.

It's hard to quantify this sort of thing - most of the time one will
suffice, but when one will suffice, when is it still better to include
two?

> This is what I mean by a RISC programming language.  I agree with
> providing the programmer a large amount of freedom, but the freedom
> must not be such that it doesn't interfere with the ability to
> recognize common structures.

Recognising common structures is important in readability, but I think
"B if A" is pretty readable.  If this form was used for all simple
statements, it would probably also be more easily recognisable since it
is more common.

> >Structured programming zealots don't like exit for some reason.  They
> >forget that structured programming was invented to make programming
> >easier.  It's a tool, not a religion.  In some places exit is just the
> >best tool for the job,
> 
> I could extend this to the OO paradigm.  OO is a tool, not a religion
> and does not need to be forced on the programmer.  Structured design has
> been around for decades and it is known by almost every programmer.
> While OOAD and OOPS are just getting off the ground, SAD in structured
> languages is fairly formalized and many people are more comfortable in
> that language.  This, however, does not conflict with my minimal
> beliefs, because only allowing one paradigm does not make a programming
> language complete.

I'm not quite sure where you're coming from here.  While the functional
and logic paradigms are not complete in their pure forms due to lack of
side effects and temporal consideration (unless viewing the entire
computer's state including I/O streams as one parameter to be
continuously passed into the same function turns you on), both
object oriented and imperative languages are quite complete given
adequate primitives.  My understanding, at least, of that often touted
buzzword "pure object oriented programming" is the lack of free methods
(which I'm now beginning to question the benefit of).

> Both the structured and the OO paridigm have to be included for the language to be complete.  However,
> a language can be complete by having just a do while instead of a do while and a repeat until.

Maybe you have a different definition of pure OOP to me?

> >For that matter why include for when you have while?

> I know this is a rhetorical question, but for is justified, if only
> slightly because of the ability it gives the programmer to understand
> the code easier.  By glancing at the code, someone who is not familiar
> with the code can understand more about the code by seeing a for than if
> all fors were implemented with a while loop.  This does not conflict
> with the do.while and repeat.unitl example because you cannot glean any
> more information from looking at the words do while than you could
> repeat until.

Seeing repeat does tell you there's a condition at the bottom, but then
so does "do" without a condition in a VB do/loop.

> Just pointing out that functional programming was designed so that ALL
> facets of programming would look alike. My idea above was that this was
> not quite what should happen and that a more orthogonal approach should
> be implemented.

Well orthogonality is probably my favourite concept, but I think we
might be getting lost in syntactic issues here, when the semantic
equivalent of "more than one way to do something" is more insidious if
rarer.  By this I mean constructs where one cannot be categorised as
more general than the other (and therefore a mapping exists).  Ada 95's
data aggregation and parameterisation features are what I'd call as
example of this, but I'd have to hunt around for some of my notes on
this to elaborate.

> For example, to include a library in C++ you must #include the header
> and also add it to the make file.  I think that a more user friendly
> system should be implemented once again much like Java's import
> statement.  This way, functions that are not used do not get compiled
> and headers cannot be included more than once( thus eliminating the need
> for the #ifndef and #define )

Certainly.  C is the only language I'm familiar with that does this, all
others have some sort of import.

I'm not sure how this prevents unused submodules from being compiled in
though.  When you use import it is invariably to import a whole raft of
submodules/classes whatever (using * in Java), otherwise there would be
way too many imports.  Hence it is still the job of a good inter class
optimiser in either scheme to crack down on unused submodules.

> >>   I don't like Java because it forces you to program in one paradigm.
> >> Even if it is an object oriented one, the notion that I "have" to do
> >> things a particular way upsets me.  I'm a big fan of C and C++
> >> especially.  I think that there are a number of things that could be
> >> improved, but for the most part, I enjoy the languages.

I've talked about free methods, but are there any other way you think
OOP restrictions you?

> >Thirdly, that value (1 in the previous case), might not be statically
> >determinable.  This is where we start running into trouble.  Say the
> >value we want to add to each member on the list is entered by the user
> >or obtained from elsewhere in the program or some such.
> >
> >What to do?  We can't write:
> >
> >increment: procedure(a: integer) -> integer := add(a, b)
> 
> If you want to increment the node by a user defined value, then I would
> either do one of two things, either input the value in the function, or
> use a global variable( perhaps in a C++ namespace ) I think a true class
> is overkill in this situation.

The first option misses the point.  The map function is meant to be a
reusable component, whose implementation can be hidden.  It should not
care that the implementation of an add_number function requires another
number not in its interface.  The writer of map can't write a map for
each possibility.  Nor should the writer have to allow one parameter
which is passed through and which is a record, and let the client use
data aggregation if necessary.  This would entail doing this for every
reusable submodule written as you don't know whether the client will
want to do this or not!  And of course the client shouldn't be able to
define extra maps at will since the class/package might not want to
allow access to the necessary submodules to implement it.

The second option does not solve the problem of having multiple
instances of the add_number function, adding different numbers, where
the number of instances is unbounded or large.

"I think a true class is overkill in this situation" reflects you are
worried about one of two things.

(a) The syntactic inefficiency of class definition.  You can use the
free method shorthand.
(b) The inefficiency of class implementation.  All of this can be
optimised out if unnecessary, and should already be happening in modern
OOP compilers!  Not just in this situation, but all situations where
classes are being used.  Pity it's not to my knowledge.

I know namespaces are a fairly recent C++ feature, but I'm not familiar
with them yet.  Would you mind briefly explaining them.  Are they any
different to singular objects?

> I think the problem is trying to simplify the function to a "thunk"-like
> state.  You should define a more complex procedure that will either
> input your value each time, or calculate it, or whatever.  The only time
> that you would want multiple values at the same time is with parallel
> processing and that is a different ball game entirely.

You generally use thunks when either of

(a) Call by name is used.  Say no more.
(b) You're hooking onto a legacy (sub)system.
(c) You've designed good reusable components.

(c) is the case here.  It is not a problem that I already have a
procedure which is so close to what I want that little code is required,
it is a boon.  If it is inefficient, then, again, it can be optimised:
inlined in this case.

I admit my most general case might be exactly common place, but it does
not come up only with parallel processing.  This is simply a necessary
condition for treating functions as first class objects.

> Well,
> >define "free methods" as a shorthand for a class definition, and you
> can
> >get the smaller code size, plus retain the ability to extend the class
> >if necessary.
> 
> I find this hard to swallow.  BETA tried something similar to this and
> died a horrible death in a sea of syntactic troubles.  Allowing users to
> "extend" "free methods" sounds like trouble to me.

I'm aware BETA has something like that, reading about it inspired me to
a degree.  I don't see any major problem with it.  I'd certainly be
interested in hearing about the syntactic troubles, and whether they're
your opinion or commonly held opinion.

> >You're tickling my personal language design beliefs today.  =)  There
> >are various kinds of strings - fixed size, bounded (maximum) size and
> >unbounded size.  Ada 95 manages to provide all three in a library.
> >You're concerned here with the efficiency of string implementation
> >
> >Programming languages have for too long confused the notions of
> >interface and implementation.  Why do you have a class which has one
> >implementation?  What languages should do is provide an interface for
> >strings, and several implementations, whereupon the programmer simply
> >chooses the best one.
> 
> I don't believe that I have ever seen this proposal for strings.  The
> reason being is such:  How do you do the underlying implementation of
> the string interface?  I would guess through a vector of characters or
> char* thereby we are back to the C way...my personal favorite :)

Actually, no.

The string interface doesn't have an implementation.  The interface is
implemented as an abstract class and the implementations as concrete
classes, but they are not shown that way to the programmer.  There are
two ways of viewing this sort of scheme - what I think you're referring
to here, having a class B where implementation conversions go A->B->C,
or going directly A->C.  This is done with the "Builder" design pattern.

If anyone is not familiar with design patterns I'd suggest to them to
immediately go out and read "Design Patterns: Elements of Reusable
Object Oriented Software" by Gamma et. al.  It is the most enlightening
book I've ever read.  While the knowledge within is not necessarily
groundbreaking, the compendium of them certainly is, and I think anyone
would take a lot anyway from it.

The basic gist of the Builder pattern is that a class has enough
interface so that it can create a copy of itself even if it is of a
different implementation.  Now, this certainly could be inefficient. 
But take this into mind ...

You can always use the same implementation if you don't want the
expensive copy (wherein a copy would be efficient).  So it's up to you. 
Multiple implementations are there to improve efficiency, so if they
don't, don't use them.  Where conversions are rare and in A it is better
to use impl B, while in C it is better to use impl D, it is great!

There would be a way you could make a copy of the same implementation,
and a different one.  You could use the same implementation without
knowing which implementation it was.  Let me say this is no uncertain
terms: VARIABLES CAN ALWAYS TAKE AN OBJECT OF ANY IMPLEMENTATION.  The
only time the implementation is specified is at creation time.

Furthermore, if someone didn't like your implementation, they are going
to write another implementation of it anyway.  Let's support them.  Give
them incentive to stick to a common interface.  Ensure that we can
convert between the implementations.  Right now the chance of that is
spotty, and what about converting between many representations, written
by different people?  Fat chance.

Strings are ordered collections of characters (note: not arrays of
characters) despite what some languages might have you believe.  As such
they are open to multiple representations and converting between them is
not much of a chore.  But to be fair, this scheme gets more efficient
the more complex the class, and strings are not majorly complex.  But
even the humble date would be a great candidate - look at the raft of
implementations  of dates around the place.

This is not too attractive with integers, but it can be done in the
interface using binary decomposition and composition.  Same arguments as
for expensive conversion apply as they do above.  Even the humble
boolean, smallest piece of data possible, can be converted using
set_true and set_false methods.

So the intermediate implementation is never necessary, and no, we don't
need to give vectors a special place in the language or even compiler.

Except for one case - programmers aren't going to be happy if they have
to specify an implementation each time they allocate an object.  Now
implementations are there purely as an efficiency measure, they have no
externally discernible difference to a client as the same object using
another implementation, excepting execution time and space.  So we
shouldn't require the programmer to specify an implementation unless
they want to.  What programmers do know about implementations is roughly
how it works and what operations it does quickly, but not the details
and they still can't make use of internal information.

So, if they don't, which do we pick?  Well, we could make it arbitrary
(up to the compiler).  But given that one implementation is likely going
to be more efficient is more circumstances than others, we might want to
pick that one by default.

We couldn't let implementations say this - what if neither said they
were default or more than one?  In a system where you don't know how
many implementations will be put with your implementation (and it will
always be this way if you're writing reusable components) you can't know
what to do.

The alternative is to place this information in the class itself.  Now
I'll admit that placing a preferred implementation in a class (which,
remember is abstract), before the implementation is even declared is not
perfect, but it seems to me to be the best idea.

If the preferred implementation does not exist we emit a warning and
pick an arbitrary one, and if the class does not specify a preferred
impl (and it should be optional) then again we pick arbitrarily.

Now, you might be concerned about the ability of switching
implementations this easily to change the semantics of the program. 
Theoretically the semantics of two implementations should be the same,
but bugs occur.    So changing impls could be a problem.  But this is a
problem with changing impls when you must change the interface.  There
you could create even more bugs, so generally it's not done.

You might try to perform some sort of equality analysis on
implementations.  Now I think that this is the equality problem which is
unsolvable, but something might be better than nothing.  Something for
the compiler people to look at maybe.

I should iterate a point which might have been lost in the above
discussion.  Since implementations have the same semantics, they have NO
externally realisable difference.  THIS INCLUDES I/O REPRESENTATION. 
You can have multiple implementations internally, but on I/O the
situation is a whole new ball game, since it is a realisable difference.

Java tried to solve these problems by standardising implementations of
basic types, which is an improvement on the C scheme as it solved
various conversion problems.  But multiple implementations allow several
implementations while retaining these advantages.

Furthermore, the example of the Builder pattern in Gamma et. al. is of a
set of word processor document type classes in a document converter, for
example to convert between Word and Word Pro.  These are not two
implementations of one document class!  They are two separate classes,
since each has a different representation on I/O.

Don't get me wrong, there's nothing wrong with doing that in a program,
it shows the Builder pattern is applicable to more than just multiple
implementations.  The authors also have a different definition of
implementation than me.

What would be an example of multi-impls here would be one document class
that has a I/O dump document method for each document type, but allowing
different implementations which each support dumping each document
type.  This doesn't solve the document type conversion problem they are
referring to however, but that is not a multi-impl problem.

Another example would be keeping the class structure one class per
document type, and having each class support dumping just itself, but
let there be multi-impls for each document type.  If each document type
class had a parent class with the conversion interface, it could be used
for both converting between document types and implementations of a
document type.

Other examples of this include integers.  Java requires two's comp fixed
size etc.  Unnecessary.  Just make sure it has I/O write and read two's
complement integer methods (and even ones for other representations). 
Doesn't matter what it is internally.

Characters also.  ASCII or EBCDIC? (ignore Unicode for this example) 
Doesn't matter - just support I/O for ASCII plus EBCDIC.  As an added
bonus, you can read one if the other is native on your platform.

The disadvantage of this of course is that you can't have a dump method
for each representation possible in the parent class.  If this starts
occurring, there is an alternative.  Lay bare the implementation of the
class which you would previously have hidden, by putting it in the
class.  Then it is externally realisable and you basically get back to
the old way of doing things (as an escape hatch) without this problem.

This isn't a new idea, although I think I thought of it before I heard
of it (which tends to make me a big supporter of it =) ).

Phew.

> >True, but we are human, and we should still try to protect against
> >error.
> 
> We should try to protect against error, but not at the cost of
> functionality.

A reason we shouldn't object to multi-impls on the basis of changing an
implementation might have program wide effects if it is in error.

> >That's what strong typing is, after all.
> 
> Strong typing is not in the same boat because weak typing does not
> increase functionality in a way that it could not be done before.
> However, MI is hard to implement effectively and easily if it is not
> included in the language.  Strong typing can use typecasts and functions
> to easily emulate weak typing.

Yep.  I think some language designers give up on MI because we have no
good solution to the problems of repeated inheritance and shadowed
methods.  For all I've seen there could be as many systems of MI as
languages that support it.

> >Yes, but I don't believe JVM is a good level for optimising at, at
> >least from what I've heard.
> 
> I don't know, I haven't looked at it, but I wouldn't imagine optimizing
> at the compiled level anyway, naturally an intermediate language should
> be formed, either a trinary or quatranary stack language to easily
> optimize similar statements and then an AST to handle data flow and
> larger structures.

Yes, but I think to a degree you have to optimise at all levels
(peephole opti on low level code).  The only problem is if the low level
opti reveals another opti that could only be done at the higher level. 
For these reasons the opti should be done at the lower level as well if
possible, but might not be as easy to do.

Having two representations of our program might make this a problem.  Or
instead of in sequence, you could maintain them in unison.  That would
be interesting!

You might have realised I'm in favour of optimisation almost at any
cost.  I figure that the oft touted tradeoff of run time vs.
compile time is not really as significant as some people would have you
believe.  When developing you're not too concerned with run time
efficiency, but you compile a lot, so you're concerned with compile time
efficiency.  While when compiling your final build, you're not too
concerned with compile time efficiency, since the final build is
important, but are extremely concerned with run time efficiency.  So
different optimisation options are necessary for different times!  A 2%
increase in run time for a 1000% increase in compile time IS worthwhile
for a final build (wherein we'd bring in some of the more rare
optimisations).

This of course means that profiling could be potentially misleading - it
must always be done with the final build options on.  Unfortunately true
so we might have to do full opti options earlier, near the end of coding
rather than testing.

I'm not too familiar with the triple/quad representations.  I know what
they are, but how do they compare to ASTs in terms of ease of use in
discovering and performing optimisations?

-- 
     Matthew Tuck - Software Developer & All-Round Nice Guy
                              ***
       Check out the Ultra programming language project!
              http://www.box.net.au/~matty/ultra/