The Burning Questions

Matthew Tuck matty@box.net.au
Tue, 17 Nov 1998 21:06:11 +1030


Tanton Gibbs wrote:

>> What will be the target platforms:
> I'm afraid to say the JVM because of much of the legal battles/copyright
> up in the air quality of Java not to mention that Java is still not the
> fastest language to be executed, although it is gaining on compiled
> programs.  I think we should output either to C or to Unix.  The reason not
> Win32 is that because most people who do development and would need an
> experimental compiler do so at a university that usually uses Unix or VMS.
> Win32 is not used a lot for expiramental research on language/compiler
> design

As far as the JVM is concerned, are you referring to MS/Sun battle?  I
don't really see how that will affect us too much, we can still output
to the lowest common denominator, or preferably the Sun JVM.  A decision
is due imminently I've heard anyhow.

As for not being the fastest language, I find that irrelevant at this
stage.  The trick is to get Ultra working first, then fast later.  If
it's high level, there's going to be plenty of optimsiation to do, and
that won't all be in place overnight.  And by the time we get this done,
what do you reckon will have happened to JVMs?

Even if not, outputting to JVM is a quick way to get cross platform
capability.  That will attract more members, who want to work on other
back-ends, and help us get it done quicker.

The C on Unix idea is definitely good, due to the reasons you give. 
Win32 is the platform of choice for application development, but not
experimentation.  The first couple of years of the project will be
experimentation.  That leaves plenty of time for implementing a Win32
back-end later.  Especially if we have JVM capability available for
Win32 users.

But on the other hand, just because Unix is the platform for
experimentation, don't get the idea that that's the only place we should
look.  I want members from EVERYWHERE.  You could think of Win32 as an
largely untapped pool of open source developers.

>> What will be the uncompiled source distribution format?
> I don't quite understand what you are asking here.

I was referred to what format is source code in.  Traditionally text,
but I have advocated parse trees here.

>> What will be the compiled module distribution format?
> See above

What format would a compiled module be distributed in?  Traditionally
you have something like .o under Unix, .obj under DOS or .class under
Java.  But the low level nature of these prevents optimisations.  We
should NOT go down this road in my opinion.

>> What tools shall we write?
> As few as possible.  We should rely on previously written tools like Flex
> and Bison if at all possible.

I was thinking more along the line of programmer tools, rather than
tools for supporting our development.  It seems pretty obvious to me
would be as many as possible, and as many as we need, as few as possible
respectively.  But what tools?

Most compiler generator tools are based on a particular language, e.g.
flex/bison on c, antlr on java.  We might have problems if we wanted to
bootstrap our compiler, leading to wanting to write our own.  Since
flex/bison is an open source project (I think), I don't think they'd
mind us writing an Ultra-specific implementation though, to save work.

>> Do we support static library binding, dynamic library binding, or both?
> We have to support both if this is going to be something that people are
> going to use.

I agree.  Depends on the target platform of course.

>> What optimisations can we do?
> I think we'll have to know more about the structure of the language and
> intermediate code before we start categorizing these.

This was deliberately vague to get specifics, maybe it didn't work.  =)

>> What will be the general structure of the compiler?
>   Boy, this one is vague :)  I'm sure the standard front-end -> back-end
> both using symbol-table will be the general structure.

OK, same here, but what do you think of what I have said on intelligent
editing?

>> What facilties should be in the library?  Bear in mind that since we
>> are an open source project, we will let anybody include the code in
>> their compiler.  Hence, one reason against large libraries (the need to
>> write them) is gone.

>   We have to have a standard string class, I/O classes, OS classes,
> Container classes, memory management if not garbage collected, general
> search and sort, and math...that, naturally, is minimal.

What would you say about the size of Java?  Most people think big
libraries are good.  There are various concerns you can have against
really big ones.

For example, that they slow you down if you don't use them.  We counter
this with optimisation.

That they are hard to learn entirely.  Well, if you spend the time
learning them, you'll develop quicker.

That you're writing implementations in ways which aren't useful to
everyone.  I've talked about multiple implementations before here, and
they can counter this.

That if they aren't designed properly they are harmful.  True, but I
have argued here for an experimental non-legacy-shackled language.  We
should be concerned with breaking existing programs, provided we make it
obvious to all users that that's what we will do.

Furthermore, intelligent editing can help here.  If you make some
changes to libraries you can make them automatically reflected in the
clients (e.g. the renaming of a method).  This could be done by giving
methods ids rather than names, and the client code stores the id.  The
editor merely shows the current name of that method.

>> How much and what syntactic sugars should we use?
> Enough, but not too much...how's that for a vague answer!

Well, what do you think about this idea?  We start by separating the
syntax from the rest of the language.  That is, Ultra is about what to
do with a parse tree (the source format I was talking about).  The Ultra
editor allows "plug-in" editors, which can be absolutely anything. 
Non-textual, non-English grammar, whatever.  We can improve the "syntax"
part without breaking programs.

Now, why can't we let programmers define their own shorthands?  I know
syntax extension was looked at at some stage in language design and put
in the "too hard" basket, but perhaps we can revisit it now in the light
of intelligent editing.

Anyway, implementation.  You wouldn't translate a typed in shorthand to
the parse tree, as next time you opened the program it'd be back to the
full form.  Not quite as useful as possible.  Translation of shorthands
would occur later in compilation.

So the alternative is to allow defining our own parse tree extensions. 
These would be stored with the parse tree.  If the program is
distributed to another program, they get the shorthands used in the
file, which allows them to be expanded out to the full form where a
programmer doesn't want that shorthand.  They shouldn't increase the
file size too much, in fact, they might provide a degree of natural
compression.

So, you get the shorthands you want, while other programmers don't have
to learn them.  This pretty much negates the argument against them I
believe.

But what if you get code from someone else.  They won't use your
shorthands.  In groups of programmers, the shorthand idea could become
useless unless they agree to standardise them.  Not necessarily so.  You
can set your editor to scan for known shorthands and convert them.  Most
shorthands would be patterns in the parse tree, some might be more
complicated.

Say you're a database programmer.  You could define embedded SQL
statement.  Or any number of possible shorthands for control constructs
(pre-tested loop, post-tested loop etc.) or library calls.  Don't want
that shorthand anymore?  Delete it.  Want a new one?  Add it and scan
the code for instances of it.  I believe it CAN be done.

>> What sort of mechanisms will be used with code trees, and how will they
>> be regulated?
> We need to know more about what features will be intrinsic in the trees
> and which will be broken down into smaller sub-features.

OK, but what tools will be used.  I've heard of a lot of projects using
CVS, I think that costs money though, and we would still need a computer
to host it.

>> Will there be any standard distribution?
> There should be a standard distribution in order to control "feature
> creep".  We should have control over any new feature that makes its way into
> our compiler.

Given that it will be open source, I figure having a standard
distribution won't really be a problem.  Anyone is free to set up an
alternative distribution.

>> Will there be voting on issues as to the direction of the project.  If
>> so, will they be binding on the standard distribution, and who can vote?
>   Voting members should have the final decision on the standard distribution
> and should be a group of N picked from the project members where N >= 10.

This doesn't really say much about who is a "project member" and how the
voting is done.  Do you know how other projects are organised?

>> What license should we place our code under?
> I don't like the GNU license because it is often too prohibitive about
> paying for such and such and so forth.  I believe that we should postpone
> making any obligations until we have a final product and can evaluate its
> worth.

What do you mean about prohibitive about paying?  Maybe I've
misinterpreted, but we won't be charging people for it, and the source
code will be open and free.  I don't know what you'd describe as a
"final product" here.  I think it would be very ongoing, although I
guess we would have "clean compiles" for release.

Waiting for this would be a bad idea I think - as soon as the code is
public we have to say what rights people have and don't have.

-- 
     Matthew Tuck - Software Developer & All-Round Nice Guy
                              ***
       Check out the Ultra programming language project!
              http://www.box.net.au/~matty/ultra/