Steps

Matthew Tuck matty@box.net.au
Sat, 06 Mar 1999 23:10:38 +1030


Hans-Dieter Dreier wrote:

>> Well we would be using the Ultra GUI capability if we wrote it in Ultra.
> Looks like a lot of work.

Yes, but it's a way off, so we can evaluate it when we know how many
people we have.

> Yes. If a equivalent GUI would exist, I would prefer that.

OK, we could wrap our language libraries around an existing library. 
The only problem with this is that if it is in C, we will need to use a
conservative collector at first.

>> Yeah it's going to be there, but should be able to be optimised out.
> Sure, it can be optimised out. But that requires extra effort, and the
> *source* still is cluttered with stuff that is not needed. IMO the problem
> is not so much that the resulting program could be big or slow, but rather
> that the development process is slowed down and the source is bloated. You
> see this very good in MS VC++: Because you need KLOCs of header files,
> they had to invent "precompiled headers" simply to reduce compile time to
> acceptable values.

They invented it because it allowed large software to be written.  You
could say the interfaces were not needed, but the fact is that some
software is that big purely due to inherent complexity.  If we structure
our ASTs well (maybe not as a purely physical tree), lookup should be
naturally quick like that of precompiled headers.  This way you get to
define rich interfaces which make programming faster, less prone to bugs
and more readable, without necesarily losing compile-time.

Source bloat problems can be eased in various ways.  Such things as
collapsing a program tree or section of text, only seeing the
specification, etc.  A bigger problem as I see it is if you can't adapt
the library to what you want to do because it isn't flexible enough.

> This optimisation (enevitably) adds complexity, because now one has to to
> watch out when to recompile this precompiled header (which can be as big
> as 5 MB, BTW).  This is no extreme example, but everyday practice.

If we make fast lookup a natural part of the representation you don't
need to recompile.  Part of the advantage of ditching text.

> Well it's modular if you get only what you need, and do not have to take
> what you don't need. Example: Microsoft Foundation Classes. It's all in a
> few DLLs and you have the choice to either link it statically (then you
> can avoid a lot of unneccessary stuff, but by no means all, and there is
> no more sharing if you have multiple executables) or dynamically (then you
> get all but it may be shared if other applications currently in memory use
> the same version).Example: Netscape (can be viewed as a graphics library
> for HTML / Java): Is pretty big and slow, and you got to take it all.

Well you really haven't put anything specific here.  We all know the
static/dynamic link tradeoff, and other than that, what are some
examples of programming that would be modular or not?  What is included
that is useless?  Netscape is being rewritten now too, if you didn't
know.

>> They're the same thing.  If you have C (or Java) code you can make
>> executable code on all platforms.  That's the quickest way for us to do
>> that.  The point is we have a reasonably fast implementation to base
>> bootstrapping on.  It's not a permanent solution.
> Don't you underestimate the effort to produce C (or Java) source code ?
> ...

No it's not insignificant but it's not impossible either.  Many
compilers are doing this nowadays.

> IMO to directly produce threaded code (as shown in my example) is so much
> less effort than to produce C (or Java) source. Maybe sometime later
> producing C may be a reasonable platform to produce new MCs directly from
> Ultra source rather than use the C compiler directly to produce new MCs
> (which is as portable as your proposal).

All I'm concerned about here is getting a portable and reasonably
efficient implementation working.  I don't really see another way to do
it.  For example, there are many UNIX OSs and many platforms they run
on.  We probably wouldn't have to produce C code directly - the GCC
interface will produce for UNIX and I think DOS/WIN too.  I'm not fluent
with the details of GCC however.

> Producing C is somewhat like a detour: You already got a representation
> at a level that might be an intermediate level of a C compiler, but
> re-convert it back to a higher level, only to get it compiled once more by
> the C compiler. Of course this cannot lead to good performance.

An AST will be higher level than the C.  If we go into GCC, I think it
would be pretty much the same efficieny since it's AST -> GCC interface
-> code.

> And, BTW, are you so sure that
> C(++) code really is so platform independent? My impression is that a lot
> of people get into trouble if they use a back-end C compiler other than
> the (few) ones specified to work. (Well that's just my impression).

Yes I know, but it has been done.  GCC is available for pretty much
everything and doesn't cost anything so we could require that.

> COM objects are self-contained entities which reside in dynamic libraries
> and/or ...

What would you say the advantage of COM or similar systems would be?

>> 1b Write simple AST VM.
> OK - will be finished before 1a because it is much simpler

Well probably, but interfacing with the library is much harder.  The VM
will have to have embedding support for I/O, etc.

>> 2a Write AST to C generator.
> I would delay that.

How would you delay it?  Start writing the editor in C?  Develop the
language more?

>> 3b Rewrite VM in Ultra.
> IMO not necessary because VM is so simple. Will automatically be done by
> your step 4b

I think we should keep the VM.  We need this both for when you want
quick compile times, and especially for debugging.  It doesn't
necessarily become redundant.

> Sure, so do I. But if the bigger the code base is, the bigger the effort
> to port it from one language version to another.

Well automatic translators can do this.  Since they're ASTs it's a lot
easier than text.

> Or the bigger the temptation to stay
> compatible by all means, even if that means to use only the second best
> approach.

I will never do this.  Everyone can decide they want to work on whatever
they want, but I'll always be looking to improve.  That's my promise.

> IMO design decisions are easier if the code base remains as small as
> possible for as long as possible. Having a language that is sufficiently
> fully designed to allow writing a compiler in it requires a pretty large
> code base, however. If we start from an assembler (like object loader),
> the code base is significantly smaller.

Well, for a start, I will make it clear to the user base that the main
thread of Ultra is designed to be a good language over being used. 
"Being used" has been done.  If you want to use it, and the automatic
translation isn't working, you can create your own language based on
Ultra.  I won't mind.

> It boils down to the question: How big should the steps be that need to
> be completed before you get the next working version. Here I can only
> speak for myself: I like these steps to be as small as possible, even if
> there are more of them. It simply boosts motivation if you see that you
> get somewhere. If I have to write a lot of stuff and test it all at once
> to get my next version running, chances that I loose motivation are much
> bigger.

Yes, I agree.

>> 10 lines?  Are you serious?
> Perfectly serious. Look at the parser skeleton. You could almost take the
> same code as the "parser". VM (as I see it) consists just of a little loop
> where the next threaded code is fetched, checked whether it is NULL, and
> pushed (or executed). Everything else is done by the MCs called. Half the
> code is used for the "catch" clause that is used to break out of the
> execution loop (and handle errors).

OK, so you're essentially moving the code elsewhere.

> At first we should use a text editor off the shelf. Writing our one only
> makes sense as soon as the object infrastucture is present (e.g. class
> layout, symbol tables and so on).

I agree.  What I was saying was that you seemed to want a minimal editor
at first and I wanted a minimal VM at first.

> It's commercial and available as a demo. You have to register to get it.

Will take a look.

>> Maybe I misread your message, but I was referring to your "object
>> assembly code".
> I'm not quite sure what the original question was here. So I'll try to
> ...

I went back and reread the thread and it seems I was operating on the
assumption we would have a compiler available.  So it's academic now
since I understand.

Why do you feel the implementation using a stack VM would be better than
an AST VM?  I feel we should get a VM up as fast as possible.  You would
have to design an entirely new utility and language with your proposal.

> The parser I sketched should already produce postfix code ready to be fed
> into VM. Producing an AST may actually be slightly more complicated. Both
> are equivalent, however, and easily translated into each other.

Well the compiler would be producing an AST.  I want to keep the middle
representation as an AST so we can keep the code if we change either
component.  Hence the AST VM is not difficult.

> Isn't there a lot of mutual dependencies: You need the compiler to test
> the VM, and you need the VM to test the compiler. You got to test a lot of
> stuff in parallel. I would prefer to be able to test them in little bits,
> function by function; this would require much less communication between
> development teams.

There is always going to be this.  My VM relies on the compiler, yours
on your assembler.  I would prefer a compiler since it would be easier
to program valid test cases for the VM.  Also I would have an AST dumper
to
test the compiler.

>> Perhaps you're misinterpreting me here.  Parser generators generate
>> parsers, and hence are not a part of the compiler.  They're tools so
>> you can change the parser easily.
>
> The task the parser generator has in your approach is done by the syntax
> table generator in mine. In my approach the MCs making up the bits of the
> parser need not change often (hopefully) since they are general building
> blocks; it's the table generator's input that changes, leading to other
> object loader code. No C compilation or linking involved. The executable
> (MM, VM, MCs, ...) remains unchanged and can handle several language
> versions.

OK, they are essentially the same.  If you generate code fully you get
efficiency, while parameterised code is quicker to generate, and would
possibly support multiple languages better.  In the end there are
probably many variations between the two.

> You replace the whole parser (by a newly generated one), which you got to
> C-compile and link into the runtime environment. IMO that is not as
> flexible - the changes that are needed are more fundamental, leading to
> another executable each time the language is changed.

OK, that's fair enough.

> Generally, I'm biased towards embedded languages. Using these always
> worked well for me. In contrast, I always had problems using wizards when
> I was not able to influence their meta-language (which of course would
> have turned them more towards an embedded language). Maybe this can be an
> explanation why I don't like parser generators and C code as intermediate
> representation.

What I think you're getting at here is being forced to use a wizard is
bad if it can't do what you want it to.  That's one of the advantages a
translational hierachy would have, since the wizard translates to the
level below, and you can still program to that level if you want.

Normally if you didn't like the generated code you couldn't change it,
you could only see it, for obvious synchronisation reasons.  But you
could generate the code and then detach the generator, so there is no
longer a logical link.  Then you would be free to modify it.

> You seem to favour the code generator approach; at least more than I do.

Not at all.  I haven't had a large amount of experience with a code
generator.  I am just familiar with scanner and parser generators that
work this way.  Having a domain specific language there should help the
situation though - if it doesn't work in all situations you want it to
it should be improved.

An advantage of DSLs is taking the semantic challlenge away - since
you're programming for a specific task rather than generally. 
Frameworks like you would prefer do this too, however, so the only real
advantage is syntactical (or better, ASTical).

Ideally programs would provide a framework and the DSL would generate
code to go into that framework, allowing you to program at either level.

> A propos debugging: How do you show the correct location inside the Ultra
> source if there is an intermediate C code level? You got C code on one
> side and machine code at the other - how do you match locations in C code
> to machine code?
>
> If all goes through the C compiler, the debugger must be capable to
> handle machine code, which makes it platform dependent and forces it to
> deal with things like software interrupts and the machine stack. If you
> use the debugger that comes with the C compiler, then there is no
> integrated environment any more because that debugger is not Ultra-aware.

I think I remember some papers about the implementation of the SELF
language - they had an interesting way of handling debugging compiled
code.  Basically they used the higher level code where necessary, and
compiled code where possible, for speed.  So basically they got
efficiency and flexibility to change stuff like methods at runtime.

Basically you would need some sort of cross-referencing between the
source and binary.  Debugging generally prohibits a large number of
optimisations, unfortunately.  But without optimisations mapping isn't
insurmountable.

> In contrast, using threaded code, it's easy (as long as you don't attempt
> to trace into a MC), since you are in control of the (few and simple)
> representations and translation stages involved. Which, BTW, is an
> argument in favour of small MCs containing performing just one function
> each.

You can pretty easily do this with any interpreter.  This disadvantage
is speed.

> I would call it "library help" because it is used to get information on
> libary items most often. It would show parameters, return values,
> constraints, comments and so on. If language features are objects, the
> mechanism can provide help on them too. Since the information is directly
> fetched from the program representation, it does not need an intermediate
> tool; it rather is a specialized view into source code and hence
> up-to-date all the time.

OK, so it's dumb.  Essentially you need good linking to help.  You could
place help around the AST.  Help for an identifier, help for a
statement, etc.  But would help be view-specific, language-specific,
both, what?

Also it would be handy here to have good navigation utilities, for
example,  "Go To Declaration" on an identifier.

-- 
     Matthew Tuck - Software Developer & All-Round Nice Guy
             mailto:matty@box.net.au (ICQ #8125618)
       Check out the Ultra programming language project!
              http://www.box.net.au/~matty/ultra/