Steps

Matthew Tuck matty@box.net.au
Thu, 11 Mar 1999 20:13:14 +1030


Hans-Dieter Dreier wrote:

>> No it's not insignificant but it's not impossible either.  Many
>> compilers are doing this nowadays.
> I know. But even if everyone is doing it this way, it need not
> neccessarily be optimal. Once upon a time they frowned at high level
> languages in general, because of poor performance.

I never said it was optimal - I said it was a quick cross-platform
solution.

>> An AST will be higher level than the C.  If we go into GCC, I think it
>> would be pretty much the same efficieny since it's AST -> GCC interface
>> -> code.
>  I don't use GCC, so I'm not familiar with it. Do you mean that you
> intend to produce info for some intermediate representation that GCC
> uses?

I haven't used GCC either, but that's my understanding of it, yes.
 
> They allow easy integration of components from different sources,
> written in different languages. Components may range from general and
> small to highly specialised and big. Don't understand me wrong; I'm not
> saying we should use COM as a standard for Ultra; it was just meant to
> show the difference between a component approach and a monolith.

I never said so - was just interested.

> I may change my mind concerning the AST representation to be used for VM
> execution. At first I thought interpreting a true tree structure would
> have to be slower and more complicated than interpreting a flattened
> tree. After thinking it through, I'm not so sure anymore. I'll try to
> dream up some variants and discuss them in a separate posting.

It should be pretty simple.  For example, a compound block would need to
support stepping into a new namespace, and then would need to execute
each statement in turn.

An "if" is just a matter of evaluating the expression and then executing
a statement node.  If you did this using OOP the code would be fairly
simple in each object, although possibly inefficient.  A bit of inlining
wouldn't go astray.

>> How would you delay it?  Start writing the editor in C?  Develop the
>> language more?
> At first, write MCs in C and use them as components (or VM instructions,
> or functions, if you like), which are called by VM that executes a
> "script" (an object containing VM-interpretable contents) loaded by
> object loader. In this phase it is mostly thought of as a means to test
> the MM, the VM, the MCs, to play around with alternatives and to whet our
> appetite for more. Similar to the Forth approach AFAIK.

Yes I think the stack implementation is basically like Forth.

Where you say the VM is simple and you put the logic in the MCs I would
say this is pretty much the same as my approach except that I would have
some VM code, and all the logic in the node classes, which would be
statically linked.

This could be complicated by exceptions, which might force the AST VM to
have some sort of explicit stack structure.

>> I think we should keep the VM.  We need this both for when you want
>> quick compile times, and especially for debugging.  It doesn't
>> necessarily become redundant.
> I'm not so sure anymore whether we need a VM at all. I was surprised
> when I tried to figure out how a VM executing a tree (instead of linear
> code) would look like. I simply found nothing that a VM (defined as a
> "main" program which controls execution) could do what a function to be
> called from the operators could not do better.

Things like executing statements, computing expressions and routine
calls are really easy - things like exceptions and declarations could be
a little harder.

>> Why do you feel the implementation using a stack VM would be better
>> than an AST VM?  I feel we should get a VM up as fast as possible.  You
>> would have to design an entirely new utility and language with your
>> proposal.
> Utility = VM? Well, as I mentioned, a simple VM could be done in a lazy

Utility = Object Assembler, ie way of generating input.

> True. I didn't say that the compiler shouldn't produce an AST. But if we
> find a representation that can be executed more effectively and can be
> easily generated from the AST, why not have that one as a subsequent
> stage? Similar to JVM byte codes. Since this translation can be fully
> automatic, the user would not even notice.

Yes, it can be, but I think keeping it like that for a while is a good
idea.  Optimisation and linking are pretty easy, it allows inter-module
optimisation (sorry for using "inter-class" and "program-wide" a lot,
I'll try to use this term from now on) very easily using the existing
optimiser that did the first stage.

> Well firstly, if we provide a lisp like syntax for object loader, it can
> produce an AST as easily as flat code. It's no big deal to parse a
> parenthesized list.Secondly, I would prefer an assembler, since the
> runtime library, for example, will be tested using simple examples
> anyway.

I wouldn't necessarily say this.  In my regression testing I use such
thing as large random arrays that are often tricky to set up but work
great for ferreting out bugs.

I understand that a compiler might be a little harder than an object
assembler to write, but I would use an AST dumper instead to test my
compiler.  And AST output is pretty easy.  Also, the scanner of a
compiler is usually tested before the parser is written.

> These can easily be produced by an assembler-like object
> loader. Which can be written and changed fairly rapid. How do you test
> the components you need for the compiler if you have no compiler? You
> would have to rely on untested tools to test your other untested tools,
> and test them all at once. I always found it better to do my testing
> step by little step

Untested tools relying on untested tools?  Like the stack VM relying on
the object assembler for instance?  =)

>>> You replace the whole parser (by a newly generated one), which you got
>>> to C-compile and link into the runtime environment. IMO that is not as
>>> flexible - the changes that are needed are more fundamental, leading
>>> to another executable each time the language is changed.
>> OK, that's fair enough.
> What do you mean by that?

Well the main thing I was thinking through was that even a structural
editor has to have some sort of parser - even if only for decimal
numbers.  Well, maybe you could split it into two halves, but asking a
programmer to handle this I think would be a bit much.  Anyway, it would
be useful to have the ability to have several different parsers loaded
simulanteously.

This dictates have a parameterisable parser.  I don't necessarily mind
doing a recompile, so I might like a "parser framework" rather than a
table-driven parser, but it certainly needs to be flexible, which
dictates taking the parser code away from the syntactical details.  A
parser generator might still be able to do this though.
 
> Sure, but then it's a one-shot. If you later decide that you want to do
> changes that might be done with less effort by changing the wizard's
> input, all your modifications you did to the wizard's output are lost.
> How annoying! IMO a wizard is really useful only if:

Essentially you would use wizards or DSLs because they speed you up. 
Sure, occasionally you might have to rewrite without it because it won't
support what you want, but does this amount of time outweigh the time
gained?

> a) Its output is perfect. Most likely because it is simple. But then, why
> use it at all.

Because it performs a common task quickly.

> b) It can reverse engineer its output and retain your modifications.
> That's much better, but usually devilishly complicated because your
> modifcations assume some wizard code - they depend on each other.

Much easier just to make it work better.

> c) It has plenty of hooks where you can specify your own code. Means a
> lot of work on the wizard's side, a complicated wizard interface and
> careful thinking about future needs.

Essentially this isn't really hard.  It's just a matter of generating an
type/impl which gets inherited, hence allowing filling in abstract
methods on the level below.  You can't change the code - but you can
override it.

> What is a DSL?

Domain-specific language.  Essentially written to do certain things
well.  They're often specificational in nature rather than imperative or
even functional or logic.

> Maybe the advantage in handling as well. The more steps we have in the
> pipeline to get the finished product, the more possibilities for
> problems. The build process tends to get more and more complicated, so
> we need a make utility. That saves a lot of work, but also introduces
> its own complexity. I prefer short pipelines, using small, self-written
> tools.

Of course fewers steps are better, but the question is, is there a
better way?  If the answer is yes, we want to change it, but we can't
necessarily do it right away.

> If I could avoid having to pipe the stuff through the C compiler
> and the linker, I'd feel better.

So would I, make no mistake about it.  But after the inter-module
optimisation stage we can essentially say "do what you want from it from
now on - GCC, JVM, interpreted AST, native, whatever".  We can move from
one to the other pretty smoothly since they just take an AST.  We
currently have limited programming time.
 
>>> A propos debugging: How do you show the correct location inside the
>>> Ultra source if there is an intermediate C code level? You got C code
>>> on one side and machine code at the other - how do you match locations
>>> in C code to machine code?

Probably with difficulty.  We could possibly generate some code to
delimit statements.  I don't see a full-on debugger for a while though,
so hopefully we'll have someone who knows a bit more about one by then.

>>> If all goes through the C compiler, the debugger must be capable to
>>> handle machine code, which makes it platform dependent and forces it
>>> to deal with things like software interrupts and the machine stack. If
>>> you use the debugger that comes with the C compiler, then there is no
>>> integrated environment any more because that debugger is not
>>> Ultra-aware.

We could initially implement a AST-interpreting VM to do debugging. 
Plus, I think copious assertioning could greatly reduce the need for a
debugger, although it certainly does not eliminate it.

A debugger with reasonably efficient behaviour over large blocks of code
might need to come later.

> That sounds rather blurry to me. I can't think of a way how to map
> machine code back to C source other than relying on the compiler or use
> the debugger that comes with it. That really ties us to a specific brand
> if we want any degree of integration, since they all do it differently.

You're probably right there.  I think I wasn't really reading those
paragraphs clearly, it must have been late.

>>> In contrast, using threaded code, it's easy (as long as you don't
>> ...
Hmm, should have asked this earlier, by threaded here are you referring
to multithreading?  If so, how does this relate to the stack machine?

>> OK, so it's dumb.  Essentially you need good linking to help.  You
>> could place help around the AST.  Help for an identifier, help for a
>> statement, etc.  But would help be view-specific, language-specific,
>> both, what?
> I really haven't thought it through to that extent. Help on the language
> would normally not be contained inside each source file. Help on the
> library, however, could (and IMO should) be included in the AST of the
> library.

Maybe, we've missed each other here.  I was referring to help for the
language as you might bring up in another window.

Definitely library documentation could be stored inline.  It should be
fairly simple to collapse and expand both the code and the
documentation.  Auto-generated documentation is better of course.

>> Also it would be handy here to have good navigation utilities, for
>> example,  "Go To Declaration" on an identifier.
> Certainly. Since there need to be links between source and symbol table
> and generated code, the editor would have full access to that. Another
> advantage of having it all in memory at the same time.

I think the juxtaposition of these paragraphs which have diverged has
confused you as to what I was saying.  I was referring to developing in
the editor.

You seem to be talking about debugging although I'm not exactly sure, so
I may as well explore the situation.  If you had generated code in the
translational hierachy, you could debug at that level rather than the
source level, or you might debug at both at the same time, provided the
relevant source and generated languages have a view that supports
debugging.

In fact, I originally formulated the translational hierachy system while
trying to find a way to view generated code within the editor framework,
since it's another language, rather than just a view.  And then putting
languages on top of Ultra is a simple step.

Further, you might want good linking between the levels, so you can see
where code from one level goes in the next, or was in the previous. 
This might not be simple though, since a small amount of code could be
distributed throughout the program.  Also, there would be some code that
would have no higher-level equivalent, such as utility functions used to
implement standard features in the higher level language.

This is bringing back memories of hearing about CASE (computer aided
software engineering) tools.  Basically they can support the translation
from requirements to design to code, etc. and it's useful to be able to
trace something in one level down to the next.  I think this is called
traceability off memory.  This would seem to be a good analogy.

-- 
     Matthew Tuck - Software Developer & All-Round Nice Guy
             mailto:matty@box.net.au (ICQ #8125618)
       Check out the Ultra programming language project!
              http://www.box.net.au/~matty/ultra/