Steps

Hans-Dieter Dreier Ursula.Dreier@ruhr-uni-bochum.de
Wed, 10 Mar 1999 23:04:52 +0100



Matthew Tuck schrieb:

> Hans-Dieter Dreier wrote:
>
> > Don't you underestimate the effort to produce C (or Java) source code ?
> > ...
>
> No it's not insignificant but it's not impossible either.  Many
> compilers are doing this nowadays.

I know. But even if everyone is doing it this way, it need not neccessarily be
optimal. Once upon a time they frowned at high level languages in general,
because of poor performance.

> > IMO to directly produce threaded code (as shown in my example) is so much
> > less effort than to produce C (or Java) source. Maybe sometime later
> > producing C may be a reasonable platform to produce new MCs directly from
> > Ultra source rather than use the C compiler directly to produce new MCs
> > (which is as portable as your proposal).
>
> All I'm concerned about here is getting a portable and reasonably
> efficient implementation working.  I don't really see another way to do
> it.  For example, there are many UNIX OSs and many platforms they run
> on.  We probably wouldn't have to produce C code directly - the GCC
> interface will produce for UNIX and I think DOS/WIN too.  I'm not fluent
> with the details of GCC however.
>
> > Producing C is somewhat like a detour: You already got a representation
> > at a level that might be an intermediate level of a C compiler, but
> > re-convert it back to a higher level, only to get it compiled once more by
> > the C compiler. Of course this cannot lead to good performance.
>
> An AST will be higher level than the C.  If we go into GCC, I think it
> would be pretty much the same efficieny since it's AST -> GCC interface
> -> code.

 I don't use GCC, so I'm not familiar with it. Do you mean that you intend to
produce info for some intermediate representation that GCC uses?

> > COM objects are self-contained entities which reside in dynamic libraries
> > and/or ...
>
> What would you say the advantage of COM or similar systems would be?

They allow easy integration of components from different sources, written in
different languages. Components may range from general and small to highly
specialised and big. Don't understand me wrong; I'm not saying we should use COM
as a standard for Ultra; it was just meant to show the difference between a
component approach and a monolith.

> >> 1b Write simple AST VM.

I may change my mind concerning the AST representation to be used for VM
execution. At first I thought interpreting a true tree structure would have to
be slower and more complicated than interpreting a flattened tree. After
thinking it through, I'm not so sure anymore. I'll try to dream up some variants
and discuss them in a separate posting.

> >> 2a Write AST to C generator.
> > I would delay that.
>
> How would you delay it?  Start writing the editor in C?  Develop the
> language more?

At first, write MCs in C and use them as components (or VM instructions, or
functions, if you like), which are called by VM that executes a "script" (an
object containing VM-interpretable contents) loaded by object loader. In this
phase it is mostly thought of as a means to test the MM, the VM, the MCs, to
play around with alternatives and to whet our appetite for more. Similar to the
Forth approach AFAIK. During that phase the foundation of the runtime library
MCs will be created, especially those which are needed to write a proper parser
which might then be used to supplement or even replace object loader. Remember,
all MCs will be coded in C.  As soon as we get tired of writing MCs in C (and
have the neccessary MC base of the runtime), we may start to write an MC which
translates ASTs to some compiler input language, preferably C(++). If that works
and the Ultra definition has "settled" enough, we may start to rewrite (or
automatically translate, who knows) the MC source (which is in C up to that
time) in Ultra. Concerning the editor, we might find it feasable to compose that
from MCs. Or we delay it and use a traditional editor for a while.

> >> 3b Rewrite VM in Ultra.
> > IMO not necessary because VM is so simple. Will automatically be done by
> > your step 4b
>
> I think we should keep the VM.  We need this both for when you want
> quick compile times, and especially for debugging.  It doesn't
> necessarily become redundant.

I'm not so sure anymore whether we need a VM at all. I was surprised when I
tried to figure out how a VM executing a tree (instead of linear code) would
look like. I simply found nothing that a VM (defined as a "main" program which
controls execution) could do what a function to be called from the operators
could not do better.

> >> 10 lines?  Are you serious?
> > Perfectly serious. Look at the parser skeleton. You could almost take the
> > same code as the "parser". VM (as I see it) consists just of a little loop
> > where the next threaded code is fetched, checked whether it is NULL, and
> > pushed (or executed). Everything else is done by the MCs called. Half the
> > code is used for the "catch" clause that is used to break out of the
> > execution loop (and handle errors).
>
> OK, so you're essentially moving the code elsewhere.

Into the MCs. Same thing as with the parser skeleton: That does nothing as well.
Everything is done by the recognisers and executors.

>
>
> >> Maybe I misread your message, but I was referring to your "object
> >> assembly code".
> > I'm not quite sure what the original question was here. So I'll try to
> > ...
>
> I went back and reread the thread and it seems I was operating on the
> assumption we would have a compiler available.  So it's academic now
> since I understand.
>
> Why do you feel the implementation using a stack VM would be better than
> an AST VM?  I feel we should get a VM up as fast as possible.  You would
> have to design an entirely new utility and language with your proposal.

Utility = VM? Well, as I mentioned, a simple VM could be done in a lazy
afternoon. But you may be right. I thought it over a little bit, and methinks
that an AST "VM" might probably be better. Since both approaches may imply
different tasks on the MCs side, and thus might be somewhat incompatible, we
should discuss that in detail. I'll soon make a proposal.

> > The parser I sketched should already produce postfix code ready to be fed
> > into VM. Producing an AST may actually be slightly more complicated. Both
> > are equivalent, however, and easily translated into each other.
>
> Well the compiler would be producing an AST.  I want to keep the middle
> representation as an AST so we can keep the code if we change either
> component.  Hence the AST VM is not difficult.

True. I didn't say that the compiler shouldn't produce an AST. But if we find a
representation that can be executed more effectively and can be easily generated
from the AST, why not have that one as a subsequent stage? Similar to JVM byte
codes. Since this translation can be fully automatic, the user would not even
notice.

> > Isn't there a lot of mutual dependencies: You need the compiler to test
> > the VM, and you need the VM to test the compiler. You got to test a lot of
> > stuff in parallel. I would prefer to be able to test them in little bits,
> > function by function; this would require much less communication between
> > development teams.
>
> There is always going to be this.  My VM relies on the compiler, yours
> on your assembler.  I would prefer a compiler since it would be easier
> to program valid test cases for the VM.  Also I would have an AST dumper
> to
> test the compiler.

Well firstly, if we provide a lisp like syntax for object loader, it can produce
an AST as easily as flat code. It's no big deal to parse a parenthesized
list.Secondly, I would prefer an assembler, since the runtime library, for
example, will be tested using simple examples anyway. These can easily be
produced by an assembler-like object loader. Which can be written and changed
fairly rapid. How do you test the components you need for the compiler if you
have no compiler? You would have to rely on untested tools to test your other
untested tools, and test them all at once. I always found it better to do my
testing step by little step

> > You replace the whole parser (by a newly generated one), which you got to
> > C-compile and link into the runtime environment. IMO that is not as
> > flexible - the changes that are needed are more fundamental, leading to
> > another executable each time the language is changed.
>
> OK, that's fair enough.

What do you mean by that?

> > Generally, I'm biased towards embedded languages. Using these always
> > worked well for me. In contrast, I always had problems using wizards when
> > I was not able to influence their meta-language (which of course would
> > have turned them more towards an embedded language). Maybe this can be an
> > explanation why I don't like parser generators and C code as intermediate
> > representation.
>
> What I think you're getting at here is being forced to use a wizard is
> bad if it can't do what you want it to.  That's one of the advantages a
> translational hierachy would have, since the wizard translates to the
> level below, and you can still program to that level if you want.
>
> Normally if you didn't like the generated code you couldn't change it,
> you could only see it, for obvious synchronisation reasons.  But you
> could generate the code and then detach the generator, so there is no
> longer a logical link.  Then you would be free to modify it.

Sure, but then it's a one-shot. If you later decide that you want to do changes
that might be done with less effort by changing the wizard's input, all your
modifications you did to the wizard's output are lost. How annoying! IMO a
wizard is really useful only if:
a) Its output is perfect. Most likely because it is simple. But then, why use it
at all.
b) It can reverse engineer its output and retain your modifications. That's much
better, but usually devilishly complicated because your modifcations assume some
wizard code - they depend on each other.
c) It has plenty of hooks where you can specify your own code. Means a lot of
work on the wizard's side, a complicated wizard interface and careful thinking
about future needs.
d) You adapt it whenever you need a change. But then, the advantage over writing
the generated program yourself is diminished.

> > You seem to favour the code generator approach; at least more than I do.
>
> Not at all.  I haven't had a large amount of experience with a code
> generator.  I am just familiar with scanner and parser generators that
> work this way.  Having a domain specific language there should help the
> situation though - if it doesn't work in all situations you want it to
> it should be improved.

To be honest, I'm not familiar with scanner and parser generators, but I have a
lot of experience with general program generators (my own and other's).
Certainly there are cases where a generator saves a lot of work.

> An advantage of DSLs is taking the semantic challlenge away - since you're
> programming for a specific task rather than generally.

What is a DSL?

> Frameworks like you would prefer do this too, however, so the only real
> advantage is syntactical (or better, ASTical).

Maybe the advantage in handling as well. The more steps we have in the pipeline
to get the finished product, the more possibilities for problems. The build
process tends to get more and more complicated, so we need a make utility. That
saves a lot of work, but also introduces its own complexity. I prefer short
pipelines, using small, self-written tools. If I could avoid having to pipe the
stuff through the C compiler and the linker, I'd feel better.

> > A propos debugging: How do you show the correct location inside the Ultra
> > source if there is an intermediate C code level? You got C code on one
> > side and machine code at the other - how do you match locations in C code
> > to machine code?
> >
> > If all goes through the C compiler, the debugger must be capable to
> > handle machine code, which makes it platform dependent and forces it to
> > deal with things like software interrupts and the machine stack. If you
> > use the debugger that comes with the C compiler, then there is no
> > integrated environment any more because that debugger is not Ultra-aware.
>
> I think I remember some papers about the implementation of the SELF
> language - they had an interesting way of handling debugging compiled
> code.  Basically they used the higher level code where necessary, and
> compiled code where possible, for speed.  So basically they got
> efficiency and flexibility to change stuff like methods at runtime.
>
> Basically you would need some sort of cross-referencing between the
> source and binary.  Debugging generally prohibits a large number of
> optimisations, unfortunately.  But without optimisations mapping isn't
> insurmountable.

That sounds rather blurry to me. I can't think of a way how to map machine code
back to C source other than relying on the compiler or use the debugger that
comes with it. That really ties us to a specific brand if we want any degree of
integration, since they all do it differently. And, as you pointed out, usually
you have to turn the compiler's optimisation off (or nearly off) to be able to
do decent debugging. And that's only half of the story, since there would be
another mapping (Ultra <-> C) on top of the C compiler's.As I said before, if we
do not want to debug MC-internals, we don't *need* to inspect C code or machine
code. But as I see it, we have no other choice than to do it if we feed
everything through the C compiler.

If we layer the task, however (MCs and objects containing ASTs),  those layers
may be tested fairly independent of each other and once the MCs are reliable
enough, most testing would be high-level and completely under our control.

> > In contrast, using threaded code, it's easy (as long as you don't attempt
> > to trace into a MC), since you are in control of the (few and simple)
> > representations and translation stages involved. Which, BTW, is an
> > argument in favour of small MCs containing performing just one function
> > each.
>
> You can pretty easily do this with any interpreter.  This disadvantage
> is speed.

We are free to build larger MCs if we find that certain combinations of MC calls
are occurring frequently. As soon as we are able to write MCs in Ultra, we are
free to form a single MC for each method _and_ use inlining and other tricks.
Plenty of playground for an optimizer fan ;=)

> > I would call it "library help" because it is used to get information on
> > libary items most often. It would show parameters, return values,
> > constraints, comments and so on. If language features are objects, the
> > mechanism can provide help on them too. Since the information is directly
> > fetched from the program representation, it does not need an intermediate
> > tool; it rather is a specialized view into source code and hence
> > up-to-date all the time.
>
> OK, so it's dumb.  Essentially you need good linking to help.  You could
> place help around the AST.  Help for an identifier, help for a
> statement, etc.  But would help be view-specific, language-specific,
> both, what?

I really haven't thought it through to that extent. Help on the language would
normally not be contained inside each source file. Help on the library, however,
could (and IMO should) be included in the AST of the library.

> Also it would be handy here to have good navigation utilities, for
> example,  "Go To Declaration" on an identifier.

Certainly. Since there need to be links between source and symbol table and
generated code, the editor would have full access to that. Another advantage of
having it all in memory at the same time.

Regards

Hans-Dieter Dreier