Steps

Hans-Dieter.Dreier@materna.de Hans-Dieter.Dreier@materna.de
Thu, 11 Mar 1999 15:20:56 +0100


--2xXRLRKzbWQWqkw4MFfL6OyKDo7NPCpb
Content-type: text/plain; charset="ISO-8859-1"
Content-transfer-encoding: quoted-printable

Matthew Tuck wrote:

>> At first, write MCs in C and use them as components (or VM instructions,
>> or functions, if you like), which are called by VM that executes a
>> "script" (an object containing VM-interpretable contents) loaded by
>> object loader. In this phase it is mostly thought of as a means to test
>> the MM, the VM, the MCs, to play around with alternatives and to whet ou=
r
>> appetite for more. Similar to the Forth approach AFAIK.

>Yes I think the stack implementation is basically like Forth.

That's true, but I rather meant the component approach - AFAIK every keywor=
d is implemented by some function. There is hardly any syntax. They extend =
by adding more functions, thus more keywords. =

Lisp is similar in this respect, too.

>Where you say the VM is simple and you put the logic in the MCs I would
>say this is pretty much the same as my approach except that I would have
>some VM code, and all the logic in the node classes, which would be
>statically linked.
>
>This could be complicated by exceptions, which might force the AST VM to
>have some sort of explicit stack structure.

If the stack is implemented by a Ultra object (subject to GC), which IMO is=
 a must, then exceptions certainly have to be caught by VM to adjust that s=
tack.

>>> I think we should keep the VM.  We need this both for when you want
>>> quick compile times, and especially for debugging.  It doesn't
>>> necessarily become redundant.
>> I'm not so sure anymore whether we need a VM at all. I was surprised
>> when I tried to figure out how a VM executing a tree (instead of linear
>> code) would look like. I simply found nothing that a VM (defined as a
>> "main" program which controls execution) could do what a function to be
>> called from the operators could not do better.
>
>Things like executing statements, computing expressions and routine
>calls are really easy - things like exceptions and declarations could be
>a little harder.

At least in C exceptions (meaning catch & throw) are pretty easy. Apparentl=
y in C++ it's much more complicated because thas has to take care of destru=
ctor calls as well - but if we simply use the try-catch feature of C++, it'=
s all done for us. We just have to store the info neccessary to unwind the =
stack  in case of an exception and do that when an exception occurs.

Declarations are not dealt with by VM.

>>> Why do you feel the implementation using a stack VM would be better
>>> than an AST VM?  I feel we should get a VM up as fast as possible.  You
>>> would have to design an entirely new utility and language with your
>>> proposal.
>> Utility =3D VM? Well, as I mentioned, a simple VM could be done in a laz=
y

>Utility =3D Object Assembler, ie way of generating input.

A compiler is a new utility as well and much more complicated. It would tak=
e much longer to implement than an assembler. Just look at the syntax: An a=
ssembler like the one I imagine really has a tiny syntax that can easily be=
 built in without using parser generators.

>> Well firstly, if we provide a lisp like syntax for object loader, it can
>> produce an AST as easily as flat code. It's no big deal to parse a
>> parenthesized list.Secondly, I would prefer an assembler, since the
>> runtime library, for example, will be tested using simple examples
>> anyway.
>
>I wouldn't necessarily say this.  In my regression testing I use such
>thing as large random arrays that are often tricky to set up but work
>great for ferreting out bugs.

One could write a OL program to generate objects for test cases. Or write a=
 C program or shell script to generate a large OL program that *is* the tes=
t case. These could be kept for a while (and adapted if neccessary) for reg=
ression testing. IMO OL won't change as often as a compiler would, so this =
option is more realistic of OL than for a compiler.
 =

>I understand that a compiler might be a little harder than an object
>assembler to write, but I would use an AST dumper instead to test my
>compiler.  And AST output is pretty easy.  Also, the scanner of a
>compiler is usually tested before the parser is written.

We would need a general object dumper as well. That would be used to test t=
he OL. It could be refined to produce a nice dump for an AST. Later it coul=
d be part of the editor.

>> These can easily be produced by an assembler-like object
>> loader. Which can be written and changed fairly rapid. How do you test
>> the components you need for the compiler if you have no compiler? You
>> would have to rely on untested tools to test your other untested tools,
>> and test them all at once. I always found it better to do my testing
>> step by little step
>
>Untested tools relying on untested tools?  Like the stack VM relying on
>the object assembler for instance?  =3D)

The stack VM you mention is no good example since it is so simple that it i=
s barely visible. But in principle, you are right, of course. Certainly the=
 minimum starting set most be debugged all together. I'm just pointing out =
that this will be easier if this set is smaller. IMO this is the case for a=
 simple assembler rather than for a compiler.

>>>> You replace the whole parser (by a newly generated one), which you got
>>>> to C-compile and link into the runtime environment. IMO that is not as
>>>> flexible - the changes that are needed are more fundamental, leading
>>>> to another executable each time the language is changed.
>>> OK, that's fair enough.
>> What do you mean by that?
>
>Well the main thing I was thinking through was that even a structural
>editor has to have some sort of parser - even if only for decimal
>numbers.  Well, maybe you could split it into two halves, but asking a
>programmer to handle this I think would be a bit much.  Anyway, it would
>be useful to have the ability to have several different parsers loaded
>simulanteously.

Split what in to two? The parser? I'm afraid I can't follow you here.
Having several parsers "loaded" simultaneously would mean to have them link=
ed into the executable, right?
You'd need to relink even if your test setup changed as far as I can see. I=
 wouldn't like that much hassle. Instead I'd prefer just to change a simple=
 text file (containing OL code) and be able to reuse the same executable fa=
irly often. But maybe I missed the point you were trying to make here.

>This dictates have a parameterisable parser.  I don't necessarily mind
>doing a recompile, so I might like a "parser framework" rather than a
>table-driven parser, but it certainly needs to be flexible, which
>dictates taking the parser code away from the syntactical details.  A
>parser generator might still be able to do this though.

Well I do mind doing recompiles if I can avoid them. I really like small an=
d fast test cycles.

>> Sure, but then it's a one-shot. If you later decide that you want to do
>> changes that might be done with less effort by changing the wizard's
>> input, all your modifications you did to the wizard's output are lost.
>> How annoying! IMO a wizard is really useful only if:
>
>Essentially you would use wizards or DSLs because they speed you up. =

>Sure, occasionally you might have to rewrite without it because it won't
>support what you want, but does this amount of time outweigh the time
>gained?

That might be different in each case.

>> a) Its output is perfect. Most likely because it is simple. But then, wh=
y
>> use it at all.
>
>Because it performs a common task quickly.

Yes, but then maybe the way you have to perform the task without the wizard=
 is less-then-optimal and needs reengineering.

I'll give an example:
In VC++ there is a class wizard which allows you to add/remove a member to/=
from a class. This saves you work because usually C++ requires you to do it=
 twice: in .h and .cpp. If C++ were designed sensibly, you would just have =
to write that declaration once. Using a wizard would not save any time, hen=
ce no wizard would be required.
Some wizards mend insufficiencies which should not have happened in the fir=
st place. The lession to be learned from this is: If you see that you might=
 need a wizard, first check your architecture critically and make sure that=
 you don't try to cure the symptoms rather than the sickness itself.

>> c) It has plenty of hooks where you can specify your own code. Means a
>> lot of work on the wizard's side, a complicated wizard interface and
>> careful thinking about future needs.
>
>Essentially this isn't really hard.  It's just a matter of generating an
>type/impl which gets inherited, hence allowing filling in abstract
>methods on the level below.  You can't change the code - but you can
>override it.

I'd prefer another approach for the task you mentioned:
Inside the class (impl, sorry) that inherits from the interface, supply a v=
iew into the interface class. Mark the items that this view displays by a d=
ifferent colour so that the user can distinguish them from item that are re=
ally present in the impl class. Allow him to add function bodies while keep=
ing the type signature inherited (i.e. noneditable). Every time the interfa=
ce changes, the impl will be recompiles anyway. Both parts (intf and impl) =
can be seen simultaneously. No wizard is needed. The user never needs to ed=
it the inherited part from within the impl. In fact, he can't.

>> What is a DSL?
>
>Domain-specific language.  Essentially written to do certain things
>well.  They're often specificational in nature rather than imperative or
>even functional or logic.

I see. Input to a parser generator (or syntax table generator) might be an =
example, right?

>> Maybe the advantage in handling as well. The more steps we have in the
>> pipeline to get the finished product, the more possibilities for
>> problems. The build process tends to get more and more complicated, so
>> we need a make utility. That saves a lot of work, but also introduces
>> its own complexity. I prefer short pipelines, using small, self-written
>> tools.
>
>Of course fewers steps are better, but the question is, is there a
>better way?  If the answer is yes, we want to change it, but we can't
>necessarily do it right away.

True. But we always should give it a second thought that might save us a lo=
t of work.

>
>> If I could avoid having to pipe the stuff through the C compiler
>> and the linker, I'd feel better.
>
>So would I, make no mistake about it.  But after the inter-module
>optimisation stage we can essentially say "do what you want from it from
>now on - GCC, JVM, interpreted AST, native, whatever".  We can move from
>one to the other pretty smoothly since they just take an AST.  We
>currently have limited programming time.

Of course. Well, most of the suggestions I'm making now concerning the way =
to do it are intended for the near future. What comes later is another issu=
e.

>>>> A propos debugging: How do you show the correct location inside the
>>>> Ultra source if there is an intermediate C code level? You got C code
>>>> on one side and machine code at the other - how do you match locations
>>>> in C code to machine code?
>
>Probably with difficulty.  We could possibly generate some code to
>delimit statements.  I don't see a full-on debugger for a while though,
>so hopefully we'll have someone who knows a bit more about one by then.

Yes, but why use such a kludge as instrumented code at all? I say: Debug C =
using the C debugger, and debug objects (VM calls, the stack, such things) =
using an object debugger as soon as it is available. Try to minimize C debu=
gging by keeping the units written in C (even in generated C) as simple as =
possible.

>>>> If all goes through the C compiler, the debugger must be capable to
>>>> handle machine code, which makes it platform dependent and forces it
>>>> to deal with things like software interrupts and the machine stack. If
>>>> you use the debugger that comes with the C compiler, then there is no
>>>> integrated environment any more because that debugger is not
>>>> Ultra-aware.
>
>We could initially implement a AST-interpreting VM to do debugging. =

>Plus, I think copious assertioning could greatly reduce the need for a
>debugger, although it certainly does not eliminate it.

I agree.

>>>> In contrast, using threaded code, it's easy (as long as you don't
>>> ...
>Hmm, should have asked this earlier, by threaded here are you referring
>to multithreading?  If so, how does this relate to the stack machine?

No. I can't remember where I read that term, it must have been a long time =
ago. Basically, it means code that consists of a stream of references (poin=
ters) to operands and / or operators. All "instructions" have same length s=
ince they all are pointers. The "instruction space" is as big as the addres=
s space and very thinly populated, thus carrying little information and was=
ting a lot of space. Different from byte codes or machine instructions that=
 have to be interpreted and may have variable length. Exactly what that fla=
ttened AST code for the VM is, except for the NULL which was interpreted by=
 VM.

>> I really haven't thought it through to that extent. Help on the language
>> ...

>Maybe, we've missed each other here.  I was referring to help for the
>language as you might bring up in another window.

I agree.

>Definitely library documentation could be stored inline.  It should be
>fairly simple to collapse and expand both the code and the
>documentation.  Auto-generated documentation is better of course.

Yep.

>I think the juxtaposition of these paragraphs which have diverged has
>confused you as to what I was saying.  I was referring to developing in
>the editor.
>
>You seem to be talking about debugging although I'm not exactly sure, so
>I may as well explore the situation.  If you had generated code in the
>translational hierachy, you could debug at that level rather than the
>source level, or you might debug at both at the same time, provided the
>relevant source and generated languages have a view that supports
>debugging.

I'd like decent help inside the editor as well as inside the debugger. In f=
act, I see the (Ultra) debugger as an extension to the editor rather than a=
s a standalone tool. If the editor can display general objects, half of the=
 work is already done, since then it is able to inspect ASTs as well as VM'=
s stack and VM's current state (which would be stored in an object). And ch=
ange their values... and maybe even trigger a compile-on-the-fly while the =
program is still running... (VC++ can do compile-on-the-fly in some cases. =
I was really surprised that it is possible even in C++).

>In fact, I originally formulated the translational hierachy system while
>trying to find a way to view generated code within the editor framework,
>since it's another language, rather than just a view.  And then putting
>languages on top of Ultra is a simple step.
>
>Further, you might want good linking between the levels, so you can see
>where code from one level goes in the next, or was in the previous. =

>This might not be simple though, since a small amount of code could be
>distributed throughout the program.  Also, there would be some code that
>would have no higher-level equivalent, such as utility functions used to
>implement standard features in the higher level language.

I agree. An example for code that has no higher-level equivalent might be a=
 type conversion call that has been inserted automatically. In this case th=
e user *might* have written it explicitly. So it has a representation, but =
it does not appear in the source. The editor might still display it as if t=
he user had written it explicitly (but use another color to mark it as comp=
iler generated), so a breakpoint can be set, the stack can be examined and =
single stepping be performed on it. As an additional benefit, this view mig=
ht be accessible even when not debugging, to show the user (and the program=
mer who is debugging the compiler) what the compiler actually generated.

--

Regards,

Hans-Dieter Dreier
(Hans-Dieter.Dreier@materna.de)=

--2xXRLRKzbWQWqkw4MFfL6OyKDo7NPCpb
Content-type: text/plain; charset="ISO-8859-1"
Content-transfer-encoding: quoted-printable

IDENTIFIKATIONSANGABEN:
a19758a.txt IA5 DX-MAIL X.400 User Agent=

--2xXRLRKzbWQWqkw4MFfL6OyKDo7NPCpb--