Steps

Matthew Tuck matty@box.net.au
Sun, 14 Mar 1999 11:36:32 +1030


Hans-Dieter.Dreier@materna.de wrote:

> That's true, but I rather meant the component approach - AFAIK every
> keyword is implemented by some function. There is hardly any syntax.
> They extend by adding more functions, thus more keywords.
> Lisp is similar in this respect, too.

Yes, the tree would operate in a similar sort of way, with code for the
node put in the node itself.
 
>> This could be complicated by exceptions, which might force the AST
>> VM to have some sort of explicit stack structure.
> If the stack is implemented by a Ultra object (subject to GC), which
> IMO is a must, then exceptions certainly have to be caught by VM to
> adjust that stack.

What I meant was that we could have implemented something like:

class plus extends expression
   function calculate : integer = left.calculate + right.calculate
   ...

Where the stack structure is implicit, since it uses the existing stack.

Why do you want to make the stack an object?  Reflection reasons? 
Something else?

> At least in C exceptions (meaning catch & throw) are pretty easy.
> Apparently in C++ it's much more complicated because thas has to take
> care of destructor calls as well - but if we simply use the try-catch
> feature of C++, it's all done for us. We just have to store the info
> neccessary to unwind the stack  in case of an exception and do that
> when an exception occurs.

You can't really have destructors in a GC language which uses objects
consistently.  Ideally at a matter of correctness you would not close
files etc. in a finaliser if possible but have a close method which when
an exception got raised, the method caught it, closed the file and
propogated the exception up the call stack.

> Declarations are not dealt with by VM.

In one sense, but we have to handle allocations and deallocations of
local variables on the stack, although they could probably be done using
an implicit stack as well.

>> Utility = Object Assembler, ie way of generating input.

> A compiler is a new utility as well and much more complicated. It
> would take much longer to implement than an assembler. Just look at
> the syntax: An assembler like the one I imagine really has a tiny
> syntax that can easily be built in without using parser generators.

The compiler can probablly have a fairly minimal syntax at first too. 
Then it can be expanded later.  Essentially the same as what your
assembler is, except existing work can be built upon.
 
> One could write a OL program to generate objects for test cases. Or
> write a C program or shell script to generate a large OL program that
> *is* the test case. These could be kept for a while (and adapted if
> neccessary) for regression testing. IMO OL won't change as often as a
> compiler would, so this option is more realistic of OL than for a
> compiler.

But isn't this work better spend starting with a minimal compiler and
working your way up?
 
>> Untested tools relying on untested tools?  Like the stack VM relying
>> on the object assembler for instance?  =)
> The stack VM you mention is no good example since it is so simple that
> it is barely visible.

But it relies on the MC code which is the VM in the operational sense.

> But in principle, you are right, of course.
> Certainly the minimum starting set most be debugged all together. I'm
> just pointing out that this will be easier if this set is smaller. IMO
> this is the case for a simple assembler rather than for a compiler.

Even a compiler generally starts with a small working set.  But
basically, you'll need some sort of scanner.  Writing finite state
automata is very easy, general algorithm parameterised by a table - it
may as well be written right away.  Then the parser can be implemented a
keyword at a time.

>> Well the main thing I was thinking through was that even a structural
>> editor has to have some sort of parser - even if only for decimal
>> numbers.  Well, maybe you could split it into two halves, but asking a
>> programmer to handle this I think would be a bit much.  Anyway, it
>> would be useful to have the ability to have several different parsers
>> loaded simulanteously.
> Split what in to two? The parser? I'm afraid I can't follow you here.

I meant split the number into two, ie 3.2 -> int_part = 3., decimal part
= .2.  So you would type each in a separate field in a structural view.

> Having several parsers "loaded" simultaneously would mean to have them
> linked into the executable, right?

By having two parsers loaded at the same time I meant the ability to
parse two different languages, ie either a parser framework or by
parameterisation by table.

Even if you wanted to parameterise by table, how are you going to change
the table.  Since direct parse table manipulation is difficult, you'd
want to set up a parser table generator anyway.  It'd be quicker than a
normal compiler, but we'd have to spend time writing it.

>> This dictates have a parameterisable parser.  I don't necessarily mind
>> doing a recompile, so I might like a "parser framework" rather than a
>> table-driven parser, but it certainly needs to be flexible, which
>> dictates taking the parser code away from the syntactical details.  A
>> parser generator might still be able to do this though.
> Well I do mind doing recompiles if I can avoid them. I really like
> small and fast test cycles.

But how much time would it take to write the parser this way over a
hardcoded one?

> Yes, but then maybe the way you have to perform the task without the
> wizard is less-then-optimal and needs reengineering.

In a sense being able to change the generated code is useful for this,
but there's no universal solution.

> I'll give an example:
> In VC++ there is a class wizard which allows you to add/remove a
> member to/from a class. This saves you work because usually C++
> requires you to do it twice: in .h and .cpp. If C++ were designed
> sensibly, you would just have to write that declaration once. Using a
> wizard would not save any time, hence no wizard would be required.
> Some wizards mend insufficiencies which should not have happened in
> the first place. The lession to be learned from this is: If you see
> that you might need a wizard, first check your architecture critically
> and make sure that you don't try to cure the symptoms rather than the
> sickness itself.

Point taken. I would not call this a wizard in the sense I'm using it
here though.

Essentially in the end, a general language is general, and if you want
to do the same specifics often you have to do something about it.  Now
handling the semantics via libraries and syntax via shorthands is a
viable option, and better than a new language for modification reasons. 
But the process might not work in all cases.
 
>> Essentially this isn't really hard.  It's just a matter of generating
>> an type/impl which gets inherited, hence allowing filling in abstract
>> methods on the level below.  You can't change the code - but you can
>> override it.
> I'd prefer another approach for the task you mentioned:
> Inside the class (impl, sorry) that inherits from the interface,
> supply a view into the interface class.

I had intended for the impl would show whatever information about the
type that the user desired, embedding into the impl.

> Mark the items that this view
> displays by a different colour so that the user can distinguish them
> from item that are really present in the impl class.

I'm not sure I understand this.  Items that the view displays?

> Allow him to add
> function bodies while keeping the type signature inherited (i.e.
> noneditable). Every time the interface changes, the impl will be
> recompiles anyway. Both parts (intf and impl) can be seen
> simultaneously. No wizard is needed. The user never needs to edit the
> inherited part from within the impl. In fact, he can't.

Well I never said code generational wizards are needed.  They're there
to make things easier.  Whether we need them is another question.
 
> I see. Input to a parser generator (or syntax table generator) might be
> an example, right?

Yes, although something like Bison allows you to insert pass-through C
code, I wouldn't call it a DSL in the same sense.
 
> Yes, but why use such a kludge as instrumented code at all? I say:
> Debug C using the C debugger, and debug objects (VM calls, the stack,
> such things) using an object debugger as soon as it is available. Try
> to minimize C debugging by keeping the units written in C (even in
> generated C) as simple as possible.

Ideally you'd want to convert into native and have a direct mapping into
the machine code so you can easily map source to machine code and run as
much machine code as you can and interpret as little as necessary.

>> If you had generated code in the
>> translational hierachy, you could debug at that level rather than the
>> source level, or you might debug at both at the same time, provided the 
>> relevant source and generated languages have a view that supports
>> debugging.

Debugging at a lower level would be useful for things like testing
generated optimised code, since there's no way to map it to a higher
level in general.  Hence it would be useful to debug the optimiser as we
get more ambitious.

> I'd like decent help inside the editor as well as inside the debugger.
> In fact, I see the (Ultra) debugger as an extension to the editor
> rather than as a standalone tool.

Yes, it should have that support, hence "view that supports debugging".

> I agree. An example for code that has no higher-level equivalent might
> be a type conversion call that has been inserted automatically. In this
> case the user *might* have written it explicitly.

In this case you probably know what statement the implicit conversion
came from - and that is really the only grain of linking you really
need.  1 statement to M statements is fairly easily handled (usually
each stage has more code).

> So it has a representation, but it does not appear in the source. The
> editor might still display it as if the user had written it explicitly
> (but use another color to mark it as compiler generated),

If the code was written by a translation, none of the code is user
generated, and all of it really carries the same status.

> so a breakpoint can be set, the stack can be examined and single
> stepping be performed on it. As an additional benefit, this view might
> be accessible even when not debugging, to show the user (and the
> programmer who is debugging the compiler) what the compiler actually
> generated.

Yes, you could do a step debug at a lower level provided that there was
no optimisation.  If you could not map an element directly to a sequence
of adjacent elements at the next level you would be required to always
debug at the lower level.

-- 
     Matthew Tuck - Software Developer & All-Round Nice Guy
             mailto:matty@box.net.au (ICQ #8125618)
       Check out the Ultra programming language project!
              http://www.box.net.au/~matty/ultra/