Steps

Hans-Dieter.Dreier@materna.de Hans-Dieter.Dreier@materna.de
Tue, 16 Mar 1999 17:20:07 +0100


--sYCJ6kdIyGO3lWGixBoLq1JtGTAyL9T6
Content-type: text/plain; charset="ISO-8859-1"
Content-transfer-encoding: quoted-printable

Matthew Tuck wrote:

>Hans-Dieter.Dreier@materna.de wrote:
>
>> That's true, but I rather meant the component approach - AFAIK every
>> keyword is implemented by some function. There is hardly any syntax.
>> They extend by adding more functions, thus more keywords.
>> Lisp is similar in this respect, too.
>
>Yes, the tree would operate in a similar sort of way, with code for the
>node put in the node itself.

You mean that you would include information on how to generate code within =
the definition of each node, right? If so, there's a little problem: Code g=
eneration may depend on the compiler as well as on the node. Different cimp=
liers may produce different code. You got a 2-dimensional array of code gen=
erator infos.

>>> This could be complicated by exceptions, which might force the AST
>>> VM to have some sort of explicit stack structure.
>> If the stack is implemented by a Ultra object (subject to GC), which
>> IMO is a must, then exceptions certainly have to be caught by VM to
>> adjust that stack.
>
>What I meant was that we could have implemented something like:
>
>class plus extends expression
>   function calculate : integer =3D left.calculate + right.calculate
>   ...

What would "calculate" do? I'm asking because there seem to be recursive ca=
lls.

>Where the stack structure is implicit, since it uses the existing stack.
>
>Why do you want to make the stack an object?  Reflection reasons? =

>Something else?

- Reflection (especially high-level debugging)
- GC. Objects for intermediate results need to be reachable or they might g=
et collected. That would not be good... GC would have to examine the C stac=
k otherwise. But the C stack isn't GC friendly. It would involve a lot of e=
xtra work.

>> At least in C exceptions (meaning catch & throw) are pretty easy.
>> Apparently in C++ it's much more complicated because thas has to take
>> care of destructor calls as well - but if we simply use the try-catch
>> feature of C++, it's all done for us. We just have to store the info
>> neccessary to unwind the stack  in case of an exception and do that
>> when an exception occurs.
>
>You can't really have destructors in a GC language which uses objects
>consistently.  Ideally at a matter of correctness you would not close
>files etc. in a finaliser if possible but have a close method which when
>an exception got raised, the method caught it, closed the file and
>propogated the exception up the call stack.

I referred to the treatment of exceptions in the (initial) implementation o=
f the runtime system which will (most likely) be in C(++). Ultra won't have=
 a delete facility; no way to corrupt pointers. Maybe we need finalisers. M=
aybe we even include a function that finds all the references to a given ob=
ject and changes them to NULL.  =


>> Declarations are not dealt with by VM.
>
>In one sense, but we have to handle allocations and deallocations of
>local variables on the stack, although they could probably be done using
>an implicit stack as well.

Methods of the stack class will take care of that. In my memory layout prop=
osal there's an example of a stack class (exactly: a stack frame class). Id=
eally, each stack frame should only be allocated once, with sufficient spac=
e. The compiler would calculate this. Normally no reallocation should be ne=
cessary. If we really find that VM is too slow, we still can inline the sta=
ck functionality.

>>> Utility =3D Object Assembler, ie way of generating input.
>
>> A compiler is a new utility as well and much more complicated. It
>> would take much longer to implement than an assembler. Just look at
>> the syntax: An assembler like the one I imagine really has a tiny
>> syntax that can easily be built in without using parser generators.
>
>The compiler can probablly have a fairly minimal syntax at first too.

Certainly it will have to. But that still will be a lot larger than a simpl=
e assembler.
 =

>Then it can be expanded later.  Essentially the same as what your
>assembler is, except existing work can be built upon.

Most of the assembler's ingredients can be reused if we keep that in mind.

>> One could write a OL program to generate objects for test cases. Or
>> write a C program or shell script to generate a large OL program that
>> *is* the test case. These could be kept for a while (and adapted if
>> neccessary) for regression testing. IMO OL won't change as often as a
>> compiler would, so this option is more realistic of OL than for a
>> compiler.
>
>But isn't this work better spend starting with a minimal compiler and
>working your way up?

I don't think so. Maybe you want to do some estimate: Just sketch your mini=
mal syntax and see what is takes. If you post it here, we can talk it over.=
 I wouldn't be surprised if what you call a minimal compiler turns out to b=
e very similar to what I call a simple assembler. =


>>> Untested tools relying on untested tools?  Like the stack VM relying
>>> on the object assembler for instance?  =3D)
>> The stack VM you mention is no good example since it is so simple that
>> it is barely visible.
>
>But it relies on the MC code which is the VM in the operational sense.

Yes of course. But a test environment for VM is way smaller than one for a =
compiler.

>> But in principle, you are right, of course.
>> Certainly the minimum starting set most be debugged all together. I'm
>> just pointing out that this will be easier if this set is smaller. IMO
>> this is the case for a simple assembler rather than for a compiler.
>
>Even a compiler generally starts with a small working set.  But
>basically, you'll need some sort of scanner.  Writing finite state
>automata is very easy, general algorithm parameterised by a table - it
>may as well be written right away.  Then the parser can be implemented a
>keyword at a time.

OK OK. I'll sketch a syntax for OL - you sketch one for a compiler. Then le=
t's compare.

>> Split what in to two? The parser? I'm afraid I can't follow you here.
>
>I meant split the number into two, ie 3.2 -> int_part =3D 3., decimal part
>=3D .2.  So you would type each in a separate field in a structural view.

If by "field" you mean "edit control", oh no.
I didn't mean the structural view to be *so* fine-grained. No more than one=
 editable field in each line. And that starts at some fixed column position=
 and extends to the end of the line, wrapping to the next lines if necessar=
y. Otherwise it will be a nightmare both to program it and to use it.

But even if you mean logical fields, I think it's no good idea. A number sh=
ould be an atom; everything else is overkill. You'd have to handle cases li=
ke 1.a or .5 or a.5 or 1..0 - not me! I'd even include the minus sign.

>> Having several parsers "loaded" simultaneously would mean to have them
>> linked into the executable, right?
>
>By having two parsers loaded at the same time I meant the ability to
>parse two different languages, ie either a parser framework or by
>parameterisation by table.
>
>Even if you wanted to parameterise by table, how are you going to change
>the table.  Since direct parse table manipulation is difficult, you'd
>want to set up a parser table generator anyway.  It'd be quicker than a
>normal compiler, but we'd have to spend time writing it.

We'd write a parser table generator in due time (rather sooner than later).=
 That's a bit effort, right, but it keeps us flexible and independent from =
the C compile chain. I see manual table entry only in the beginning.

>>> This dictates have a parameterisable parser.  I don't necessarily mind
>>> doing a recompile, so I might like a "parser framework" rather than a
>>> table-driven parser, but it certainly needs to be flexible, which
>>> dictates taking the parser code away from the syntactical details.  A
>>> parser generator might still be able to do this though.
>> Well I do mind doing recompiles if I can avoid them. I really like
>> small and fast test cycles.

Would you please explain what you mean by "parser framework"?

>But how much time would it take to write the parser this way over a
>hardcoded one?

Do you want an estimate? I don't know, honestly. But you had a look at the =
parser bones I submitted. Maybe that was 5 or 10% of what would be needed, =
plus support MM + VM + runtime support (stack, hash table, I/O...). Just to=
 show that the table engine works. After that, you can expand it in increme=
nts as small as you want.

We need to do most of that for a hardcoded one as well or it will remain an=
 isolated affair, not integrated into the rest of the environment, and we w=
ill need to ship a C compiler and yacc or whatever it takes along with our =
code.

>>> Essentially this isn't really hard.  It's just a matter of generating
>>> an type/impl which gets inherited, hence allowing filling in abstract
>>> methods on the level below.  You can't change the code - but you can
>>> override it.
>> I'd prefer another approach for the task you mentioned:
>> Inside the class (impl, sorry) that inherits from the interface,
>> supply a view into the interface class.
>
>I had intended for the impl would show whatever information about the
>type that the user desired, embedding into the impl.
>
>> Mark the items that this view
>> displays by a different colour so that the user can distinguish them
>> from item that are really present in the impl class.
>
>I'm not sure I understand this.  Items that the view displays?

In the impl, it looks like this if it's not inherited from the interface:

(Adapted from the GUI builders thread):

<Function: x>
 Parameters
  <Number: i>
  <String: s>
 Returns
  <String: s>             // the name is just a comment
 Local items
  <Number: myLocalVar>
 Program Text
  <MakeSomethingWithThisString (s);>
  <return s.Left (i);>

... and like this if it is:

Function: x
 Parameters
  Number: i
  String: s
 Returns
  String: s             // the name is just a comment
 Local items
  <Number: myLocalVar>
 Program Text
  <MakeSomethingWithThisString (s);>
  <return s.Left (i);>

The parts that can be entered by the user are enclosed in <> just in this e=
xample to distiguish editor generated (non-editable) parts from user input.=
 =


Only local items and program text contain editable sections in hte inherite=
d version. If you omit program text, this means that you didn't give an imp=
lementation, thus child classes need to do that.

To avoid confusion on the user's side, noneditable parts of the outline sho=
uld be displayed in a different colour.

>> Yes, but why use such a kludge as instrumented code at all? I say:
>> Debug C using the C debugger, and debug objects (VM calls, the stack,
>> such things) using an object debugger as soon as it is available. Try
>> to minimize C debugging by keeping the units written in C (even in
>> generated C) as simple as possible.
>
>Ideally you'd want to convert into native and have a direct mapping into
>the machine code so you can easily map source to machine code and run as
>much machine code as you can and interpret as little as necessary.

The mapping isn't all. You also got to find variables (on stack, inside obj=
ects, in static memory), interpret them (as numbers, pointers, text, arrays=
, structures), set breakpoints, look at registers (eg return values are alw=
ays kept in EAX on '86 machines), interpret the stack for a call history. T=
here may be Windows callbacks interspersed. That's a helluva lotof work, an=
d mostly lowest level and VERY machine and compiler-dependent. Once upon a =
time I did a debugger for the good old (8-bit) 6800 which was much much les=
s complicated than today's CPUs and it took me really long. It doesn't pay =
off if we need to do it ourselves. If we can find an extendible expandable =
source code for that maybe it's a different matter, but I won't count on th=
at...

>>> If you had generated code in the
>>> translational hierachy, you could debug at that level rather than the
>>> source level, or you might debug at both at the same time, provided the=
 =

>>> relevant source and generated languages have a view that supports
>>> debugging.
>
>Debugging at a lower level would be useful for things like testing
>generated optimised code, since there's no way to map it to a higher
>level in general.  Hence it would be useful to debug the optimiser as we
>get more ambitious.

If it's C-generated assembly code, you'd be in for a really hard time. See =
above and take that to the power of 2. But we are a long way from that...

>> I agree. An example for code that has no higher-level equivalent might
>> be a type conversion call that has been inserted automatically. In this
>> case the user *might* have written it explicitly.
>
>In this case you probably know what statement the implicit conversion
>came from - and that is really the only grain of linking you really
>need.  1 statement to M statements is fairly easily handled (usually
>each stage has more code).

Yes. If the next step has less code, it's no problem either.

>> So it has a representation, but it does not appear in the source. The
>> editor might still display it as if the user had written it explicitly
>> (but use another color to mark it as compiler generated),
>
>If the code was written by a translation, none of the code is user
>generated, and all of it really carries the same status.

I'm referring to the generated code. That may contain conversions that coul=
d have been written as well, so there's a text representation for them.

I'll show in an example (bad example, I know, but never mind). Let's assume=
 that there is an automatic conversion from number to text. So if the user =
writes

 "there are " + n + " pigs."

he actually gets

 "there are " + n.tostring () + " pigs."

Now the editor could decompile the AST generated by the compiler and displa=
y

 "there are " + n.tostring () + " pigs."
 ----------------++++++++++++-----------

and show the compiler generated parts (+++) in grey. Only if the user expli=
citly wishes, and the source text is not changed by the editor. But so the =
user can see what he *got* in case he wonders.

Well, it's just an idea. Of course there are lots of other more important t=
hings to do. Let's just keep in mind that we might wish to implement such a=
 feature some time so we don't screw up the interfaces and make it impossib=
le.


--

Regards,

Hans-Dieter Dreier
(Hans-Dieter.Dreier@materna.de)=

--sYCJ6kdIyGO3lWGixBoLq1JtGTAyL9T6
Content-type: text/plain; charset="ISO-8859-1"
Content-transfer-encoding: quoted-printable

IDENTIFIKATIONSANGABEN:
a22041a.txt IA5 DX-MAIL X.400 User Agent=

--sYCJ6kdIyGO3lWGixBoLq1JtGTAyL9T6--