Steps

Hans-Dieter.Dreier@materna.de Hans-Dieter.Dreier@materna.de
Fri, 26 Feb 1999 13:53:28 +0100


--t39oz7GZfjF0WHdXyObM342UX23FEsO9
Content-type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: quoted-printable

>Hans-Dieter Dreier wrote:
>
>> I think we should start the editor as simple as possible, using a
>> character based approach. I know it's oldfashoined, but it's the simples=
t
...
>Regardless of this, we will still have to do cross-platform widgets to get
>a graphical intelligent editor at some point in the future.

Of course. Rather sooner than later.

>Is a CUI a useful stepping stone though?   What would the extra effort
>in
>writing and then throwing out the CUI get us?   It does not unblock
>other
>development effort like writing a non-intelligent version which we throw
>out to later use an intelligent version does.  There are already plenty
>of
>text editors out there we can use to program.  You can't do a very
>intelligent editor with a CUI.

Maybe someone should look out for GUI and CUI packages that are simplest to=
 use, compare them and then we will see. As to how intelligent an editor yo=
u can write using a CUI, what does "intelligence" have to do with the UI ch=
osen? The main difference IMO is that the designer is considerably more con=
strained when using a CUI screen layout, and that he probably has to write =
code to support some UI elements that are not present in a CUI. OTOH, if we=
 use a GUI, we might be constrained by its API (callback routines, memory o=
bjects that do not fit well into our environment, to name a few).

>Then again, maybe I'm visualising wrong.  After all, a lot of the early
>effort might be infrastructure like defining languages, translations,
>framework, generators etc.  If you take this view the UI effort would be
>small, in which case, going straight to GUI is not such a huge effort
>anyway.
>
>I know if we apply my argument above universally all those little bits
>of
>extra effort would add up, but I see the editor as a really important
>thing
> - important enough to program the way we want it first time.  If you
>were
>designing a graphical browser, would you create a text-based browser
>first?

IMO the editor should not be the first step, since in the beginning most pr=
ogramming will be in C++ anyway. I think it is no good idea to try to cente=
r development around the editor - that might bias a lot of design decisions=
 away
from general solutions, towards GUI support (which is quite important, to b=
e sure, but not for _general language design_).

>  I wouldn't, maybe others feel differently.

I wouldn't either, but for probably for different reasons: I would (at leas=
t try to) use a Windows TreeControl, try to re-feature that so that it supp=
orts items of variable height, try to use Windows RichEdit controls for the=
 items, and that's it. But alas, then it's tied to Micro$oft. I wouldn't li=
ke that.

>I'm also worried that the interfaces will turn out to be workable yet
>non-optimal in a GUI world.  Now I've said I'm not worried about
>backward
>compatibility, but translation is still, as far as I know, an untested
>alternative, so I'd like to avoid it is much as possible unless we need
>it.

By "translation", do you mean porting a CUI application to a GUI environmen=
t? That has been tried and I always found the results not really convincing=
. If the CUI were designed with that translation already in mind, that migh=
t make a difference, however.

>> The more third party software we can use, the better. Hopefully it isn't
>> too BIG and complicated.
>
>If you don't want a feature, you don't use it.

But you have it in your header files and libraries, and maybe have to link =
it as well. GUI stuff IMO tends not to be as modular as one could wish.

>> If by "bootstrapping you mean using the system to compile itself, I thin=
k
>> that will come in a later stage. If you mean the startup process, I'll
>> come to that later in this posting.
>
>I meant the former.  I think it should happen before the editor.  If we
>integrate cross-platform GUI capability into the language like Java,
>then
>everything we need to go intelligent would be there.

Sorry, I didn't express myself clearly. By "bootstrapping" I meant to produ=
ce *executable* code - not generated C source code. IOW, the Ultra source c=
ode of the whole system goes in, and the executable that is just compiling =
itself comes out, without C compiler or linker. Maybe I'm biased by my Cent=
ura experience - they do it exactly like this, and I loved it. No makefile!=
 No linker!
You just hit "Make executable" and a few seconds later it is done.

Some days ago I got a binary copy of SmallEiffel from the net. It needs the=
 GNU C compiler to process its ouput. That has to be installed separately. =
Sucks!

But I think we have a quite different approach. I'll try to make it clear u=
sing an example: Different ways to write a business application.

You may write it "monolithic" (that word smacks of dinosaurs and Microsoft =
Word, I know, which is quite intended here) and add "features" to that mono=
lith afterwards. IMO that is what you're ending up with if you produce C co=
de (no offense intended :)

Or you might compose it from componentware (COM objects, for example), smal=
l building blocks that fit together nevertheless, because they were written=
 to common standards. They are written and refined in small steps and you g=
et a working environment almost instantly - even before the language design=
 has been done. The top language level ("glue") would be some scripting lan=
guage.

The building blocks are VMs, as you might have guessed, and the "glue" lang=
uage is Ultra (or object loader, for a start). The MCs (and VM and MM) are =
the only parts that are written in C++ (Some day in Ultra, hopefully).
The compiler will produce ASTs or threaded code right from the start, not C=
 code. This is much easier to do and the result can be executed immediately=
, with no intervening steps.

I'm not saying we should use COM, beware! The analogy ends here. But we mig=
ht consider using dynamic libraries containing MCs...

Another example that comes to mind is tk (of tcl/tk). It is quite similar, =
except that tk is just a library, not the language itself.

>> The runtime's structure should consist of a small "core": MM and VM, and
>> a generator for the very minimum of object infrastructure for VM to star=
t
>> up and call object loader, as well as "plug-ins" that are formed from C+=
+
>> routines that form the library and can be reached from the object level
>> ...
>
>I would really like to be able to bootstrap before writing the
>intelligent
>editor.  This way we can write it straight in Ultra.  I think to-c would
>be
> the quickest way to do this.

The quickest way: Perhaps in the short run (though I wouldn't take that for=
 granted), but we would be tied to to-c for an awfully long time after that=
.

It also means that the language must be (almost) fully designed, and the co=
mpiler be ready. This will take its time.

>I think we're coming from different backgrounds here.  You're proposing
>a minimal editor, while I'm proposing a minimal interpreter.

I don't agree here! If you take a look at the VM input code sketched in the=
 original posting, you will see that it is *designed* to be interpreted by =
a *minimalistic* VM. I'd be surprised if it has more than 10 lines of code,=
 including comments.

The editor is much, much bigger - and it is not intended to be minimal (at =
least not as far as functionality is concerned). If I'd want a minimal edit=
or, I'd rather use a text editor off the shelf.

> It's not that I
> think the interpreter is not important, it's just that I think the
>editor
>is where the new stuff will happen, so I want to get there as soon as
>possible.

I agree here - the editor (as I see it) will be a great achievement to make=
 programming easier. (I really wish you could check out Centura - that's my=
 guiding star as far as editing is concerned).
But it need not neccessarily be *the* new stuff - I got some more ideas, an=
d others too, certainly...

>> I would like to allocate large chunks from whatever is the next allocato=
r
>> beyond our control - that may be malloc for a start, or the OS later.
>> Reasons are the following:1. I don't know anything about their
>> performance, so I'd like to limit their impact by using them as seldom a=
s
>> possible - but that's just a gut feeling.
>
>Well you might be right - I've already stated I'd like plug-in
>allocators/collectors for the interpreter - but is performance a good
>reason to do this in first release?

Maybe *my* main motivation is to stay in control of memory layout - I admit=
 that. It's just a feeling: The whole thing might get screwed up if we cann=
ot properly control how memory layout is done.

>> 2. Since we will have our own GC, we need to be able to traverse all
>> objects, hence they need to be linked somehow. The next allocator beyond
>> our control does the same thing, of course.
>
>First-generation might not even need GC.  First-and-a-half probably
>would
>though.  =3D)

Maybe, but that would mean very limited system run time if there were no wa=
y to reclaim significant amounts of memory other than GC.
BTW, in an environment that can be designed as GC friendly as one might wis=
h, writing a GC should be really easy.

>> Duplicating that would be a waste of memory
>> and time; using their structures would be not portable. Using someone
>> else's GC and not having to roll our own would be OK with me, _but_ sinc=
e
>> we have the chance to create a GC-friendly environment, an off-the-shelf
>> GC would probably be overkill in terms of code size, complicatedness and
>> time consumption.
>
>I would not bother working too much on the memory system until we're
>ready
>to bootstrap.  Once we have an interpreter, people can start writing a
>compiler in Ultra, followed by a bootstrap.

I don't agree here. We need a memory system to get the interpreter (VM) run=
ning. As soon as there is MM, VM and object loader, people can start writin=
g MCs and play with them. One of the first tasks might not even be a compil=
er, but a runtime system (some classes needed by the compiler, such as hash=
 tables), *then* a parser (the code generator part is negligible when a VM =
is used). The MCs might be as little ot as big as is deemed appropriate. On=
e could write a single MC that contains a complete parser with syntax attac=
hed (but I would not advocate this) or break it down to tiny parts.

Since in the beginning the "glue" language will be object loader source (wh=
ich is no fun to program in, like all assemblers), MCs might tend to be big=
ger, shifting programming towards C++. Later, the reverse might be true (I =
hope so, at least).

> A conservative GC would make our programming a lot easier.  The problem
>would disappear after bootstrapping.

True. It also would make the outcome more unpredictable. It *is* possible t=
o break a C++ GC if you do the wrong thing. I'm not convinced, but I also h=
aven't worked with GC in C++ yet. It would be interesting to hear some opin=
ions of people who have experience in this area.

Another point is that we would be stuck with that C++ GC even after bootstr=
apping (if I understand that right), since it is C++ code that gets compile=
d to machine code. I'd think that a GC built for C++ cannot be as fast as o=
ne that has been designed right into the memory system.

>What I suggested was the optimiser converting several
>language objects into one VM object.  As far as the VM is concerned
>there is one object per memory allocation block.
>
>By "memory allocation block" I'm assuming you mean a block of memory
>suballocated from a large block allocated from C, or under my proposal,
>something actually actually from C.

Sorry, I'll try to make things clearer:

- A memory chunk is a block obtained from the next allocator beyond our con=
trol.
  That might be malloc or the OS itself.
- A memory object is an allocation unit obtained by Ultra's memory allocato=
r.
  Memory objects are the smallest unit that is collectible by GC
- A (logical) object is an object in the language sense, i.e. the smallest
  independently creatable object at the language level.

Several logical objects may form a memory object.
Several memory objects may form a memory chunk.
No logical object is spread over more than one memory object.
No memory object is spread over more than one memory chunk.
Maybe the compiler chooses to decompose objects to smaller pieces (see our =
discussion about MI), but then these _are_ logical objects in their own rig=
ht, as far as memory management is concerned, although the user may only be=
 able to create them together.

In the memory layout which I suggested first, logical objects and memory ob=
jects were identical.

>> How would that object creation code look like? Could you do that without
>> engaging the C++ compiler? I would not like to have to compile and link
>> every time I want to change something in the (test) setup. I also would
>> appreciate if the distribution could do without a C compiler.
>
>Yes, it would be done with C code.  So the point is not to have to
>recompile code to change the initial value of the objects.  What sort of
>things would you want to change?  Are you just looking for a sort of INI
>file?  If so, that would be better than implement a whole persistence
>system to be thrown away.

Maybe you got the intention of a persistence system wrong: It is designed f=
or efficiency, to get to highest possible throughput and (secondarily) smal=
l files. Therefore it stores its data in binary form, preferrably as a memo=
ry image.
Since it is not editable, it cannot used to change a test setup.

What you called INI file would rather be object loader format, I think.

>> What do you mean by "state"?
>
>I mean the state of the object, i.e. any data attached to it, as opposed
>to
> "static" class-wide data and code.

I see. Well, since classes should be objects, there is no distinction betwe=
en them: They are all stored in objects and can therefore all be manipulate=
d using the same devices. Handling different *contents* may demand differen=
t tools, however.

>> Firstly, if we want to execute code directly (I would prefer that)
>> instead of having to use the C compiler, there has to be a machine. The
>> simplest one IMO is a stack machine.Code for a stack machine (i.e. postf=
ix
>> notation) actually *is* a (flattened) tree or at least most easily
>> convertible (see below).
>
>Why both flattening it?  I personally think executing an AST would be an
>interesting way of doing things.

Pure execution speed. If it is laid out so that it can be executed linearly=
, that is fastest. It is also very compact, since less pointers are involve=
d. But *how* the tree is *implemented* should make no difference to a highe=
r-level tool where speed is not of premier importance, since tree classes w=
ill hide all those nasty little details from their clients.

>> Secondly, this is not meant to be the last word. Remember, my general
>> principle for a start is "Keep it as simple as possible without
>> sacrifycing too much flexibility", to get us off the ground as quickly a=
s
>> possible.
>> Since initially there will only be toy "applications", code compactness =
is
>> last priority (IMHO). BTW, execution will be quite fast however.
>
>But the AST is closer to the language, as it is only a slightly modified
>parse tree.  If fact we could use parse trees.
>
>>> What do you mean by execute object here?
>> I'll give an example: To calculate 1 * 2 + 3 * 4:
>
>How is this executing one object?

If by "one object" you mean the "*" operator, for instance, it would be don=
e by the code that recognises the NULL behind the Multiply. That would fetc=
h the topmost stack item, cast that to a MC and call the code pointed to by=
 some member variable (which is actually payload of the referenced MC objec=
t).

If by "one object" you mean a piece of code that is given in the form you s=
ee below, well, each line will have been translated to an address by object=
 loader.
VM only fetches the contents found under one address and checks whether it =
is NULL. If that's the case, the stacktop is executed, see above. If not, i=
t is pushed onto the stack (*not* the C stack, of course; a stack object th=
at is subject to GC is needed here).

>>...
>>     NumberThree      // Push third operand
>>     NumberFour
>>     Multiply               // Push multiply op
>>     NULL                  // Execute multiply op
>>     Add                    // Push Add op
>>     NULL                  // Execute
>
>Interesting way of doing a stack machine, actually pushing the
>operators.
>I guess this would allow you to not know what operation you're pushing.

Exactly. It needs not know anything at all.

>> Each line stands for a fullblown reference, each 4 bytes. 40 Bytes total=
.
>> You see: This is an awful waste of memory, but IMO it doesn't matter for
>> now. Execution is extremely simple, so it is very fast. There is NO type
>> checking ...
>
>I would say early development is the exactly the time to be checking
>this
>sort of thing.

Of course we can have a VM that does some checking. In fact, not having suc=
h a device would be rather annoying as long as hand-coding in "object loade=
r" "language" (let's call that OL) prevails, since this would be quite unfo=
rgiving:
Each mistake results in an access violation...

But later on, as OL code grows more reliable since it is generated by some =
trusted tool, these checks could and should be dropped. =


>It's interesting we are taking different tacks on this ... I'm just
>thinking ... are you proceeding from a point of view that the VM would
>exist before the compiler?

You got it! If you look at the ordering I gave in the original posting, you=
 can see that the compiler comes much later.

>I was thinking more about parallel
>development,
> and hence the AST is directly available to you.

Where do you want to store it in ?
How do you want to test & try ?
Of course design is parallel, but implementation is another issue.

>> The parser table generator should be sufficiently versatile to accept
>> quite a range of syntaxes; it could be upgraded to handle syntax feature=
s
>> that do not fit into its original design, or if we decide to use a more
>> sophisticated syntax table format. It would produce output for object
>> loader, which is text and sufficiently readable at least for debugging
>> purposes.
>
>I think I was reading you wrong - parser table generators could be
>ported
>OK, I was thinking more of parser generators, a la generating code.

No, I meant "tables". If the notion "input to object loader" did confuse yo=
u: Of course object loader can load tables or trees or the like (since the =
table entries are linked, they actually form a cyclic graph), which are obj=
ects like anything else.
  =

I wouldn't like a parser generator. It would involve yet another step in th=
e pipeline from input to executable code, yet another tool, a more complica=
ted build process. And what for? If you take a look at the code I submitted=
 as a parser skeleton, it should become clear that it ought to be pretty fa=
st - like VM - no need for a hard-coded parser here - at least not yet. And=
 since it relies on components (MCs), my comparison concerning "monoliths" =
applies here as well...

>> At IBM they have an interesting approach to graphical programming (I
>>...

>You might be referring to visual dataflow here.  That is, joining boxes
>(operators) and have data flow along the edges between them.  I'm not an
>expert on dataflow programming, but I know there are also text-based
>dataflow languages (SISAL is one I think).  This might be another
>application from the translation hierachy, but using a different
>underlying
> AST.

Visual Age, that was it (strange how (my human) memory works, as soon as I =
read the keyword "visual", I remembered!) They had a similar setup like the=
 one you describe. I seem to recall that the boxes were objects and the edg=
es some relationships (message sends, calls, is-a, has-a), well, doesn't ma=
tter.

>This brings me to talking about my proposed hierachy.  The editor would
>be designed to support any language.

Agreed.

>This way you could say write your GUI
>using a GUI builder/language, generate a parser via a parser language,
>and
>write some hooks using Ultra.  At the end, they would all be translated
>to
>Ultra and compiled.  Each of these languages could have different views
>as
>well.

I'm not sure whether I understand right. Do you propose different languages=
 to be used in parallel, to support the different tasks that are to be done=
 in one project?

I think I wouldn't do it that way. Although I'm a fan of using components f=
or functionality, that doesn't apply to the user interface. The user interf=
ace must be as clearly laid out and uniform (yeah, monolithic) as possible.=
 Having to use different tools that have different interfaces and that inev=
itably do not work together well makes me sick.

>Making the editor like this could spawn all manner of domain-specific
>languages which simplify programming in Ultra.  For example, one
>application I would like to write would be a role-playing game engine. =

>You
> would have all sorts of "languages".  One might be a monster definition
>language, map definition language, ettrc.  They would all be linked in
>with
> prepared code and you have an instant game.
>
>Then there is no reason to force Ultra to be the required to be used. =

>It
>could be a framework for any language translation.

Nice idea. I didn't see it that way. Hey, it would be possible to write an =
extendable programmable video game for the kids, not just point-and-shoot. =
That is what I always looked for but never found.

>> Yes. A powerful debugger also needs complete introspection facalities: I=
t
>> must be able to access all living objects including the call stack.
>> That's one of the reasons why I so strongly advocate mixing the binary
>> with the code (and documentation) in a development environment.
>
>Well this is really a no-brainer, existing systems manage this.  What
>you
>mean by "mixing of code and data" could either mean logically or
>physically.

Both. While developing, you got all in memory, as objects. You do not start=
 "manpages" or "winhelp" to get at your docu. Its already present as outlin=
e items. It's seachable as well, and all the links are there (provided by t=
he structure of the source). It may be true that many existing systems have=
 complete integration of the programmer's reference, but certainly not all =
of them.

Look at VC++ which I does a nice job as far as documentation is concerned. =
Even they have problems: You mark the keyword "IUnknown" and hit F1. Up pop=
s not the section you wanted to get to, but rather a dialog which prompts y=
ou to select among a lot of alternatives, for each class that has such a me=
mber. Because it doesn't know a thing about the *context*. You even get Jav=
a stuff when you are actually using C++. If you try a keyword that you defi=
ned yourself, you get no hits at all. If help were really integrated, you w=
ould get to the right place instead. If there is any docu for that item, of=
 course.

Such a thing can be done using a plain text input format but it needs a sop=
histicated tool that is language specific. That needs to be run explicitly,=
 so the docs are never really up-to-date. If an outline format is used, whe=
re the items are objects and have some type, it suddenly becomes rather eas=
y and language independent.
 =

>Oh yes, certainly you could link to code on the fly.  This is necessary
>for
> plugins like views and languages into the editor (the languages,
>translations and views are open ended in number, you shouldn't have to
>compile them in).  It's pretty much like dlls.  But further, you could
>sandbox the code like Java allows.

Linking on the fly is possible, of course. But I didn't mean *link*, I said=
 *compile*. That makes a difference. You can only link pre-packaged things,=
 but you compile *source code*(i.e. plain text), hence you have a lot great=
er expressive power.

>Then you could generate it yourself.  A parser is not the way I'd do
>this
>however as it is view-specific.  I'd just create the AST directly,
>possibly
> using a helper API.  Then it's a simple matter of running the AST,
>which
>might involve compiling-running, interpreting, or semi-interpreting.

Ok. One would need "compile-on-the-fly" primarily for simple clauses, witho=
ut declarations, although even that could be done.

Where the border is to be drawn between the part of the syntax that is hand=
led by the editor and the one that is handled by the parser, depends on the=
 circumstances.
Normally, I would say, top->down to the statement level is done by the edit=
or,
and function bodies down to tokens is done by the parser.
Even for the same language, there may be different implementations where th=
e border is at a different level of language constructions. If one is using=
 a traditional text editor, the parser has to do it all, for example.

--

Regards,

Hans-Dieter Dreier
(Hans-Dieter.Dreier@materna.de)=

--t39oz7GZfjF0WHdXyObM342UX23FEsO9
Content-type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: quoted-printable

IDENTIFIKATIONSANGABEN:
a19179a.txt IA5 DX-MAIL X.400 User Agent=

--t39oz7GZfjF0WHdXyObM342UX23FEsO9--