Steps

Hans-Dieter Dreier Ursula.Dreier@ruhr-uni-bochum.de
Thu, 04 Mar 1999 02:49:23 +0100



Matthew Tuck schrieb:

> Hans-Dieter.Dreier@materna.de wrote:
>
> > Maybe someone should look out for GUI and CUI packages that are simplest
> > to use, compare them and then we will see. As to how intelligent an editor
> > you can write using a CUI, what does "intelligence" have to do with the UI
> > chosen?
>
> A lot of intelligence has to do with user feedback, and hence interface.

But certainly there can be a CUI equivalent to everything you need in a text
editor?

> Well we would be using the Ultra GUI capability if we wrote it in Ultra.

Looks like a lot of work.

> Is the reason you want to go to CUI because of the extra effort of GUI?

Yes.

> If so, would the necessity go away if we could initially do the Ultra
> GUI capabilities with an existing GUI toolkit?

Yes. If a equivalent GUI would exist, I would prefer that.

> >> If you don't want a feature, you don't use it.
> > But you have it in your header files and libraries, and maybe have to link
> > it as well.
>
> Yeah it's going to be there, but should be able to be optimised out.

Sure, it can be optimised out. But that requires extra effort, and the *source*
still is cluttered with stuff that is not needed. IMO the problem is not so much
that the resulting program could be big or slow, but rather that the development
process is slowed down and the source is bloated. You see this very good in MS
VC++: Because you need KLOCs of header files, they had to invent "precompiled
headers" simply to reduce compile time to acceptable values. This optimisation
(enevitably) adds complexity, because now one has to to watch out when to
recompile this precompiled header (which can be as big as 5 MB, BTW). This is no
extreme example, but everyday practice.

> > GUI stuff IMO tends not to be as modular as one could wish.
>
> How do you mean not modular?  What examples are you familiar with?

Well it's modular if you get only what you need, and do not have to take what
you don't need. Example: Microsoft Foundation Classes. It's all in a few DLLs
and you have the choice to either link it statically (then you can avoid a lot
of unneccessary stuff, but by no means all, and there is no more sharing if you
have multiple executables) or dynamically (then you get all but it may be shared
if other applications currently in memory use the same version).Example:
Netscape (can be viewed as a graphics library for HTML / Java): Is pretty big
and slow, and you got to take it all.

> > Sorry, I didn't express myself clearly. By "bootstrapping" I meant to

> > produce *executable* code - not generated C source code.

> They're the same thing.  If you have C (or Java) code you can make
> executable code on all platforms.  That's the quickest way for us to do
> that.  The point is we have a reasonably fast implementation to base
> bootstrapping on.  It's not a permanent solution.

Don't you underestimate the effort to produce C (or Java) source code ?IMO to
directly produce threaded code (as shown in my example) is so much less effort
than to produce C (or Java) source. Maybe sometime later producing C may be a
reasonable platform to produce new MCs directly from Ultra source rather than
use the C compiler directly to produce new MCs (which is as portable as your
proposal).

Producing C is somewhat like a detour: You already got a representation at a
level that might be an intermediate level of a C compiler, but re-convert it
back to a higher level, only to get it compiled once more by the C compiler. Of
course this cannot lead to good performance. And, BTW, are you so sure that
C(++) code really is so platform independent? My impression is that a lot of
people get into trouble if they use a back-end C compiler other than the (few)
ones specified to work. (Well that's just my impression).

> > IOW, the Ultra source code of the whole system goes in, and the executable
> > that is just compiling itself comes out, without C compiler or linker.
> > Maybe I'm biased  by my Centura experience - they do it exactly like this,
> > and I loved it.
>
> Sure, it is best if we can translate to native code, but this has to be
> done on many platforms.  We don't have the people or the knowledge to do
> this for a while.  Although I imagine someone here would know a few
> things about machine code for some processor and the OS interfaces.

I agree. That is the reason why I said that I don't see real bootstrapping for a
while.

> > Or you might compose it from componentware (COM objects, for example),
> > small building blocks that fit together nevertheless, because they were
> > written to common standards. They are written and refined in small steps
> > and you get a working environment almost instantly - even before the
> > language design has been done. The top language level ("glue") would be
> > some scripting language.
>
> I have to admit to not really understanding how COM and stuff works,
> would you be able to explain how it works?

COM objects are self-contained entities which reside in dynamic libraries and/or
executables.They communicate with the outside world through "interfaces"
(roughly equivalent to v-tables) which provide services. There is no direct
access to state from the outside. Interfaces are required to never change; if
you want to add features to an interface, you are required to define a new one
(you can keep the old one, of course). All interfaces inherit from a minimal
one, known as "IUnknown", which provides the ability to request a pointer to
another interface that the component supports ("exposes"), if the ID of that
interface is specified. Once you have the interface pointer, you can call
functions of that interface the same way you would call a virtual function. This
is very fast. You have to maintain a reference count (free the interface pointer
if it is no longer needed) so the library/executable can unload when no longer
needed. There can be multiple instances of an interface, produced by a so-called
class factory (IMO a misleading term). Then each instance gets its own, distinct
interface pointer, which is used by the COM object to determine to correct
instance (internally).

> > The building blocks are VMs, as you might have guessed, and the "glue"
> > language is Ultra (or object loader, for a start). The MCs (and VM and MM)
> > are the only parts that are written in C++ (Some day in Ultra, hopefully).
> > The compiler will produce ASTs or threaded code right from the start, not
> > C code. This is much easier to do and the result can be executed
> > immediately, with no intervening steps.
>
> Yes, I agree.  Perhaps I should illustrate my own steps.
>
> 1a Write text to AST compiler.

OK

> 1b Write simple AST VM.

OK - will be finished before 1a because it is much simpler

> 2a Write AST to C generator.

I would delay that.

> 2b Include necessary library features for bootstrap.
> 3a Rewrite compiler in Ultra.

OK

> 3b Rewrite VM in Ultra.

IMO not necessary because VM is so simple. Will automatically be done by your
step 4b

> 3c Include necessary library features for GUI.

Should be 3 or even 2

> 4a Write editor.

Should be 3

> 4b? Write AST to native generators.
>
> > The quickest way: Perhaps in the short run (though I wouldn't take that
> > for granted), but we would be tied to to-c for an awfully long time after
> > that.
>
> I don't see it like that.  Portable code ensures we can have people
> writing stuff on all platforms.  Nuisances like having to have a C
> compiler will only encourage us to do better.

People will have to code in C (or let it be C++) anyway. I'd like to avoid (or
at least reduce) nuisances right from the start. Remember, this is one of my
main motivations - to avoid nuisances.

> > It also means that the language must be (almost) fully designed, and the
> > compiler be ready. This will take its time.
>
> I don't intend to stop language design anytime in the next few years.
> See my legacy shackling posts.

Sure, so do I. But if the bigger the code base is, the bigger the effort to port
it from one language version to another. Or the bigger the temptation to stay
compatible by all means, even if that means to use only the second best
approach. IMO design decisions are easier if the code base remains as small as
possible for as long as possible. Having a language that is sufficiently fully
designed to allow writing a compiler in it requires a pretty large code base,
however. If we start from an assembler (like object loader), the code base is
significantly smaller.

It boils down to the question: How big should the steps be that need to be
completed before you get the next working version. Here I can only speak for
myself: I like these steps to be as small as possible, even if there are more of
them. It simply boosts motivation if you see that you get somewhere. If I have
to write a lot of stuff and test it all at once to get my next version running,
chances that I loose motivation are much bigger.

> > >I think we're coming from different backgrounds here.  You're proposing
> > >a minimal editor, while I'm proposing a minimal interpreter.
> >
> > I don't agree here! If you take a look at the VM input code sketched in
> > the original posting, you will see that it is *designed* to be interpreted
> > by a *minimalistic* VM. I'd be surprised if it has more than 10 lines of
> > code, including comments.
>
> 10 lines?  Are you serious?

Perfectly serious. Look at the parser skeleton. You could almost take the same
code as the "parser". VM (as I see it) consists just of a little loop where the
next threaded code is fetched, checked whether it is NULL, and pushed (or
executed). Everything else is done by the MCs called. Half the code is used for
the "catch" clause that is used to break out of the execution loop (and handle
errors).

> > The editor is much, much bigger - and it is not intended to be minimal (at
> > least not as far as functionality is concerned). If I'd want a minimal
> > editor, I'd rather use a text editor off the shelf.
>
> I meant at first.

At first we should use a text editor off the shelf. Writing our one only makes
sense as soon as the object infrastucture is present (e.g. class layout, symbol
tables and so on).

> > (I really wish you could check out Centura -
> > that's my guiding star as far as editing is concerned).
>
> Is Centura commercial?  Available as a demo?  I'd certainly look at it
> if I could.

It's commercial and available as a demo. You have to register to get it.

> > But it need not neccessarily be *the* new stuff - I got some more ideas,
> > and others too, certainly...
>
> I intend to start up an intelligent editing catalogue shortly.

Please let me know when you got a draft.At the moment I'm looking at Design By
Contract - we simply need something like that. Especially, I'm thinking of what
parts of DBC can be done *statically* (not much, I'm afraid), and how DBC and
intelligent editing can be combined to get additional benefits.

> >> Yes, it would be done with C code.  So the point is not to have to
> >> recompile code to change the initial value of the objects.  What sort of
> >> things would you want to change?  Are you just looking for a sort of INI
> >> file?  If so, that would be better than implement a whole persistence
> >> system to be thrown away.
> > Maybe you got the intention of a persistence system wrong: It is designed
> > for efficiency, to get to highest possible throughput and (secondarily)
> > small files. Therefore it stores its data in binary form, preferrably as a
> > memory image.
> > Since it is not editable, it cannot used to change a test setup.
>
> Maybe I misread your message, but I was referring to your "object
> assembly code".

I'm not quite sure what the original question was here. So I'll try to give some
general explanation and apologize in advance if I miss your point (If so, please
let me know):Since "object assemly code" is text, it can easily be changed. That
is one of its main purposes. Because everything is an object, you can change
every aspect by supplying a different "object assemply code" setup. However, you
can only use those MCs ("instructions" if you see it as an assemly language)
that have been written (in C). There will be a "main" object which will be
executed by VM after object loader has finished loading. That object may of
course invoke object loader to load some other objects since object loader
itself is a MC. If we choose to make VM another MC, something similiar to a
subroutine call (without parameters) is possible. The available set of MCs is
likely to change (grow) pretty fast in the beginning, since MCs may be small
things that are easily written.

> > I see. Well, since classes should be objects, there is no distinction
> > between them: They are all stored in objects and can therefore all be
> > manipulated using the same devices. Handling different *contents* may
> > demand different tools, however.
>
> Ok, so you want your assembly to allow code entry.

Yes. To be more precisely: Threaded code, no directly executable machine code or
C source. Maybe we choose to enhance object loader's simple syntax as a stepping
stone towards Ultra or like an assembler can be enhanced by a macro faciality -
we will see.

> >> Why both flattening it?  I personally think executing an AST would be an
> >> interesting way of doing things.
> > Pure execution speed. If it is laid out so that it can be executed
> > linearly, that is fastest. It is also very compact, since less pointers
> > are involved. But *how* the tree is *implemented* should make no
> > difference to a higher-level tool where speed is not of premier
> > importance, since tree classes will hide all those nasty little details
> > from their clients.
>
> That may be true, but my primary concern was really that an AST VM would
> be quicker to implement, since there would be no conversion from the AST
> coming out of the compiler.

The parser I sketched should already produce postfix code ready to be fed into
VM. Producing an AST may actually be slightly more complicated. Both are
equivalent, however, and easily translated into each other.

> > You got it! If you look at the ordering I gave in the original posting,
> > you can see that the compiler comes much later.
>
> To tell you the truth I can't see the compiler in there at all, other
> than maybe "parser skeleton".

Well it is just a skeleton (or maybe just a couple of bones ;=). I admit that
one needs a lot of imagination to see the whole picture.

> > Where do you want to store it in ?
>
> Well it would just dump an AST whereever you told it to.
>
> > How do you want to test & try ?
>
> Well I figured implementing them in parallel, the testing of compiler
> could be done with a AST dumper and the VM checking, the VM testing
> could be done with the ASTs from the compiler.

Isn't there a lot of mutual dependencies: You need the compiler to test the VM,
and you need the VM to test the compiler. You got to test a lot of stuff in
parallel. I would prefer to be able to test them in little bits, function by
function; this would require much less communication between development teams.

> > Of course design is parallel, but implementation is another issue.
>
> I can't see any reason why my way wouldn't work.  The compiler is
> probably logically completed before the VM.  Earlier stages are usually
> completed first in a compiler.  Since you need it right for the input to
> test the next stage.
>
> > I wouldn't like a parser generator. It would involve yet another step in
> > the pipeline from input to executable code, yet another tool, a more
> > complicated build process. And what for? If you take a look at the code I
> > submitted as a parser skeleton, it should become clear that it ought to be
> > pretty fast - like VM - no need for a hard-coded parser here - at least
> > not yet. And since it relies on components (MCs), my comparison concerning
> > "monoliths" applies here as well...
>
> Perhaps you're misinterpreting me here.  Parser generators generate
> parsers, and hence are not a part of the compiler.  They're tools so you
> can change the parser easily.

The task the parser generator has in your approach is done by the syntax table
generator in mine. In my approach the MCs making up the bits of the parser need
not change often (hopefully) since they are general building blocks; it's the
table generator's input that changes, leading to other object loader code. No C
compilation or linking involved. The executable (MM, VM, MCs, ...) remains
unchanged and can handle several language versions.

You replace the whole parser (by a newly generated one), which you got to
C-compile and link into the runtime environment. IMO that is not as flexible -
the changes that are needed are more fundamental, leading to another executable
each time the language is changed.

When I recall some of our discussions, I seem to recognise a pattern that
distiguishes the ways we look at problem solutions. When a problem gets complex,
programmers/designers tend to "layer" it to reduce complexity. For instance, if
the problem is too complex to use assembly language, we use a high-level
language. If there is too much to write using a high-level language, there are
two other approaches:
- Use a code generator (wizard)
- Use a (embedded) higher level language

There is a fundamental difference between these two approaches:
- The code generator acts like a compiler: It produces (lots of) lower-level
code
- The higher level language acts like an interpreter: It utilises a tailor-made
representation that can be interpreted.

Code generators use a longer "pipe" of subsequent representations and
translation stages (symbolic syntax, generated C code, linkable objects,
executable) as opposed to interpreters. Because code generators use more
different representations, especially low-level ones (like executable code),
their impact upon the low-level parts of the system is much bigger and the
representation levels involved cannot be as cleanly separated as they could be
using an interpreter.

Generally, I'm biased towards embedded languages. Using these always worked well
for me. In contrast, I always had problems using wizards when I was not able to
influence their meta-language (which of course would have turned them more
towards an embedded language). Maybe this can be an explanation why I don't like
parser generators and C code as intermediate representation.

You seem to favour the code generator approach; at least more than I do.

> > I'm not sure whether I understand right. Do you propose different
> > languages to be used in parallel, to support the different tasks that are
> > to be done in one project?
>
> Yes.  It's a tree structure.  Check out my "Translational Hierachy
> Framework" message.  You must have missed it.  =)

I looked it up. I didn't miss it, but somehow didn't make the connection to what
you were saying here, sorry.

> > Both. While developing, you got all in memory, as objects. You do not
> > start "manpages" or "winhelp" to get at your docu. Its already present as
> > outline items. It's seachable as well, and all the links are there
> > (provided by the structure of the source). It may be true that many
> > existing systems have complete integration of the programmer's reference,
> > but certainly not all of them.
>
> Any system that claims to have a decent debugger can show the source.

A propos debugging: How do you show the correct location inside the Ultra source
if there is an intermediate C code level? You got C code on one side and machine
code at the other - how do you match locations in C code to machine code?

If all goes through the C compiler, the debugger must be capable to handle
machine code, which makes it platform dependent and forces it to deal with
things like software interrupts and the machine stack. If you use the debugger
that comes with the C compiler, then there is no integrated environment any more
because that debugger is not Ultra-aware.

In contrast, using threaded code, it's easy (as long as you don't attempt to
trace into a MC), since you are in control of the (few and simple)
representations and translation stages involved. Which, BTW, is an argument in
favour of small MCs containing performing just one function each.

> > Look at VC++ which I does a nice job as far as documentation is concerned.
> > Even they have problems: You mark the keyword "IUnknown" and hit F1. Up
> > pops not the section you wanted to get to, but rather a dialog which
> > prompts you to select among a lot of alternatives, for each class that has
> > such a member. Because it doesn't know a thing about the *context*. You
> > even get Java stuff when you are actually using C++. If you try a keyword
> > that you defined yourself, you get no hits at all. If help were really
> > integrated, you would get to the right place instead. If there is any docu
> > for that item, of course.
>
> Can you explain a bit more our what you mean help for item here.  Are
> you referring to language help or program help?

I would call it "library help" because it is used to get information on libary
items most often. It would show parameters, return values, constraints, comments
and so on. If language features are objects, the mechanism can provide help on
them too. Since the information is directly fetched from the program
representation, it does not need an intermediate tool; it rather is a
specialized view into source code and hence up-to-date all the time.

Regards

Hans-Dieter Dreier