Steps

Matthew Tuck matty@box.net.au
Fri, 26 Feb 1999 18:53:26 +1030


Hans-Dieter Dreier wrote:

> I think we should start the editor as simple as possible, using a
> character based approach. I know it's oldfashoined, but it's the simplest
> thing we can do. I imagine that switching to a GUI should not be _that_
> difficult if we use appropriate display classes. See note on intelligent
> editor.

Regardless of this, we will still have to do cross-platform widgets to
get
a graphical intelligent editor at some point in the future.

Is a CUI a useful stepping stone though?   What would the extra effort
in
writing and then throwing out the CUI get us?   It does not unblock
other
development effort like writing a non-intelligent version which we throw
out to later use an intelligent version does.  There are already plenty
of
text editors out there we can use to program.  You can't do a very
intelligent editor with a CUI.

Then again, maybe I'm visualising wrong.  After all, a lot of the early
effort might be infrastructure like defining languages, translations,
framework, generators etc.  If you take this view the UI effort would be
small, in which case, going straight to GUI is not such a huge effort
anyway.

I know if we apply my argument above universally all those little bits
of
extra effort would add up, but I see the editor as a really important
thing
 - important enough to program the way we want it first time.  If you
were
designing a graphical browser, would you create a text-based browser
first?
  I wouldn't, maybe others feel differently.

I'm also worried that the interfaces will turn out to be workable yet
non-optimal in a GUI world.  Now I've said I'm not worried about
backward
compatibility, but translation is still, as far as I know, an untested
alternative, so I'd like to avoid it is much as possible unless we need
it.

> The more third party software we can use, the better. Hopefully it isn't
> too BIG and complicated.

If you don't want a feature, you don't use it.

> If by "bootstrapping you mean using the system to compile itself, I think
> that will come in a later stage. If you mean the startup process, I'll
> come to that later in this posting.

I meant the former.  I think it should happen before the editor.  If we
integrate cross-platform GUI capability into the language like Java,
then
everything we need to go intelligent would be there.

> The runtime's structure should consist of a small "core": MM and VM, and
> a generator for the very minimum of object infrastructure for VM to start
> up and call object loader, as well as "plug-ins" that are formed from C++
> routines that form the library and can be reached from the object level
> ...

I would really like to be able to bootstrap before writing the
intelligent
editor.  This way we can write it straight in Ultra.  I think to-c would
be
 the quickest way to do this.

What I was thinking of for a first-generation interpreter was something
that interpreted an AST like this:  Look at the root statement, then
interpret that, using the children of the root and so on.  Not
necessarily
doing any translation.  Memory allocated would be a simple matter of
using
Cs allocator which contains a header and the data.

I think we're coming from different backgrounds here.  You're proposing
a
minimal editor, while I'm proposing a minimal interpreter.  It's not
that I
 think the interpreter is not important, it's just that I think the
editor
is where the new stuff will happen, so I want to get there as soon as
possible.

> I would like to allocate large chunks from whatever is the next allocator
> beyond our control - that may be malloc for a start, or the OS later.
> Reasons are the following:1. I don't know anything about their
> performance, so I'd like to limit their impact by using them as seldom as
> possible - but that's just a gut feeling.

Well you might be right - I've already stated I'd like plug-in
allocators/collectors for the interpreter - but is performance a good
reason to do this in first release?

> 2. Since we will have our own GC, we need to be able to traverse all
> objects, hence they need to be linked somehow. The next allocator beyond
> our control does the same thing, of course.

First-generation might not even need GC.  First-and-a-half probably
would
though.  =)

> Duplicating that would be a waste of memory
> and time; using their structures would be not portable. Using someone
> else's GC and not having to roll our own would be OK with me, _but_ since
> we have the chance to create a GC-friendly environment, an off-the-shelf
> GC would probably be overkill in terms of code size, complicatedness and
> time consumption.

I would not bother working too much on the memory system until we're
ready
to bootstrap.  Once we have an interpreter, people can start writing a
compiler in Ultra, followed by a bootstrap.

> I read some alarming things about GCs that are designed to be
> used with C(++) - they have to *guess* <shudder> what might be a pointer,
> for example.

Not really alarming.  GCs that have been designed to work with
GC-hostile
memory systems (read: most C(++) compilers) have to make conservative
assumptions about what is a pointer to avoid early deallocation.  The
disadvantage is that other data could look like a pointer, causing
memory
that is safe to deallocate not to be, but this is supposedly only a
small
problem.  For a first implementation of the VM in C, do you really
care?  A
 conservative GC would make our programming a lot easier.  The problem
would disappear after bootstrapping.

> The memory layout I suggested certainly is debatable. I gave considerable
> thought to your idea of separating memory allocation blocks and objects,
> allowing to pack several of the latter into one of the former. Finally I
> came upon a solution that I would like to present to discussion (in a
> separate posting, to follow soon).

No, I didn't suggest this, although I'd certainly be interesting in
looking
 at your proposal.  What I suggested was the optimiser converting
several
language objects into one VM object.  As far as the VM is concerned
there
is one object per memory allocation block.

By "memory allocation block" I'm assuming you mean a block of memory
suballocated from a large block allocated from C, or under my proposal,
something actually actually from C.

> I hope we will be able to minimise using "C's pointer tricks" by
> employing a custom-made operator new and form our objects from C++
> structures (without v table). Pointer tricks will only be used by MM and
> GC and will be confined as much as possible.

That's basically what I meant.  If you allocated a large chunk as a
(void
*) you'd need to do pointer arithmetic to return an object pointer in
there.

> Not exactly familiar, but I read a bit about them. I think I can guess
> how a bitmap would work and I know how the buddy system works.I did hole
> list. The tail of each object could be unused space, so there were not too
> many "hole objects". Access to objects was exclusively through an object
> table  (indirect),which allowed easy moving of objects (no fixups
> required), but I wouldn't design it exactly the same way again because of
> that indirection (performance hit) and the need to closely monitor the use
> of actual pointers inside MCs, which discourages heavy use of movements.

Bitmap is the standard allocation on disk file systems - I have to admit
ignorance in which system is more common in memory systems.

> Text is easiest to start with. It can be prepared with the simplest of
> tools and is a nice exchange format. Loading a (binary) memory image is
> another option (see "persistent storage). See, this is just sort of an
> assembler.
> We'll abandon that thing some time later, but we got to have a test
> driver and a loader at first - we simply can't wait until we have a full
> blown language.

OK, if we need this, it can be text.

> Sure, why not? It is meant as a means to enter "binary" data (i.e.
> non-references) without too much "semantics" attached.

That's what I meant really - references (including NULL), binary data,
and
arrays and records of these.

> How would that object creation code look like? Could you do that without
> engaging the C++ compiler? I would not like to have to compile and link
> every time I want to change something in the (test) setup. I also would
> appreciate if the distribution could do without a C compiler.

Yes, it would be done with C code.  So the point is not to have to
recompile code to change the initial value of the objects.  What sort of
things would you want to change?  Are you just looking for a sort of INI
file?  If so, that would be better than implement a whole persistence
system to be thrown away.

> What do you mean by "state"?

I mean the state of the object, i.e. any data attached to it, as opposed
to
 "static" class-wide data and code.

> Firstly, if we want to execute code directly (I would prefer that)
> instead of having to use the C compiler, there has to be a machine. The
> simplest one IMO is a stack machine.Code for a stack machine (i.e. postfix
> notation) actually *is* a (flattened) tree or at least most easily
> convertible (see below).

Why both flattening it?  I personally think executing an AST would be an
interesting way of doing things.

> Secondly, this is not meant to be the last word. Remember, my general
> principle for a start is "Keep it as simple as possible without
> sacrifycing too much flexibility", to get us off the ground as quickly as
> possible.
> Since initially there will only be toy "applications", code compactness is
> last priority (IMHO). BTW, execution will be quite fast however.

But the AST is closer to the language, as it is only a slightly modified
parse tree.  If fact we could use parse trees.

>> What do you mean by execute object here?
> I'll give an example: To calculate 1 * 2 + 3 * 4:

How is this executing one object?

>...
>     NumberThree      // Push third operand
>     NumberFour
>     Multiply               // Push multiply op
>     NULL                  // Execute multiply op
>     Add                    // Push Add op
>     NULL                  // Execute

Interesting way of doing a stack machine, actually pushing the
operators.
I guess this would allow you to not know what operation you're pushing.

> Each line stands for a fullblown reference, each 4 bytes. 40 Bytes total.
> You see: This is an awful waste of memory, but IMO it doesn't matter for
> now. Execution is extremely simple, so it is very fast. There is NO type
> checking ...

I would say early development is the exactly the time to be checking
this
sort of thing.

It's interesting we are taking different tacks on this ... I'm just
thinking ... are you proceeding from a point of view that the VM would
exist before the compiler?  I was thinking more about parallel
development,
 and hence the AST is directly available to you.

> The parser table generator should be sufficiently versatile to accept
> quite a range of syntaxes; it could be upgraded to handle syntax features
> that do not fit into its original design, or if we decide to use a more
> sophisticated syntax table format. It would produce output for object
> loader, which is text and sufficiently readable at least for debugging
> purposes.

I think I was reading you wrong - parser table generators could be
ported
OK, I was thinking more of parser generators, a la generating code.

> At IBM they have an interesting approach to graphical programming (I
> don't remember the name right now, but you certainly have heard of). But
> when I looked at that confusion of edges and objects, messed up with
> interspersed text, I decided to refrain from real graphics (where you
> place your objects anywhere in a plane). IMO, a tree has exactly the right
> balance of navigational ease and the capacity to hold *large* amounts of
> data while providing a nice structuring and clean layout.

You might be referring to visual dataflow here.  That is, joining boxes
(operators) and have data flow along the edges between them.  I'm not an
expert on dataflow programming, but I know there are also text-based
dataflow languages (SISAL is one I think).  This might be another
application from the translation hierachy, but using a different
underlying
 AST.

This brings me to talking about my proposed hierachy.  The editor would
be
designed to support any language.  This way you could say write your GUI
using a GUI builder/language, generate a parser via a parser language,
and
write some hooks using Ultra.  At the end, they would all be translated
to
Ultra and compiled.  Each of these languages could have different views
as
well.

Making the editor like this could spawn all manner of domain-specific
languages which simplify programming in Ultra.  For example, one
application I would like to write would be a role-playing game engine. 
You
 would have all sorts of "languages".  One might be a monster definition
language, map definition language, ettrc.  They would all be linked in
with
 prepared code and you have an instant game.

Then there is no reason to force Ultra to be the required to be used. 
It
could be a framework for any language translation.

> Yes. A powerful debugger also needs complete introspection facalities: It
> must be able to access all living objects including the call stack.
> That's one of the reasons why I so strongly advocate mixing the binary
> with the code (and documentation) in a development environment.

Well this is really a no-brainer, existing systems manage this.  What
you
mean by "mixing of code and data" could either mean logically or
physically.

> For a release, one would
> strip off all the source stuff and supply a leaner release ruintime
> environment, *but* if need be one could still retain some parser to
> generate code "on the fly". I found this extremely valuable to customize
> program behaviour in places where I was sure that users would have demands
> I would not think of in the first place, e.g. print layout. I could simply
> provide some code, to be stored in an INI file, which would extract all
> the data they wanted and even issue some SQL statement if neccessary.

Oh yes, certainly you could link to code on the fly.  This is necessary
for
 plugins like views and languages into the editor (the languages,
translations and views are open ended in number, you shouldn't have to
compile them in).  It's pretty much like dlls.  But further, you could
sandbox the code like Java allows.

Then you could generate it yourself.  A parser is not the way I'd do
this
however as it is view-specific.  I'd just create the AST directly,
possibly
 using a helper API.  Then it's a simple matter of running the AST,
which
might involve compiling-running, interpreting, or semi-interpreting.

-- 
     Matthew Tuck - Software Developer & All-Round Nice Guy
             mailto:matty@box.net.au (ICQ #8125618)
       Check out the Ultra programming language project!
              http://www.box.net.au/~matty/ultra/