Steps

Hans-Dieter Dreier Ursula.Dreier@ruhr-uni-bochum.de
Tue, 23 Feb 1999 02:15:40 +0100



Matthew Tuck schrieb:

> Hans-Dieter Dreier wrote:
>
> > 1 Write a memory manager (shorthand: MM).
> > ...
>
> A couple of things that you don't mention - we need to write a
> cross-platform widget capability before the intelligent editor.  We may

I think we should start the editor as simple as possible, using a character
based approach. I know it's oldfashoined, but it's the simplest thing we can
do. I imagine that switching to a GUI should not be _that_ difficult if we
use appropriate display classes. See note on intelligent editor.

> be able to use the Mozilla XPToolkit - I'm not sure about what
> license-compatibility issues there would be.

The more third party software we can use, the better. Hopefully it isn't too
BIG and complicated.

> Also, bootstrapping is not mentioned.

If by "bootstrapping you mean using the system to compile itself, I think
that will come in a later stage. If you mean the startup process, I'll come
to that later in this posting.

The runtime's structure should consist of a small "core": MM and VM, and a
generator for the very minimum of object infrastructure for VM to start up
and call object loader, as well as "plug-ins" that are formed from C++
routines that form the library and can be reached from the object level in
the form of references to "machine call" objects (MCs). As soon as we have a
code generator that can actually produce machine code that is directly
executable, we are really able to bootstrap - but methinks that will not be
anytime soon. Till then we will have to form programs by combining MCs (which
is quite OK for me) and need the C++ compiler if we intend to change things
inside the MC level or add new ones. Or we incorporate the Java VM (I'm
rather relucant to consider this right now; looks like a lot of work and is
likely to impose restrictions on design OTOH).

> >  // the handle of the object at the next lower memory location
> >  MMBlockHandle mpPrev;
>
> I'm not sure why you'd want to do this here - since we're talking about
> a VM over an existing memory space, it's really just a collection of
> blocks.  The way we want to access that memory may not become apparent
> until later - and the access specification to use would then become
> apparent.
>
> Or am I misinterpreting what you're referring to here, and you want to
> allocate large chunks of memory at one time and use C's pointer tricks
> to allocate objects in there?

I would like to allocate large chunks from whatever is the next allocator
beyond our control - that may be malloc for a start, or the OS later. Reasons
are the following:1. I don't know anything about their performance, so I'd
like to limit their impact by using them as seldom as possible - but that's
just a gut feeling.
2. Since we will have our own GC, we need to be able to traverse all objects,
hence they need to be linked somehow. The next allocator beyond our control
does the same thing, of course. Duplicating that would be a waste of memory
and time; using their structures would be not portable. Using someone else's
GC and not having to roll our own would be OK with me, _but_ since we have
the chance to create a GC-friendly environment, an off-the-shelf GC would
probably be overkill in terms of code size, complicatedness and time
consumption. I read some alarming things about GCs that are designed to be
used with C(++) - they have to *guess* <shudder> what might be a pointer, for
example.

The memory layout I suggested certainly is debatable. I gave considerable
thought to your idea of separating memory allocation blocks and objects,
allowing to pack several of the latter into one of the former. Finally I came
upon a solution that I would like to present to discussion (in a separate
posting, to follow soon).

I hope we will be able to minimise using "C's pointer tricks" by employing a
custom-made operator new and form our objects from C++ structures (without v
table). Pointer tricks will only be used by MM and GC and will be confined as
much as possible.

> > BTW, I started to write such a thing several years ago (in C, not C++),
> > but at some point I got into trouble with the implementation
> > (automatic type conversions if I recall right).
> > It had MM, VM, object loader and a simple parser.
> > My source backup has been damaged, so I don't know whether
> > it would be worth trying to revive this thing and port it to C++.
>
> Are you familiar with the different memory allocation systems like
> bit-map, hole list and buddy?   If so, which would you say you did?

Not exactly familiar, but I read a bit about them. I think I can guess how a
bitmap would work and I know how the buddy system works.I did hole list. The
tail of each object could be unused space, so there were not too many "hole
objects". Access to objects was exclusively through an object table
(indirect),which allowed easy moving of objects (no fixups required), but I
wouldn't design it exactly the same way again because of that indirection
(performance hit) and the need to closely monitor the use of actual pointers
inside MCs, which discourages heavy use of movements.

> > Object loader:
> > It should accept a text format and produce memory objects.
>
> Why text?

Text is easiest to start with. It can be prepared with the simplest of tools
and is a nice exchange format. Loading a (binary) memory image is another
option (see "persistent storage). See, this is just sort of an assembler.
We'll abandon that thing some time later, but we got to have a test driver
and a loader at first - we simply can't wait until we have a full blown
language.

> > It should have a simple, single-namespace symbol table
> > to provide symbolic references to other objects.
> > It should be able to process these data types:
> > - References (to other objects that are already known)
> > - Integers
> > - Strings
> > - NULL (to provide NULL pointers)
> > - Maybe some other constants that are used to represent special object
> > handles
>
> You should probably include characters and arrays instead of strings.
> In fact I don't think the object loader or memory manager would need to
> be concerned with integers and characters at all.

Sure, why not? It is meant as a means to enter "binary" data (i.e.
non-references) without too much "semantics" attached.

> > Example how a piece of "Ultra assembler" might look like:
> > ...
>
> What you propose seems to be a sort of persistent dump description
> language.  I'm not too sure how this is used at startup.  Why exactly do
> you want to load a set of objects.  I'm assuming we're not talking about
> loading any code here, e.g. class objects, but just state.  So why would
> persistence be better than object creation code?

How would that object creation code look like? Could you do that without
engaging the C++ compiler? I would not like to have to compile and link every
time I want to change something in the (test) setup. I also would appreciate
if the distribution could do without a C compiler.

Object loader could load class objects as well as VM executable code (no big
deal, since it consists almost entirely of references). It would even be used
to set up stacks and the parser's symbol table. In one word: *All* objects
that are under control of MM, maybe with a few exceptions, such as the stuff
which VM needs to run (i.e. its stack) and the startup code (which executes
the object loader MC).

What do you mean by "state"?

> > VM:
> > This should be a simple stack machine with just a few "instructions".
>
> Why stack machine?  I believe we've talked before about distributing
> programs as ASTs - they seem to have benefits - therefore it would
> involve a runtime conversion.  For a native implementation, this would
> be OK, but a conversion and then an intepretation would have to be
> inefficient.

Firstly, if we want to execute code directly (I would prefer that) instead of
having to use the C compiler, there has to be a machine. The simplest one IMO
is a stack machine.Code for a stack machine (i.e. postfix notation) actually
*is* a (flattened) tree or at least most easily convertible (see below).

Secondly, this is not meant to be the last word. Remember, my general
principle for a start is "Keep it as simple as possible without sacrifycing
too much flexibility", to get us off the ground as quickly as possible. Since
initially there will only be toy "applications", code compactness is last
priority (IMHO). BTW, execution will be quite fast however.

> > NULL: Execute the object on top of the stack
>
> What do you mean by execute object here?
>

I'll give an example: To calculate 1 * 2 + 3 * 4:

    NumberOne        // Push first operand (reference to a number object
containing a 1)
    NumberTwo        // Push second operand
    Multiply              // Push reference to MC containing a reference to
the "Multiply" MC
    NULL                 // Execute whatever is on top of the stack (here:
Multiply)
    NumberThree      // Push third operand
    NumberFour
    Multiply               // Push multiply op
    NULL                  // Execute multiply op
    Add                    // Push Add op
    NULL                  // Execute

Each line stands for a fullblown reference, each 4 bytes. 40 Bytes total. You
see: This is an awful waste of memory, but IMO it doesn't matter for now.
Execution is extremely simple, so it is very fast. There is NO type checking
in VM or MCs; this is supposed to already have been done by the compiler (for
now, it must be done by hand). (The debug version might at least check that
what it tries to execute is *really* a MC, to avoid the exception that would
result from trying to execute a non-MC). Remember, this is a VM executing
assembly language. IIRC this type of code is called "threaded code".

One might omit the NULLs and check the object's class instead. Then there
would have to be a "do-not-execute" prefix code, however. Otherwise a MC
could never be used for anything else than executing it. At least the code
generator needs to handle MCs as data when assembling code.

To get the tree, start at the beginning. As long as there is no NULL, keep
adding leafs. When you find a NULL, build a node for these leafs. The last
leaf added was the operator. Use that node as the first leaf of the next node
to be built.
Or simply scan it backwards. Start a new node each time you encounter a NULL,
otherwise add a leaf to the current node. The next entry is the operator,
followed by its operands.
In either approach, the operator must know how many operands it has. If the
number is not known, there either has to be a DELIMITER operand to be pushed
onto the stack, to mark the start of a stack frame, or the operation must be
decomposed into a list of concatenated operations with known operand counts
each.

BTW, code resides in a separate object for each routine, of course. The
Subroutine class will have additional instance items (e.g. a reference to the
source code, a description of the parameter list and so on).

As soon as we introduce the concept of subroutines into the library, there
will be a MC which handles this:

    Parameter1
    Parameter2
    ReferenceToRoutine
    MCToHandleASubroutineCall
    NULL

The task of MCToHandleASubroutineCall would be to identify the parameter list
(maybe using the DELIMITER, see above, or looking at the routine's
description), to save VM's state and set up its new state, so that it can
execute the subroutine. VM's state would be saved in an object in a call
stack (object), under MM's control, because it is subject to GC.

You see, I'm following an evolutionary approach here, starting with almost
nothing and proceeding to ever higher order constructs.

All this may need discussion and refinement, of course. Please comment!

> > Refine parser skeleton, write Runtime Library:
> > This certainly is the main effort, and it never ends.
>
> Well in the traditional sense, the parser should end with syntax
> independence.  =)

Of course, but since it might be a nice exercise to compose the compiler from
MCs as soon as possible (you might as well call that "bootstrapping" in some
sense), first there have to be MCs to compose something of. I'm thinking of
methods to handle lists, hash tables, ... Whatever the basic building blocks
of a compiler are.

> > Parser Table Generator:
>
> If we go with intelligent editors then it's probably not worthwhile
> looking at these unless we do it first.  They are probably not going to
> fit well with the new editing paradigm.  A different sort of generator
> will be needed, and it's not worthwhile writing one that will be
> discarded.

I would not see it this way:

The parser table generator should be sufficiently versatile to accept quite a
range of syntaxes; it could be upgraded to handle syntax features that do not
fit into its original design, or if we decide to use a more sophisticated
syntax table format. It would produce output for object loader, which is text
and sufficiently readable at least for debugging purposes.

*As soon as* there is an intelligent editor, the syntax *may* eventually grow
simpler because the editor provides the "forms" to fill out. But IMO there
will always be the need to parse a plain text file, skipping the editor. At
least for some form of non-binary code exchange. Also, some people are
incredibly conservative as soon as their favourite text editor is concerned.
If they want less language awareness and more typing effort, let them have
their way!

If it is properly designed, the text editor can draw the parts of syntax if
handles from the class descriptions of, say, subroutine calls or class
definitions. I already sketched my ideas on that in a rather superficial way
(key word: properties). If you wish, I'll elaborate on that.

> > Intelligent Editor:
>
> > A *text* editor has little need for graphics, after all
> > (apart from color, bold, underline).
>
> It's an interesting point, but one of the main aims of an intelligent
> editor is to use GUIs in new ways rather than pretend we're still in the
> text file world - which - let's face it - is as old as punched cards.
> It could still be a stepping stone of course, although it might not be
> easy to move code just like that.

Of course it's just a stepping stone. I'm not *that* oldfashioned. Actually,
I'd like to write most of my programs by drag'n drop, point'n click. Using
Centura builder, you could *almost* do that, at least compared to the clumsy
way you have to write C++ code, despite IntelliSense.

At IBM they have an interesting approach to graphical programming (I don't
remember the name right now, but you certainly have heard of). But when I
looked at that confusion of edges and objects, messed up with interspersed
text, I decided to refrain from real graphics (where you place your objects
anywhere in a plane). IMO, a tree has exactly the right balance of
navigational ease and the capacity to hold *large* amounts of data while
providing a nice structuring and clean layout.

> > Debugger:
> > Should be tightly coupled with editor.
> > Need not include hardware level.
> > Needs some support by VM
> > (a "debug VM" as opposed to a "release VM" ?).
>
> A debugger is essentially an interpreter with a front-end.  It shouldn't
> be difficult to define the code with two different modes.  Allowing the
> editor to run code would be useful whether debugging or not.

Yes. A powerful debugger also needs complete introspection facalities: It
must be able to access all living objects including the call stack. That's
one of the reasons why I so strongly advocate mixing the binary with the code
(and documentation) in a development environment. For a release, one would
strip off all the source stuff and supply a leaner release ruintime
environment, *but* if need be one could still retain some parser to generate
code "on the fly". I found this extremely valuable to customize program
behaviour in places where I was sure that users would have demands I would
not think of in the first place, e.g. print layout. I could simply provide
some code, to be stored in an INI file, which would extract all the data they
wanted and even issue some SQL statement if neccessary.

Regards

Hans-Dieter Dreier