CORE CODE!

Dr. J. Van Sckalkwyk (external) SCHALKW@odie.ee.wits.ac.za
Sun, 18 Dec 1994 16:59:49 SAT


PROPOSED CORE CODE FOR IMPLEMENTATION OF LLL

This letter has 3 parts:

1. Symbolic assembler (J) conventions

2. Core code, with explanation

3. Recapitulation.


I would have liked to have explored various conventions in more
detail prior to this, but the time seems right to "come clean" 
about my ideas!



1. Symbolic assembler conventions
=================================

a. Syntax
---------

All references are to DWORDs (32 bits)
All numbers are hexadecimal
Naming and operand conventions are 80X86 flavoured
 (for now, but no bias implied).


b. Registers, Stacks, memory
----------------------------

The core code assumes that the following exist:
2 General purpose registers AA, BB
3 Pointer registers pp, qq and rr that point to three blocks of memory
(already set up) termed MEMpp, MEMqq and MEMrr

    MEMpp contains "program" to be interpreted
    MEMqq contains "meanings" of each item
    MEMrr is a table of addresses of "primitive interpreters"

A stack, called yY.




c. Instructions
---------------

The symbolic instructions are:-
GET     (into general purpose register, from pointer location)
ADD     (to a register, another register)
TM      (test under mask)
        --> Ze means that result of test of all masked bits was zero
        --> Nz means that one or more bits were nonzero
PUSH        (register to a stack)
MOVE        (one register into another)
BRANCH      (conditional or not)
SHR     (right shift by a certain number of bits)
CALL        (ie push return address to local stack and load
        address called into instruction pointer)



2. Core code, with explanation
==============================

2a. Core code
-------------
Evaluation starts at the entry point "Entry:"

    ...
    BRANCH  Entry       ;Begin..

                                                          ;LINE
Store:  PUSH yY,AA          ;store item on stack            ;1
Entry:  GET qq,pp           ;get next item to interpret **  ;2
        ADD pp,00000001     ;move source pointer right      ;3
                            ; (spare                        ;4
                            ; lines)                        ;5
        GET     AA,qq       ;get meaning                    ;6
        TM      AA,80000000 ;IS IT AN ACTIVE ITEM?          ;7
        BRANCH  Nz,Store    ;  NO: branch up and store..    ;8
        MOVE    rr,AA       ;  YES: move AA into rr pointer ;9
        SHR     rr,24       ;shift 24 bits right            ;10
        GET     BB,rr       ;get address of interpreter     ;11
        CALL    BB          ;call interpreter               ;12
        BRANCH  Entry       ;continue                       ;13


2b. Basic Explanation
---------------------

1. This assumes that the format of the "program" that is being 
    interpreted
    is in Reverse Polish Notation (although I have other cute ideas,
    let's assume this for the time being). This is implicit in the
    ADD pp,00000001 .. and what follows.


2. We obtain the first item to be interpreted. (GET qq,pp). The item
    we obtain is actually a 32 bit number, which points to a memory
    block containing all the "values" of all the items being used!
    We get the value of the first item (GET AA,qq). If the highest
    bit is SET, then we simply put this value on the stack.

3. If however the value has its highest bit clear (as
     determined by the TM instruction) then it is handled 
    differently. I have at present allowed three options:-

    a. "verbs"
    b. "friends"
    c. "objects"

    So the table in MEMrr would have the following structure:

    Offset      Address
    ------      -------
    +0          DoVerb
    +1          DoFriend
    +2          DoObject
    +3..on      ErrorPatch

Before I go into detail about the differences between, and need for,
the three, let's try a concrete example.


Example 1.
----------

We need to evaluate the expression:

    fred mary TIMES jane PLUS



from left to right. The basic scenario is:
    1. find the value of fred, put it on stack;
    2. find value of mary, put it on stack;
    3. pass control to TIMES, who gets mary and then fred off stack,
        evaluates them, and puts answer on stack;
    4. find value of jane, put on stack;
    5. pass control to PLUS, who gets jane and value of (fred*mary)
        off stack, adds them together, and puts result on stack.


Now, it is easy to see how this fits into our core code:

    A1. in evaluating the example from left to right, we first
    encounter a _pointer to_ the value of fred, obtaining this
    by GET qq,pp.

    2. We next get the value of fred by GET AA,qq.

    3. We check to see that this is not an "active item" by TM etc.

    4. It isn't so we store it on the stack (PUSH yY etc), 
    and continue..

    B1. repeating the cycle, we evaluate mary..

    C1. we evaluate TIMES: here, the leftmost bit indicates
    an active item (here the verb TIMES). We transfer control to
    TIMES (the details of which will be explored below), 
    and continue..

    D1 etc.

 

2c. Verb, Friend, Object
------------------------

1. Verbs
--------
    These are primitive constructs such as TIMES, PLUS (or MUL, ADD)..
    "DoVerb" might be as follows:

    Assume:
    1. A memory block MEMss, pointed at by pointer ss;
    2. The symbolic instruction AND

                                                    ;LINE
DoVerb: MOVE    ss,AA       ;point to verb list     ;14
        AND     ss,00FFFFFF ;isolate low 24 bits    ;15
        GET     AA,ss       ;get verb address       ;16
        BRANCH  AA          ;& branch to it         ;17



2. Friends
----------
    These are simply expressions (like "fred mary TIMES jane PLUS")
    so the simplest possible scenario for passing control to them
    would be just:

    (Assume symbolic instruction RETURN i.e. pop instruction
    pointer from local stack)

                                    ;LINE
DoFriend:   MOVE    pp,AA           ;18
            RETURN                  ;19

    This will need a certain amount of salad dressing
    to make it palatable, however, in the form of a "call/return"
    structure. We also need to look at parameter passing..


3. Object
---------
This is where we come to the crunch. My proposal would be to have all
of the above encapsulated as an object ie. the various memory spaces
we have discussed, together with items and their meanings, etc. Such
an object could be invoked in a similar fashion to a verb.

This does NOT imply huge objects, as, depending on the implementation
and number of variables etc, the objects may be quite small. Also,
the "core verb definitions" (including address table of verbs etc)
_could_ be shared by several objects, in fact, this is a logical
(although not mandatory) approach.

Once this has been done, there is no reason why parts of, or the
whole of, an object should not be compiled for a particular machine.
In fact, the above structure lends itself to particularly easy
compilation. (I have done it before).



3. Recapitulation
=================

a. A simple, symbolic assembly language called "J" is illustrated.
It is not machine specific.

b. It is used to create a primitive interpreter.

c. Interpreted code can then be encapsulated as an object.

Please note that the above is only a sketch. A ton of modification
will be needed to make it fully functional. I merely wanted to toss
this into the pool and see how people felt about it! I have also
_left out_ considerable detail about the different types of
"PASSIVE ITEM" and how they could be handled by various verbs, error
checking etc etc etc. Note there is also a bit of cheating with the
use of "General purpose register BB" in the statement "CALL BB" and
likewise with "BRANCH AA". We have not talked about conditional
instructions and how they fit (or don't) etc etc etc. 

Am I being too simplistic?


Any comments?


Bye for now, JVS.