CORE CODE!
Dr. J. Van Sckalkwyk (external)
SCHALKW@odie.ee.wits.ac.za
Sun, 18 Dec 1994 16:59:49 SAT
PROPOSED CORE CODE FOR IMPLEMENTATION OF LLL
This letter has 3 parts:
1. Symbolic assembler (J) conventions
2. Core code, with explanation
3. Recapitulation.
I would have liked to have explored various conventions in more
detail prior to this, but the time seems right to "come clean"
about my ideas!
1. Symbolic assembler conventions
=================================
a. Syntax
---------
All references are to DWORDs (32 bits)
All numbers are hexadecimal
Naming and operand conventions are 80X86 flavoured
(for now, but no bias implied).
b. Registers, Stacks, memory
----------------------------
The core code assumes that the following exist:
2 General purpose registers AA, BB
3 Pointer registers pp, qq and rr that point to three blocks of memory
(already set up) termed MEMpp, MEMqq and MEMrr
MEMpp contains "program" to be interpreted
MEMqq contains "meanings" of each item
MEMrr is a table of addresses of "primitive interpreters"
A stack, called yY.
c. Instructions
---------------
The symbolic instructions are:-
GET (into general purpose register, from pointer location)
ADD (to a register, another register)
TM (test under mask)
--> Ze means that result of test of all masked bits was zero
--> Nz means that one or more bits were nonzero
PUSH (register to a stack)
MOVE (one register into another)
BRANCH (conditional or not)
SHR (right shift by a certain number of bits)
CALL (ie push return address to local stack and load
address called into instruction pointer)
2. Core code, with explanation
==============================
2a. Core code
-------------
Evaluation starts at the entry point "Entry:"
...
BRANCH Entry ;Begin..
;LINE
Store: PUSH yY,AA ;store item on stack ;1
Entry: GET qq,pp ;get next item to interpret ** ;2
ADD pp,00000001 ;move source pointer right ;3
; (spare ;4
; lines) ;5
GET AA,qq ;get meaning ;6
TM AA,80000000 ;IS IT AN ACTIVE ITEM? ;7
BRANCH Nz,Store ; NO: branch up and store.. ;8
MOVE rr,AA ; YES: move AA into rr pointer ;9
SHR rr,24 ;shift 24 bits right ;10
GET BB,rr ;get address of interpreter ;11
CALL BB ;call interpreter ;12
BRANCH Entry ;continue ;13
2b. Basic Explanation
---------------------
1. This assumes that the format of the "program" that is being
interpreted
is in Reverse Polish Notation (although I have other cute ideas,
let's assume this for the time being). This is implicit in the
ADD pp,00000001 .. and what follows.
2. We obtain the first item to be interpreted. (GET qq,pp). The item
we obtain is actually a 32 bit number, which points to a memory
block containing all the "values" of all the items being used!
We get the value of the first item (GET AA,qq). If the highest
bit is SET, then we simply put this value on the stack.
3. If however the value has its highest bit clear (as
determined by the TM instruction) then it is handled
differently. I have at present allowed three options:-
a. "verbs"
b. "friends"
c. "objects"
So the table in MEMrr would have the following structure:
Offset Address
------ -------
+0 DoVerb
+1 DoFriend
+2 DoObject
+3..on ErrorPatch
Before I go into detail about the differences between, and need for,
the three, let's try a concrete example.
Example 1.
----------
We need to evaluate the expression:
fred mary TIMES jane PLUS
from left to right. The basic scenario is:
1. find the value of fred, put it on stack;
2. find value of mary, put it on stack;
3. pass control to TIMES, who gets mary and then fred off stack,
evaluates them, and puts answer on stack;
4. find value of jane, put on stack;
5. pass control to PLUS, who gets jane and value of (fred*mary)
off stack, adds them together, and puts result on stack.
Now, it is easy to see how this fits into our core code:
A1. in evaluating the example from left to right, we first
encounter a _pointer to_ the value of fred, obtaining this
by GET qq,pp.
2. We next get the value of fred by GET AA,qq.
3. We check to see that this is not an "active item" by TM etc.
4. It isn't so we store it on the stack (PUSH yY etc),
and continue..
B1. repeating the cycle, we evaluate mary..
C1. we evaluate TIMES: here, the leftmost bit indicates
an active item (here the verb TIMES). We transfer control to
TIMES (the details of which will be explored below),
and continue..
D1 etc.
2c. Verb, Friend, Object
------------------------
1. Verbs
--------
These are primitive constructs such as TIMES, PLUS (or MUL, ADD)..
"DoVerb" might be as follows:
Assume:
1. A memory block MEMss, pointed at by pointer ss;
2. The symbolic instruction AND
;LINE
DoVerb: MOVE ss,AA ;point to verb list ;14
AND ss,00FFFFFF ;isolate low 24 bits ;15
GET AA,ss ;get verb address ;16
BRANCH AA ;& branch to it ;17
2. Friends
----------
These are simply expressions (like "fred mary TIMES jane PLUS")
so the simplest possible scenario for passing control to them
would be just:
(Assume symbolic instruction RETURN i.e. pop instruction
pointer from local stack)
;LINE
DoFriend: MOVE pp,AA ;18
RETURN ;19
This will need a certain amount of salad dressing
to make it palatable, however, in the form of a "call/return"
structure. We also need to look at parameter passing..
3. Object
---------
This is where we come to the crunch. My proposal would be to have all
of the above encapsulated as an object ie. the various memory spaces
we have discussed, together with items and their meanings, etc. Such
an object could be invoked in a similar fashion to a verb.
This does NOT imply huge objects, as, depending on the implementation
and number of variables etc, the objects may be quite small. Also,
the "core verb definitions" (including address table of verbs etc)
_could_ be shared by several objects, in fact, this is a logical
(although not mandatory) approach.
Once this has been done, there is no reason why parts of, or the
whole of, an object should not be compiled for a particular machine.
In fact, the above structure lends itself to particularly easy
compilation. (I have done it before).
3. Recapitulation
=================
a. A simple, symbolic assembly language called "J" is illustrated.
It is not machine specific.
b. It is used to create a primitive interpreter.
c. Interpreted code can then be encapsulated as an object.
Please note that the above is only a sketch. A ton of modification
will be needed to make it fully functional. I merely wanted to toss
this into the pool and see how people felt about it! I have also
_left out_ considerable detail about the different types of
"PASSIVE ITEM" and how they could be handled by various verbs, error
checking etc etc etc. Note there is also a bit of cheating with the
use of "General purpose register BB" in the statement "CALL BB" and
likewise with "BRANCH AA". We have not talked about conditional
instructions and how they fit (or don't) etc etc etc.
Am I being too simplistic?
Any comments?
Bye for now, JVS.