LLL how Self does it

Jecel Mattos de Assumpcao Jr. jecel@lsi.usp.br
Tue, 20 Dec 94 17:47:34 EDT


This is a very interesting discussion, and I found Dr. J. Van Sckalkwyk's
proposal a very good start. It looks at lot like the token-threaded
interpreter Kyle Hayes used in TSOL or like the low level implementation
of the Actor language.

I thought it might be a good idea is I showed how the low level works in
the Self language ( even though it would not work for Tunes ) as it has
a lot to do with what Fare has been talking about.

There is a very simple object model that I won't go into here - the main
thing is that everything is an object. The source code is translated into
objects that have two vectors of interest:

     - literal vector: has all of the immediate values used by this
                       piece of code
     - bytecode vector: has the low level intructions for this code

Each byte code intruction has a three bit opcode and a five bit operand
field. The operand is always an index into the literal vector. Some
instructions ignore their operands. The opcodes are:

     - extend         : simply extends the operand of the next instruction by
                        five more bits
     - push           : pushes the indexed literal on the stack
     - pushSelf       : pushes the value of self on the stack
     - send           : uses the literal as the selector of a message that has
                        the top of stack as receiver, the next few elements as
                        arguments ( pops receiver and arguments )
     - selfSend       : uses the literal as the selector of a message and self
                        as the receiver. The top elements of the stack ( which
                        are popped ) are the arguments
     - resend         : like send but selector lookup is different
     - setParent      : changes even more the lookup of the next resend instruction
     - nonLocalReturn : pops the top of the stack and returns it to the caller
                        of this method

The pushSelf bytecode is not actually needed, so I only use seven. The resend and
setParent are complications due to the need to override the usual message lookup
mechanism sometimes. The selfSend looks a bit redundant, but its semantics are
more different from the normal send than is worth getting into here.

A pop bytecode would be nice as the current ones slowly fill up a method's stack
( though it is all cleared when it returns ). The reason why the Self group didn't
bother with it is that they don't interpret the code but compile it, so this
"memory leak" is fixed in the data flow analysis.

Some more hidden details: when the literal is a code block object, then the push
instruction actually does some fancy scope binding. And if the selector name
starts with an underscore, then the send or resend bytecodes will call a machine
language primitive rather than working as usual.

It might be better if I gave some examples, but as I don'e have any with me
right now I can send them to anyone who is interested.

Now for the good part: where are the branches, conditional or not? Everything
is defined in terms of objects and messages. If you send a message to the
"true" object, it does one thing, but if you send it to "false" it does another.
Looping control structures could be defined using recursion, but the Self people
hate any optimizations ( like tail recursion elimination ) that might be visible
at the user level. So there is a "_Restart" primitive that is at the heart of
all looping in the language. Of course, this means that a simple for-loop might
mean a nesting of 12 ( or more ) messages deep before you get to the primitives,
but the compiler inlines all this overhead away resulting in a machine language
code that is even closer to what C generates than C++'s is.

I would never think of using a low level code like this as the target of a
C compiler, of course. But the ideas here are well worth knowing.

-- Jecel