Bootstrap notes

Brian T Rice water at tunes.org
Tue Jan 6 14:36:41 PST 2004


To keep everyone up to date, I'll give some overview of what (the hell) is
in src/mobius/vm/, since eventually I'd like other people to be able to
dig into it.

First, the files and their meaning:

generator.slate: This is the byte-code compiler, written in straight
Slate.

interpreter.slate: This contains a "toy" interpreter simulator of
bytecodes. The general flow of it is correct, but I will eventually be
writing the whole thing in "pidgin", the simplified Slate usage that
translates to C directly.

memory.slate: This is a worked-over port of Squeak's ObjectMemory, which
is a managed heap of objects with tagged pointers. Nicely, I have been
able to separate a lot of logic from the actual details of the formats of
various Slate data structures. So at some point in the future, you could
look at writing "plain" GC'd applications in C (or via binary compiler in
the far-flung future?) using Slate. Right now, though, there's a decent
amount of code missing, mostly the mark-sweep core machinery (which uses a
state-machine tied to the heap; not sure I want to port this as it is),
but you can get the gist of it.

object.slate: This contains the definitions of Slate core data structures
in the perspective of the VM. Basically what Pidgin does is to translate a
prototype into a "movable struct" definition. Or to put it another way, it
defines a struct type in C that pointers can be cast to. Also, it defines
essential low-level methods for working with these types, but aren't
primitives (exported into userland).

primitives.slate: These are the "native methods" in Slate: those which
must be defined by low-level VM code, but aren't bytecodes. The bytecodes
are "primitive"; these are "native". This is an explicit design decision
I'm making to keep the terminology clear. Anyway, the interpretation of
the Slate in this particular file is nuanced right now. Basically the idea
is that the method bodies are pidgin code, but the signatures bind in a
way that makes them available to the image on startup. The "marshalling"
mechanism for this is the SpecialObjectsArray, an array of pointers to
objects which must have native methods defined on them. This is provided
in the image header, and the VM reads this on startup and binds as
necessary so that code works.

*** End of the files list.

Well, that covers the VM code per se. It doesn't cover pidgin, my idea of
a (much nicer) correspondent to Squeak's Slang. Most of that is covered as
an extension to the C support in src/mobius/c/. The actual code-translator
is in generator.slate, but it helps to cover the cr.slate files which
models C program domains, and the types and syntax. Basically what happens
is that the SimpleGenerator takes some body of code (SyntaxNodes in a
stream as read from a file, is what I'm assuming so far), and processes it
into C code, decorating a "program" variable which holds a C Module (which
is a C file plus directly-relevant header code). The "program" is just a
huge syntax structure for the output C program, and it gets processed a
bit before being emitted into strings (onto a File Stream).

What pidgin does is to "flatten" Slate code to a minor extent. All
dispatch is static, control structures are mapped into C primitive looping
constructs where possible, and method names and signatures are translated
to be sensible. I previously mentioned what happens to prototypes; they
become C struct types.

When pidgin source is read, what happens is that a database is decorated
with information: what source selector maps to what C function, and what
Slate type names map to what C struct type names. This database is local
to the invocation of a C SimpleGenerator, so you have to keep track of the
output to a C module or (someday, not now) try to mix modules together. If
there's some ambiguity in the database, it'll throw an error ideally, but
right now this is not written, so figuring out all the error cases might
not take as little time as I plan.

About signatures: what happens is that a method definition has its
selector munged to work in C, which has no multidispatch or Smalltalk
syntax. So what happens is that the selectors get the name of dispatched
arguments mixed in to their names at the right point, eg:

"d@(Dog traits) bite: m@(Man traits)" becomes "Dog_bite_Man"

The example works the same way if you define on a specific prototype, but
that prototype must either have a different name from its traits' name, or
the programmer gets to hope that they didn't define any conflicting method
names. Anyway, the function's signature is adjusted to work with those
types if there is anything relevant. The functions all must be defined on
integers or prototypes defined within the same source, and the resulting
types will all be single-word types: either integers and such, or pointers
to structs.

As a result of this, the static typing is necessary: each function/method
call has to include casts to the right type for the type-checker to work.
This is nasty, but from what I can tell, the programmer should not have to
annotate to do this. I've been considering abusing the ! notation to help
users add casts manually without too much annoyance; I'll wait to see how
much can be done with inference (or best-guesses) before then.

Wow, I see I've written a whole diatribe. A lot of this has been collected
into doc/mobius.lyx and will continue to make it in there. A few days ago,
I put up some output from it into http://slate.tunes.org/doc/mobius/ ...
there's a mobius.ps and mobius.pdf there, but the html output shadows it,
which is a shame since the html output clobbers all the tables. I'll try
to fix this up soon.

-- 
Brian T. Rice
LOGOS Research and Development
http://tunes.org/~water/



More information about the Slate mailing list