types and operators

Eric W. Biederman eric@flinx.home
Thu, 25 Apr 1996 19:02:26 -0500


   On Tue, 23 Apr 1996, Eric W. Biederman wrote:
   > Now all we have to do is figure out is what kinds of data types and
   > which opeations on them, our lll will support.

   I. Data types we will _need_
      A. integers
	 I'd prefer integers to be based on 32-bit or greater. (16-bit isn't 
   worth messing with, IMHO.) We need arithmatic, logic, comparison, etc. 
   All of which will have to work with whatever munging we do to the 
   integers to distinguish them from pointers, so I'd like to be able to do 
   that in 1 or 2 instructions, if possible.

See the end but I don't think any munging is required.

      B. pointers
	 Pointers will need some basic operations, and some design decisions 
   will have to go into dealing with pointers into objects vs. pointer+offset. 
   We'll need operators to access all the supported types through pointers, 
   with indexing.

Operations for pointers seperate from those for integers, will
probably save us in the long run, as pointers sometimes take on the
stranges formats.  A lot of the pointer + index work can be relaxed
because a tidy house only needs to be presented at function call time.
In addition for large objects that pointer would ususally step through
recording where the object starts and have the garbage collector look
it up, shouldn't be a problem.

      C. strings
	 I favour using counted strings, and providing a good set of 
   operators to work on them. We will need to mark strings so that the 
   garbage collector knows to leave them alone.

I'm not sure why the gabage collector needs to leave these allone.
Except that it needn't look for pointers in them.  But certainly if
it can be proved that the string can't be used the GC should reclaim it?

   II. Optional types
       A. buffers
	  Buffers are very close to strings, but might concievably need 
   different treatment. Mostly, there will need to mark memory objects which 
   the GC should not access, and operators that can access them.

I'm confused like I am for strings.  What we probably need to support
is something to generate arrays, and a character data type.

       B. floating-point
	  Floats would be a mess to do, especially when it comes to 
   distinguishing them from integers, pointers and strings. This can 
   probably be added later.

With my current design (see below) easy as pie.

       C. double-length integers
	  I'd like to have these from the start, if possible. Variations of 
   the integer operators, and some conversion operators.

Sounds good.

   There will need to be some thought put into stack operations. I was 
   planning on keeping the standard Forth ones. Any comments on this?

Sounds good.  Except the words like pick and roll that don't have a
fixed number of arguments in the usual sense.  The rest swap, drop,
dup etc, I can do.

   Also, I'd like to have a some ideas about how to do pointer/integer 
   distinction with a minimal instruction count for operators. Since + - * / 
   etc. will all have to convert LLL integer --> machine integer 
   conversion and back, I'd like it to be kept to something we can do with a 
   minimal instruction count.

See below.  Count is an amorized constant.

   Finally, a question of operator behaviour: should operators be type-safe, 
   or should they just produce undefined results? For instance: what happens 
   when you add a pointer to an integer with the integer addition operator?
   Also, if operators are type-safe, what should they do when given the 
   wrong parameters? (Exception-handling?)

I have figured out how to do type-safe operators (compile-time
type-safe) and identification of all pointers at run-time with no run
time cost.  What it does require though is a restriction on how words
can manipulate the data stack.   That is words must have a fixed
number of arguments and a fixed number of results. 

The reason I make the above restriction is as follows.  If words are
to use a single stack as C does they can be implemented as allocating
frames upon the stack.  The frame holding the funcions arguments, it's
return locate, the address of the last words frame, the local
variables, a buffer to receive results from functions, and most
important a tag that tells what everthing is in the frame.  

As well as allowing garbage collection, and definite identification of
things, it will also allows, some of the trickier things with lambda
functions where they same their return stack.  I'm really not certain
what kind of discipline is required of functions that get returned
through multiple times to keep it all consistent. (We can cross that
bridge later).  Also this enables all kinds of traditional
optimsations that C can use, and possibly allows using a hacked
version of GCC as our code generator.

Things only need to be orderly to present to the garbage collector
when a function calls allocates memory from the heap or calls some
function that might.  So in general at function call time.

I have implemented a small test interpreter in C that makes most of
this work.  What it uses to make this work is it's own data stack (as
nothing else has one.)   All items placed upon this data stack are
tagged with their type.  This allows the interpreter to check and make
certain that the correct types are passed to a function.  After the
sanity check the intrpreter gathers all of the arguments off of the
data stack, passes them to the word, then when the word returns
gathers the results back up (looks up their types) and pushes them
back upon the data stack.  A compiler should be able to avoid a lot of
this work, and generate really good code.

I hope this is a framework you can live with, as better eludes me.

Eric