types and operators

Eric W. Biederman ebiederm@cse.unl.edu
13 Jun 1996 13:53:55 -0500


Nathan Hawkins <utsl@one.net> writes:

> 
> The following was written just before I left... Sorry about the long 
> delay, but maybe we can finish this subject, and get some code going.
> I did do a little research into Lisp implementations, and I'm starting to 
> favour option 2 (see below), but as I recall, at the time I wrote this  
> (about a month ago) I was for method 2.
> 
> ----------
> 
> This is a summary of the possible designs for implementing pointer/int
> distinctions, along with pros and cons. If I've left out anything important
> out, please let me know.
> 
> 
> 1. a small tag within a 32-bit int/pointer
>    Pros:
>      * probably comes closest to normal Forth
>      * a cell on the stack is the same as a cell on the heap
>    Cons:
>      * loses bits from every pointer and int :-(
>      * requires overhead in operators
> 

<commentary>
Actually I think I favor this one.  At least for a simple start up.

If we let the tag be the least significant bit, and have it _set_ for
pointers we lose no efficiency.  For a pointer load we can simply load
the address one byte before a pointer.  This also had advantages like
2+ and CELL+ are equal as in traditional forth.  Since anything less
then 1+ on a pointer would greivously confuse the garbage collector I
guess that means native unicode support, since we have 16 bit
characters <grin>

When we needed to manipulate all 31 bits we would simply need to be
sure that the garabage collector doesn't get called inbetween times.

I would further implement the stacks(RETURN, DATA) so that they grow
towards each other, and allocated on the heap.  This would allow
multiple threads, and threads that are asynchronous could simply
allocate their own subheap on the heap.  This has all the
possibilities for recursive heaps someone else mentioned.

This doesn't do a lot for native compilation, or ultimate efficiency
but it is workable for a boot strap implementation.

For real efficiency on non-forth hardware the only real possibility is
native compilation.  Anton Ertl ( I think that's his name) on
comp.lang.forth has done some very good research into native
compilation of forth.  Combinding his work with the work on compiling
lisp, scheme, ML, would really be the best approach.  But not
immediately :)  

</commentary>

> 2. Lisp-style cells, a pointer w/tag, and a 32-bit data element
>    Pros:
>      * hl things would be easy to implement
>      * doesn't lose any bits from pointers and ints :-)
>      * don't need to worry much about fragmenting the heap
>      * the stack could be implemented as a list on the heap
>    Cons:
>      * costs twice as much memory for each cell
>      * probably quite inefficient
> 

<commentary>
I think those extra 31 unused bits would get very tempting to use for
something.  I can imagine all kinds of hacks we could regret later.
Also I don't quite see how that would bye us much but one bit at the
expense of 31 for implemtation purposes.
</commentary>

> 3. traditional Forth, with a conservative collector
>    Pros:
>      * would be _very_ easy to implement
>    Cons:
>      * conserative collector - need I say more?
> 
> 4. stack frames, with types tagged in the frame
>    Pros:
>      * supposedly would allow easier identifying of types on the stack
>    Cons:
>      * needs type tags copied into the heap separately
>      * breaks Forth concepts entirely
> 
> 

5. Native code compilation with a type smart compiler.
   Pros:
     * Very efficient.
     * Runs fast.
   Cons:
     * hard to implement
     * compiles slow ( likely faster then C though)

<commentary>
We really need to get to some version of 5 in the future.

What would be ideal would be to save enough information in word
headers so we can native code compile words after we have Forth
compiled them.
</commentary>

------------------------------
The real challenge to get right will be the implementation of the
data space dictionary manipulating words so they work alright in a
garbage collected setting.  The fact from ANS forth that dictionaries
can be broken into many pieces data, code, etc should help a lot.  The
challenge is to tell where garbage collected blocks start and end.

One bullet that shouldn't be too hard to bite is pointers into the
middle of objects.  I would simply keep a lookaside table with the
length and location of objects, sorted by location (for binary
search), and stack allocate the things.


> Of these, I think 1 and 2 are our best bets.
> 
> *utsl*