types and operators
Nathan Hawkins
utsl@one.net
Thu, 25 Apr 1996 21:14:04 -0400 (EDT)
On Thu, 25 Apr 1996, Eric W. Biederman wrote:
> C. strings
> I favour using counted strings, and providing a good set of
> operators to work on them. We will need to mark strings so that the
> garbage collector knows to leave them alone.
>
> I'm not sure why the gabage collector needs to leave these allone.
> Except that it needn't look for pointers in them. But certainly if
> it can be proved that the string can't be used the GC should reclaim it?
I'm sorry, what I meant was that the GC shouldn't bother search strings
for pointers. Therefore, strings must be marked so the GC knows what they
are. QED.
> II. Optional types
> A. buffers
> Buffers are very close to strings, but might concievably need
> different treatment. Mostly, there will need to mark memory objects which
> the GC should not access, and operators that can access them.
>
> I'm confused like I am for strings. What we probably need to support
> is something to generate arrays, and a character data type.
I knew I was forgetting something... A character data type and a string
are not quite the same, although closely related.
> There will need to be some thought put into stack operations. I was
> planning on keeping the standard Forth ones. Any comments on this?
>
> Sounds good. Except the words like pick and roll that don't have a
> fixed number of arguments in the usual sense. The rest swap, drop,
> dup etc, I can do.
>
> I have figured out how to do type-safe operators (compile-time
> type-safe) and identification of all pointers at run-time with no run
> time cost. What it does require though is a restriction on how words
> can manipulate the data stack. That is words must have a fixed
> number of arguments and a fixed number of results.
This restriction is would get obnoxious in almost no time. We're talking
about writing a very extensible language which can be expanded to handle
doing a lot of things dynamically. In your design, I couldn't even write
a function like printf! Not to mention breaking the separation between
the data and return stack. I'd rather stay closer to Forth.
> The reason I make the above restriction is as follows. If words are
> to use a single stack as C does they can be implemented as allocating
> frames upon the stack. The frame holding the funcions arguments, it's
> return locate, the address of the last words frame, the local
> variables, a buffer to receive results from functions, and most
> important a tag that tells what everthing is in the frame.
Well, in Forth, we don't use local variables, anyway. The restriction
that all types of arguments and return values would force us to emulate
C++ templates when we want to create functions that accept and return
variable types: create a function for each permutation that might be used.
Sorry, but this design is much better suited to a traditional procedural
language with multiple return values.
> As well as allowing garbage collection, and definite identification of
> things, it will also allows, some of the trickier things with lambda
> functions where they same their return stack. I'm really not certain
> what kind of discipline is required of functions that get returned
> through multiple times to keep it all consistent. (We can cross that
> bridge later). Also this enables all kinds of traditional
> optimsations that C can use, and possibly allows using a hacked
> version of GCC as our code generator.
Not really, C would not be very friendly about having multiple return
values, especially since C on the x86 returns values in EAX. As for
hacking GCC for a code generator: you're out of your mind. Have you taken
a close look at the source to GCC? It'd be better (and probably almost as
fast :) to write the code generator in a combination of Awk, Sed, Perl and
Icon, with a big shell script piping things all over the place....
While you're at it, you might as well build a front end for ADA as well.
> Things only need to be orderly to present to the garbage collector
> when a function calls allocates memory from the heap or calls some
> function that might. So in general at function call time.
Maybe not even that often. We could probably get by with having the GC
run only at programmed points, which would allow us to simply make sure
that nothing significant is on the stack at those points. This is
consistant with the cooperative multitasking model Fare has spoken of.
> I have implemented a small test interpreter in C that makes most of
> this work. What it uses to make this work is it's own data stack (as
> nothing else has one.) All items placed upon this data stack are
> tagged with their type. This allows the interpreter to check and make
> certain that the correct types are passed to a function. After the
> sanity check the intrpreter gathers all of the arguments off of the
> data stack, passes them to the word, then when the word returns
> gathers the results back up (looks up their types) and pushes them
> back upon the data stack. A compiler should be able to avoid a lot of
> this work, and generate really good code.
>
> I hope this is a framework you can live with, as better eludes me.
I think I'd rather put up with type munging. :( :( :( You see, I'd like
to remain as close to Forth in design as is possible, but with the
needed extentions. (As well as dumping some minor stupidity in the Forth
I/O words.) But I do want it be close enough that most ANS Forth would
run, provided it obeys the rules of ANS Forth.
Your design, while neatly avoiding the problem of finding pointers,
breaks an unacceptably large number of things. (A good solution, all the
same, just more suited to building a C over Forth.)
In general, I'd like to stay away from stack frames. I'd prefer to not
have those at all, or let the hll use them if it wants. But not in the
lll, or its calling convention. Those are just my preferences, though.
We'll have to see what other people think.
*utsl*