types and operators

Francois-Rene Rideau rideau@ens.fr
Mon, 17 Jun 1996 10:59:27 +0200 (MET DST)


>>>>:utsl
>>>:Eric
>>:utsl
>:Eric

>> Sorry, I meant to say a favoured option one when I wrote this stuff.
>>
I also favour it as a first implementation.
Again, I favour modular design that would allow us to change such
setting as easily as possible...


[a summary of the possible designs for implementing pointer/int distinctions]

>>>> 1. a small tag within a 32-bit int/pointer
>>>>    Pros:
>>>>      * probably comes closest to normal Forth
>>>>      * a cell on the stack is the same as a cell on the heap
>>>>    Cons:
>>>>      * loses bits from every pointer and int :-(
>>>>      * requires overhead in operators
>              does not require overhead in operators.
>            One bit lost from pointers is trivial, 2 bits can be billed
>            to alignment
>            One bit lost for integers is also almost trivial.
>
Yeah. As Eric demonstrates (excerpt cut),
using an odd offset for all pointers is cheap,
and most memory access into the heap most often involves an offset anyway.
Plus having ints as shifted values without offset
allows int+int and/or pointer+int addition to be trivially implemented.
There still can be longints and far pointers (pointer+offset) as boxed values,
when needed.

> Essentially I propose option 3 except we represent the objects
> conservatively and let the collector be more aggressive.
> 
Uh? You just prove that option 1 can be efficient and now you say
that you propose option 3 (see below) ???


>>> I would further implement the stacks(RETURN, DATA) so that they grow
>>> towards each other, and allocated on the heap.
There are many problems about that:
* firstly, the arbitrary limits it imposes on stack size,
 that you must preallocate. Either you're optimistic and you
 (at least the thread) crash someday,
 unless you have a lot of trouble implementing fall-back solutions,
 or you're pessimistic, and you waste a lot of space, without a guarantee.
 And of course, more space wasted means more GC passes, and/or more swapping,
 hence important loss of time.
* Secondly, such implementation means that all continuations are of
 "call me once" type, that sharing frames is not possible among threads;
 this means we loose Scheme first-class continuation capability.
 Surely, many Common LISP people argue against it,
 but I think it's a Good Thing to have *both* kinds of continuations,
 using Clean-like uniqueness annotations on types (=linear logic)
 to allow both shared and unshared continuations.
 Because call-multiple continuations can trivially emulate call-once
 continuations, but not the converse, I propose we first implement
 call-multiple continuations; plus it'll make it easier to implement
 a Scheme interpreter on top of the LLL so we can test it, debug it,
 and benchmark it.
* Surely real-time threads should reserve fixed space,
 but this would be done off the heap, unless we have a (costly) real-time GC.
 Perhaps real-time threads could have their own shared real-time heap,
 still my idea is that guaranteed hard real-time is uselessly costly
 for most normal operations while necessary for some interrupt-based programs.

>>> This would allow
>>> multiple threads, and threads that are asynchronous could simply
>>> allocate their own subheap on the heap.  This has all the
>>> possibilities for recursive heaps someone else mentioned.
>>>
Sub-heaps are a good idea. But then comes the problem of wasting a register
as a universal index to all accesses (see Win32Forth). A solution to not
do that would be that most GCs be non-moving, but for a daily GC scheduled
when noone uses the computer interactively.


>>> This doesn't do a lot for native compilation, or ultimate efficiency
>>> but it is workable for a boot strap implementation.
>>>
When we don't care at all about efficiency, we should favor simplicity;
here, simplicity would be emulating a stack with the heap,
as we already have the heap anyway.


>>> For real efficiency on non-forth hardware the only real possibility is
>>> native compilation.  Anton Ertl ( I think that's his name) on
>>> comp.lang.forth has done some very good research into native
>>> compilation of forth.  Combinding his work with the work on compiling
>>> lisp, scheme, ML, would really be the best approach.  But not
>>> immediately :)
>>>
Yeah. I have a collected pointers to articles and texts about
optimizing compiler technique for languages like our LLL and HLL,
for use when we're bootstrapped...


>>>> 2. Lisp-style cells, a pointer w/tag, and a 32-bit data element
>>>>    Pros:
>>>>      * hl things would be easy to implement
>>>>      * doesn't lose any bits from pointers and ints :-)
>>>>      * don't need to worry much about fragmenting the heap
>>>>      * the stack could be implemented as a list on the heap
>>>>    Cons:
>>>>      * costs twice as much memory for each cell
>>>>      * probably quite inefficient
>>>> 
>>>
Note:
* that the stack be implemented as a list on the heap or not
 is an independent issue.
* Elk (an embedded Scheme interpreter,
 that I chose as the reference platform for Scheme->Tunes development)
 migrated from option 1 to option 2 between release 2.2 and 3.0;
 I don't have benchmarks, but they would be interesting;
 Surely the ability to have full 32-bit values without using bignums
 when talking to C programs was considered important.
* If we're using 64-bit values for cells, we might as well use the
 extra field in pointers as an integer offset for far pointers.
* If we're going on such paths, perhaps making everything boxed
 without having a priviledged 2-word structure (see option 4 below).

Subnote: when I say "word", I mean the native word size,
 which is 32-bit on the i386, though 16-bit on 8088;
 16-bit is halfword to me on the i386,
 much like 32-bit is half-word on the Alpha, etc.


[About using a double-word implementation]
> That's the problem.  You will use those.  The code we write, or the
> techniques we chose will not resemble those of Forth.
If we choose the GC-formatted double-word as our Forth cell, that's ok.

> If our guide for the LLL is to be Forth it should be essentially Forth.
Double-word seems the most straightforward way to implement ANS Forth
on top of a GC anyway (true for option 1, too). ANS Forth seems difficult
to me to fully support with a GC.
I see how we can have a single-word-celled non-ANS Forth
that efficiently takes advantage of GC,
and a full double-word-celled ANS Forth that runs slowly
on top of the above.
Or an ANS Forth that is required to cooperate so much with the GC
that it can't be really used both as a usual Forth and a GC'ed Forth
(that is, an existing program would run by disabling the GC and
system multithreading, while the GC primitives would allow writing
completely different programs).




> Under
> Option 1 we can write a GCing ANS Forth as fast as any other.  Under
> this plan who knows what we will write but it won't be Forth.  Further
> the compiled code will be implemented quite differently.
>
No no. Again, if we use double-word objects as the basic unit (see Elk 3.0),
they will also be the basic Forth cell, so that's no problem.
Still, I prefer option 1...

> My point is we'll get lost in this implementation and never get
> anywhere as Fare was in M4 and assembly.
>
:( :( :( :(
 #####             ##            #####             ##
#     #           #             #     #           #
#     #          #              #     #          #
 #####   #####   #               #####   #####   #
#     #          #              #     #          #
#     #           #             #     #           #
 #####             ##            #####             ##




>>>> 3. traditional Forth, with a conservative collector
>>>>    Pros:
>>>>      * would be _very_ easy to implement
>>>>    Cons:
>>>>      * conserative collector - need I say more?
>>>>
I have strong feelings against a pointer-conservative GC
***in a persistent environment***, like Tunes will be.
When you're persistent, any uncontrollable memory leak is deadly,
and such kind of conservatism is uncontrollable.
Note that minor GCs could be pointer-conservative and non-moving,
as long as major GCs are fully tracing.


>>>> 4. stack frames, with types tagged in the frame
>>>>    Pros:
>>>>      * supposedly would allow easier identifying of types on the stack
>>>>    Cons:
>>>>      * needs type tags copied into the heap separately
>>>>      * breaks Forth concepts entirely
>
That's the standard way of doing it with C/C++, whether conservative or not.
Hence, the only basic object is the pointer,
and the GC needn't make any distinction among values,
as it will use structure descriptor that will hand it
all pointers but only pointers.
   This surely breaks some Forth concepts,
but descriptors for stack layout could be pushed on the stack,
or on yet another stack...
Seems a valid approach when using mid-grained to coarse-grained objects
on heap, like C/C++ programmers are encouraged to use anyway.
If we're doing Scheme and Forth, this is sure becomes
unusual programming style.


>>> 5. Native code compilation with a type smart compiler.
>>> [...] We really need to get to some version of 5 in the future. [...]
>>[use threaded code as target]
Yeah, you're all right. Looks like we're redoing OCAML.
Let's see if we can do better than them...


>>> ------------------------------
>>> The real challenge to get right will be the implementation of the
>>> data space dictionary manipulating words so they work alright in a
>>> garbage collected setting.  The fact from ANS forth that dictionaries
>>> can be broken into many pieces data, code, etc should help a lot.  The
>>> challenge is to tell where garbage collected blocks start and end.
>> 
>> That's one of the reasons I'm favouring using cons cells. That problem 
>> neatly goes away. The garbage collector would only need go through the 
>> heap, and would not need to be aware of the dictionary.
> 
> The Forth heap, is the Forth dictionary, but it's not as general.
> Let's keep it that way.
>
I'm not sure what you mean. Having the GC apply to code as well as everything
else (perhaps not for all passes of the GC, at least not in a moving way)
seems a great way to me to get rid of dead code
(like boot code and unused drivers after initialization)
without any extra magic. They say a lot of the Linux footprint
is due to such dead code with associated messages and probing data...
   Anyway, this topic seems fairly independent to me from the heap
being made only of cons cells or not. Surely, LISP machines have
dictionaries with only cons cells, so we could do the same.
I'm not sure we should, but why not?


> As far as , C, ALLOT et al. the dictionary data space words.  They
> simply allow incremental allocation of space of the Forth dictionary.
>
Note that incremental allocation must be either atomic wrt GC,
or must be done in a GNU obstack fashion;
as with GNU obstack, we could have the best of both world by
falling back to obstack-like behavior whenever
Note that initialization of arbitrarily large areas (incremental or not)
involves some care not to break real-time requirements from the GC (if any).
This could be implicit to normal "user" wordsets,
and explicit to "system" wordsets.


>>> One bullet that shouldn't be too hard to bite is pointers into the
>>> middle of objects.  I would simply keep a lookaside table with the
>>> length and location of objects, sorted by location (for binary
>>> search), and stack allocate the things.
>> 
>> Again, this also would be irrelevant where cells are individually
>> equivalent.
>> :-) Only "strings" would need this sort of thing. Probably we'd partition 
>> the heap into two sections, one for variable length objects without 
>> pointers, and the other for cons cells. No lookaside table needed.
>> 
> 
> Let's say only vectors need this sort of thing.  And Vectors turn up
> frequently in normal Forth.  Anyhow I meant it's easy, so we shouldn't
> worry about it.
>
I'd rather say that we should only used aligned pointers,
and not infix pointers. This would simplify most everything.
As for allowing infix pointers in a further implementation,
I'd suggest we organize objects so that a same page holds
similar sized objects (at least, similarly aligned, rounding size up),
so that locating the beginning of the object is just one/two memory
lookup plus divide. Remember: what's expensive is memory lookup.
Though nominally efficient, binary trees are slow for what
ought to be atomic operations.



> -----------------------------
> Finally, a New Thought
> 
> I think that the connection between LLL and HLL should be that they
> share a common internal native code compiler Language.  And can call
> each others routines Natively.  The LLL should be for exposing the
> machine as much as possible while making it easy to implement.  The
> HLLs should be for writing cool aps etc. 
> 
I mostly agree, though global optimization of a HLL program
involves HL techniques completely foreign to the LLL.

> If we can tack a GC onto Forth,  will remaining ANS compliant (At least
> after we load the ANS compatibilty wordsets) we will have acheived
> something I think no other Language has done, and show how truly
> reflexive forth really is.
> 
My guess is that ANS Forth requires infix pointers,
so that either we have a GC that manages infix pointers (ouch),
or we use far pointers (ouch ouch).

---------------------------
Enough TALK about GC.
jql and I have adapted hbaker's real-time read-barriered GC in C,
but I don't like it after all, because it's either much too costly,
or not so real-time.
   I'm writing a series of GC in Scheme,
to be translated automatically or manually into Forth then C/Asm,
which is essentially a matter of transforming prefix syntax into
postfix syntax with proper primitives.
   A first GC (lame stupid flat mark&sweep) is almost finished
[Note to jql: cleanup is longer than I thought]
but needs some cleanup before I am satisfied enough to post.
Further GCs should be easier to write upon this skeleton.

This first series of GC is such that
* integers == 0 [modulo 2] (shifted once)
  pointers == 1 [modulo 4]
  reserved == 3 [modulo 4] (for forwarding pointers? weak pointers?)
* heap objects are arbitrarily sized (but aligned), with a fixed header
 (1 to 3 words depending on the GC).
* only infix pointers are allowed
* The GC use a queue of objects to inspect,
 hence every objects imply reserving space not only
* the very first GC is a dumb mark&sweep
* other may include stop and copy, mark&don't sweep, or treadmill.
* subsequent GCs would be incremental with a read or write barrier.

A second series would add page-grouping or similarly sized/attributed objects.
A first-generation would be implemented over the page-grouped GC;
the first-generation could be either the

The raw heap could be first generation before such an organized multi-heap,
and a stack would be used on top of that.

As for GC in general, see the GC FAQ
Particularly interesting links there are Paul Wilson's Report and FTP site,
and Henry G. Baker's page.
	http://www.centerline.com/people/chase/GC/GC-faq.html


== Fare' -- rideau@ens.fr -- Franc,ois-Rene' Rideau -- DDa(.ng-Vu~ Ba^n ==
Join the TUNES project for a computing system based on computing freedom !
                TUNES is a Useful, Not Expedient System
URL: "http://www.eleves.ens.fr:8080/home/rideau/Tunes/"