types and operators

Francois-Rene Rideau rideau@ens.fr
Sat, 27 Apr 1996 22:05:35 +0200 (MET DST)


> On Fri, 26 Apr 1996, Francois-Rene Rideau wrote:
>
>> Let's sum up things from my last post:
>> 
>>    We agree that the GC should primarily know about
>> * one kind of pointer, perhaps used to encode small integers, too.
>> * two kind of memory blocks, those made of pointers,
>>  and those made of raw data, the each block having a known size.
>> 
>> We can add a lot of refinements later, but these are the most important.
>> Strings, buffers, long integers, whatever, will be raw data to the GC.
>>    This may mean that mixing pointers and raw data require an additional
>> indirection, unless we have mixed blocks, but then, it means additional
>> GC code (not too important, but superfluous as for an initial release),
>> and an additional size tag (much more inconvenient) for these blocks;
>> we could have several kinds of blocks, too.
>>
>>    I think we agree on these points. Where there are choices left,
>> our rule has always been "implementor chooses". So if you want your
>> choice against someone else, you'd better implement it first :)
>
> Ok. Well, I've been fishing for ideas to familiarize myself with how to
> implement this stuff. I'm slowly getting some idea how I'd like to do it.
I can help when I'm finished with this WNO article...


> Would someone define stack frame for me?
I see a LLL stack frames as the heap-allocated record
of a continuation's state.
In a multithreaded system, non-running threads would be stored
as such continuations, that the system preempts or activates at safe points.


> I think of stack frames as what C does to the stack. You only need that if
> you have a single stack.
We need that because we have one "stack" per thread.
Each "frame" is actually a closure describing the current continuation:
what procedure to run with what parameters/context, among which
most probably one continuation (sometimes several, to manage exceptions
or multiple cases).

> But I was thinking we'd have two stacks per
> thread.
This is not incompatible.

> The GC will only need to scan through these stacks for pointers.
Well, the "stacks" could be roots for the GC.
But in a

> It will need to run on a special stack of its own I'd imagine, but that's
> no problem. (That's assuming we can do tell integers and pointers apart
> on the stack.)
Which means either framing, or a uniform standard encoding
for "stack" elements.


>>>> Things only need to be orderly to present to the garbage collector
>>>> when a function calls allocates memory from the heap or calls some
>>>> function that might.  So in general at function call time.
>>>
>>> Maybe not even that often. We could probably get by with having the GC
>>> run only at programmed points, which would allow us to simply make sure
>>> that nothing significant is on the stack at those points. This is
>>> consistant with the cooperative multitasking model Fare has spoken of.
>> Yes. Any function that allocates a predictable amount of stack space
>> could have this amount preallocated by the calling function,
>> so that calling functions could be required to do so
>> (remember: we're talking about LLL stuff; most non-optimized HLL stuff
>> would indeed be completely framed to allow GC).
>> Anyway, if we require the stack to be made of proper GC-able objects,
>> framing is just a matter of "closing" the stack with its current length,
>> and can be done automatically by the allocation routine that detects the
>> need for a GC.
> 
> "closing" meaning storing the address and current length, then switching
> to a safe stack? That's what I was assuming we'd do. GC-able objects
> means it can detect the difference between a pointer and an integer, right?
> (The terminology you use confuses me sometimes.)
>
That's it.
I know no proper word for "GC-able object".


>>>> I hope this is a framework you can live with, as better eludes me.
>>> I think I'd rather put up with type munging. :( :( :(
>> I think we *must* give types when there are,
>> but not reject code when there aint a simple one.
>> After all, that's the LLL,
>> and there will be more than enough types in HLL.
>>    Surely, if we can have both
>> universally and existentially quantified types
>> that depend on integers, and accept cases,
>> all FORTH primitives can eventually be well-typed.
>> But that's not a priority.
Don't be afraid with this, Nathan: it's all HLL stuff !

>>    Now for utsl, remember that any frame that might be GC'ed must be clearly
>> tagged to let the GC know things (notably size, etc). Because frames are
>> already tagged with their execution address (well, at least the frame that
>> pointed to them, and from which the GC reached them had such information),
>> it might be an "optimization" to derive size information from execution
>> address which this also means having a way to distinguish frames from
>> normal blocks; but then, there is no absolute requirement from the derived
>> size to depend *only* on the execution address; the derived size code could
>> very well pick size info from the frame. Again, this really seems
>> overly complicated. I'm all for having execution frames mostly standard
>> frames.
>>    Please also remember that this complication is not valid for
>> code between GC, whose results are transient, and that will not last
>> long enough for a safepoint to be required by our conventions.
>> Only calls to "PAUSE" should be specially
>> framed/framable, but then, this preclude growing the frame down afterwards,
>> unless we copy the frame to the end of heap before,
>> which can be all the more interesting for continuations etc.
>> [Remember that I'd like all non-interrupt-driven tasks to share the same
>> stack/heap/whatever space].
>
> I think I understood that last paragraph, could you restate the other
> parts, defining some of your terms? I'm particularly confused by what you
> mean by a stack frame.
>
I think Eric would have not kept size of stack frames in the frame,
but associate it to the code pointer that accompany any execution frame,
which would at least add another indirection, and a lot of hassle
if the code pointer is infix (plus lots and lots of difficulties
when there's variable size arguments for the framing function).
If there's a precise term you'd want me to explain, please point it out...

> Re: sharing stack/heap/whatever space between all [...] tasks. Having a
> common address space is no problem if that's what you mean. Neither is
> sharing a heap, assuming that allocations are not interruptable. But how
> can you share a stack? This is impossible, as I understand it. Please
> enlighten me. (Unless you mean something other than using the native
> PUSH/POP instructions, and do something more complicated.)
>


>>    As a side note, we still haven't decided how "return addresses"
>> would be encoded. Will we require block-aligned pointers, and have
>> "base+offset" encoding ? Will we allow "infix" code, and go through
>> the hassle of identifying the beginning of a block each time ?
>> The first may be the simplest to implement, and though it is inefficient
>> at run-time, we don't care for the initial implementation. Let's do it
>> and wrap it with enough macros so that the user doesn't see it.
>
> Well, I would answer that with the following question: how hard would it
> be to indentify the beginning of a block each time?  Especially since the
> allocation routines could be coded keep track of blocks. For example, we
> might set up an ordered array of pointers, which can then be searched for
> the largest pointer less than the one we're looking for. This should
> identify the block, and could be done with a bsearch.
>
See below about another infix pointer remark...

> Infix pointers would be somewhat more trouble for the GC, but drasticly 
> more efficient and convenient to use. Particularly when you consider that 
> we'd have them on the return stack, and would have to convert the IP into 
> a base+offset.
>
   More efficient: that's a matter of compromise between GC-time and
non-GC time. If GC takes 2% time, but is slowed down 5 fives by a trick
that has you save 5% time in non-GC,
then you'd better keep non-infix pointers.
   Unhappily, there are no figures about such comparisons,
because up to now,
I've never seen any system that was implemented alternatively
with both methods;
moreover, each method involves its own optimizations in user code,
so that a fair comparison should take HLL code
compiled and rightfully optimized for each method.
   Personally, I favor that all pointers be block-aligned,
but those pointing to code space, which would be an interval in
address space. We could even require block-alignment without
base+offset, by framing code so that any return/continuation address
follows a frame; but believe infix code pointers are better
if they do not over complicate code.

>>    For a longer-term implementation, I propose the latter trick
>> (allow infix pointers), with some page-based hashing:
>> Objects on a same "page" will be required to be of same "GC-type",
>> so by doing simple stuff with the high-bits of a pointer
>> we can determine the action to take to see the beginning of a block
>> by checking a page directory.
> 
> Hmm. Might as well call that a multiple heap implementation.
Yes. Again, that's for a longer-term implementation on mmap()ed systems...

> We could also just put at the beginning of the object, and have the type
> and size of the object stored there.
>
This costs much more in size, particularly for small objects.
See that this means you will have a one-word tag for objects
of length 1, 2, 3 in length, which are *very* frequent.
I agree that this would simplify many things,
in an initial implementation, though.


>> The exact implementation should priviledge usual cases,
>> testing them first, and add levels of indirection for unusual cases.
>>    Objects of large size (several pages) would have their own entry.
>> Small objects would be grouped on pages where all objects
>> have the same size.
>
> Buddy system? Never really like that one much...
>
Why not ? It allows much better performance,
because quantatively different things can be treated differently.
Of course, this should not be exaggerated either,
or the code bloat will explode the caches...


[Infix pointers or not]
> A better method would be to say that for each infix pointer, once we find
> the object referred to, we can determine the type of object from a flag
> stored at the beginning of the object itself. This would be fairly
> trivial to do, if we can do a decent algorithm for quickly discovering
> the block from an infix pointer. How fast can we do this?
>
Very slowly.
   If objects are not grouped by size, with no page-alignment warranty,
this means (using an balanced btree) ten to twenty cache misses per pointer,
and hundreds of instructions just to locate the beginning of block !!!
   A page-based method would involve basically one to two cache misses
due to random memory reference, per pointer to locate, with perhaps
thirty instructions. Caching frequent cases with early hard-wired tests
could reduce the overhead below one average cache miss per pointer,
and a few average instructions.
   Of course, requiring block-aligned pointers reduces all this
overhead to zero instruction and zero cache misses !
   Remember that even minor GCs need that,
and that these affect performance a lot.

   Again, this should be modular, so we try several implementations
and let the user which behavior he will compile in his system.

   Let me recall that eventually, GC will be separated into a hierarchy
of more or less frequent GCs, the more frequent being faster,
but not as thoroughly cleaning; hopefully, major GCs would be needed
once per day, and full cross-address space GCs once a week,
so we can arrange to do that when all human users are off sleeping.


[FORTH style]
> Well, I was hoping to keep the general nature of the language, for 
> instance the RPN notation, dual stacks, : style definitions, some kind of 
> threading, etc. I'd like to keep the basic operations as much as 
> possible, particularly the stack operators, and stack behaviour.
>
> So ANS Forth compatibility could just be a small vocabulary, mapping ANS
> Forth words into LLL words.
>
I think we all agree to that.


> The real issue is the language model: I like the Forth virtual machine, 
> especially the data/return stack part. This allows drasticly more freedom
> for variable arguments and return values. Functions like printf, which
> are a real pain in C, become trivial, and we can even do similar things
> with return values. So the equivalent of scanf might be able to return
> multiple values, instead of needing to have pointers passed to it.
>
Sure, sure.

> Stack frames break this model, AFAIK.
Again, "stack frames" will just be a way to store continuations by
some kind of stack-copying; they must exist only at PAUSEs,
can be done silently by the PAUSE mechanism when no manual action was
taken. Most LLL words will know nothing of it.


>>> In general, I'd like to stay away from stack frames.
>> Unhappily, they are just *required*, not everywhere, but at checkpoints.
>> Maybe the easiest implementation is to require all data on stack to be
>> of valid "pointer/integer" type at checkpoints, so that framing is only
>> knowing the stack length. But even if we do that,
>> I prefer we insert code to allow more constraining implementations,
>> if we want to try that later.
> 
> Please explain. I was assuming that all that would be needed to work on a
> stack would be to have that annoying type distinction and a pointer/size
> to a frozen (i.e. not in use) stack.
>
Yes; now, I would have liked to remove the need for separate physical
address spaces in stacks, and use SML's "stack on heap" thingy.


> What do you mean by inserting code for more constraining implementations?
> Could you give me an example of what would be inserted where?
>
Well, some LLL routines that do some kind of call/cc thing
(to implement exceptions) would be required to call framing macros
to create a continuation, which would be basically be void in an
single-threaded implementation without escaping continuation,
and a chaining macro to continue.

Every externally-callable point (for multithreading, callbacks,
exceptions, etc) should be FRAMEd into a valid continuation.
A common implementation is that during such FRAMEing,
a new stack is created, and the older one was copied into heap space.
When a continuation is resumed, the heap-stored stack is copied
back to stack space, and the program is run back (which can happen
for both stacks).

Basically, you call the primitive FRAME to create a frame,
with perhaps a primitive to copy arguments and reserve local space together;
then you can access data with the FRAME thanks to some primitives,
and you can resume execution from a frame with a primitive named
CONTINUE (because that's how calling continuations is implemented).
Frames would be copied to stack when resumed,
but double framing allows the user to choose to use the frame on heap
(but then, be careful if calling a continuation multiple times).
If this is useful, particularly on a two-stack system,
primitives could be made available to separately frame either stack,
so as to give better control to the LLL programmer.


>>> Those are just my preferences, though.
>>> We'll have to see what other people think.
>> My idea seems to satisfy everyone,
>> but still impose hard constraints on the LLL programmer.
>
> The main constraint I see is the integer/pointer distinction being a
> nuisense. This is mainly a constraint on the primitives, and a major
> headache for the programmer who writes them.
>
Not that of a headache if the encoding is well chosen.


>> BTW, what grain should we require for PAUSEs ?
>
> I would leave that an open question, or variable, since we don't know how
> often we will need it. Also, it might be nice to be able to let the HLL
> decide this for it's code, and let LLL programmers hand code this in. :-)
>
I agree. What I meant was as an initial informal convention.


>> We agreed that LLL code should PAUSE at least every 2^-n second,
>> Where n should preferrably be one more than for the hardware timer.
>> What should be n ? I think 11 to 16 is good.
>> What can the PC RTC do already ?
> 
> Well, I can't answer the rest, but as for the PC RTC, it can generate 2^16 
> interrupts per second, at ruinous cost on a protected mode system.
>
I propose we use a 1024 Hz timer on the PC, much like Linux I believe...


> I'd like to have a model that uses a combination of preemptive and 
> cooperative tasking, in multiple heaps. So threads running in heap 1 can
> be preemptively multitasked with threads in heap 2, but cooperatively
> with respect to each other.
>
I think we all agreed about such thing.
Now that logically separate "heaps" should be physically separated
as different address spaces, or user-space vs system-space address spaces,
here is some question left to the implementor.
What I'd do is that interrupt-driven code goes in a common system pool,
and every "user" or "session" has its own address space.
The problem is how to achieve cross-address space code&data sharing,
if it can be done in a way that satisfies GC,
and how to achieve synchronization when it's needed.


> Then, let GC be done only locally to a given heap.
> This would allow GC to be done with considerably less urgency,
> since we don't halt all threads when we do it.
>
That's also what Patrick proposed earlier.
But as I pointed out, this may be the "usual" way to do things,
but a global, synchronized, GC is eventual, though it can hopefully
be done when no user is doing any interaction that would be throttled.


> The only headache this imposes is with cross-heap pointers. The best
> solution to this is to have those be indirected through a pointer array,
> and require that these be used safely, in such a way that the GC may move
> objects and update the indirected cross-heap pointer, without changing
> the other heap.
>
Also, but this doesn't remove the need for global GC,
to eliminate unrooted cross-address-space loops,
that could sometimes be associated to non-migratable objects...


> Please excuse the redundancy and rambling in this message. This whole 
> topic is getting very confusing to me.
>
   Maybe someone (me?) should write an article on how to do the GC,
but I must first finish my WNO article.
   If you have any problem, please begin to write the system
without any actual GC,
but a few GC constraints (like the pointer/integer stuff):
we can add the GC afterwards, and your code will be a good base
for further collective work...


--    ,                                         ,           _ v    ~  ^  --
-- Fare -- rideau@clipper.ens.fr -- Francois-Rene Rideau -- +)ang-Vu Ban --
--                                      '                   / .          --
Join the TUNES project for a computing system based on computing freedom !
                 TUNES is a Useful, Not Expedient System
WWW page at URL: "http://www.eleves.ens.fr:8080/home/rideau/Tunes/"