Nathan Hawkins
Tue, 2 Apr 1996 20:51:40 -0500 (EST)

On Tue, 2 Apr 1996, Francois-Rene Rideau wrote:
> I have nothing to substract to utsl's mail.
> I'll just add some remarks.

>    These look like very portable conventions,
> that'd allow the code to run OTOP (on top of POSIX).
Yeah, that's why I created them in the first place. They (conveniently) 
work well with gcc. If you look very closly, you'll notice that all 5 of 
those registers are more or less preserved across function calls by gcc. 
So I got much better behaviour calling C, which would obviously help 

>    I think we should try them, though I have many ideas
> about i386 specific optimizations that would make it incompatible with OTOP,
> like putting the environment (your edi stuff)
> in a fixed segment or page whose descriptor would be modified,

Well, I was planning to have only a handful of these fields, probably 
allocated on the heap, but make them thread-unique. These would be the 
minumum variables needed to track the context of the Forth 
compiler/interpreter. I think I had 5 cells in there, which is 20 bytes. 
Shouldn't be a big problem. But yes, I wish I didn't need the register. 
Maybe we can use a pointer at a global address, which contains a value 
equivalent to what I have in edi, and change it with every context 
switch. This would acheive the same essential results, but without using edi.
Good enough?

> which means access to those usually priviledged fields is cheap.
> This would also save that one general purpose register !
>    Also, swapping esi and esp as stack and code pointers
> could be an interesting experience, etc.

I mentioned swapping ebp with some other reg. Why swap esi and esp? I 
can't see what this would do.

>    Finally, because high-level languages may rely on a blazing fast heap,
> some stack may be merged with the heap, with a requirement that at
> PAUSEs, it follow some restrictive layout.
>    You can find some more tricks in the tunes files,
> particularly the doc/draft document and i386 oriented files.

You mean something like alloca? (C function allocating data on stack, 
automatically disappears on exiting function.) This is certainly 
possible. So is allocating stacks inside heaps, but the advantages of 
using guard pages would go away, so this would work best if only used by 
stacks we are sure won't increase beyond a fixed size.

>    Because people often like allocating consecutive pages,
> both physical or linear memory,
> some more elaborate concept might be needed to manage page
> chunks by size.

Well, consecutive physical pages would be needed by DMA, and for video 
buffers, but I didn't want to drasticly complicate the discussion this 
early. Other than that, I don't see much reason for consecutive physical 
Consecutive pages in linear space would be accomplished by some of the 
other functions which would map them somewhere. Naturally consecutive 
linear addresses need not have anything to do with physical addressing.

> > I need to have some kind of heap management,
> > and I'm leaning toward using
> > something more like malloc than the Forth setup.
> > I want to be able to
> > FORGET individual words, as well as whole vocabularies. Normal Forth can't
> > really do either, which I think is stupid.
>    Because the HLL need it altogether,
> I propose there be a garbage collected heap,
> and that such be the standard interface for memory allocation;
> also, several kinds of allocators functions shall be interfaced,
> according to particularities of allocated object:
> are they constant, pointer-free, do they require special GC measures, etc.
> Even though particular implementations
> be not required to provide its full semantics,
> moving collection and all the bells and whistles.
Hmm. That will be complicated. We'll have to work together on that, 
because my experience doesn't cover that. It sounds like you're talking 
about memory objects with various attributes, in addition to size. A 
flags cell might take of this?

> > Less than 32 bit pointers would be a nuisence:
> > either it would limit address space, or it would make accessing
> > characters awkward. Both are very difficult to accept.
>    Well, you need word-alignment for efficiency anyway.

Right, I was planning on dword aligning all heap allocations. :)

> Isolated instring character access is rare enough to suffer some additional
> cost (base+offset),
> while string access won't suffer much from it.
>    Also, only parts of the memory marked as possible pointer suffers
> this coding restriction. And only at PAUSEs and safe-points for GC
> and multithreading. Otherwise, all the usual hacks can be used
> inside a computation, plus many new ones...
>    Sure, I felt uneasy myself with such a restriction.
> But now it looks quite natural to me:
> the advantages of a moving GC far outweight the disadvantages...

Well, I'll try and implement without GC, to get things working. Then, 
once some of this stuff is up and running, we'll have to add it, since I 
don't know enough about how it works, and would rather debug it in a 
working environment, anyway.

> > I could also use some info on modern techniques for implementing GC, but I
> > need examples in a C or similar level. Alternatively, give me a better idea
> > what you'll need from the heap manager functions.
>    For an example in C, perhaps you could see how the various Scheme
> implementations do it.
> See the Tunes page for the address of the Scheme repository.
> There's a GC FAQ being written (sorry, forgot the address;
> ask a web searcher).

Ok. I've downloaded a copy of MIT-Scheme (5M!) but haven't yet had a 
chance to mess with it. I need to take a closer look at that and see how 
that thing works, if you're going to be using Scheme.

> > Stacks:
> > 
[guard pages and multiple stacks]
>    Guard pages are good.
> Again, they show how allocating small or large stuff is different.
> They can also be used for dynamically-grown stacks,
> for flush-on overflow buffers, etc.
Ok. So we use them for large items, but not for small ones? Problem is, 
with stacks you can't easily tell how large they possibly will get, 
making allocation on a heap risky. Stacks implemented with a 4k 
granularity, is admittedly, probably wasteful, but safer. There are, of 
course, times where a 256 byte stack will be enough, but those aren't 
common enough, or as predictable.

>    As for multiple stacks, I suggest that it be *possible* to do everything
> using shared heaps/stacks as a *transient* space between safe points.
> People who like to keep more information are free to allocate their own
> heaps and stacks.

Right, that just means being able to access the lower level memory 
functions which the heaps and stacks sit on top of. Not a problem.
But Forth or Forth-like code, will still need both a data-stack and an 
r-stack to work properly. How those stacks are allocated, Forth won't 
care about.

> > Also, I think it is probably a good idea to require that stacks be protected
> > from other threads, or assume that other threads will not modify them.
>    Yeah of course, but for the above.
Meaning only on page-granular stacks?

> > Level 5: Modules and Vocabularies
> > 
> > Forth-style vocabularies, or any other sort of modular structure should be
> > implemented over heaps.
> > Some questions about modules:
> > How closely linked do we want the modules to be?
> > [problems of efficiency vs possibility of migration]
>    That's a very difficult question, all the more when migration is wanted.
> As I see things, all would be kept abstract by the HLL,
> and an interpreter would manage composition of objects at first.
> That is, objects, including "code" would be structures on the heap,
> and contain little code, but for some optimized routines.

Are you talking about source code, or something like intermediate code?
And this would be HLL standard behaviour?

> Then, as they are composed, objects get optimized,
> and compiled versions are cached, that are "closely linked".
> But because a "loosely linked" version still exists,
> there is no problem with persistency, as it's just a matter
> of discarding cached versions.

What you're talking about sounds like compiled objects are disposable, while
source/intermediate objects would always be retained. This seems to be 
saying that the lowest migratable object is LLL source. Am I correct?

[what level of resources will be available to a thread]
>    Well, some real-time threads would need to "own" such resources anyway.
> There's no need to develop a superduper interface for it, either.

I just wondered about how abstracted the thing will be... I'm getting the 
picture. Apparently, variable levels of access to low level objects. Good.

> > 
> > I'm thinking in terms of using a thread-specific vector to the parser,
> > allowing easier replacement. Hence, a really easy transition to the HLL.
>    I'm not sure about implementing thread-specific data as a structure
> of fixed type. Each thread should have its environment specifically
> adapted to its activity, and common fields should be only the essential
> ones. Perhaps one field could be a link to a dictionary for self-extension.
> But for instance a variable like BASE sure doesn't have its place among
> common thread-specific variables. Instead, procedures to print integers
> with parameter base would take the parameter from the stack,
> and the HLL (perhaps self-extension to the LLL) could perhaps
> eventually implicitly push from a thread-specific variable.

Right. I don't see BASE as being important. (In fact, I was thinking in 
terms of throwing away the complicated multiple base support, and support 
both hex and decimal through separate words.) But I wasn't exactly 
stating I wanted a fixed structure. :) Just that there should be at least 
one, a convenient way to get at it, and a handful of really 
useful/necessary things in it. Of course, careful consideration must go 
into what actually would be included. A dictionary pointer would be 
needed, just to get the whole thing to work. A path (see below) of some 
kind would be nice. Stack base pointers would be helpful. But no, I 
wouldn't include things like >IN. Obsolete Forth things wouldn't be in 
LLL, anyway.

[vocabularies structured like directories]
>    Yes. Path access is so useful it must be possible
> *and* have a simple syntax. Something both FORTH and Scheme lack.

Good. I'm glad you like it.

> > An alternative would be a design I've been told about, involving nestable
> > word definitions, like this [...]
>    Nested definitions require scoping meta definitions require quoting.
> Scheme does it quite right (well, there could be progress),
> and any metasystem should get inspiration from there.

Well, I've looked at it, and it doesn't look like something I want to do 
in LLL. Very complicated to do there. In HLL, it could be "simply" mapped 
to the appropriate words in the LLL, with no serious need to actual nest
words, since scoping would have been resolved already. Am I right?

> > Another language detail, I've been toying with, is compiling words in a
> > buffer, then copying the buffer into a heap object.
>   We sure should make it modular where code is produced by compiling words.

Well, compiling words is needed somewhere, and the standard Forth HERE 
contraption is totally incompatible with a standard heap. Doing it in a 
buffer allows the copied output to be allocated exactly the right size 
object in the heap.

> > Of course,
> > this buffer would need to be separate for each thread.
>    Not at all, it could be done on heap objects that be freed when we're
> done, or on transient space between safe points.

Right, that's just the location of it. I should have said separate for 
each compilation thread, so that if more than one compile is being done, 
nothing gets eaten. Better?

> > require that word definitions be relocatable/PIC.
>    This is a possibility, but other ones should not be discarded
> that easily: function calls, memory references need relocation
> anyway !

True, relocation is a factor, but in threaded code, memory access can be 
relative to heap or some word pointer, anyway. Doing this would 
drastically reduce headaches in copying from a fixed size buffer (large 
enough) to a n-sized buffer containing the finished, dictionary linked 
word. Doing relocation here would be something I'd like to avoid.
 Making compile output PIC, would do this, and I think that may be possible, 
given the esi=word base pointer convention I had stated. Then, the 
compiler can use this pointer for relative addressing within the word, 
and for references outside, we could do an absolute reference. The 
finished word would be movable, but anything it links to would not. 
That's the problem.

Now one way around this, would be to use a table of pointers to words, 
and require that those be used to allow compiled words to be moved within 
the heap. This could be done at some cost, and it might even help the GC, 
since all pointers to words would be either pointers to pointer, or on 
the stack. Doing this in threaded code would be efficient enough to not 
be a problem, since something similar is done already. In compiled x86 
binary, it would be about as elegant as an elephant playing piano, but 
that's normal for x86.

> > This would allow IF to be entered from the command line,
> > which many Forth's don't allow.
>    Yup. Let's make a higher-level FORTH.

Something like that...

> > One weak point: large ALLOT's would be broken by a small buffer,
> > while a large buffer would generally be wasteful.
>    It is a well-known solution to provide different solutions to
> qualitatively different problems: large objects generally have completely
> different GC behavior than small objects, thus must be treated differently.
> Specialized words will be provided for either, as well as perhaps
> a generic multiplexing instance.

As long as Forth compatibility in LLL isn't a major issue, no problem. 
I'm just saying that ALLOT wouldn't be able to be used quite as freely as 
in Forth, being only usable for small amounts of data. 1000 ALLOT would 
probably bomb, unless the compile buffer is capable of dynamic expansion, 
which seems like unnecessary complication.

> > A side issue to consider: could the HLL be reasonably implemented as a
> > close superset of the LLL, such that rather than having a clear two level
> > system, we could allow programs to be written at arbitrary level.
>    That was my original idea.
> However, for cleaner semantics, and not to have to cope with
> early implementation bugs,
> we are currently bootstrapping the HLL from Scheme.

Fine, if this is still the eventual goal.

> > Why build a dichotomy of between the two languages?
> Because, in a first time, it allows independent development of them.
> When things are settled down and stable, they can hopefully be merged,
> unless good reasons appear against it.

We'll just have to try to prevent those "good reasons" from appearing. :-)

> > 
> > Obviously, we must deal with execution threads. But there will need to be
> > some accounting of resource ownership, and a design decision, as to what
> > level this is handled at. The HLL could do this, but I'd like to program in
> > the LLL, and that would screw things up for me. (Personal gripe.)
>    Until the HLL eventually does it, some LLL code must be provided,
> and can be used. Of course, the HLL will hopefully make it far easier,
> while allowing people who really want to to have the same expressivity
> as LLL allows, by having more or less elaborate hooks and plugs
> into the LLL.

Allow calling back up into HLL from LLL, you mean? Good.

[resource ownership by what sort of objects?]
>    I'd rather make things orthogonal:
> there are objects than have access to other objects,
> among which are unique resources.

I was only suggesting that to describe the problem... I assumed you had 
some other idea.

[using segmentation/protection]
>    I've been wondering about the feasability and expediency
> of writing everything in Ring 0.

My experience has been that it doesn't work nicely. The x86 just doesn't 
seem to be designed for it. Everything works, except exception handlers, 
which break badly, and if there is every a stack fault, it will probably 
shut the machine down. So guard pages, and a lot of other VM tricks would 
have to go away. Better idea would be to keep a minimal ring 0 section, 
with very precisely defined tasks, controlled by the rest of the system 
running in ring 3.
Btw, I've tried to do it that way, it just won't work for a featureful 
system. :( :( :( They _do_ call it protected mode.

> > Mixing segments with non-segment pointers would be even worse. And no matter
> > how dynamic the language, dynamically modifying DT's and code to adjust to
> > which type of pointer a given object requires would be a nightmare. At best
> > it's way too optimistic about stupid behaviour in the x86.
>    We could use segments only for objects without moving allocation inside,
> or objects in general where the overhead for GC is small before the
> advantages of the segment stuff.

Still remains the hassle of generating code to deal with it. Best if 
these sorts of objects aren't common. However, code to manipulate the 
DT's is probably necessary.

> > At the same time, it has its own complications. A flat model could 
> > be done with single or multiple address spaces, and with or without page
> > protection.
>    Not sure what you mean.

I'm talking about the difference between a completely linear model, with 
flat segments and a virtual address model. Both use flat segments, but 
not necessarily flat addressing. I favour writing and debugging code to 
handle it, but leaving the choice of whether to use it up to the HLL.

> > Also, on the x86, address spaces can be done either through segments or
> > through page directories. I like page directories much better, but they can
> > be wasteful of memory.
>    And they have larger grain. Perhaps both ways may be provided for
> different purposes.

Probably a good idea, I just have this vendetta against segments. You'll 
get used to that.

> > All of these are things that some part of the system will need to deal with,
> > even though higher level sections may remain ignorant and work fine
> > regardless.
>    As I said on the other mailing list,
> the low-level part of the OS should provide as expedient means as possible
> to achieve operations of well-known semantics.
> It is then up to the high-level part to coordinate them.
> The low-level part should make as little scheduling decisions,
> contain as little heuristics as possible,
> and not overspecify any part of the OS.

True, but some things obviously shouldn't be contained within the view of 
every object. Large portions of the system should neither know nor care 
about segments vs. pages, just memory that they use. Right? So I would 
think, given that you have also the OTOP subproject, that there will be 
at least a few things that can be diffentiated only in LLL, and ignored 
by the HLL, simply because they're different, but not important. This 
would be things genuinely unimportant to the HLL.
Important differences would need to be dealt with, such as the fact that 
OTOP won't be able to remap pages easily like raw x86 code can. But will 
everything fall into that catagory?

>    I have stated many possible optimizations in this message.
> However, the highest priority is to have something simple working,
> even unoptimized (well, not pessimized either).
> It is always time to optimize later,
> when we can add high-level support for low-level tricks at the same time.
> Optimizing at once only makes it more
> difficult for both implementor and user...

True, but it can save time. Also, I'm lazy, and like to do work only 
once, when possible. And although many issues I've asked about aren't 
short term worries, it'll help me to know the needs of the HLL sitting on 
top of LLL, even though I don't need to know exactly how the HLL will work.
If I just write a Forth, it wouldn't help you much, would it? Knowing 
more about the middle of the system (where HLL and LLL meet) will help me 
to know what I need to do in the LLL.