Memory Management, Inlining, and POINTERS (was: Joy, Dolphin, ...)
Thu, 30 Mar 2000 15:24:53 -0800

From: []

>> >Anyhow, I try to avoid "uncons"; to me, it seems similar to "undo".

>> 'first' has the same problem as 'uncons', and it's clearly necessary.

>I try to avoid "first" and "rest"; to me, they seem similar to "undo"
>(moreover; they have a popping effect, which is not good).

>"first" and "rest" are not really needed if lists are kept in unwrapped
>form (which is what I would do in many cases). To get the effect of
>"rest", one could just use "pop" to pop the top item off;
>to get the effect of "rest", one would use a bunch of dipped pops.
>Of course, one would not usually do this... if one is popping things
>off, then why were they put there in the first place?

Um...  Who says they were put there?  You have a reference to a list; you
want to get the first thing in the list.  Now, that list still may exist
(someone else may have it); but now you have the first thing as in
independantly considerable item.

As for using 'pop': that only works if lists are not 'items' which can be
manipulated by themselves, OR are item which require special manipulation.
I wouldn't really enjoy that, but it's up to you.  I do, however, have to
point out that if this is where you go, then you're not actually avoiding
'first' and 'rest'; on the contrary, you're avoiding 'cons'.

>> >Ahh... looking at Dolphin, I guess you're writing this in C and
>> >maybe some assembly.

>> Nope.  We started in Pygmy Forth, and we're moving to gForth; we'll
>> eventually metacompile it.  There is some assembly, of 
>> course, but we plan on keeping that to a minimum.

>There must be new unreleased versions of Dolphin floating around
>I guess ...

There's one version, 0.5, which is nearly complete but which few have seen;
0.6 is currently being coded (by a different coder), and 0.7 is in
development, and is planned to incorporate most, if not all, of the features
involved in 0.7, but take advantage of Forth-based construction for:

 - faster task switching (there's less state in the virtual processor)
 - dynamic optimization

...and perhaps other things.  Who knows.

>> >Interesting... One question... how are you planning on 
>> >implementing lists?

>> I'm not going to -- this Forth doesn't have dynamic memory 
>> management.  You
>> might say that the purpose of this is to prove whether or 
>> not Forth can be
>> adapted to work like Joy.

>This is wacko... no lists? Or maybe I should have said "array"; you
>are going to have arrays, right? How are you going to represent
>strings of characters in the system? Ahh, I see; you're just going
>to keep the characters in unwrapped form, directly on the stack, right?

Grin.  Nope, no lists, and in a general sense no arrays either.  Actually, I
am supporting _buffers_; if you can grab a chunk of memory you can use it to
hold anything except the system stack (you aren't allowed to play with
that).  You could use it to hold an array, and store xts in it; a quotation
will be an ordered collection of xts.  (xt == Execution Token.)

>> >I'm guessing you're going to supply special primitives for lists,
>> >and distinguish them from quoted programs (unlike Joy)...

>> You might say that.  On the other hand, my quotations are going to be
>> treated the same as an array...

>Once again, I should have said "array" instead of "list"...
>So, like Joy, you're identifying quoted programs with arrays
>(except that you're going to keep the whole array together instead
>of using a pathetic pointer-chain technique), right?

Generally correct.  As I said, I'll also provide no array-manipulation
primitives; you're free to access these 'arrays' any way you wish, they're
just buffers in memory.  The only primitive I have which actually cares
about that is 'i', and it's just about certain that 'i' will not be
primitive.  It's trivial to write either way (array or list).

>> >But then,
>> >the question is: are you going to keep the lists all in one piece,
>> >or are you going to split them up, and link the pieces with 
>> >pointers?
>> >This seems like quite an important thing, as lists I'm guessing
>> >will play quite an important role in the system, and their 
>> >efficiency
>> >will largely determine the efficiency of the whole system...

>> Nope.  They won't exist.

>Aack... Again, I should have said "array"...

Even there, I don't expect them to matter quite that much.  Not at first,
that is.

My current plan is to have three layers of code:

1. low-level Forth: written in native x86, implements a simple virtual
2. quotation system: written in low-level Forth.
3. high-level Forth: "outer" interpreter (text), block support (512 byte per
block, direct hard drive access), optimiser subsystem.

If all goes well, other languages may be implementable on top of those --
for example, a Joy-style system.

>> >Another question is: will lists be kept directly on the stack, or
>> >will only pointers to them (off in the heap somewhere) be kept?

>> Eventually, everything will be kept in memory.

>Hrmm... this tells me nothing.
>Of course everything will be kept in memory.
>Where else would it be kept?

On the stack.  You're assuming that the stack is in memory, a false
assumption even on the x86.  I could keep six whole stack items on the x86,
which is a virtual infinity of stack space for machine Forth applications.
Or I could kepe the stack in a seperate segment, completely inaccesable to
the data operations of the VM.

Or I could be using a stack coprocessor.

Actually, though, I'm sure I'll be keeping the stack in memory.

>My question, rephrased:

>  Will arrays and quoted programs be kept directly on the stack?
>It looks like the answer is "no", I'm guessing...

Not unless the user wishes to do so.  I have no objection to them keeping
stuff whereever they want; I'll have no primitive which cares.  The L1 cache
will probably throw a bolt if you keep a quotation on the stack, though --
your 'i' primitive would have to be TERRIBLY sophisticated to handle all of
the swapping it would require.

>> >The more I think about it, the more I like the idea of just keeping
>> >everything directly on the stack; of course, a problem that arises
>> >if one does this literally is that it becomes very expensive to
>> >shuffle things around on the stack... I'm not sure how serious this
>> >might be.

>> Also, you can't cache the top of the stack in a register.  

>Not so. Of course you can still cache the top few stack items 
>in registers.

Not true if the top stack item is a list.

>> I have really no idea why you like the idea; it doesn't even
>> make sense to me.

>Here is why I like the idea: it looks like it could be FAST.

You have proven yourself, in my opinion, to be extremely intelligent, a very
capable debater and mathematically skilled beyond my training.  Thus,
statements like this which appear to me to be contrary to obvious fact,
rather than eliciting my traditional, "you've gotta be full of it," will
instead recieve careful study, possibly followed by the response, "Could you
clarify that?"

Could you clarify that?

>For example, when one needs 4KB of scratch space for a sound buffer, 
>one could just subtract 4K from ESP, and you have
>allocated memory in a very speedy way (a segfault will be generated
>and caught if the stack has overflowed). Actually, one would rarely
>(if ever) just subtract ESP like that; rather, one would just use
>the program that fills up the buffer, and it would
>decrement the ESP as it goes along. Anyhow, the point
>is that one can avoid the overhead of memory allocation (for such
>a large buffer, this overhead might not be significant; but if
>one is working with many small lists, then it could be significant).

Of course.  But one doesn't even have to use the stack for such a thing;
that's what the Forth dictionary is for.  If you use the stack, you've going
to spend your time trying to defeat the stack optimiser, as it attempts to
register-cache the top stack items (which are supposed to be the most
commonly used), but you wind up using effectively random items.

In the Forth dictionary, with a single HERE SWAP ALLOT command sequence,
you've taken an integer stating the size of the buffer you need and returned
a pointer to the buffer.  The buffer isn't going anywhere; it's not memory
managed in any reasonable sense.

>Another reason why I like it is that it avoids the need for pointers
>(well, probably not entirely)... pointers are bad; I don't like them
>at all. Whenever you use a pointer, you can't move the thing without
>updating all the pointers pointing to it. That's the problem with
>traditional memory management systems (well, my C system, at least):
>you always end up with holes all over the place because things can't
>move around. 

This is, unfortunately, illusory; everything has to work with a pointer
eventually, and that pointer has to be updated when you move the things it's
talking about.  If you're restricting yourself to "linear logic" you don't
need pointers (in fact, that's the only way to do without pointers in the
manner you describe); however, if you think about it, you'll realize that if
you're using linear logic with pointers, you also don't need to worry about
moving data around, because you KNOW there's only one pointer, and you can
simply find it and change it.

In your system, you have no real way of knowing that there really wasn't a
pointer, because the nature of computation is such that processors really
_do_ need at least one pointer to data in order to work with it.  Thus, your
data moving code has to include a special case which will refuse to move the
data if there's a live procedure working on it.  That's far too much

BTW, did you know that in Henry Baker's definition of Linear Logic, he
requires "exactly one" pointer per data item?

>Let's face it; pointers are lame. They are slow and must be avoided
>at all costs.

You say this, but your evidence is FAR from decisive.

>Well, not quite at all costs... They will probably need
>to be used in a system. But, just a little bit. Not a pointer
>for every single list in the system. That's too much. Waste.

List node, you mean.

>Think of all the space one would waste. That's four bytes for
>every list. Think of all the time wasted traversing the pointers.
>Especially if you have enough data that you're keeping it on
>hard disks; time wasted seeking way off to never-never-land because
>the memory management was too lame to keep the data in the right
>place and decided to use a pointer instead.

Of course.

>> >Also, it is interesting that you think of using "CALL"... 
>> >I'm inclined
>> >to always inline everything, but I guess this might not be practical
>> >in some cases...

>> It's impossible (recursive programs) 

>Surprise. It is not impossible. It is not uncomputable. 

>  foo == [[dup i] cons (foo\ Z)] dup i

Before we continue, I have to ask whether you realize what the definition of
'i' is.  CALL, right?  Right.  You haven't eliminated recursive CALLs.

>Now, you ask, what is the point of doing this? The point of doing
>this is that it avoid using "CALL". "CALL" should be avoided because
>it takes a POINTER as an immediate!

You keep saying that kind of thing.  It doesn't make sense to me.  I'm going
to delete more of your text below, because it sounds like you're just saying
the same thing over and over again, and I don't understand it.

>Usually, copying-by-reference causes slowness, for these reasons:

>  1) the slowness of indirection
>  2) the slowness of reference-counting
>  3) the slowness of seeking all the way to the other end of the
>     hard drive (or to another computer, in a distributed system)
>     to get something because a local copy was not made,
>     but copying-by-reference was used instead.

>But in some situations I can see how copying-by-reference would still
>be appropriate, in situations where one is dealing with many copies
>of very large things; but these are fairly rare situations, I suspect.

Then try to avoid using references when you actually need copies.  It seems
pretty simple to me -- use references when you can, copy when you must.  The
entire purpose of linear logic is to provide a framework for deciding
exactly that.  The goal is to _avoid_ as many copies as possible.

>> >If there are some large programs that are used so many times that
>> >for storage (and caching) purposes one does not want to inline them,
>> >then this seems like a special case of a more general problem
>> >of things being copied so much that one wants to use 
>> >copying-by-reference
>> >rather than physical copying... if possible, it should be handled
>> >by the same mechanism...

>> Why?

>Surely you like the idea of handling similar problems with one
>general solution, rather than inventing similar but artificially
>separate solutions for each problem.

Yes.  This is why I like references; they work for everything.

>> >Now, the point I wanted to get at is that it would be silly if the
>> >system implemented this by physically making a copy of the pixels;
>> >this is because "refresh" probably does not really need to 
>> >write over
>> >the pixels; it probably just reads them, incrementing the 
>> >stack pointer
>> >as it goes along. The pixels could probably be restored at the end
>> >just by backing up the stack pointer. However, some care might need
>> >to be taken to make this work; "refresh" might naturally use some of
>> >the stack as a scratch space, effectively destroying 
>> >previous pixels...
>> >To make the technique work, one would have to make "refresh" use
>> >somewhere else as scratch space (probably it would just use 
>> >the original
>> >top of the stack, behind the pixel data)...

>> Or you could just depend on the fact that the list/array of 
>> pixels is in
>> memory, so a 'dup' only duplicates an address.

>No! That would involve a pointer! Slow slow. 
>There's no need to bring pointers into this.

But surely you see that in this case, the pointer-based solution is NOT slow
at all, but on the contrary, is FAST?

>> At some point it seems that we want to do what the 
>> programmer asks us to do,
>> instead of something radically different.

>I don't see it as a problem to optimize away a "dup" even if the
>programmer asked for the "dup"; it certainly doesn't seem any better
>to use a pointer; the programmer surely didn't order that.

Not in any logic system I've ever heard of.  It's considered essential to
reasoning to be able to discuss an item without having to have a true copy
of it.

>> >One other thing... it seems like you are thinking of 
>> >keeping quotations
>> >in uncompiled form at runtime (with a compiled form attatched to it,
>> >perhaps)... It seems possible to me that one could always keep 
>> >quotations
>> >in compiled form, if one is willing to give up "uncons" and 
>> >the other
>> >introspecting primitives...

>> This is true, but what would that gain me?

>It could gain you speed in many situations, to keep programs in
>compiled form, compiling them incrementally as you "cons",
>rather than compiling them in a slower way at the last second right
>before you execute. Note that I'm not arguing against keeping
>uncompiled forms; I'm only suggesting that you might want
>to always keep compiled forms also.

This is not a low-level concern, though; it's strictly a matter of caching.
Yes, I might do an initial compilation as part of the loading process; or
then again, I might not.  It's entirely possible that the code will never be
needed, or that it'll be only needed once.

>> >The essential primitive then for forming
>> >new programs at runtime is "cons", which takes a base 
>> >program "p", and
>> >another thing off the stack "x", and yields a modified 
>> >program that is
>> >like the base program except that when run, the "x" is pushed before
>> >"p" is run... It seems conceivable that "cons" could be implemented
>> >dealing only with compiled programs, although if one was going to
>> >use such a constructed program many many times, then it would pay
>> >off to keep an uncompiled version available so one could 
>> >optimize it;
>> >however, if one was going to use it, say, one time, then it would
>> >pay off not to waste time optimizing it.

>> That's the whole purpose of keeping quotations in "array of 
>> pointers to
>> function structs" form.  An array of function structs is 
>> less efficient to
>> execute than a stream of native code (although not 
>> disasterously less so,
>> since the appropriate interpreter is simply a loop 
>> containing an indirect
>> call), but it contains (or points to) information which can 
>> be used to count
>> executions, profile execution times, and so on.

>Whoa... a whole array of POINTERS? indirect calls? slow slow slow.
>seriously, keeping the code in uncompiled form and interpreting
>from that, as you say, may not be a disasterously bad choice.

Utter and obvious prejudice.  You clearly have NO clue what you're talking
about; there have been many benchmarks studying this stuff.

Plus, and this is far more serious, you're completely wasting your time by
optimizing prematurely.  My code is flexible; I haven't rejected or accepted
pointers.  Your code will be bigger than mine, AND less flexible.  It _may_
be faster, since you're claiming that it is, although I doubt it.  It'll
require so much more space, though, than all advantage of your speed will be
utterly lost.

>> So the first time you 'unquote' a program (that is, run it) 
>> you can just
>> 'interpret' the list; the second time, you can compile it and run the
>> result, and the third and fourth time you just run the 
>> compiled result.  Any
>> more runs trigger a bigger, fancier optimiser...  And so on.

>Sounds like a good idea... 

It's the classical dynamic optimization scheme.

>I finally see that Pointers are the foul things I've
>been trying to avoid in the back of my mind all along.


>But, some pointers will need to be used in the system.
>If nothing else, there's ESP and EIP. But, other pointers
>will be needed too, if we want to be practical. But,
>it must be controlled. No more pointers than we need.

Of course.  And no more copies than we need; references are cheaper by far
than copies, so use references whenever possible.

>All pointers that are used must be carefully kept track of,
>so that things can still move around. 

This is true.  Each item should have a single 'owner' slot, which refers
back to its (single) owner; nobody else may EVER have a pointer to it.  When
the item is moved, the owner is updated.  (Note that the owner slot is
actually a pointer, so this isn't strictly true at a low level -- but it's
true anyhow, because the owner slot is different.)

>- "iepos" (Brent Kerby)