Memory Management, Inlining, and POINTERS (was: Joy, Dolphin, ...)
Tue, 28 Mar 2000 06:22:03 -0800 (PST)

> >Anyhow, I try to avoid "uncons"; to me, it seems similar to "undo".
> No it's not -- 
>   uncons == dup first swap rest.
> 'first' has the same problem as 'uncons', and it's clearly necessary.

I try to avoid "first" and "rest"; to me, they seem similar to "undo"
(moreover; they have a popping effect, which is not good).

"first" and "rest" are not really needed if lists are kept in unwrapped
form (which is what I would do in many cases). To get the effect of
"rest", one could just use "pop" to pop the top item off;
to get the effect of "rest", one would use a bunch of dipped pops.
Of course, one would not usually do this... if one is popping things
off, then why were they put there in the first place?

> >Ahh... looking at Dolphin, I guess you're writing this in C and
> >maybe some assembly.
> Nope.  We started in Pygmy Forth, and we're moving to gForth; we'll
> eventually metacompile it.  There is some assembly, of course, but we plan
> on keeping that to a minimum.

There must be new unreleased versions of Dolphin floating around
I guess ...

> >Interesting... One question... how are you planning on 
> >implementing lists?
> I'm not going to -- this Forth doesn't have dynamic memory management.  You
> might say that the purpose of this is to prove whether or not Forth can be
> adapted to work like Joy.

This is wacko... no lists? Or maybe I should have said "array"; you
are going to have arrays, right? How are you going to represent
strings of characters in the system? Ahh, I see; you're just going
to keep the characters in unwrapped form, directly on the stack, right?

> >I'm guessing you're going to supply special primitives for lists,
> >and distinguish them from quoted programs (unlike Joy)...
> You might say that.  On the other hand, my quotations are going to be
> treated the same as an array...

Once again, I should have said "array" instead of "list"...
So, like Joy, you're identifying quoted programs with arrays
(except that you're going to keep the whole array together instead
of using a pathetic pointer-chain technique), right?
> >But then,
> >the question is: are you going to keep the lists all in one piece,
> >or are you going to split them up, and link the pieces with pointers?
> >This seems like quite an important thing, as lists I'm guessing
> >will play quite an important role in the system, and their efficiency
> >will largely determine the efficiency of the whole system...
> Nope.  They won't exist.

Aack... Again, I should have said "array"...

> >Another question is: will lists be kept directly on the stack, or
> >will only pointers to them (off in the heap somewhere) be kept?
> Eventually, everything will be kept in memory.

Hrmm... this tells me nothing.
Of course everything will be kept in memory.
Where else would it be kept?

My question, rephrased:

  Will arrays and quoted programs be kept directly on the stack?

It looks like the answer is "no", I'm guessing...

> >The more I think about it, the more I like the idea of just keeping
> >everything directly on the stack; of course, a problem that arises
> >if one does this literally is that it becomes very expensive to
> >shuffle things around on the stack... I'm not sure how serious this
> >might be.
> Also, you can't cache the top of the stack in a register.  

Not so. Of course you can still cache the top few stack items in registers.

> I have really no idea why you like the idea; it doesn't even
> make sense to me.

Here is why I like the idea: it looks like it could be FAST.
For example, when one needs 4KB of scratch space for a sound buffer, 
one could just subtract 4K from ESP, and you have
allocated memory in a very speedy way (a segfault will be generated
and caught if the stack has overflowed). Actually, one would rarely
(if ever) just subtract ESP like that; rather, one would just use
the program that fills up the buffer, and it would
decrement the ESP as it goes along. Anyhow, the point
is that one can avoid the overhead of memory allocation (for such
a large buffer, this overhead might not be significant; but if
one is working with many small lists, then it could be significant).

Another reason why I like it is that it avoids the need for pointers
(well, probably not entirely)... pointers are bad; I don't like them
at all. Whenever you use a pointer, you can't move the thing without
updating all the pointers pointing to it. That's the problem with
traditional memory management systems (well, my C system, at least):
you always end up with holes all over the place because things can't
move around. 

In my system, I would be much less hesitant to move something;
Moving things prevents useless holes from exhausting all your memory.
Keeping things in order can also make caching work better; 
I hate hearing my hard drive seek all over the place because of
a lame memory management system that was afraid to move things.

Let's face it; pointers are lame. They are slow and must be avoided
at all costs. Well, not quite at all costs... They will probably need
to be used in a system. But, just a little bit. Not a pointer
for every single list in the system. That's too much. Waste.
Think of all the space one would waste. That's four bytes for
every list. Think of all the time wasted traversing the pointers.
Especially if you have enough data that you're keeping it on
hard disks; time wasted seeking way off to never-never-land because
the memory management was too lame to keep the data in the right
place and decided to use a pointer instead.

Of course, I haven't implemented a system that avoids use of pointers
as I describe, so I can't seriously claim that it would work.

> >Also, it is interesting that you think of using "CALL"... I'm inclined
> >to always inline everything, but I guess this might not be practical
> >in some cases...
> It's impossible (recursive programs) 

Surprise. It is not impossible. It is not uncomputable. Non-trivial yes,
but not even really too hard. At worst it would require a
lambda abstraction. But actually, the recursion will probably
already be explicitly written by the programmer in a non-recursive
way, so the system will not need to do anything at all.
But for curiousity, rather than leave you hanging, I'll give
you a specific algorithm whereby you could easily eliminate
explicit recursion; if you have a "foo" defined recursively as

  foo == Z

(where "Z" may contain "foo"), then you can rewrite the definition as

  foo == [[dup i] cons (foo\ Z)] dup i

Where "(foo\ Z)" is the abstraction of "foo" out of the expression "Z";
this could be done perhaps using this cheap algorithm:

  foo\ foo   =>   i
  foo\ C     =>   pop C

  foo\ F C   =>   (foo\ F) C
  foo\ C F   =>   [C] dip (foo\ F)
  foo\ F G   =>   [foo\ F] sip (foo\ G)
  foo\ [F]   =>   [foo\ F] cons

I won't give you a real proof that this works, but I can give you
example; the recursive definition of "spin":

  spin == "hello" put spin

The first step would be to abstract out "spin" from the right-hand
side; if you use my cheap algorithm, this will give

  ["hello" put] dip i

but this is not the final answer of course; we now must plug this
into the formula, which gives:

  [[dup i] cons ["hello" put] dip i] dup i

It is not the best way, in this case, but it works. Once again,
in a real system, one would not need to do this, because the
programmer will give the program explicitly in a non-recursive way
to begin with.

Now, you ask, what is the point of doing this? The point of doing
this is that it avoid using "CALL". "CALL" should be avoided because
it takes a POINTER as an immediate!

> and wasteful (cache locality inside loops).

This may be a valid point. If you use the same program over and over
again so many times at different parts of a loop that a significant
amount of space is wasted inlining the program, then a loss may
be suffered because the whole loop cannot fit in the machine's
code cache (thus requiring it to be fetched again and again each
time through the loop). In this situation, it may be a good time
to resort to a pointer and "CALL".

A similar situation could happen if one particular thing is duplicated a
whole lot on the data stack, causing cache problems because of the
space taken by physical copying; a similar solution again would be to use
copying-by-reference to save space. However, a system should use
this kind of solution (a solution involving pointers) sparingly.

It should be noted again that using "CALL" instead of inlining is a
form of copying-by-reference, and will need to supplemented with
reference counting or some other form of garbage-collection to
reclaim the code once everybody is through with it, unless
maybe you're dealing with static kernel code, which you know
will be needed forever.

Usually, copying-by-reference causes slowness, for these reasons:

  1) the slowness of indirection
  2) the slowness of reference-counting
  3) the slowness of seeking all the way to the other end of the
     hard drive (or to another computer, in a distributed system)
     to get something because a local copy was not made,
     but copying-by-reference was used instead.

But in some situations I can see how copying-by-reference would still
be appropriate, in situations where one is dealing with many copies
of very large things; but these are fairly rare situations, I suspect.

> >If there are some large programs that are used so many times that
> >for storage (and caching) purposes one does not want to inline them,
> >then this seems like a special case of a more general problem
> >of things being copied so much that one wants to use 
> >copying-by-reference
> >rather than physical copying... if possible, it should be handled
> >by the same mechanism...
> Why?

Surely you like the idea of handling similar problems with one
general solution, rather than inventing similar but artificially
separate solutions for each problem.

> >Now, the point I wanted to get at is that it would be silly if the
> >system implemented this by physically making a copy of the pixels;
> >this is because "refresh" probably does not really need to write over
> >the pixels; it probably just reads them, incrementing the stack pointer
> >as it goes along. The pixels could probably be restored at the end
> >just by backing up the stack pointer. However, some care might need
> >to be taken to make this work; "refresh" might naturally use some of
> >the stack as a scratch space, effectively destroying previous pixels...
> >To make the technique work, one would have to make "refresh" use
> >somewhere else as scratch space (probably it would just use 
> >the original
> >top of the stack, behind the pixel data)...
> Or you could just depend on the fact that the list/array of pixels is in
> memory, so a 'dup' only duplicates an address.

No! That would involve a pointer! Slow slow. 
There's no need to bring pointers into this.

> At some point it seems that we want to do what the programmer asks us to do,
> instead of something radically different.

I don't see it as a problem to optimize away a "dup" even if the
programmer asked for the "dup"; it certainly doesn't seem any better
to use a pointer; the programmer surely didn't order that.

> >One other thing... it seems like you are thinking of keeping quotations
> >in uncompiled form at runtime (with a compiled form attatched to it,
> >perhaps)... It seems possible to me that one could always keep 
> >quotations
> >in compiled form, if one is willing to give up "uncons" and the other
> >introspecting primitives...
> This is true, but what would that gain me?

It could gain you speed in many situations, to keep programs in
compiled form, compiling them incrementally as you "cons",
rather than compiling them in a slower way at the last second right
before you execute. Note that I'm not arguing against keeping
uncompiled forms; I'm only suggesting that you might want
to always keep compiled forms also.

> >The essential primitive then for forming
> >new programs at runtime is "cons", which takes a base program "p", and
> >another thing off the stack "x", and yields a modified program that is
> >like the base program except that when run, the "x" is pushed before
> >"p" is run... It seems conceivable that "cons" could be implemented
> >dealing only with compiled programs, although if one was going to
> >use such a constructed program many many times, then it would pay
> >off to keep an uncompiled version available so one could optimize it;
> >however, if one was going to use it, say, one time, then it would
> >pay off not to waste time optimizing it.
> That's the whole purpose of keeping quotations in "array of pointers to
> function structs" form.  An array of function structs is less efficient to
> execute than a stream of native code (although not disasterously less so,
> since the appropriate interpreter is simply a loop containing an indirect
> call), but it contains (or points to) information which can be used to count
> executions, profile execution times, and so on.

Whoa... a whole array of POINTERS? indirect calls? slow slow slow.
seriously, keeping the code in uncompiled form and interpreting
from that, as you say, may not be a disasterously bad choice.

> So the first time you 'unquote' a program (that is, run it) you can just
> 'interpret' the list; the second time, you can compile it and run the
> result, and the third and fourth time you just run the compiled result.  Any
> more runs trigger a bigger, fancier optimiser...  And so on.

Sounds like a good idea... 

> >- "iepos" (Brent Kerby)
> -Billy

I finally see that Pointers are the foul things I've
been trying to avoid in the back of my mind all along.

But, some pointers will need to be used in the system.
If nothing else, there's ESP and EIP. But, other pointers
will be needed too, if we want to be practical. But,
it must be controlled. No more pointers than we need.
All pointers that are used must be carefully kept track of,
so that things can still move around. 

- "iepos" (Brent Kerby)