Memory Management, Inlining, and POINTER

btanksley@hifn.com btanksley@hifn.com
Thu, 6 Apr 2000 14:25:11 -0700


From: iepos@tunes.org [mailto:iepos@tunes.org]

>But, the technique you're speaking of now, iterating over a whole list,
>one item at a time, seems to involve "uncons". My above objection about
>leaving large lists on the stack does not apply to using "uncons".
>In fact, I would definitely rather use "uncons" to iterate over a
>list than use a pair of "first" and "rest".

But that's all uncons _is_, a pair of 'first' and 'rest'.  uncons == dup
first swap rest.  You want to use the form which best expresses what you're
doing, not the form which expresses some ideology.

>> >Hmm... I don't quite understand. I've never considered avoiding
>> >"cons"; it's one of the more fundamental primitives.

>> Okay, let me explain.  If POP is equivalent to REST, then 
>> the following equality is true:

>> 2 2 pop == 2 2 rest == 2 2 cons rest

>> This means cons is the identity.

>Hmm... I don't follow your logic.
>How did you get this? :

>  2 2 rest == 2 2 cons rest

>I'm quite certain that this is not valid.

You told me that: pop == rest.  Given that assumption, what I wrote is
correct.  Let me rewrite it:

pop == rest;
[2 2] == 2 2;
2 2 pop == [2 2] rest == 2 [2] cons rest.

>There may be problems with my technique of keeping things unwrapped,
>but I don't see this one.

I didn't claim it was a problem; I stated it as a consequence.  You're not
avoiding 'rest' and 'first', as you claim; you're simply using a different
implementation of them.  What you're actually avoiding is cons.

>> It's especially notable that 'rest' is NOT the identity; 
>> you're not avoiding
>> it, but rather changing its name to 'pop'.  This is a 
>> terrible exposure of implementation details.

>Hmm... I suspect that I would rarely use "pop" or "rest"... 

Forth never uses 'rest', since it doesn't use lists.  It uses @ (fetch) and
DROP (pop) almost constantly, though.

>Also, I should clarify one thing... "pop" would not exactly play
>the role of "rest" in all cases. If one was just using a bunch of
>unwrapped things on the stack, then yes, "pop" would play the role of
>"rest"; it would pop the first item off the "list". 

>However, if one was using a bunch of unwrapped things on the stack, 
>followed by a length integer (telling how many there are), 
>then "pop" would
>not cut it, as it would only remove the length. Rather, one
>would probably use "[pop] dip dec" (where "dec" decrements an integer).

That word is called "nip 1-" in Forth, and "ab--b dec" in my little dialect
:-).  But the specifics of how you implement 'first' isn't what concerns me;
the important part is that using unwrapped stack items isn't avoiding first.

>In practice, I would probably sometimes use a length and sometimes not.

Or, in other words, sometimes you'd use lists and sometime you'd just use
the stack.  This is very wise; lists can save code, but using them when
they're not appropriate will add code, not save it.

>> >> In the Forth dictionary, with a single HERE SWAP ALLOT 
>> >> command sequence,
>> >> you've taken an integer stating the size of the buffer you 
>> >> need and returned
>> >> a pointer to the buffer.  The buffer isn't going anywhere; 
>> >> it's not memory managed in any reasonable sense.

>> >Okay, I think I see how that works... It looks basically the same
>> >as how one would do it in C... 

>> No, not at all.  I thought I had explained that...  What 
>> happens is that
>> there's a single dictionary pointer, called HERE.  All ALLOT 
>> does is add the amount of space you want to HERE.

>> That's it.  Anything below HERE is not available for scratch 
>> space; anything
>> above is dangerous to play with unless you're some kind of 
>> memory manager.

>Ahh... I think I see now. So a FORTH program has access to one huge
>clump of memory, "HERE", which can be grown (or shrunk?) as needed...

Yes.  HERE isn't the name of the memory, though; it's a variable (often a
register) which points to the _end_ of the memory.  The pool can't be shrunk
arbitrarily, since as I mentioned it's not managed; you should only shrink
it if you're sure that you're the last person who grew it.  This means that
it's most commonly used at compile-time; and something that's done at
compile-time is one less thing to do at runtime.

>Now, this might not be considered bad, but this approach seems to leave
>a lot of the nitty-gritty memory management details to the programmer.

Yes, it does.  This can be a pain, which is why I would certainly want to
provide full managed memory as well (later), as well as full access to
unmanaged memory (now).

As far as leaving stuff to the programmer goes, let me sketch for you a
little "food chain":

1. Concept time
2. Design time
3. Coding time
4. Compile time
5. Run time

I suggest to you that if you have a problem, the best place to solve it is
as high on this list as possible.

1. Solving a problem at concept time means that you don't have to put as
much effort into designing it.  The ideal solution here is "that's not a
problem;" this results in no work at all, and therefore no bugs.  Of course,
let me emphasise that this isn't always appropriate, nor is it always so
simple; usually, solving a problem at this level means you restate the
problem in a different, simpler way.

2. Solving a problem at design time means that your coding will be much
simpler.  Some of the most common examples here are using different
algorithms, laying out the functions into which your code will be broken so
that each function can be used as many times as possible, and structuring
your design so that magic numbers aren't needed.

3. Solving a problem at coding time means less time spent running your
program, as well as fewer bugs which can only be found by running your code.
For example, when you know the result of a computation, just type it in
directly; don't code the computation.

4. Solving a problem at compiletime means that you'll be able to directly
and consistently test the results; all the execution happens on your own
machine.  For example, macro expansion is better than runtime evaluation,
and static typechecking is better than runtime typechecking.

5. Finally, the whole purpose of computer software is to solve problems at
runtime.  I won't try to kid you -- you still have to do that.  However, the
less you have to do, the fewer bugs you'll have to find.

>For instance, suppose a programmer really needs two separate buffers
>that may need grow or shrink over time in unpredictable ways;
>he could do what he needs to by using "HERE", by keeping one buffer
>in the first half of "HERE" and the other buffer in the second half,
>but then he'd probably need to leave some padding in between the two
>buffers, so that if the first one grows the other will not need
>to be repeatly moved out of the way. But still, if the first
>buffer grows too much, then the second one will have to be bumped
>out of the way (or else the first one will need to be relocated
>after the second), which would mean the programmer would have to
>update his pointers. If a programmer is dealing with too many
>of these kind of buffers, it could become very difficult to deal with
>(one would probably be compelled to set up a memory-management
>layer separating the program from "HERE").

This is a good example of a problem for which HERE would be inappropriate.
In modern ANSI Forth, you'd probably use the ALLOCATE wordset, which is like
C's malloc, for this purpose; in traditional Forth, and modern Machine
Forth, you'd use the BLOCK buffers, which are static structures used to
buffer the disk (and therefore implement virtual memory).

But the best solution is to restate the problem so that those buffers are
not needed.

As you say, pointers are bad.  I'll agree that they're worse, far worse,
than not using any code at all -- that's the best solution.

>> >I'm going to go out on a limb and use an analogy of questionable
>> >relevance: the human body, in which every cell has a copy of a huge
>> >data structure, the DNA. As well as this setup works, perhaps there
>> >actually is something good about really making copies of 
>> >things instead of just copying references.

>> The cells are very wide-area distributed processors; 
>> communications between
>> them is slow and risky.  Internal communications within the cell is a
>> different story, though; there, RNA is used (short for 
>> Reference Nucleic
>> Acid ;-).  Seriously, RNA is also a copy of DNA, so your 
>> analogy is okay.

>> I like your analogy a lot, but I don't think it's really 
>> appropriate -- you
>> could also consider DNA to be a reference. The original 
>> structure being
>> referenced is the proteins and enzymes encoded by the DNA. 

>This statement doesn't make sense to me; "references" to proteins don't
>really seem like references at all, in the sense that we've 
>been talking
>about them. They are not like the 4-byte (or 8-byte) pointers of our
>processors, but seem more like 20K image or sound files.

>Anyhow, I'm hardly a biology specialist... As far as I know, portions
>of DNA might be copied-by-reference internally within a cell.

They're not; there's no way to reference an object without a shared
reference space.

>But, whether or not DNA contains references or is referenced,
>my point is that it _is_ copied-by-value millions of times when
>cell division happens, suggesting that copying-by-value may be
>of use, even with very large structures.

>But, it doesn't really take an analogy to see this. If one considers
>a real distributed computing system where inter-communication is
>relatively expensive but storage is relatively cheap, then it
>is clear that it is best for individual units to make their own copies.

EXACTLY.  And it's all the more clear that it's unwise to do that when
inter-communication is staggeringly cheap, as it is on a single
task-switching processor.

>> Right.  Linear Logic is a very strict approach, probably the 
>> strictest
>> possible one; reference counting is more flexible but adds more
>> restrictions, and full GC is most flexible but adds time penalties.

>I like reference counting. Linear logic carried to an extreme
>(meaning that no copying-by-reference is allowed) does not seem
>practical, at least not with current machines. 

Probably.

>> It's almost certain that the huge move involved in the initial copy
>> would take enough time to dwarf any gain brought about by 
>> having a single additional register to play with.

>Whoa... where did this copying come from? The 'dup' was optimized away,
>remember?

No, I don't remember.  Quite simply, I like doing what the programmer asked;
I think the programmer's smarter than any code I can write.  The programmer
should never have to rely on huge optimizations like that -- especially ones
like that, which require some smarts from the compiler.

>> >  2) Use a data and instruction stack, and keep everything on them,
>> >     and in order, and without copying-by-reference, but keep the
>> >     stacks padded in places where insertions are likely to 
>> >     take place
>> >     or where deletions have taken place. Padding may be removed,
>> >     by moving things around, after a region has cooled off.

>> >     Doing it this way, with padding, makes addressing things 
>> >     on the stack a
>> >     little bit tricky; one would probably want to have an 
>> >     addressing
>> >     table somewhere (probably in the form of a binary tree)
>> >     that tells where things are; there would probably need to be
>> >     a pointer for each padded region. Entries in such a table could
>> >     be cached (in registers) in many situations...

>> >     This approach I think may be quite decent for many purposes.
>> >     But, one prominent shortcoming is that it would go very slow if
>> >     one tried to move around the same region over and over again.

>> I can't see it ever being practical; it involves all of the 
>> shortcomings of
>> a huge nest of pointers and no benefits which I can see.  
>> Same problem with
>> #3.

>Hmm... Well, #2 allows insertion and deletion on the stack to be
>achieved with decent speed (while this was not the case with #1);
>thus one could keep multiple growable buffers on the stack with #2
>in a straightforward way.

You're assuming that the bookkeeping takes zero time and space.  Bad
assumption, I believe.

>As for the problem of the "huge nest of pointers", hopefully the
>nest would not usually grow too large. Padding would only be
>needed at areas where insertions was likely to occur; the system
>could have the policy of combining adjacent regions (removing the
>padding between them) if activity was unlikely between them, thus
>simplifying the "nest".

Lots of smarts in the software there.  Lots of bugs to work out, and lots
more smarts always needed.

>But, if you can think of a better approach, I'm all ears...

Sure.  Don't do that.  Provide a number of simpler mechanisms your
programmer can use to achieve a similar effect -- for example, use a MEMGET
and MEMDROP word pair which allocate and free memory.

>> >  4) Use a similar approach to #3, except that when copying a large
>> >     region, do not physically copy it, but instead only modify
>> >     the addressing table so that it appears to have been copied.

>> Doesn't work -- copying implies seperate regions.  What if 
>> someone modifies
>> their region?

>Ooops. Duh. That's true.

>Okay... But, there are probably some ways one could adapt the technique
>to make it work. For instance, the system could have the policy of
>putting regions that had been copied by reference in a read-only page,
>such that if it was written to, a fault would occur and trigger an
>actual copy of the region to be modified. If I understand correctly,
>this sort of technique is used to implement fork() in many 
>UNIX systems.

Yes.  That's actually rather obvious; it's an optimization to _any_ copying
solution.

>> >Anyhow, there are surely many variations one could apply to these.
>> >Also, I'm currently looking for a really good way to implement
>> >an addressing table of the kind that these approaches would require.

>> Try linear logic -- just have exactly one pointer for each 
>> item, and a
>> single back-pointer in the item to indicate its owner. 

>This sounds like too much overhead to me if every list is considered
>an "item". However, if the system can clump lots of data as one item,
>then it seems reasonable.

By "list", I'm sure you mean "node".  Yes, I agree.

>There is one interesting thought I had recently of how the
>memory management could be implemented; here is the idea:
>
>  Whenever the system encounters a combinatory instruction such as
>  "dup", "pop", "swap", do not physically perform the operation;
>  instead, just keep a history of such operations that were imagined
>  to have been performed. When an instruction for side-effect purposes
>  needs to physically access some part of the stack, consult 
>  the history to see where it might be found.

>This is a rather raw idea... It would probably need to be adapted if
>it were to be efficient; for instance, a lot of the work could be
>done at compile-time to see where things will be found.

In general, doing this at runtime is trying to be too smart; however, in
specific, it's VERY easy to build an optimiser to do this statically.  And
since nobody every uses dynamic combinators, it's easy.

>What do you think about it?

It's the core of the optimiser that my friend is building.  Once you have
this information, register allocation and instruction scheduling is
relatively simple.

>- "iepos" (Brent Kerby)

-Billy