Memory Management, Inlining, and POINTER
Sat, 1 Apr 2000 11:02:56 -0800 (PST)

> >> >If one is popping things
> >> >off, then why were they put there in the first place?
> >> Um...  Who says they were put there?  You have a reference 
> >> to a list; you
> >> want to get the first thing in the list.  Now, that list 
> >> still may exist
> >> (someone else may have it); but now you have the first thing as in
> >> independantly considerable item.
> >This situation could happen, in a system that used references,
> >but it seems like very poor style to me, to leave large lists on
> >the stack (even if you really only leave a reference), when you know
> >you are only going to use a small piece of the list. It seems like it
> >would be wiser to arrange things such that instead of copying 
> >a reference
> >to the whole list earlier, one had just copied the small piece that
> >one really needed a copy of.
> But you really _are_ using the whole list, one item at a time.  This is a
> pretty standard way to iterate over a linked list. 

I'm a bit confused about what we are talking about. I was talking
about "first" and "rest"; when you use these primitives, you drop
a portion of the list (in the case of "first", you drop the whole
list except the first element, and in the case of "rest", you just
drop the first item).

But, the technique you're speaking of now, iterating over a whole list,
one item at a time, seems to involve "uncons". My above objection about
leaving large lists on the stack does not apply to using "uncons".
In fact, I would definitely rather use "uncons" to iterate over a
list than use a pair of "first" and "rest".

But, I recognize that earlier I did express some dislike for "uncons"
also; but, that is something I don't really want to dig up again
(it is closely related to the opacity-of-quotation issue and the
idea of using more than seven things on the stack :-)).

> If you'd wanted just a
> small piece of the list, you would have started with a reference to the
> list, and use that reference to carve the small chunk you wanted.  

I'm a little confused. I've forgotten if the list has other references
to it. If it does not, then the question is: if you are only carving out
a piece of the list, then why was the whole list constructed in the
first place? But, there are surely some situations where this really
is necessary; in particular, it seems reasonable if you didn't know
at compile-time (or didn't know at the time that the whole list was
pushed) which piece exactly would ultimately be needed.

Anyhow, but what if the list does have other references, or if
one is saving the original list as well as carving a chunk... Then,
carving out a piece of the list by just adding to the original
list reference (or by following the links in the list a certain
number of times, if you are using the abominable pointer-chain form),
as you describe, would probably be a good way to do it.

> No need to copy the whole list just to let you carve off a tiny
> chunk of the copy!

Right. I don't think I ever suggested that.

However, it might be reasonable in some cases to make a copy
of the tiny piece you need, rather than getting a reference to it.
In particular, this might be a good idea if there was a possibility
that the remaining references to the big list were going to be
dropped soon, and one did not want to go through the hassle of
ensuring that the little piece be saved from the destruction.

> >Another benefit is that if one only copies the small
> >piece it may become reasonable to copy by value instead of by 
> >reference.
> Circular reasoning -- yes, avoiding pointers does avoid pointers, but that
> doesn't justify avoiding pointers.  :-)

Okay, I'm mixed up now... You are supposing that my statement essentially
ran "avoiding pointers allows us to avoid pointers" (with the
unstated conclusion: thus, avoiding pointers is good). Let's see...
Now, is that what I really said? Hmm... Well, maybe it would help
if I quote in context what I originally said:

  It seems like very poor style to me, to leave large lists on
  the stack (even if you really only leave a reference), when you know
  you are only going to use a small piece of the list. It seems like it
  would be wiser to arrange things such that instead of copying a reference
  to the whole list earlier, one had just copied the small piece that
  one really needed a copy of.

Okay, no circular reasoning so far... In fact, there really hasn't been
any reasoning so far, although I'm guessing that you'd agree with me
that this is a good policy: not leaving data on the stack that you know
you are never going to need. In fact, I honestly cannot remember why
I brought this up, since it seems to be something obvious that you'd
just agree with. Now, with that in mind, here is the rest of what I said:

  The rationale for doing it this way is that it may prevent storage
  from being wasted; if one copied the whole list (possibly by reference) when
  one really only needed a small piece, then one is preventing the
  rest of the list from being garbage-collected even though it is not
  really needed. Another benefit is that if one only copies the small
  piece it may become reasonable to copy by value instead of by reference.

Thus, my last statement should not be interpreted as

  "Avoiding pointers sometimes allows us to avoid pointers"

but instead as

  "Not keeping junk on the stack sometimes allows us to avoid pointers"

Much more sane. The problem is that I did not spell out what I was
referring to by "Another benefit"; I was talking about the benefits
of not keeping things on the stack that you know you no longer need
(in particular, of refraining from making copies of large structures
when you know you only need a small piece of it), but you supposed
that I was talking about another benefit of avoiding pointers.

> >Hmm... I don't quite understand. I've never considered avoiding
> >"cons"; it's one of the more fundamental primitives.
> Okay, let me explain.  If POP is equivalent to REST, then the following
> equality is true:
> 2 2 pop == 2 2 rest == 2 2 cons rest
> This means cons is the identity.

Hmm... I don't follow your logic.
How did you get this? :

  2 2 rest == 2 2 cons rest

I'm quite certain that this is not valid.

There may be problems with my technique of keeping things unwrapped,
but I don't see this one.

> It's especially notable that 'rest' is NOT the identity; you're not avoiding
> it, but rather changing its name to 'pop'.  This is a terrible exposure of
> implementation details.

Hmm... I suspect that I would rarely use "pop" or "rest"... 

Also, I should clarify one thing... "pop" would not exactly play
the role of "rest" in all cases. If one was just using a bunch of
unwrapped things on the stack, then yes, "pop" would play the role of
"rest"; it would pop the first item off the "list". 

However, if one was using a bunch of unwrapped things on the stack, 
followed by a length integer (telling how many there are), then "pop" would
not cut it, as it would only remove the length. Rather, one
would probably use "[pop] dip dec" (where "dec" decrements an integer).

In practice, I would probably sometimes use a length and sometimes not.

> Registers, however, have no resemblance to memory.  They're not indirectly
> addressable or computable, for instance.

Okay, so I'll be sure from now on to interpret "memory" as excluding

> >> >> Also, you can't cache the top of the stack in a register.  
> >> >Not so. Of course you can still cache the top few stack items 
> >> >in registers.
> >> Not true if the top stack item is a list.
> >Ah... but it is still true if, say, the list only has two items in it.
> >In that case, the two items could be kept in two registers, and perhaps
> >the size (2) would be kept in a third register... If the list was
> >too big to fit in the registers, then probably only the first 
> >part of the list would be kept in registers.
> So you can't cache the top item anymore -- you can only cache the first few
> cells of the stack.

Ah.. I see your point. In some cases, you wouldn't be able to cache
the "top few stack items", only the first few pieces of one item.

> >> In the Forth dictionary, with a single HERE SWAP ALLOT 
> >> command sequence,
> >> you've taken an integer stating the size of the buffer you 
> >> need and returned
> >> a pointer to the buffer.  The buffer isn't going anywhere; 
> >> it's not memory managed in any reasonable sense.
> >Okay, I think I see how that works... It looks basically the same
> >as how one would do it in C... 
> No, not at all.  I thought I had explained that...  What happens is that
> there's a single dictionary pointer, called HERE.  All ALLOT does is add the
> amount of space you want to HERE.
> That's it.  Anything below HERE is not available for scratch space; anything
> above is dangerous to play with unless you're some kind of memory manager.

Ahh... I think I see now. So a FORTH program has access to one huge
clump of memory, "HERE", which can be grown (or shrunk?) as needed...

Now, this might not be considered bad, but this approach seems to leave
a lot of the nitty-gritty memory management details to the programmer.
For instance, suppose a programmer really needs two separate buffers
that may need grow or shrink over time in unpredictable ways;
he could do what he needs to by using "HERE", by keeping one buffer
in the first half of "HERE" and the other buffer in the second half,
but then he'd probably need to leave some padding in between the two
buffers, so that if the first one grows the other will not need
to be repeatly moved out of the way. But still, if the first
buffer grows too much, then the second one will have to be bumped
out of the way (or else the first one will need to be relocated
after the second), which would mean the programmer would have to
update his pointers. If a programmer is dealing with too many
of these kind of buffers, it could become very difficult to deal with
(one would probably be compelled to set up a memory-management
layer separating the program from "HERE").
> >Wait. 'i' need not be implemented with CALL. One could move the code
> >from the data stack to the top of the instruction stack.
> Okay, but...
> >If done
> >carelessly, this could be very slow (the straightforward way 
> >to implement
> >it would be to JMP somewhere safe, memcpy from the data stack to the
> >code stack in the just the right place, and then JMP back).
> JMP back to where?  You don't know until runtime -- so at runtime, you need
> to play with a pointer.

True, you probably wouldn't know where to jump back to until runtime.
Actually, you would probably base the information on the current EIP;
so, you would probably want to use CALL to get to somewhere safe
rather than JMP. But, one would subtract a certain amount from the
pushed EIP before returning.

Anyhow; it may sound silly that I've suggested using CALL. But, it
is not CALL specifically that I'm objecting do. My point is that
manipulating code using lots of pointers is just as hazardous as
manipulating any other kind of data using lots of pointers; if you
overuse CALL so that the same code is referenced in many different
places, then you're going to have to have some garbage-collection
mechanism to collect code that becomes unneeded.

My point here is that it is theoretically possible (although
probably not practical) to take a "linear logic" approach to the code
as well as the data; this could be done by ensuring that each piece
of code is referenced only one time, and is destroyed when it is

> >I'm going to go out on a limb and use an analogy of questionable
> >relevance: the human body, in which every cell has a copy of a huge
> >data structure, the DNA. As well as this setup works, perhaps there
> >actually is something good about really making copies of things instead
> >of just copying references.
> The cells are very wide-area distributed processors; communications between
> them is slow and risky.  Internal communications within the cell is a
> different story, though; there, RNA is used (short for Reference Nucleic
> Acid ;-).  Seriously, RNA is also a copy of DNA, so your analogy is okay.
> I like your analogy a lot, but I don't think it's really appropriate -- you
> could also consider DNA to be a reference. The original structure being
> referenced is the proteins and enzymes encoded by the DNA. 

This statement doesn't make sense to me; "references" to proteins don't
really seem like references at all, in the sense that we've been talking
about them. They are not like the 4-byte (or 8-byte) pointers of our
processors, but seem more like 20K image or sound files.

Anyhow, I'm hardly a biology specialist... As far as I know, portions
of DNA might be copied-by-reference internally within a cell.
But, whether or not DNA contains references or is referenced,
my point is that it _is_ copied-by-value millions of times when
cell division happens, suggesting that copying-by-value may be
of use, even with very large structures.

But, it doesn't really take an analogy to see this. If one considers
a real distributed computing system where inter-communication is
relatively expensive but storage is relatively cheap, then it
is clear that it is best for individual units to make their own copies.

> Right.  Linear Logic is a very strict approach, probably the strictest
> possible one; reference counting is more flexible but adds more
> restrictions, and full GC is most flexible but adds time penalties.

I like reference counting. Linear logic carried to an extreme
(meaning that no copying-by-reference is allowed) does not seem
practical, at least not with current machines. 

> >> >> Or you could just depend on the fact that the list/array of 
> >> >> pixels is in
> >> >> memory, so a 'dup' only duplicates an address.
> >> >No! That would involve a pointer! Slow slow. 
> >> >There's no need to bring pointers into this.
> >> But surely you see that in this case, the pointer-based 
> >> solution is NOT slow
> >> at all, but on the contrary, is FAST?
> >Not really. If the pixel buffer is stored directly on the stack, then
> >an extra register may be available, since there is no longer a need
> >for a pointer to be stored to it (or rather, ESP plays the role of
> >the pointer). This could result in a speed increase.
> A single less register does not make anything "slow slow" (your words, not
> mine).  

True. I was exagerrating a little bit :-)

> It's almost certain that the huge move involved in the initial copy
> would take enough time to dwarf any gain brought about by having a single
> additional register to play with.

Whoa... where did this copying come from? The 'dup' was optimized away,
remember? If you were really going to copy, then you're right, it had
better be by reference. Similarly, if you are going to move the
structure around a bunch then it had better be by reference.

> WHOAH!  There's a "not" in there that I've missed every single time I've
> read it!!!!!!  

Glad that's straightened out now...

> >But, I could see how a system could use references extensively while
> >never using copying-by-reference.
> I think that's a good goal.

An interesting goal... but, one which is impractical when carried to
an extreme, I think...

For example, if you have some lists of MP3s, and you have the same
MP3 in several lists, do you really want the system to make physical
copies of it?

In cases of really big structures, copying-by-reference is still
a good thing, I believe.

> >  2) Use a data and instruction stack, and keep everything on them,
> >     and in order, and without copying-by-reference, but keep the
> >     stacks padded in places where insertions are likely to take place
> >     or where deletions have taken place. Padding may be removed,
> >     by moving things around, after a region has cooled off.
> >     Doing it this way, with padding, makes addressing things 
> >     on the stack a
> >     little bit tricky; one would probably want to have an addressing
> >     table somewhere (probably in the form of a binary tree)
> >     that tells where things are; there would probably need to be
> >     a pointer for each padded region. Entries in such a table could
> >     be cached (in registers) in many situations...
> >     This approach I think may be quite decent for many purposes.
> >     But, one prominent shortcoming is that it would go very slow if
> >     one tried to move around the same region over and over again.
> I can't see it ever being practical; it involves all of the shortcomings of
> a huge nest of pointers and no benefits which I can see.  Same problem with
> #3.

Hmm... Well, #2 allows insertion and deletion on the stack to be
achieved with decent speed (while this was not the case with #1);
thus one could keep multiple growable buffers on the stack with #2
in a straightforward way.

As for the problem of the "huge nest of pointers", hopefully the
nest would not usually grow too large. Padding would only be
needed at areas where insertions was likely to occur; the system
could have the policy of combining adjacent regions (removing the
padding between them) if activity was unlikely between them, thus
simplifying the "nest".

But, if you can think of a better approach, I'm all ears...

> >  4) Use a similar approach to #3, except that when copying a large
> >     region, do not physically copy it, but instead only modify
> >     the addressing table so that it appears to have been copied.
> Doesn't work -- copying implies seperate regions.  What if someone modifies
> their region?

Ooops. Duh. That's true.

Okay... But, there are probably some ways one could adapt the technique
to make it work. For instance, the system could have the policy of
putting regions that had been copied by reference in a read-only page,
such that if it was written to, a fault would occur and trigger an
actual copy of the region to be modified. If I understand correctly,
this sort of technique is used to implement fork() in many UNIX systems.

In some cases, the system might be able to foresee that a copy was
not going to be used destructively, and thus could skip over this
> >Anyhow, there are surely many variations one could apply to these.
> >Also, I'm currently looking for a really good way to implement
> >an addressing table of the kind that these approaches would require.
> Try linear logic -- just have exactly one pointer for each item, and a
> single back-pointer in the item to indicate its owner. 

This sounds like too much overhead to me if every list is considered
an "item". However, if the system can clump lots of data as one item,
then it seems reasonable.

> (Of course, not
> everything in the system is an 'item'; integers on the stack, for example;
> and also, data may be moved into a scratch area, such as the Forth
> dictionary, where it can be temporarily handled without having to worry
> about the number of pointers.
> >- "iepos" (Brent Kerby)
> -Billy

There is one interesting thought I had recently of how the
memory management could be implemented; here is the idea:

  Whenever the system encounters a combinatory instruction such as
  "dup", "pop", "swap", do not physically perform the operation;
  instead, just keep a history of such operations that were imagined
  to have been performed. When an instruction for side-effect purposes
  needs to physically access some part of the stack, consult the history
  to see where it might be found.

This is a rather raw idea... It would probably need to be adapted if
it were to be efficient; for instance, a lot of the work could be
done at compile-time to see where things will be found.

>From time to time, the system might physically carry out some combinatory
instructions, after the dust has settled. But, first it might try
to simplify them; one particularly important example of simplification
is that "dup pop" might be eliminated, so that the system, by delaying the
duplication, can in some cases get around having to do it at all.
Another simplification might be to collapse two moves (variations of
bury and dig) into one, if they involve the same region... such
simplifications would likely be done as the system goes along.

Hmm... I'll have to think about this some more... maybe experiment
with it some... It looks very promising though, and is causing me
to start thinking again about the Combinatory Axioms, which would
play an essential role in such a setup.

What do you think about it?

- "iepos" (Brent Kerby)