Memory Management, Inlining, and POINTER

btanksley@hifn.com btanksley@hifn.com
Fri, 31 Mar 2000 15:29:58 -0800


From: iepos@tunes.org [mailto:iepos@tunes.org]

>> >If one is popping things
>> >off, then why were they put there in the first place?

>> Um...  Who says they were put there?  You have a reference 
>> to a list; you
>> want to get the first thing in the list.  Now, that list 
>> still may exist
>> (someone else may have it); but now you have the first thing as in
>> independantly considerable item.

>This situation could happen, in a system that used references,
>but it seems like very poor style to me, to leave large lists on
>the stack (even if you really only leave a reference), when you know
>you are only going to use a small piece of the list. It seems like it
>would be wiser to arrange things such that instead of copying 
>a reference
>to the whole list earlier, one had just copied the small piece that
>one really needed a copy of.

But you really _are_ using the whole list, one item at a time.  This is a
pretty standard way to iterate over a linked list.  If you'd wanted just a
small piece of the list, you would have started with a reference to the
list, and use that reference to carve the small chunk you wanted.  No need
to copy the whole list just to let you carve off a tiny chunk of the copy!

>Another benefit is that if one only copies the small
>piece it may become reasonable to copy by value instead of by 
>reference.

Circular reasoning -- yes, avoiding pointers does avoid pointers, but that
doesn't justify avoiding pointers.  :-)

>> As for using 'pop': that only works if lists are not 'items' 
>> which can be
>> manipulated by themselves, OR are item which require special 
>> manipulation.
>> I wouldn't really enjoy that, but it's up to you.  I do, 
>> however, have to
>> point out that if this is where you go, then you're not 
>> actually avoiding
>> 'first' and 'rest'; on the contrary, you're avoiding 'cons'.

>Hmm... I don't quite understand. I've never considered avoiding
>"cons"; it's one of the more fundamental primitives.

Okay, let me explain.  If POP is equivalent to REST, then the following
equality is true:

2 2 pop == 2 2 rest == 2 2 cons rest

This means cons is the identity.

It's especially notable that 'rest' is NOT the identity; you're not avoiding
it, but rather changing its name to 'pop'.  This is a terrible exposure of
implementation details.

>This is quite an interesting project...
>Hopefully something cool will come out of it.

Dolphin is a lot of fun.  Thanks for the good wishes.

>> >This is wacko... no lists? Or maybe I should have said "array"; you
>> >are going to have arrays, right? How are you going to represent
>> >strings of characters in the system? Ahh, I see; you're just going
>> >to keep the characters in unwrapped form, directly on the 
>> >stack, right?  (:-))

>> Grin.  Nope, no lists, and in a general sense no arrays 
>> either.  Actually, I
>> am supporting _buffers_; if you can grab a chunk of memory 
>> you can use it to
>> hold anything except the system stack (you aren't allowed to 
>> play with
>> that).  You could use it to hold an array, and store xts in 
>> it; a quotation
>> will be an ordered collection of xts.  (xt == Execution Token.)

>Okay... so you are going to have a program sort of like malloc() which
>can be used to ask the system for a pointer to an available buffer of
>some size, right?

Yes and no.  See my comments on ALLOC, below.

>One question... is the system going to be 
>designed in such
>a way that such allocated memory could be physically moved around
>(for defragmentation or caching purposes) by the system 
>without disturbing the program?

Not the memory allocated by ALLOC, no.  In fact, not in general.

>> My current plan is to have three layers of code:

>> 1. low-level Forth: written in native x86, implements a 
>> simple virtual
>> machine.
>> 2. quotation system: written in low-level Forth.
>> 3. high-level Forth: "outer" interpreter (text), block 
>> support (512 byte per
>> block, direct hard drive access), optimiser subsystem.
>> If all goes well, other languages may be implementable on 
>> top of those -- for example, a Joy-style system.

>Sounds like a good plan...

I hope so.

>> >> Eventually, everything will be kept in memory.

>> >Hrmm... this tells me nothing.
>> >Of course everything will be kept in memory.
>> >Where else would it be kept?

>> On the stack.  You're assuming that the stack is in memory, a false
>> assumption even on the x86.  I could keep six whole stack 
>> items on the x86,
>> which is a virtual infinity of stack space for machine Forth 
>> applications.
>> Or I could kepe the stack in a seperate segment, completely 
>> inaccesable to the data operations of the VM.

>> Actually, though, I'm sure I'll be keeping the stack in memory.

>My misunderstanding comes from the word "memory"; I was thinking
>of registers and hard disks as forms of memory... but, you answer
>my question below...

The hard drive is a reasonable source of confusion, although my system won't
use implicit virtual memory; it'll be built to avoid that.  (I don't know
whether I'll be able to avoid it forever, but the low-level system will
certainly not have a clue about it.)

Registers, however, have no resemblance to memory.  They're not indirectly
addressable or computable, for instance.

>> >> >The more I think about it, the more I like the idea of 
>> >> >just keeping
>> >> >everything directly on the stack; of course, a problem 
>> >> >that arises
>> >> >if one does this literally is that it becomes very expensive to
>> >> >shuffle things around on the stack... I'm not sure how 
>> >> >serious this might be.

>> >> Also, you can't cache the top of the stack in a register.  

>> >Not so. Of course you can still cache the top few stack items 
>> >in registers.

>> Not true if the top stack item is a list.

>Ah... but it is still true if, say, the list only has two items in it.
>In that case, the two items could be kept in two registers, and perhaps
>the size (2) would be kept in a third register... If the list was
>too big to fit in the registers, then probably only the first 
>part of the list would be kept in registers.

So you can't cache the top item anymore -- you can only cache the first few
cells of the stack.

>> >For example, when one needs 4KB of scratch space for a 
>> >sound buffer, 
>> >one could just subtract 4K from ESP, and you have
>> >allocated memory in a very speedy way (a segfault will be generated
>> >and caught if the stack has overflowed). Actually, one would rarely
>> >(if ever) just subtract ESP like that; rather, one would just use
>> >the program that fills up the buffer, and it would
>> >decrement the ESP as it goes along. Anyhow, the point
>> >is that one can avoid the overhead of memory allocation (for such
>> >a large buffer, this overhead might not be significant; but if
>> >one is working with many small lists, then it could be significant).

>> Of course.  But one doesn't even have to use the stack for 
>> such a thing;
>> that's what the Forth dictionary is for.  If you use the 
>> stack, you've going
>> to spend your time trying to defeat the stack optimiser, as 
>> it attempts to
>> register-cache the top stack items (which are supposed to be the most
>> commonly used), but you wind up using effectively random items.

>> In the Forth dictionary, with a single HERE SWAP ALLOT 
>> command sequence,
>> you've taken an integer stating the size of the buffer you 
>> need and returned
>> a pointer to the buffer.  The buffer isn't going anywhere; 
>> it's not memory managed in any reasonable sense.

>Okay, I think I see how that works... It looks basically the same
>as how one would do it in C... 

No, not at all.  I thought I had explained that...  What happens is that
there's a single dictionary pointer, called HERE.  All ALLOT does is add the
amount of space you want to HERE.

That's it.  Anything below HERE is not available for scratch space; anything
above is dangerous to play with unless you're some kind of memory manager.

>> If you're restricting yourself to "linear logic" you don't
>> need pointers (in fact, that's the only way to do without 
>> pointers in the
>> manner you describe); however, if you think about it, you'll 
>> realize that if
>> you're using linear logic with pointers, you also don't need 
>> to worry about
>> moving data around, because you KNOW there's only one 
>> pointer, and you can simply find it and change it.

>Yes. If you know you only have one pointer, then its easier to 
>move things.

>> BTW, did you know that in Henry Baker's definition of Linear 
>> Logic, he requires "exactly one" pointer per data item?

>Hmm.. well I guess we can hope that there aren't too many data 
>items, then.

I suppose so.  Too many items would be a waste -- better to have just enough
items.

>> >> >Also, it is interesting that you think of using "CALL"... 
>> >> >I'm inclined
>> >> >to always inline everything, but I guess this might not 
>> >> >be practical
>> >> >in some cases...

>> >> It's impossible (recursive programs) 

>> >Surprise. It is not impossible. It is not uncomputable. 
>> [...]

>> >  foo == [[dup i] cons (foo\ Z)] dup i

>> Before we continue, I have to ask whether you realize what 
>> the definition of
>> 'i' is.  CALL, right?  Right.  You haven't eliminated 
>> recursive CALLs.

>Wait. 'i' need not be implemented with CALL. One could move the code
>from the data stack to the top of the instruction stack.

Okay, but...

>If done
>carelessly, this could be very slow (the straightforward way 
>to implement
>it would be to JMP somewhere safe, memcpy from the data stack to the
>code stack in the just the right place, and then JMP back).

JMP back to where?  You don't know until runtime -- so at runtime, you need
to play with a pointer.

>> >Usually, copying-by-reference causes slowness, for these reasons:

>> >  1) the slowness of indirection
>> >  2) the slowness of reference-counting
>> >  3) the slowness of seeking all the way to the other end of the
>> >     hard drive (or to another computer, in a distributed system)
>> >     to get something because a local copy was not made,
>> >     but copying-by-reference was used instead.

>> >But in some situations I can see how copying-by-reference 
>> >would still
>> >be appropriate, in situations where one is dealing with many copies
>> >of very large things; but these are fairly rare situations, 
>> >I suspect.

>> Then try to avoid using references when you actually need 
>> copies.  It seems
>> pretty simple to me -- use references when you can, copy 
>> when you must.  The
>> entire purpose of linear logic is to provide a framework for deciding
>> exactly that.  The goal is to _avoid_ as many copies as possible.

>I'm going to go out on a limb and use an analogy of questionable
>relevance: the human body, in which every cell has a copy of a huge
>data structure, the DNA. As well as this setup works, perhaps there
>actually is something good about really making copies of things instead
>of just copying references.

The cells are very wide-area distributed processors; communications between
them is slow and risky.  Internal communications within the cell is a
different story, though; there, RNA is used (short for Reference Nucleic
Acid ;-).  Seriously, RNA is also a copy of DNA, so your analogy is okay.

I like your analogy a lot, but I don't think it's really appropriate -- you
could also consider DNA to be a reference.  The original structure being
referenced is the proteins and enzymes encoded by the DNA.  This is not
enormously accurate, but since nobody has yet suggested HOW to take a
reference of DNA, I don't think it's relaistic to ask whether the body would
be more efficient if it used references.

>Back to reality... the issue of when to copy by reference and when
>to copy by value does seem like quite a tricky one.

Right.  Linear Logic is a very strict approach, probably the strictest
possible one; reference counting is more flexible but adds more
restrictions, and full GC is most flexible but adds time penalties.

>An advantage
>of copy-by-value that I think may often be overlooked is "locality",
>that it is then possible to keep the copy in order, where it needs
>to be, whereas this may not be possible for copy-by-reference.

On the contrary -- with copy by reference you can _always_ move the
structure around.  Nobody's stopping you.  The trick is that you don't
_have_ to.

>> >> Or you could just depend on the fact that the list/array of 
>> >> pixels is in
>> >> memory, so a 'dup' only duplicates an address.

>> >No! That would involve a pointer! Slow slow. 
>> >There's no need to bring pointers into this.

>> But surely you see that in this case, the pointer-based 
>> solution is NOT slow
>> at all, but on the contrary, is FAST?

>Not really. If the pixel buffer is stored directly on the stack, then
>an extra register may be available, since there is no longer a need
>for a pointer to be stored to it (or rather, ESP plays the role of
>the pointer). This could result in a speed increase.

A single less register does not make anything "slow slow" (your words, not
mine).  It's almost certain that the huge move involved in the initial copy
would take enough time to dwarf any gain brought about by having a single
additional register to play with.

>> >> That's the whole purpose of keeping quotations in "array of 
>> >> pointers to
>> >> function structs" form.  An array of function structs is 
>> >> less efficient to
>> >> execute than a stream of native code (although not 
>> >> disasterously less so,
>> >> since the appropriate interpreter is simply a loop 
>> >> containing an indirect
>> >> call), but it contains (or points to) information which can 
>> >> be used to count
>> >> executions, profile execution times, and so on.

>> >Whoa... a whole array of POINTERS? indirect calls? slow slow slow.
>> >seriously, keeping the code in uncompiled form and interpreting
>> >from that, as you say, may not be a disasterously bad choice.

>> Utter and obvious prejudice.  You clearly have NO clue what 
>> you're talking
>> about; there have been many benchmarks studying this stuff.

>Heh heh... Okay, okay. Guilty as charged.
>I'm really not an expert at memory management.

>But, I'm a bit confused, as I was mainly agreeing with you that your
>technique of interpreting (the first time) would not be that bad.

WHOAH!  There's a "not" in there that I've missed every single time I've
read it!!!!!!  Holy mackerel, no wonder we're getting confused.  After I saw
you saying "slow slow slow" I naturally assumed that you were justifiying
such a strong topic sentance.  Wow, was I ever wrong!  I thought you were
saying that it would be disasterously bad.  Wow.

Whew.  Sorry about that -- it was a pretty big misunderstanding.

>> >But, some pointers will need to be used in the system.
>> >If nothing else, there's ESP and EIP. But, other pointers
>> >will be needed too, if we want to be practical. But,
>> >it must be controlled. No more pointers than we need.

>> Of course.  And no more copies than we need; references are 
>> cheaper by far
>> than copies, so use references whenever possible.

>Hmm... References are probably usually cheaper, yes. Some notable
>exceptions are:

>  1) if you're dealing with really small pieces of data

...which are independant (i.e. small pieces in arrays are best handled by
pointers).

>  2) if your two copies are being used in quite unrelated 
>parts of the system.

100% true.

>An extreme case of #2 can happen with distributed systems. In 
>this case,
>unless a data structure is _really_ big, it would be ridiculous
>for two machines to share a copy. But, this can happen on a smaller
>scale on disks with significant seek time, as mentioned earlier.
>A system might be mostly working on one area of the hard disk and it
>suddenly needs to zoom way off to another end because a 
>copy-by-reference
>was used in two unrelated parts of the system. 

Even more true.  :)

>> >All pointers that are used must be carefully kept track of,
>> >so that things can still move around. 

>> This is true.  Each item should have a single 'owner' slot, 
>> which refers
>> back to its (single) owner; nobody else may EVER have a 
>> pointer to it.  When
>> the item is moved, the owner is updated. 

>This would not support copying-by-reference, but I guess it 
>could easily be extended to, by allowing multiple owners.

*Could* be, but by design it doesn't.  It's just an implmentation of true
linear logic.

>But, I could see how a system could use references extensively while
>never using copying-by-reference.

I think that's a good goal.

>In particular, single-owner references
>might be used so that swapping around large structures becomes
>decent, although copying them would remain expensive.

Right.

>There are several approaches to implementing a system that is careful
>with its pointers that I've been thinking about; here are some,
>listed in order of increasing complexity and dependence on pointers...

>  1) Use mainly just two pointers in the system, the top of 
>     the data stack,
>     and the top of the instruction stack. Keep everything on these
>     stacks, and keep them exactly in order without 
>     copying-by-reference.
>     This is quite a ridiculous approach; it has been recently 
>     implemented.
>     See http://www.tunes.org/~iepos/prog.c and the "remarks" file
>     in the same directory :-)

Wow.  Cool.

>  2) Use a data and instruction stack, and keep everything on them,
>     and in order, and without copying-by-reference, but keep the
>     stacks padded in places where insertions are likely to take place
>     or where deletions have taken place. Padding may be removed,
>     by moving things around, after a region has cooled off.

>     Doing it this way, with padding, makes addressing things 
>     on the stack a
>     little bit tricky; one would probably want to have an addressing
>     table somewhere (probably in the form of a binary tree)
>     that tells where things are; there would probably need to be
>     a pointer for each padded region. Entries in such a table could
>     be cached (in registers) in many situations...

>     This approach I think may be quite decent for many purposes.
>     But, one prominent shortcoming is that it would go very slow if
>     one tried to move around the same region over and over again.

I can't see it ever being practical; it involves all of the shortcomings of
a huge nest of pointers and no benefits which I can see.  Same problem with
#3.

>  4) Use a similar approach to #3, except that when copying a large
>     region, do not physically copy it, but instead only modify
>     the addressing table so that it appears to have been copied.

Doesn't work -- copying implies seperate regions.  What if someone modifies
their region?

>Anyhow, there are surely many variations one could apply to these.
>Also, I'm currently looking for a really good way to implement
>an addressing table of the kind that these approaches would require.

Try linear logic -- just have exactly one pointer for each item, and a
single back-pointer in the item to indicate its owner.  (Of course, not
everything in the system is an 'item'; integers on the stack, for example;
and also, data may be moved into a scratch area, such as the Forth
dictionary, where it can be temporarily handled without having to worry
about the number of pointers.

>- "iepos" (Brent Kerby)

-Billy