types and operators

Nathan Hawkins utsl@one.net
Mon, 29 Apr 1996 15:49:15 -0400 (EDT)


On Mon, 29 Apr 1996, Captain Napalm wrote:

> A long long time ago on a network far far away, Nathan Hawkins thus said:
> > 
> > A revised version of this:
> > 
> > Types:
> > 
> > 1. integer
> > 2. character
> > 3. pointer
> > 4. double-integer
> > 5. float (optional)
> > 
> > I propose to distinguish pointers and integers (all three kinds), on the 
> > stack in in memory by the top or bottom two bits. We can simply shift these 
> > bits away, and mask them off for pointers.
> > Pointers should probably always be dword aligned, so two bits will not 
> > lose any address space for us. Strings will need special handling, 
> > anyway, so we shouldn't worry too much about them.
> > 
>   I would not recomend this.  First, because playing such games with
> pointers/integers/etc leads to problems down the road (just ask the people
> that wrote EMACS.  Once system started REALLY addressing more than 16M of
> memory, they were screwed BIG TIME because of playing around with bits in
> the pointers).

There is no actual loss for pointers. Two bits are already unused with 
dword alignment of everything. (Except for strings, but that'll have to 
be dealt with anyway.)

> > I would define the following distinguished types (which may be on the stack):
> > 
> > 0 - pointer to integer
> > 1 - pointer to character/string
> > 2 - integer
> > 3 - undefined (poss. pointer to pointer)
> > 
>   And if you were folly enough 8-) to go along with this, I would (well, I
> wouldn't, but if forced to, I would) define the following:
> 
> 	0 - pointer to char/string
> 	1 - pointer to integer
> 	2 - pointer to typed object (other object)
> 	3 - integer
> 
> > (Note that these numbers aren't important, it's just to give you the 
> > general idea.)
> > 
>   So noted.

I'll wait for more comments about this...

> > By having a separate pointer type for strings, the GC would be able to 
> > tell which memory objects it needs to search for pointers and which ones 
> > it doesn't. The only problem with this is that it would forbid embedding 
> > strings within other structures, but that isn't that great of a hardship.
> > 
> > Double ints would lose a little, but could still be handled as two 
> > integers treated together. (On x86, we could use SHRD and SHLD to fix it...)
> > 
> > I don't think we can deal with floats very well. Therefore, to implement 
> > floats, I would forbid them on the stack, but permit a "string" to be a 
> > float. This would be type-safe enough for the GC, I think.
> > 
>   *boggle*  Geeze, already you're running into complications.  At the point
> where you start doing this (well, we can always frob the grommizt into the
> quibutz ... ) you're going down the wrong path.

I can't help but agree. However, using stack frames is a lot more 
obnoxious for Forth. And floats won't work on the stack very well on the 
x86, anyway. Frankly, I'm inclined to ignore them entirely, if no one 
else minds, since I never use them. On the x86, there are other problems 
that can produce migraines for an OS designer, like context switching 
the FPU.

As for the complications, everything we do about this will have them. 
There's no help for that, and some type will probably lose slightly no 
matter what we do. For example, I predict that floats will be a stumbling 
block for any design we can think up. The real issue is: does this 
adequately solve the problem of detecting whether a stack element (or on 
the heap) is a pointer, an integer, or something else? A good solution to 
this seems to be the main thing we need to write some code.

>   To begin with, you already have the overhead of checking for the type.  By
> playing with bits like this, you have the additional overhead (for certain
> types) of having to shift them around a bit, as well as loosing two orders
> (2^2) of magnitude on integers, meaning you can then only represent values
> between -500,000,000 to +500,000,000, not the -2,000,000,000 to
> +2,000,000,000 (approximately 8-) that most 32-bit systems support, and you
> may have porting problems (may - depends upon the package).

Yes, this is my main problem with the idea. I don't really like losing 
bits in my integers. But since Fare and others have convinced me that 
typing is necessary for the GC, it seems that this way of doing things 
would waste less memory and time than anything else I can think of. That 
doesn't mean I think it's the only solution or the best one.

>   You might as well separate out the type information, and oh, what the
> hell, give yourself four bytes to tag on type information.  You could do
> something like:
> 
> struct tag
> {
>   unsigned short size;	/* two bytes	*/
>   unsigned char  type;	/* one byte	*/
>   unsigned char  other; /* one byte	*/
> };
> 
> union mem
> {
>   unsighed char  c[4];  /* assuming char is one byte   */
>   unsigned short s[2];  /* assuming short is two bytes */
>   unsigned int   i[1];	/* assuming this is four bytes */
>   void          *p[1];	/* assuming this is four bytes */
> };
> 
> struct tagmem
> {
>   struct tag t;
>   union mem  m;
> };
> 
>   Okay, the smallest object now takes 8 bytes, but you have allocation
> overhead, and now you have a better base for GC.
> 
>   -spc (simplify simplify simplify ... but not TOO simple)

You're describing a Lisp atom, right? This would work, and I would be 
inclined to put a <=8 bit type in the tag and use the rest for a relative 
addressed pointer. This would allow us to have threading of memory 
objects and stacks. So fragmentation would be something that the GC fixes 
up, and we don't worry about it. Also, I would require that all atoms be 
of a fixed size. (8 bytes would be good.) This would allow memory to be 
allocated by that unit. For strings, we would have to allocate a contiguous 
block, and then mark the beginning with a magic # in the type field.

This idea is really the only decent alternative I can see, but it would 
have some slight loss in performance, while simplifying several things.

Pros:
1. It wouldn't break 32-bit integers
2. Would carry more type information
3. Would keep type information with objects at all times
4. Could be a solution to mixing the stack and the heap

Cons:
1. Takes twice as much memory for everything
2. Obviously must be a little slower for something or other

As usual all comments are welcome.

*utsl*