types and operators

Captain Napalm spc@armigeron.com
Mon, 29 Apr 1996 06:48:27 -0400 (EDT)


A long long time ago on a network far far away, Nathan Hawkins thus said:
> 
> A revised version of this:
> 
> Types:
> 
> 1. integer
> 2. character
> 3. pointer
> 4. double-integer
> 5. float (optional)
> 
> I propose to distinguish pointers and integers (all three kinds), on the 
> stack in in memory by the top or bottom two bits. We can simply shift these 
> bits away, and mask them off for pointers.
> Pointers should probably always be dword aligned, so two bits will not 
> lose any address space for us. Strings will need special handling, 
> anyway, so we shouldn't worry too much about them.
> 
  I would not recomend this.  First, because playing such games with
pointers/integers/etc leads to problems down the road (just ask the people
that wrote EMACS.  Once system started REALLY addressing more than 16M of
memory, they were screwed BIG TIME because of playing around with bits in
the pointers).

> I would define the following distinguished types (which may be on the stack):
> 
> 0 - pointer to integer
> 1 - pointer to character/string
> 2 - integer
> 3 - undefined (poss. pointer to pointer)
> 
  And if you were folly enough 8-) to go along with this, I would (well, I
wouldn't, but if forced to, I would) define the following:

	0 - pointer to char/string
	1 - pointer to integer
	2 - pointer to typed object (other object)
	3 - integer

> (Note that these numbers aren't important, it's just to give you the 
> general idea.)
> 
  So noted.

> By having a separate pointer type for strings, the GC would be able to 
> tell which memory objects it needs to search for pointers and which ones 
> it doesn't. The only problem with this is that it would forbid embedding 
> strings within other structures, but that isn't that great of a hardship.
> 
> Double ints would lose a little, but could still be handled as two 
> integers treated together. (On x86, we could use SHRD and SHLD to fix it...)
> 
> I don't think we can deal with floats very well. Therefore, to implement 
> floats, I would forbid them on the stack, but permit a "string" to be a 
> float. This would be type-safe enough for the GC, I think.
> 
  *boggle*  Geeze, already you're running into complications.  At the point
where you start doing this (well, we can always frob the grommizt into the
quibutz ... ) you're going down the wrong path.

  To begin with, you already have the overhead of checking for the type.  By
playing with bits like this, you have the additional overhead (for certain
types) of having to shift them around a bit, as well as loosing two orders
(2^2) of magnitude on integers, meaning you can then only represent values
between -500,000,000 to +500,000,000, not the -2,000,000,000 to
+2,000,000,000 (approximately 8-) that most 32-bit systems support, and you
may have porting problems (may - depends upon the package).

  You might as well separate out the type information, and oh, what the
hell, give yourself four bytes to tag on type information.  You could do
something like:

struct tag
{
  unsigned short size;	/* two bytes	*/
  unsigned char  type;	/* one byte	*/
  unsigned char  other; /* one byte	*/
};

union mem
{
  unsighed char  c[4];  /* assuming char is one byte   */
  unsigned short s[2];  /* assuming short is two bytes */
  unsigned int   i[1];	/* assuming this is four bytes */
  void          *p[1];	/* assuming this is four bytes */
};

struct tagmem
{
  struct tag t;
  union mem  m;
};

  Okay, the smallest object now takes 8 bytes, but you have allocation
overhead, and now you have a better base for GC.

  -spc (simplify simplify simplify ... but not TOO simple)