[gclist] Memory cheats and VM (was Counting on one thumb)

Richard A. O'Keefe ok@cs.rmit.edu.au
Mon, 20 Jan 1997 14:31:05 +1100 (EST)


	>  _I_ find it hard to believe that brute force copying would usually
	> be cheaper than 2 instructions.
	Ok, maybe you have convinced me. Actually 3 instructions, you forgot
	to inc the refcount.

I forgot nothing.  In the unique reference case, the reference count
does not change.

	As I remember it from one the Dec marketing brochures Bliss was
	designed to be highly memory efficient. VMS showed scars of this. Most
	SYS$ calls had horrible parameters that required very tedious, (in C),
	packing of structs and bit strings. 

There is nothing in the BLISS language that *requires* this.
BLISS makes access to bit fields very easy, so BLISS *enables* that
style of OS interface (which struck me as pretty much a disaster in
TOPS-10).  But that's a different matter.

	> Come off it!
	> There is *never* any need to copy *immutable* 'objects' of any kind.
	> Integers and floating point numbers are intrinsically immutable.
	> Copying (and copy optimisation) only apply to *mutable* objects,
	> where there is some operation that can change some *part* of an object.

	Huh? I'm confuzzled. Speaking C++ because it so easy to illustrate
	messes in...

Exactly so.  C has confuzzled you.  In Smalltalk, SETL, Lisp, Prolog, Dylan,
NewtonScript, you name it, numbers are VALUES.

		int i, j, * ip, & k = j;
		j = 3;
		ip = &j; // j is aliased.
		i = j; // j is copied.
		*ip &= 1; j is mutated.
		k &= 2; // j is zero.

The thing that has confuzxzled you (lovely word) is that I was talking
about INTEGERS and you were talking about INTEGER VARIABLES.  Yes, you
can change an integer variable, but you can't change an integer.
Let's translate your example into something that makes this a bit
more explicit.

	let i : int ref = box(0)
	and j : int ref = box(3)
	and ip: int ref ref = box(j)
	and k : int ref = j
	    # j _is_ the location of a box holding the OID for 3
	    # k is the same location as j
	    # ip is the location of a box holding the location that is j
	    # the expressions j, k, cont(ip) all refer the the same box.
	in {
	    i := cont(j)
		# i did not change, the _contents_ of i changed.
		# j was not copied, the _contents_ of j were copied.
	    cont(ip) := bitand(cont(cont(ip)), 1)
	        # j has not changed at all, only the _contents_ of j
	    k := bitand(cont(k), 2)
		# j has not changed at all, but now cont(j) is 0
	}

In none of this did any *integers* changed.
The *contents* of some *boxes* changed, but that is another matter entirely.
After the command
	i := cont(j)
i and j are distinct _boxes_ containing the same _value_.
We might say that i and j both contain the object identifier for the
object 3, and the object 3 has no user-accessible parts that could
possibly change.  Any number of boxes can safely contain the object
identifier of the object 3 without any need ever to copy 3, because
3 cannot change.

This is one area of confusion that Algol 68 cleared up.

Even in C++, the number 2147483647 remains the number 2147483647 no
matter what you do.  A box that contains that number at one time may
contain a different number at another time, but the _number_ does not
change.  In Smalltalk, Lisp, &c, we can deal with numbers of any
reasonable size, e.g.

	(do ((i 2 (+ i 1))
	     (f 1 (* f i))
	    )
	    ((> i 40)
	     f))

computes 40! = 815915283247897734345611269596115894272000000000
which at 48 digits is too big for the registers of most machines.  So it
is in practice represented as a pointer to some sort of data structure,
BUT that data structure is immutable, so pointers to it can be passed
out to anyone that wants one, without ever bothering to make a copy
of that particular number object.

It's also true in statistical languages like S and R that numbers
cannot be changed.  If I do
	y <- c(1,2,3)	# make a vector with 3 elements
	x <- y
	x[[2]] <- 4	# change one element
	x
=>	1 4 3
I have changed a component of a mutable object, so that if I want to
preserve the illusion that arrays are values, I must make sure that
x referes to a _different_ mutable object from y.  But if I do
	y <- 2
	x <- y
	x <- 4
then I haven't changed any components of any mutable objects, I have
merely rebound x to refer to (some) 4 object.