[gclist] Reference counting on the fingers of one thumb.

John Carter john@dwaf-hri.pwv.gov.za
Mon, 13 Jan 1997 09:33:33 +0200 (SAT)


Greetings Collectors,

I have been contemplating designing a new programming language.

One of the ideas I have been having results in a unusual form of
garbage collection.

I thought I would bounce this idea of the esteemed members of the list
and see if I can learn anything from the hisses and boos.

My idea is that the programming language should enforce a "one object,
one place" world view. (I'm in my office now, I can't be in the office
and in the bathroom at the same time.) The reason for trying to do
this, is the "one-object one place" paradigm is something our minds are
trained on since toddlerhood. (The block is on the floor, it is no
longer in my hand, WAAAH!) Thus I argue, if the computer world
mimicked our spatial world more closely we would tend to create fewer bugs.

I propose that assignment ALWAYS means deep copy. In other words, if
we say a:= b, a becomes a complete copy of b, including copying any
sub-objects of b or pointed to by b. I also propose a new
assignment-like operator which I will call "handover". Instead of
copying the value of 'b' to a, it will handover the object in 'b' to
'a'. Thus 'b' loses its binding and the 'a' takes it over. Thus the
swap becomes
  b handover to temp;
  a handover to b; 
  temp handover to a;

At the end of this sequence, 'temp' is unbound and it is illegal to
refer to it.

Now please be patient and see where this takes us. It takes us into a
world were memory errors disappear and garbage collection comes for
free.


Pointers rethought.
===================

Consider pointers. Pointers are integer labels/names of objects that
exist in a global namespace. Just as 'temp' is a string
of characters providing a means of referring to an object, a pointer
is an integer that provides a means of referring to an object.

A "C" style pointer specifies the mechanism of a direct memory
address. I say we should be concerned with the semantics. The
semantics is we wish to be able to manipulate an integer name for an
object.

Pascal attempts to safeguard this be removing the "integer"
semantics. In C you can perform very useful, but unsafe integer
arithmetic on pointers. Pascal bans this. 

Nonetheless both Pascal and C pointers are labels in a GLOBAL
namespace. A pointer created anywhere in the program is valid
everywhere else.

Now any programmer worth his salt keeps the number of global variables
to a minimum, because they are :-
a) Dangerous. They can create highly non-local bugs.
b) The hard to manage.
c) They pollute/clog the name space.

Yet practically all languages force us to use pointers which are
global integer names. 

In addition to this, pointers create aliases to objects. In the
presence of pointers, objects can have a myriad of different
names. Thus it is highly unsafe to dispose of a object as it is highly
non-trivial to establish whether it is still in use or not. Some
languages provide sophisticated garbage collection mechanisms to cope
with this.

Cleaning up the semantics of assignment allows resolution of the 
================================================================
problems of pointers.
====================

I propose to resolve these problems by imposing the well-known
database rule. "One fact in one place." I propose that all objects
should be atomic in the sense that they can only be in one place, accessed
via one name at a time.

This will be made possible by the "assignment is deepcopy" rule and the
"handover" operator. There will be no address-of operator. There will
be no way of creating an alias. 

Now a lot of highly useful algorithms make sophisticated use of
pointers, and at first sight it would seem as if I have lost a vast
amount by imposing this restriction. I will now show that I have lost
nothing and in fact gained garbage collection for free.

Clearly the language should provide some form of indexed
container. (An array being the archetype). Now the "one object one
place" rule will insist that if we place an object into a container,
we "handover" the object to the container. Thus the object is now in
the container and is no where else. The handover operation ensures that
the variable holding the object previously, is now unbound, and the
object is no longer accessible via that route. The only way of
operating on that object is via the container. 

Now the integer index of the object in the container provides all the
functionality of a traditional pointer, but in a local, not global
namespace. It is also just a plain old integer, and hence is amenable
to all the fun things one can do with integers.

However, and this is the cute bit, we need to know the name of the
container AND the integer index to access our object. Thus as soon as
we move out of scope for that container, the container AND ALL THAT IT
CONTAINS, becomes garbage and can be collected.

To make this a bit more concrete. Think of a C pointer dereference
operator. It is the unary operator *. In my scheme it becomes a binary
operator operating on the pointer and some container. In C the
container is implicitly the global memory container.

Ok, so there are some holes in this scheme...
=============================================

At first sight you may say, "Oh no! He has come up with the old
reference counting idea, and we all know that doesn't work because of
circular references." 

To a certain extent yes, but in fact I have come up with something
more draconian that that. I insist that the reference count MUST be
forced to be 0 or 1. This creates a different realm for the
programmer. It some senses it is more limiting, but in fact it can be
"godelized" to be as flexible as any other language. ("godelized",
after Kurt Godel, who mapped symbolic logic to the natural numbers to
create meta-proofs of mind-boggling implications. Indexed containers
permit "godelization" of objects. Yes, and hence all the bizzaro
memory bugs are also possible to recreate, but the difference is we
would have to create them with a fair degree of concious effort
instead of accidentally in passing.)

What this rule does do, is create an object world that is far more like
the familiar spatial universe we ourselves exist in. I can be in my
office or I can be at home, I can never be in both. Thus when writing
programs under this draconian constraint I am operating in an
environment similar to the every day one.

The only class of memory leak that still remains is if the programmer
is stupid enough to put the container in it self... (Ye olde circular
reference, but with the extra spin that the container must disappear
into itself!)

Hmm. This gives rise to a new name for this bug, the Klein bottle bug!
(The Klein bottle is a bizarrity borrowed from topology, it is a
bottle whose inside is the same side as its outside. You get one by
taping the edges of two moebius strips together in a 4-dimensional
space.)

Still, the compiler should be able to pick up most obvious occurences
of Klein bottle formation, but I suspect in general it would still be
possible.

Also the "this is in that" graph should be a nice tree it should be
quite efficient to check for such an event. Maybe one can do some form
of garbage collection sweep on program exit and find if you have Klein
bottles lying around and if you do have Klein bottles, you can switch
on checks for bottle formation next time you run the program.

There are also situations where some form of garbage collected
container could be useful, but at least this scheme can allow the
garbage collection process to work in a much smaller domain.

The implications of this scheme in the presence of multi-threading
have not been thought out properly. I suspect that it will result in
far fewer race conditions, but the I still need to work on the meaning
of 'one object one name' in the presence of multithreading.

Parameter passing revisited.
============================

The 'one object, one place' rule would suggest that parameter passing
be a 'handover' operation. In fact it must be one of the following
options :-
'hand into procedure', 'hand in and hand out', 'hand out of
procedure'.

For procedures, parameters can be one of :-
1) Hand in.
2) Input only. (Hand in and forbid changes and hand back.)
3) Modify. (Hand in and hand back)
4) Hand out. (Output only.)

The operation of placing an object into a container would require a "hand
in" parameter. For example :-

myStack.push( a);
a.doSomething; // This is illegal as a is now unbound.

Note the difference between Input only and call-by-value. Call by
value, is equivalent to shallow copy and hand-in. I object to this on
two grounds :-
 1) It allows the modification of the input parameter, creating a
    false expectation that the modification will be carried back to the
    calling routine.
 2) Shallow copies allow modication of objects pointed to by the object
    passed. Thus a false expectation is created that the called
    routine cannot change the object passed.

I could get around objection 2 by making call-by-value deepcopy and hand-in,
but objection 1 would still remain.





John Carter                    EMail: ece@dwaf-hri.pwv.gov.za
Telephone : 27-12-808-0374x194 Fax:- 27-12-808-0338

Founder of the Council for Unnatural Scientists.