[gclist] object identity & copying collection

Mark Tillotson markt@harlequin.co.uk
Wed, 10 Dec 1997 16:45:27 GMT

Charles Fiterman <cef@geodesic.com> wrote:

| There is a bad tradition in collector technology of 
| improving the efficiency of the collector by assigning
| its work elsewhere. If the collector eats half your time
| and you find a way to move half the collector's work
| to the mutator the collector goes from taking 50% of
| application time to 25% of application and you look
| like a genus while doing absolutly nothing of value.
But this is called increased incrementality isn't it?!!

| A moving collector assigns extra work to hash functions.
| There has to be extra work when an object can legitamately
| be moved across the web but those objects are the exception.
| Similarly there has to be extra work if a language allows
| an object to create a new version of itself because the
| new version will be elsewhere but have the old identity.

It's more that a moving collector prevents you from using the
hack/optimization that "identity => address" is a constant mapping.
The concept of identity doesn't have an integer value, and hashing
wants an integer value.  So if you want to implement hashing on
identity, then somewhere there has to be a 1-1 mapping from objects to
integers.  Hashing doesn't even require that this be a constant
mapping, since if it changes you just have to notice this and rehash.
If most of the time you don't rehash, then you still get the nice
statistical behaviour from your hash tables - O(1).  Hashing wants a
mapping that rarely changes compared to the amount of hash-probing
going on.

So a moving collector is really moving the onus of maintaining the
mapping back to where it belongs, the hashtable code (and thus
ultimately to a slot in every object you want to be able to hash
on...).  Heaps with static objects clearly allow the "identity == 
address" optimization at the expense of fragmentation.  (Incidentally
the cost of fragmentation is surely more than one word per object?)

There are halfway solutions where hash table code attempts to be
precise about changes to the mapping, and hence only rehash when
required, and which delay rehashing on read-only probes (you default
to linear search for only those keys that have moved).  In general
though, copying collectors move things a lot (although being
generational helps).

Also I'll point out that it's not just hash tables that can make use
of an explicitly represented mapping from object->integer, so there is
scope for it being a piece of behaviour for more general use.  (One
example is dead-lock avoidance, where you use the arbitrary total
ordering defined by the mapping to acquire a number of locks in a
given order to avoid spurious mutual-embrace).

You could also, by assigning integers sequetially on allocation,
provide a mutator-visible primitive for comparing the ages of objects
- though offhand I can't think of a compelling use for this outside
the GC.

[ markt@harlequin.co.uk | http://www.harlequin.co.uk/ | +44(0)1954 785433 ]