[gclist] object identity & copying collection

David Chase chase@world.std.com
Wed, 10 Dec 1997 11:29:18 -0500

At 10:03 AM 12/10/97 -0600, Charles Fiterman wrote:
>Here is the deep problem.

>There is a bad tradition in collector technology of 
>improving the efficiency of the collector by assigning
>its work elsewhere.

I agree (and agreed ten years ago).  Copying collectors
impede certain compiler optimizations, for instance.
Generational collectors require card-marking, which
has some costs as well (but stores are not the common
case; note that hardware is designed with the assumption
that stores are relatively rare).

>A moving collector assigns extra work to hash functions.

This is true, but it does not seem like a substantial amount
of extra work.  First of all, object addresses are unique,
but they are not necessarily good hash functions.  Many
objects (many that I have worked with) also require a different
notion of equality, so I end up processing the data within
an object when I hash it.  Address-as-hash is not an
overwhelmingly common case.

Second, even when you are using something like address-as-hash,
the costs of the two-bit solution are unlikely to dominate the
system performance.  Simple use of address has to be replaced
by a load, a test, and either no more loads, or two more loads.
Three loads and a test looks like a big overhead, but in a
hashing context you will be probing a table, which requires
additional loads, some sort of remainder operation, and an
equality test.

In the case of object identity (not hashing) the expensive
case occurs when the ID resides in an address-indexed table.
A generational collector (having worked with a straight copying-
compacting-non-generational system, I would not recommend it)
will not have many of these objects, and the small size of
the first generation allows some relatively sleazy encoding
tricks for that table if its performance is a problem.

Remember that the goal is a usable system.  Generational
compacting collectors appear to provide a good solution
for many uses.  Pauses are small, efficiency is high.
There's some runtime penalty imposed on the mutator, but
not a large one.  They're not necessarily the only solution,
but I don't see the deep flaws that you do.

>Similarly there has to be extra work if a language allows
>an object to create a new version of itself because the
>new version will be elsewhere but have the old identity.

What language does this?

David Chase