[gclist] object identity & copying collection

P. T. Withington ptw@harlequin.com
Wed, 10 Dec 1997 11:06:28 -0500


At 21:50 +1100 12/10/97, Fergus Henderson wrote:
>Hi,
>
>In Appel's "Modern compiler implementation in ML", exercise 13.4 says
>"... Explain the problem in implementing hashCode() ... in a Java system
>with copying garbage collection, and propose a solution."
>
>Explaining the problem is easy enough (you can't just use the
>object address, because it will change), and proposing a solution
>is also easy, but proposing a *good* solution is a bit harder.
>What is the "textbook" solution for this problem?
>
>Some less-than-ideal solutions:
>	- hash values can be computed lazily in hashCode()
>	  using a random number generator; they can then
>	  be stored by either
>	  - having a hash value field in each object (ugh!)
>	  - keeping a global table (binary tree) mapping
>	    from objects to hash values.  This requires
>	    that the collector preserve relative ordering
>	    (otherwise you'd have to use a linked list, double-ugh!).
>	    You also want the table keys to be weak pointers.

The solution used by many Lisp implementations is to use the object address
but rehash the table on a miss if objects may have moved.  The simplest
implementation of a motion-detector is a GC count, the table stores the GC
count when it was last hashed and if a lookup misses and the current GC
count is higher than the table GC count, rehash the table and try again.

This has the deleterious effect that your tables get slow after a GC.
There are several clever ways (some proprietary) for more accurately
determining if the object you are looking for may have moved.  One can also
aggressively update stale tables in idle time.

---
P. Tucker Withington, Harlequin Inc., One Cambridge Center, Cambridge MA 02142
Phone: +1 617 374 2557       Fax: +1 617 252 6505      "Honk if you love GC's"
The Memory Management Reference:  <URL:http://www.harlequin.com/mm/reference/>