[gclist] object identity & copying collection

Henry G. Baker hbaker@netcom.com
Wed, 10 Dec 1997 06:53:52 -0800 (PST)


> In Appel's "Modern compiler implementation in ML", exercise 13.4 says
> "... Explain the problem in implementing hashCode() ... in a Java system
> with copying garbage collection, and propose a solution."
> 
> Explaining the problem is easy enough (you can't just use the
> object address, because it will change), and proposing a solution
> is also easy, but proposing a *good* solution is a bit harder.
> What is the "textbook" solution for this problem?
> 
> Some less-than-ideal solutions:
> 	- hash values can be computed lazily in hashCode()
> 	  using a random number generator; they can then
> 	  be stored by either
> 	  - having a hash value field in each object (ugh!)
> 	  - keeping a global table (binary tree) mapping
> 	    from objects to hash values.  This requires
> 	    that the collector preserve relative ordering
> 	    (otherwise you'd have to use a linked list, double-ugh!).
> 	    You also want the table keys to be weak pointers.
> Fergus Henderson <fjh@cs.mu.oz.au>   |  "I have always known that the pursuit

There are some analogies between this situation and that of 'print-names'
for gensym'ed symbols.  I have previously proposed the use of a 'lazy'
print-name generator for gensym'ed symbols.  The problem is the following:
Gensym's want to be fast, but interning them often takes a long time.
Maclisp therefore provided the (bad) expedient of not interning them at
all.  This makes debugging nearly impossible, as the pname which types
out at you can't be typed _in_!  The right thing, IMHO, is to assign
(and then immediately intern) pnames for gensym'ed symbols _lazily_, 
_as they are printed out_.  Since printing presumably costs more than
interning, this cost is hidden, and yet it provides for the ability
to debug programs that use gensym'ed symbols.

-- 
Henry Baker
www/ftp directory URL:
ftp://ftp.netcom.com/pub/hb/hbaker/home.html