[gclist] Java vs. ML, particularly GC

Krishnaswami, Neel neelk@cswcasa.com
Fri, 22 Dec 2000 14:50:44 -0500


Dave Mason [mailto:dmason@sarg.Ryerson.CA] wrote:
> 
> I have 10-20K lines of coding experience in SML, Caml, Scheme (, and
> C) and am now building a system in Java (about 5-10K so far).  Java
> (using the Sun VM (not HotSpot)) seems much more sluggish than the
> functional languages (particularly Ocaml), and the information I've
> seen on Java performance suggests that garbage production and
> resulting collection are *major* performance bottlenecks and a
> programmer caring anything about performance should give serious
> thought to minimizing the use of dynamic variables.
> 
> 1) opinions on the accuracy of this perception?

True, but keep in mind that ML and Ocaml have highly-optimizing
native code compilers, and you are comparing it to a bytecode
system. Of course, if native MLs exists on all the platforms you 
care about then Java still loses.

> 2) are there aspects of Java that will make its GC significantly
> slower?

Yes. The hashcode() method on the Object class in Java makes it 
very hard to write a high-quality garbage collector for the JVM,
since it makes it very hard to write a copying or generational
collector. 

The problem is that every object must return a meaningful hash
code, *and* the Java spec requires that the hash code of an 
object cannot change during the run of a program. The simplest
implementation of hashcode() is to cast the address of the object 
to an integer and return that, and with this implementation the 
constant-hashcode requirement will be violated by any moving
garbage collector. 

Scheme, SML and Ocaml (as well as other OO languages like 
Smalltalk and Dylan) all have weaker constraints on object
locations, so efficient generational scavengers can be used
in their implementation. For example, here are bits from 
the Dylan object-hash documentation and the Java hashcode
documentation:

Java:

  Whenever it is invoked on the same object more than once during 
  an execution of a Java application, the hashCode method must
  consistently return the same integer, provided no information used 
  in equals comparisons on the object is modified. This integer need 
  not remain consistent from one execution of an application to 
  another execution of the same application. 

Dylan:

  It returns a hash id (an integer) and associated hash state for the
  object, computed in some implementation dependent manner. The values
  returned by object-hash when called repeatedly on the same object 
  might not be the same for each call. If the hash-id value changes 
  then the hash-state value will also change.

I'm sure there's trickery you can use to get around this 
restriction in Java, but it's a serious enough problem that 
most Java implementations simply punt and use a simple mark-and-
sweep garbage collecter. This kills gc performance compared
to the collectors in other advanced functional and OO languages.

[Incidentally, is there an online archive for this list?]

--
Neel Krishnaswami
neelk@cswcasa.com