[gclist] Java vs. ML, particularly GC
Krishnaswami, Neel
neelk@cswcasa.com
Fri, 22 Dec 2000 14:50:44 -0500
Dave Mason [mailto:dmason@sarg.Ryerson.CA] wrote:
>
> I have 10-20K lines of coding experience in SML, Caml, Scheme (, and
> C) and am now building a system in Java (about 5-10K so far). Java
> (using the Sun VM (not HotSpot)) seems much more sluggish than the
> functional languages (particularly Ocaml), and the information I've
> seen on Java performance suggests that garbage production and
> resulting collection are *major* performance bottlenecks and a
> programmer caring anything about performance should give serious
> thought to minimizing the use of dynamic variables.
>
> 1) opinions on the accuracy of this perception?
True, but keep in mind that ML and Ocaml have highly-optimizing
native code compilers, and you are comparing it to a bytecode
system. Of course, if native MLs exists on all the platforms you
care about then Java still loses.
> 2) are there aspects of Java that will make its GC significantly
> slower?
Yes. The hashcode() method on the Object class in Java makes it
very hard to write a high-quality garbage collector for the JVM,
since it makes it very hard to write a copying or generational
collector.
The problem is that every object must return a meaningful hash
code, *and* the Java spec requires that the hash code of an
object cannot change during the run of a program. The simplest
implementation of hashcode() is to cast the address of the object
to an integer and return that, and with this implementation the
constant-hashcode requirement will be violated by any moving
garbage collector.
Scheme, SML and Ocaml (as well as other OO languages like
Smalltalk and Dylan) all have weaker constraints on object
locations, so efficient generational scavengers can be used
in their implementation. For example, here are bits from
the Dylan object-hash documentation and the Java hashcode
documentation:
Java:
Whenever it is invoked on the same object more than once during
an execution of a Java application, the hashCode method must
consistently return the same integer, provided no information used
in equals comparisons on the object is modified. This integer need
not remain consistent from one execution of an application to
another execution of the same application.
Dylan:
It returns a hash id (an integer) and associated hash state for the
object, computed in some implementation dependent manner. The values
returned by object-hash when called repeatedly on the same object
might not be the same for each call. If the hash-id value changes
then the hash-state value will also change.
I'm sure there's trickery you can use to get around this
restriction in Java, but it's a serious enough problem that
most Java implementations simply punt and use a simple mark-and-
sweep garbage collecter. This kills gc performance compared
to the collectors in other advanced functional and OO languages.
[Incidentally, is there an online archive for this list?]
--
Neel Krishnaswami
neelk@cswcasa.com