[gclist] Garbage collection and XML

Boehm, Hans hans_boehm@hp.com
Wed, 7 Mar 2001 09:30:50 -0800


> -----Original Message-----
> From: David Chase [mailto:chase@world.std.com]
> The best you're likely to get out of most Java implementations
> for any type is 2 words of header, plus one or two for data,
> depending on how they deal with possible alignment of doubles
> and longs.
> 
> Java strings are also not necessarily quite as costly
> as you make them out to be.  The basic object is
> header + array pointer + offset + count (5 or 6 words, depending
> on padding) but it is entirely possible to share the array
> portion of equal strings. ...

A lot of this clearly varies greatly with the implementation.  I believe
that gcj (with a patch that hasn't yet made it into the official tree) will
in the best case represent a String as a single chunk of memory containing:

1 word object header (vtable pointer only, objects are not moved,
synchronization is handled with a separate table)
1 word pointer to array (in the best case points to the string object
itself)
1 "int" byte offset to start of string.
1 "int" length
Sequence of 16 bit characters

Thus strings up to 4 characters are 4 words on a 64 bit machine, and 6 on a
32 bit machine.  (Object sizes are even numbers of words for alignment
reasons.)

Disclaimer:  I didn't write the String implementation.  This is based on my
reading of the code.

Hans