[gclist] Garbage collection and XML

David Chase chase@world.std.com
Tue, 06 Mar 2001 23:57:06 -0500

At 05:07 PM 3/6/2001 -0800, Bryan O'Sullivan wrote:

actually, Richard O'Keefe wrote:
>r> What I *wanted* was a UniqueString class, with a less space-hungry
>r> representation than Java's String class.

The best you're likely to get out of most Java implementations
for any type is 2 words of header, plus one or two for data,
depending on how they deal with possible alignment of doubles
and longs.

Java strings are also not necessarily quite as costly
as you make them out to be.  The basic object is
header + array pointer + offset + count (5 or 6 words, depending
on padding) but it is entirely possible to share the array
portion of equal strings.  You could, for instance, say

  new String(s.intern())

to ensure that you get a string that is mostly shared,
yet not equal to any other string.  That's perhaps only
5 words per string after the first one is created, versus
maybe 3 words per whatever object you might come up with
for your unique-string type.

However, you could also play the game of indexing your
entities, and indexing instances of entities.  That
is, map objects to integers.

That way, you can store any object in 2 words, one identifying
the value, the other identifying the instance ID, with full
sharing under the covers (where "under the covers" is in
a sort of a hash table, where the "value" associated with
each object stored in the table is the integer for the
object.  When the instance id wraps, you grab another
slot for the same object in the table.)

This is probably too horrible to contemplate for most
people, given that you've got untyped integers
instead of typed objects, and no garbage collection at
all under the covers.  A loon might even push it to the
bit level, and reserve 8 bits for the instance ID, and
24 bits for the value index.  (If Fortran is outlawed,
only outlaws will use Fortran.)

>Since java.lang.String can't be subclassed and Java's notion of type
>equivalence is based on name, not structure, I fear that a
>UniqueString would be something of an annoyance to use in practice.

final public class U {
   public static String u(String s) { return new String(s.intern()); }

.... U.u("a string") ...

Less typing than those clunky old Modula-3 keywords :-).

>If only Cardelli and company had glommed C-like syntax over Modula-3's
>semantics, we might inhabit a slightly happier world.

I'm not sure, but I think some of us in the "peanut gallery"
raised the issue at the time.  I may have my old email
still from back then; maybe someday I'll see what I
can find.

Java's got one other thing that Modula-3 didn't, which is an
answer to the multiple inheritance question.  The problem, for
the M-3 definers, was that most of the people who wanted MI were
unable or unwilling to explain what it was that they wanted in any
sort of a sound semantic framework (William Cook was an exception,
I think) and the attitude of the M-3 people toward inheritance
in general would probably have come up with something different.
Not clear if better or worse, but different.  Unsurprisingly,
Java's type system is at its flakiest where it deals with
multiple inheritance, but to an engineering approximation
nobody cares, and nobody I know has figured out how to turn
the theoretical glitch into a security hole.

David Chase