[gclist] Garbage collection and XML

David Chase chase@world.std.com
Wed, 07 Mar 2001 20:23:16 -0500


At 11:54 AM 3/8/2001 +1300, Richard A. O'Keefe wrote:
>3.  The rather large storage costs of the DOM compared with the DVM
>    (even *with* shared strings) can be traced to its requirements
>    combined with the decision to use mutation as a primary programming
>    tool.

How does it go if you play the simulated mutation game?

For instance, supposing you work with applicative data
structures (e.g., splay trees, or red-black trees) so that
you never modify anything, instead only reallocating
along the spine?  Yes, it does generate garbage (so there
is some fractional relevance to gc-list :-) but only
expected-O(log N) garbage per update operation.  One
advantage of applicative data structures is that if the
assignment of the root is atomic, then you only have to
lock for modification.

It is more frustrating than amusing to watch people learn
lessons already well-understood over a decade ago.  As soon
as you start working on a big project in a garbage-collected
language, any data that "escapes" your little sandbox really
has to be regarded as immutable, and (in the case of Java)
unlockable.  In a multi-threaded world, mutability also has
the annoying overhead of synchronization (for a competently
designed memory allocator, synchronization on a multiprocessor
can be 10-20 times as expensive as allocation (*)).  Sure,
mutable data structures are great for programming in the
small, but build something big, and they become a major
pain, and given the overheads they're not necessarily any
faster.

(*) non-recursive synchronization costs two bus locks,
or (my machine, with cpu clocked 6x memory bus) 120 cycles.
Heap memory allocation is load, add, compare, conditional
branch (predicted not taken), store, store, followed by
field initialization.

It's also possible, again if you are a loon, and not in
Java, to create applicative speculatively updated data
structures -- there, you reallocate a spine to "modify"
(say for a red-black tree) and attempt to compare-and-swap
in the new root.  If you fail, simply retry the entire
operation (you can, optionally, attempt to reuse some of
the previous operation if you saved the old spine and
compare with the new as you recompute -- as long as threads
are modifying in different places, this should cut the cost
of a retry).  If contention is low, the synchronization
costs are only half what you pay in the conventional
lock-while-modifying data structure (and if contention
isn't low, then your synchronization costs get grim anyway).

David Chase