[gclist] Garbage collection and XML

Ken MacLeod ken@bitsko.slc.ut.us
01 Mar 2001 09:09:36 -0600

"Ji-Yong D. Chung" <virtualcyber@erols.com> writes:

>     I am trying to modify a c++ XML parser library, so that it uses
> a GC.
>     I have just begun my effort, and I am curious what garbage
> collector/memory management forum had to say about XML DOM and SAX
> specification and/or parser implementations.
>     In particular, I was wondering how well XML parsers (DOM and
> SAX) might get along with today's garbage collection/memory
> management techniques.  (Perhaps my question is ridiculously too
> general).  Does the fact that DOM interface is a tree structure
> manipulation tool make any difference?

I specifically selected garbage collection (and Boehm GC in
particular) for implementing the Orchard/Mostly-C XML library (which
has a fledgling C++ interface, by the way).

The performance is excellent (which most GCers will likely to have
expected :-).  We've variously stress and performance tested it using
large documents and thousands of small documents per second.  Using
the Expat C XML parser we're running about 1/3 the speed of the raw
parser, and that due mostly to creating objects for each parse event
and doing string copies of XML text (the latter may have a memmgmt
hook to prevent, haven't checked deeper yet).

One of the particular reasons for using GC is that most people want
their XML trees with "parent" references in them, which creates cycles.

>From your earlier postings, you may be interested in creating a scheme
binding to Orchard[1] (rather than using the C++ interface I mentioned
above), Orchard implements a very lightweight SAX and DOM that would
work well in Scheme.

  -- Ken

[1] <http://casbah.org/~kmacleod/orchard/>