[gclist] Garbage collection and XML

Ji-Yong D. Chung virtualcyber@erols.com
Mon, 5 Mar 2001 03:55:27 -0500


    Hi,

> [ your results were ...]
>
> EXAMS: a collection of CS examination papers (my DTD)
>
>     strings +     attrs +  elements =     total
> Without sharing:  3,755,464 +   312,936 + 1,288,364 = 5,356,764
> With sharing:        65,216 +     1,328 +   751,248 =   817,792
> DOM estimate:     5,778,556 + 1,043,120 + 2,460,680 = 9,282,356
> Source exams.xml:     = 2,634,021
> DOM/shared = 11.3
>
> SIGMOD: the SigMod Record catalogue
>
>    strings +   attrs + elements =     total
> Without sharing:   648,824 +  82,548 +  221,100 =   952,472
> With sharing:      131,784 +  16,752 +  217,748 =   366,284
> DOM estimate:    1,189,308 + 275,160 +  332,680 = 1,797,148
> Source SigmodRecord.xml:                        =   360,329
> DOM/shared = 4.9
>
> PLAYS: collected plays of Shakespeare (as found on net)
>
>     strings + attrs +  elements =      total
> Without sharing: 10,677,460 +     0 + 4,305,996 = 14,983,456
> With sharing:     4,901,656 +     0 + 3,751,320 =  8,652,976
> DOM estimate:    19,214,762 +     0 + 7,187,600 = 26,402,362
> Source plays.xml:                               =  7,648,502
> DOM/shared = 3.1
>
> % DOCBOOK: source of DocBook book, parsed by nsgmls
>
>     strings +     attrs +  elements =      total
> Without sharing: 30,266,564 +   497,196 + 1,458,096 = 32,221,856
> With sharing:       641,168 +    40,272 + 1,108,608 =  1,790,048
> DOM estimate:    60,857,232 + 1,657,320 + 2,665,160 = 65,179,712
> Source docbook.sgm:                                 =  2,896,326
> DOM/shared = 36.4
>

    Really interesting results.  What you have shown here, though, does not
seem
to be that DOM API is terribly bad for memory savings.  Rather, it seems to
show that DOM implementations should use shared strings/attributes.

    My observations are as follows:

    (1) If you look at the first result, and assume that you were using
shared
strings for DOM -- it would save nearly 1 meg.  In fact, if Java DOM used
shared strings, it would be far more memory efficient than the
non-shared case..  This holds for the rest of the examples
as well.

    (2) I noticed that DOM's memory use for attributes is bad.  But it uses
always about 3 times as much as the unshared model.  Again, this
seems to show that it is sharing which has more effect than the fact
one is following the DOM API spec.

    These observations lead to the following conclusion:
While DOM API is not designed for writing memory efficient apps,
the biggest problems in Java DOM stem not from the
API spec, but the underlying implementation.
Your parser saves memory because is just well implemented.
The main theme here is:to use shared objecs as much as possible.

    You might ask me, how do you know if Java does not use
shared strings?  I think your numbers clearly indicate this -- DOM's
string memory consumption is always approximately 2 x the amount
for non-shared case.  That they are proportionate shows DOM
is probably using the similar algorithm as the unshared case.  Similar
observation holds for attributes.