[gclist] Garbage collection and XML

Ji-Yong D. Chung virtualcyber@erols.com
Thu, 1 Mar 2001 23:45:05 -0500


Hi, you wrote

> The program puts all of these things in a hash table, so all strings are
> unique.  When I wrote this I was concerned with the effectiveness of the
> hash table, and whether it would be a good idea to do my own mallocking
> out of blocks instead of allocating each string separately. (Yes it was.)

    This is a strong argument in favor of hashtables indeed -- but how do
you
determine the dimension of the hashtable?  My guess here would be that
you chose the size based on the XML file size.  I'd think that a static
hashtable with a predetermined size would not work, because file sizes
can really vary,

>     1,496,996 bytes for tree structure
>        75,944 bytes for strings
> -------------------
>     1,572,940 bytes total storage.
> BUT
>     2,634,021 bytes for the XML sources.

    This is what I call efficient use of memory!  Actually, this makes
sense,
because XML contains many built-in inefficiencies.
(such as the matching tags). The preceding figures you provide gives
pretty good idea of what to shoot for in designing a good parser..

    One last detail -- from looking at your previous email messages,
I'd guess that you used reference counting, right?  Also, from what
Ken Anderson said, I'd guess that one could obtain similar
results (provided the code is of the similar
quality) from a parser that uses garbage collector.


Take Care,
Ji-Yong D. Chung