[gclist] Garbage collection and XML
Jerrold Leichter
jerrold.leichter@smarts.com
Fri, 2 Mar 2001 18:15:04 -0500 (EST)
| Dynamically resizable hash tables exist in other places, too.
|
| They're discussed in "The Design and Analaysis of Computer Algorithms", Aho,
| Hopcroft, and Ullman, 1974.
|
| They're used in SGI's STL implementation. (See
| http://www.sgi.com/tech/stl/stl_hashtable.h for the implementation, which is
| unfortunately uglified to be namespace-correct.)
|
| They're also used to keep track of things like finalizable objects in our
| collector. (See
| http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/boehm-gc/finalize.c?rev=1.5&conten
| t-type=text/x-cvsweb-markup for the gory details.)
The design in AHU - also used in your GC - is the basic "when the table gets
too big, make a table of double the size and copy everything". (I've looked
at the STL implementation, but I can't find the appropriate code. However, it
looks as if it uses another classical variant: There's a table of primes, so
I'm guessing that the hash value is taken modulo some prime, and when the
total number of elements exceeds some threshold, we move to the next prime,
make that many buckets, and rehash.)
While this approach works in some appropriate senses, it has many problems.
Most immediately noticeable is the cost of doubling when the size gets large.
Growing something like a vector by copying everything is relatively cheap.
Here, you have to rehash everything. That can be quite expensive. Linked-
list buckets significantly increase the space overhead for tables of small
objects. Splitting a very large hash table will require you to walk through
all the linked lists - which will have poor memory locality.
The neat thing about linear hashing - and similar algorithms - is that they
can grow in small increments. (Linear hashing works by splitting one bucket
at a time. When all buckets have been split, you double the number of buckets,
split one, and leave the rest empty for later.) This gives much more uniform
behavior.
-- Jerry