[gclist] GC topics
Kelvin Nilsen
kelvin@iastate.edu
Tue, 20 Feb 1996 11:57:42 -0600
>
> -- does anyone have any practical numbers on how much more or less storage
> GC'ed environments use than non-GC'ed. On the one hand, there's always
> some dead storage not collected yet, but on the other hand there's no
> storage leaks
Though I'm a big proponent of garbage collection, I'll play devil's
advocate for a moment:
To start with, I've done some simulations that compare manual real-time
dynamic memory management heap sizes with automatically garbage collected
heap sizes. These simulations show that, if you spend enough time doing
garbage collection (e.g. more than 50% of total execution time), then it is
possible for an automatic real-time garbage collector to support a given
workload using less memory than would be required by typical manual
techniques for real-time memory management. It is important to emphasize,
however, that I was comparing with real-time manual techniques, which are
much less efficient in terms of memory utilization than most general purpose
memory allocators that are in common use today. These results were discussed
in a recently published IWMM paper, available in your local library as
"Springer-Verlag Lectures Notes in Computer Science" no. 986, or you can
download postscript from my web page, mentioned in the signature below
(follow the real-time garbage collection link).
I do lots of stuff on my Macintosh Powerbook (12 Mbytes of RAM, plus RAM
Doubler). The other day, I purchased a product to convert my Framemaker
documents to HTML. You wouldn't think this would be a very difficult filter
to write ... But unfortunately, the tool was written in Lisp and required
garbage collection. I had to kill all other activities on my Mac (Normally,
I have Framemaker, Eudora email, Netscape, Telnet, and Claris Organizer
running simultaneously) and give this program all of the available memory,
and it still ran horribly slowly, crashing repeatedly without even making
its way through the tutorial samples. Apparently, I needed to give it
even more memory! I keep hearing from garbage collection experts that the
days of slow unresponsive garbage collectors are behind us. They continue
to say, for example, that stock hardware is perfectly capable of providing
high-performance real-time garbage collection. Why don't we see these
improved technologies in commercial products? Certainly, you would think
that a company like Harlequin would know how to build an efficient Lisp
run-time system for Macintosh computers...
In "Memory Allocation Costs in Large C and C++ Programs" (SPE 24(6)),
Detlefs, Dosser, and Zorn observe that conservative garbage collection
required factors of 4.6 to 11.1 more CPU time than manual memory
management techniques. For these same workloads, they found that the
garbage collected heaps were 1.44 to 2.06 times as large as the size of
manually managed heaps. Perhaps the authors are on this list. Have I
properly interpreted your reported results? Does anyone have results that
contradict these?
So, why is garbage collection less efficient than manual techniques?
1. Storage leaks from conservative pointer scanning?
I don't think this is a major source of typical overhead though on
rare occasions this may have very ugly consequences.
2. Conservative scanning of data that could be shown not to contain pointers?
Some people say that conservative garbage collection does not
effectively represent the performance benefits of garbage collection
because it must waste time scanning memory that an accurate garbage
collector would know contains no pointers. I don't think this is
a significant factor. First, the Boehm garbage collector lets
programmers distinguish particular objects as not containing pointers
so it is really only partially conservative. Second, there is often
a much higher tag maintenance overhead in accurate garbage collectors.
The ability to ignore run-time type tags is a big win for the
conservative collectors.
3. Algorithmic complexity?
Face it. Having to periodically march through all of memory in search
of dead objects is a time consuming process. Ignoring software
engineering issues and ignoring the difficulties of reliably
recognizing which objects are dead in the absence of an automatic
garbage collector, manual reclamation of memory is orders of magnitude
more efficient than automatic garbage collection. Sure, we've seen
the complexity arguments in support of garbage collection, but these
arguments rely on unrealistic economies. Who is willing to buy a
gigabyte of memory in order to "efficiently" support an application
that requires only 32 Mbytes of live memory at a time?
That said (and I'd be glad to be proven wrong), I think we need to accept
the fact that straight efficiency comparisons between manual and automatic
memory management techniques on the same exact workload are not the best
way to promote the benefits of garbage collection. Such comparisons ignore
considerations such as:
1. A program that uses garbage collection can be developed and maintained
with far less human resource overhead.
2. A program that does not rely on garbage collection is typically written
in a style that requires what may be considerable run-time overhead in
programmer-written code, all for the sake of determining which objects
need to be reclaimed at what times. This overhead typically consists
of run-time checks (extra code) and excessive object copying (extra data).
A fair comparison between a garbage collected system and a manually
managed system should somehow factor out these overhead costs from the
resource needs of the garbage collected system.
__
Kelvin Nilsen, Research Scientist voice: (515) 294-5143
151 ASC II, CATD fax: (515) 294-9519
Iowa State University internet: kelvin@iastate.edu
Ames, IA 50011 http://kickapoo.catd.iastate.edu/index.html