[gclist] why malloc/free instead of GC?

David F. Bacon dfb@watson.ibm.com
Tue, 18 Feb 2003 13:19:46 -0500

> By "roundtrip" I meant "malloc+free" or "malloc+<my share of GC>".


> I should really have said "tracing GC" in the following, but that was the
topic of discussion, I think.  Fundamentally, a tracing collector needs to
do the same amount of tracing work whether a client allocates, say, 10
objects containing 100 bytes each, or a single 1000 byte objects.  Large
objects cost proportionately more tracing work.  This is not true for
malloc/free allocation, where the 1000 byte allocation+deallocation often
doesn't cost more than a single 100 byte allocation.  (A pure reference
count collector without cycle detection behaves more or less like
malloc/free here.)

you're assuming the collector has to scan the whole object to find the
pointers, right?  for a type-accurate collector, the tracing work is
proportional to the pointer density of the program times the memory size,
which is usually much smaller.

> Of course, this almost never results in an asymptotic difference in the
running time of the client program, since initializing the object will cost
time proportional to its size anyway.  And most programs tend to initialize
at least a constant fraction of the objects they allocate.  However, the
posted "unrealistic and simplistic" test program which started this
discussion did not.  And in my experience, with normal collector tuning, the
initialization time is usually a small constant factor (e.g. 2-10?) less
than the object round trip time.  Thus it does seem to matter in real life.
> This becomes even more true for a fully conservative collector for C,
which really has to initialize objects itself, in order to avoid preserving
stale pointers.  In that case the allocation time includes initialization
time.  (In real life, I doubt this makes a huge difference, since the
initialization time tends to be dominated by cache miss time.  If the client
initializes the object later, as it normally would, it thus avoids the cache
miss time.  But the time cost has effectively been moved from the client int
o the allocator.)

i keep thinking that we should be able to fix this problem, at least for
objects larger than a cache line, by using the "cache line clear" operations
that now exist in many cpus.  has anyone expored this?

> As a result of both of these effects, I usually recommend that with our
collector and C code, users at least consider explicitly managing very large
objects.  Fortunately, in most cases I've heard about, this tends to be
fairly easy.  Often the large objects tend to be things like IO buffers with
well-defined lifetimes that are in fact easy to manage explicitly.  GC pays
off for complex linked structures which tend to be composed of small
> I think the same advice applies to Java, although it's no doubt
politically incorrect there.  Keeping explicit pools for large,
easily-managed objects will mostly get the GC out of the picture once the
pool is sufficiently large.  You pay a bit of space for type-safety, in that
you can't reuse a given large object for a different type.  (If the large
objects contain pointers, the GC also still needs to trace them.  But large
objects seem to often be pointer-free, e.g. bitmaps.)  But I would guess
that so long as you use this technique only in the few cases where it's
really needed, that's not a large cost.

object pooling is common in java systems.  the problem is that it brings
back all of the headaches of malloc/free.  i don't know about it being
"politically incorrect".  there is absolutely no reason why large objects
should be inefficient under gc in java.  but if you create some state
information, and the creation operation is expensive, then object pooling is
an attractive and useful feature regardless of the physical size of the