[gclist] why malloc/free instead of GC?

Boehm, Hans hans_boehm@hp.com
Tue, 18 Feb 2003 11:33:14 -0800

> From: David F. Bacon [mailto:dfb@watson.ibm.com]
> > I should really have said "tracing GC" in the following, 
> but that was the
> topic of discussion, I think.  Fundamentally, a tracing 
> collector needs to
> do the same amount of tracing work whether a client allocates, say, 10
> objects containing 100 bytes each, or a single 1000 byte 
> objects.  Large
> objects cost proportionately more tracing work.  This is not true for
> malloc/free allocation, where the 1000 byte 
> allocation+deallocation often
> doesn't cost more than a single 100 byte allocation.  (A pure 
> reference
> count collector without cycle detection behaves more or less like
> malloc/free here.)
> you're assuming the collector has to scan the whole object to find the
> pointers, right?  for a type-accurate collector, the tracing work is
> proportional to the pointer density of the program times the 
> memory size,
> which is usually much smaller.
Even for the gcj collector, that's pretty much true.

But I think it matters only in that it makes the precise argument harder.  Assume you have a nongenerational collector, 20 MB of static 20% pointer-density live data in a 40MB heap, and you start repeatedly allocating and immediately dropping 1MB pointer-free objects.  You will still have to trace 4MB of pointers every 20 allocations.  Thus they will still be far more expensive than allocating cons cells.  (It's not hard to doctor this example to deal with a generational collector, though you probably need to allocate some pointers in that case.)

> > This becomes even more true for a fully conservative 
> collector for C,
> which really has to initialize objects itself, in order to 
> avoid preserving
> stale pointers.  In that case the allocation time includes 
> initialization
> time.  (In real life, I doubt this makes a huge difference, since the
> initialization time tends to be dominated by cache miss time. 
>  If the client
> initializes the object later, as it normally would, it thus 
> avoids the cache
> miss time.  But the time cost has effectively been moved from 
> the client int
> o the allocator.)
> i keep thinking that we should be able to fix this problem, 
> at least for
> objects larger than a cache line, by using the "cache line 
> clear" operations
> that now exist in many cpus.  has anyone expored this?
In the case of our collector, it would clearly help in the not infrequent case of building a free list in an empty page.  The initial write in that case is with bzero or memset, which probably already take advantage of any such possibility.  In other cases, the limiting factor seems to be the fact that you don't want to introduce model dependencies by hard-coding the line size, etc.  I haven't experimented with it much, since I think none of the Intel architectures currently provide something along these lines.
> object pooling is common in java systems.  the problem is 
> that it brings
> back all of the headaches of malloc/free.
The question in my mind is whether you can confine it to one or two large object types.  My guess is that usually you can (and should).

I agree that widespread object pooling is generally a bad idea.  It's external fragmentation cost is usually too high in large systems, in addition to the malloc/free problems.