[gclist] why malloc/free instead of GC?

Boehm, Hans hans_boehm@hp.com
Tue, 18 Feb 2003 09:49:12 -0800


By "roundtrip" I meant "malloc+free" or "malloc+<my share of GC>".

I should really have said "tracing GC" in the following, but that was the topic of discussion, I think.  Fundamentally, a tracing collector needs to do the same amount of tracing work whether a client allocates, say, 10 objects containing 100 bytes each, or a single 1000 byte objects.  Large objects cost proportionately more tracing work.  This is not true for malloc/free allocation, where the 1000 byte allocation+deallocation often doesn't cost more than a single 100 byte allocation.  (A pure reference count collector without cycle detection behaves more or less like malloc/free here.)

Of course, this almost never results in an asymptotic difference in the running time of the client program, since initializing the object will cost time proportional to its size anyway.  And most programs tend to initialize at least a constant fraction of the objects they allocate.  However, the posted "unrealistic and simplistic" test program which started this discussion did not.  And in my experience, with normal collector tuning, the initialization time is usually a small constant factor (e.g. 2-10?) less than the object round trip time.  Thus it does seem to matter in real life.

This becomes even more true for a fully conservative collector for C, which really has to initialize objects itself, in order to avoid preserving stale pointers.  In that case the allocation time includes initialization time.  (In real life, I doubt this makes a huge difference, since the initialization time tends to be dominated by cache miss time.  If the client initializes the object later, as it normally would, it thus avoids the cache miss time.  But the time cost has effectively been moved from the client into the allocator.)

A conservative GC for C usually worsens matters as follows:

- If it needs to accommodate existing libraries or compilers, it will probably have to recognize "interior pointers", at least of they're stored on the stack or in registers.  (This could be avoided with some compiler cooperation, which is really needed anyway, but is rarely implemented, since its induced failure rate tends to be less than that of other compiler bugs.)

- This means that for any known non-pointer N on the stack, we can't safely allocate a large array A such that N is an address within A.

- As the number of such nonpointers and/or the size of A increases, eventually we get to the point at which we can't find room in the right section of the address space to safely allocate A.

Empirically, this is generally a non-issue on 64-bit hardware.  On 32-bit hardware, with interior pointers recognized everywhere (the default for our collector with C code), and an otherwise favorable application, allocations larger than about 100K seem to be problematic.  With interior pointer recognition only on the stack (default for gcj, for example), the threshold seems to be about a MB.

As a result of both of these effects, I usually recommend that with our collector and C code, users at least consider explicitly managing very large objects.  Fortunately, in most cases I've heard about, this tends to be fairly easy.  Often the large objects tend to be things like IO buffers with well-defined lifetimes that are in fact easy to manage explicitly.  GC pays off for complex linked structures which tend to be composed of small objects.

I think the same advice applies to Java, although it's no doubt politically incorrect there.  Keeping explicit pools for large, easily-managed objects will mostly get the GC out of the picture once the pool is sufficiently large.  You pay a bit of space for type-safety, in that you can't reuse a given large object for a different type.  (If the large objects contain pointers, the GC also still needs to trace them.  But large objects seem to often be pointer-free, e.g. bitmaps.)  But I would guess that so long as you use this technique only in the few cases where it's really needed, that's not a large cost.

Hans

> -----Original Message-----
> From: David F. Bacon [mailto:dfb@watson.ibm.com]
> Sent: Tuesday, February 18, 2003 5:57 AM
> To: Boehm, Hans; 'Greg Hudson '; 'Basile STARYNKEVITCH '
> Cc: gclist@iecc.com
> Subject: Re: [gclist] why malloc/free instead of GC?
> 
> 
> hans,
> 
> by "roundtrip" do you mean malloc+free?  i don't understand 
> your statement
> about proportionality to object size in GC.  also, why do conservative
> collectors dislike large objects?  is it because a floating 
> point number
> could cause a dead large object to be retained?
> 
> david
> ----- Original Message -----
> From: "Boehm, Hans" <hans_boehm@hp.com>
> To: "'Greg Hudson '" <ghudson@MIT.EDU>; "'Basile STARYNKEVITCH '"
> <basile@starynkevitch.net>
> Cc: <gclist@iecc.com>
> Sent: Monday, February 17, 2003 11:24 PM
> Subject: Re: [gclist] why malloc/free instead of GC?
> 
> 
> > I would add:
> >
> > GC object roundtrip times are pretty much unavoidably 
> proportional to the
> object size, where malloc + free times can be nearly constant.  If you
> allocate primarily large objects, malloc+free will be cheaper.  (For
> sufficiently small objects, it usually isn't, at least based on my
> measurements.  Conservative collectors like large objects even less.)
> 
> 
>