GC performance - was Re: [gclist] GC topics

Darius Blasband darius@phidani.be
Fri, 23 Feb 1996 09:43:02 +0100 (MET)


> > Well, I guess I must be dumb, or misinformed, so please give me advice.
> 
> Misinformed is my guess :-).
> 
Well, obviously, reading what came next, I was not that misinformed.
The basic idea seems to be "let's look for whatever looks like a pointer"
and consider it as a pointer. 

> 
> > It maybe is easy to write, but how easy would it be to optimize ?  What 
> > happens with portability ? If I am wrong regarding what such GC systems 
> > are, please let me know.
> 
> Depends on what you mean by "optimize" and "portability".  Remember 
> that I'm also worried about the performance of the entire system, 
> and conservative GC is the most friendly GC for a very-optimizing 
> compiler (there's still possible problems, but they're most reduced 
> for a conservative GC).  Or, to be more concrete, let's suppose that 
> I have an application, using precise GC, and GC takes 5% of the total 
> time.  By switching to conservative GC, let's say I double the time 
> spent in GC, but I get to use an off-the-shelf compiler/optimizer 
> for the rest of the code that is 10% faster than what I was using.  
> Elapsed time goes from 95+5 = 100 seconds to 85.5 + 10 = 95.5 seconds, 
> or a 4.5% speedup.  For throughput-constrained applications, the 
> conservative GC is a clear winner. 
As I told you earlier, I do not have this limitation: I can use the most
optimizing C compiler with YAFL, and then again, I believe this is not the
proper way of answering the question: apparently, even you seem to admit
that conservative GC *is* slower than precise GC, and given the liberty 
to implement any of thoses systems in a brand new compiler, language
or environment (so you optimizing compiler issue does not apply),
precise GC has clear advantages.

> I tried to group the operations into independent clumps, and singled 
> out branches and loads as potentially expensive. Obviously, this 
> is the sort of thing that you'd like to inline, but I wrote it as 
> a subroutine for ease of presentation. My guess is that it could 
> dismiss an obvious non-pointer in 3 cycles.  Actual pointers, to 
> freed or marked objects, would be dismised in about 8 cycles.  Pointers 
> that were marked but not scanned, 9 cycles, and something like 12 
> cycles for a pointer to an allocated, unmarked, scannable object. 
> 
> The code could be further improved, though substantially complicated, 
> by scanning two or four pointers at a time. The idea is to generate 
> latency between load operations and uses of the operands, so that 
> the CPU has something to do while waiting for the load to complete 
> (modern chips do this to some extent, and they'll only do it more 
> in the future). A cache miss loading the "info" or the mark byte 
> could easily double the cost of this code, so it is not a bad idea 
> to attempt to cover the cost of a cache miss.
> 
I was very impressed by the code you provided, but it is nothing close
to what I would even remotely consider as portable, not that it would not
work, but on some architectures, the impact on performance would be
negligible, if not negative.

Really, I thought there was something essential about conservative GC
which I did not get, but it does not seem to be the case after what
I read. I do believe it is not a generally good solution, except
for cases where no other scheme can be used (C and C++). I think it
is a mistake to confuse between "appropriate solution because C and C++
are poorly designed language from the memory management point of view",
and "appropriate solution because of superior behaviours, performance
and predictability".

Darius