GC performance - was Re: [gclist] GC topics
Darius Blasband
darius@phidani.be
Fri, 23 Feb 1996 09:43:02 +0100 (MET)
> > Well, I guess I must be dumb, or misinformed, so please give me advice.
>
> Misinformed is my guess :-).
>
Well, obviously, reading what came next, I was not that misinformed.
The basic idea seems to be "let's look for whatever looks like a pointer"
and consider it as a pointer.
>
> > It maybe is easy to write, but how easy would it be to optimize ? What
> > happens with portability ? If I am wrong regarding what such GC systems
> > are, please let me know.
>
> Depends on what you mean by "optimize" and "portability". Remember
> that I'm also worried about the performance of the entire system,
> and conservative GC is the most friendly GC for a very-optimizing
> compiler (there's still possible problems, but they're most reduced
> for a conservative GC). Or, to be more concrete, let's suppose that
> I have an application, using precise GC, and GC takes 5% of the total
> time. By switching to conservative GC, let's say I double the time
> spent in GC, but I get to use an off-the-shelf compiler/optimizer
> for the rest of the code that is 10% faster than what I was using.
> Elapsed time goes from 95+5 = 100 seconds to 85.5 + 10 = 95.5 seconds,
> or a 4.5% speedup. For throughput-constrained applications, the
> conservative GC is a clear winner.
As I told you earlier, I do not have this limitation: I can use the most
optimizing C compiler with YAFL, and then again, I believe this is not the
proper way of answering the question: apparently, even you seem to admit
that conservative GC *is* slower than precise GC, and given the liberty
to implement any of thoses systems in a brand new compiler, language
or environment (so you optimizing compiler issue does not apply),
precise GC has clear advantages.
> I tried to group the operations into independent clumps, and singled
> out branches and loads as potentially expensive. Obviously, this
> is the sort of thing that you'd like to inline, but I wrote it as
> a subroutine for ease of presentation. My guess is that it could
> dismiss an obvious non-pointer in 3 cycles. Actual pointers, to
> freed or marked objects, would be dismised in about 8 cycles. Pointers
> that were marked but not scanned, 9 cycles, and something like 12
> cycles for a pointer to an allocated, unmarked, scannable object.
>
> The code could be further improved, though substantially complicated,
> by scanning two or four pointers at a time. The idea is to generate
> latency between load operations and uses of the operands, so that
> the CPU has something to do while waiting for the load to complete
> (modern chips do this to some extent, and they'll only do it more
> in the future). A cache miss loading the "info" or the mark byte
> could easily double the cost of this code, so it is not a bad idea
> to attempt to cover the cost of a cache miss.
>
I was very impressed by the code you provided, but it is nothing close
to what I would even remotely consider as portable, not that it would not
work, but on some architectures, the impact on performance would be
negligible, if not negative.
Really, I thought there was something essential about conservative GC
which I did not get, but it does not seem to be the case after what
I read. I do believe it is not a generally good solution, except
for cases where no other scheme can be used (C and C++). I think it
is a mistake to confuse between "appropriate solution because C and C++
are poorly designed language from the memory management point of view",
and "appropriate solution because of superior behaviours, performance
and predictability".
Darius