[gclist] GC & profiling

Nick Barnes Nick.Barnes@pobox.com
Sun, 22 Apr 2001 20:00:27 +0000


At 2001-04-22 14:11:54+0000, Fergus Henderson writes:
> On 19-Apr-2001, eliot@parcplace.com <eliot@parcplace.com> wrote:
> > Fergus Henderson <fjh@cs.mu.oz.au>:
> > | For accurate profiling of execution time, GC can cause some problems,
> > | because GCs can occur at semi-random times, but the blame for time spent
> > | in a GC should really be spread among all the routines that allocated
> > | the memory, not on the routine that happened to do the final memory
> > | allocation which triggered that GC.  If you don't treat GC specially,
> > | then some innocuous routine can be unfairly blamed for a large proportion
> > | of the execution time simply because that routine happened to be the
> > | one that triggered a collection.
> > 
> > | On the other hand, costs of ordinary memory allocation should indeed be
> > | credited to the calling routine.
> > 
> > I'm not sure if this is the best way iof solving it, but it is an alternative
> > approach.  The VisualWorks Smalltalk profiling facilities provide both a time
> > profiler and an allocation profiler.  The time profiler doesn't include
> > explicit GC time.  Instead it is simply a statistical sampler of Smalltalk
> > methods, so GC time is hidden.
> 
> I'm not sure what you mean by that.
> Do you mean that
>       - if a profiling timer tick occurs during GC, that tick is discarded?
>       - if a profiling timer tick occurs during GC, that tick is credited to
>         the currently active Smalltalk method?
>         (This is what happens in the current Mercury profiler.)
>       - profiling ticks can never occur during GC (e.g. because the
>         profiler actually counts number of bytecode instructions executed,
>         rather than time)?
>       - something else?

Another data point, with some discussion:

The only time I've implemented a profiler was for MLWorks; my remarks
are a few years out of date but I don't imagine that this aspect has
changed much.

There was a time profiler and a heap profiler; they could be turned on
and off independently.  Neither required recompilation.  The time
profiler was statistical, and took stack histories to a configurable
depth.  The heap profiler measured all allocation and also survival
amounts to each generation.  The time overhead of the heap profiler
was about a factor of 2 on total execution time.  The GC was a
generational copying collector, triggered by the nursery filling, and
was not incremental; GC nursery pauses were mostly around 1ms.  GC
overhead for real programs varied between 5 and 25%, depending on the
code.  Allocation was exceedingly cheap.

If a time profiler tick happened during collection of generation N, it
was recorded as a GC tick for generation N and otherwise ignored (the
time profile report included the number of GCs and the number of GC
ticks for each generation).  There was no attempt to attribute this
time to the allocating functions, but if one saw a large number of GC
ticks in a time profile, one could easily take a heap profile and
approximate this by hand ("wow, we're doing loads of collection;
what's allocating?").

The reason we avoided doing this automatically is that we didn't have
a sound theory of how this time should be assigned.  For instance,
suppose the mutator consists of two functions, A and B.  A allocates a
large amount of very short-lived data.  B allocates a small amount of
very long-lived data.  All the GC time is spent copying B's data, but
the GC is triggered by A's allocation.  To which function should the
GC time be attributed?  It may be possible to reduce the GC time by
reducing:

(a) A's allocation rate (reducing the frequency of collection, and
therefore the number of times that the overhead of performing a
collection, e.g. root scanning, is incurred); or

(b) B's allocation rate (reducing the amount of data which survives
collection, and therefore the proportional part of the total GC time);
or

(c) the survival of B's allocated data (see (b)).

Better for the profiler to report the facts which it can readily
measure and allow the user to apply some intelligence than the
profiler to second-guess this kind of complexity.  I suppose the time
profiler could have done a little better by reporting ticks during the
to-space scan separately (distinguishing somewhat between (a) and
(b)).

Nick Barnes
Ravenbrook Limited