[gclist] Program behaviour

Marc Shapiro shapiro@prof.inria.fr
Wed, 2 Oct 1996 11:06:20 +0200


A student of mine, Cedric Adjih, has done an analysis of program behaviour.
He studied a number of pretty standard benchmark programs (most of them from
Zorn's studies) to look at allocation and garbage patterns, and possible
clustering strategies.  The method used is to log all allocations in a trace
file (containing the address and size of allocations) and to periodically
inspect the program's memory using a modified version of Boehm's collector,
which places a checkpoint of the object graph (including garbage objects) in
the log file.  An analysis program inspects the log post-mortem.

Some general conclusions:

 - Applications allocate sizable amounts of cycles, including cycles of
   garbage

 - Many programs have constantly increasing memory requirements

A detailed analysis of references yields the following information:

 - A pointer tends to point to an object allocated near the same time (if A
   points to B, then A and B have been allocated at nearby dates).

 - Some applications tend to have forward pointers (to objects allocated later
   in time), and others tend to have backward ones.

 - The length of a pointer (the difference between the address of object
   containing the pointer and the address of the object pointed to) tends to
   be close to 0.

 - In-degrees are very close to 1 on average.  Out-degrees are typically
   either 0 (the object is pointed directly from a root) or 1.

A more detailed analysis of cycles yields the following information:

 - Cycles are generally small in size (considering either the number of
   objects or the number of bytes in a cycle).

 - However many programs allocate a small number of large cycles (>1000
   objects).

 - Most cycles only contain objects allocated about the same time (allocated
   within ~150 allocations of each other); however Boehm's allocator breaks
   this locality by dispersing objects in memory (up to 500K bytes distance
   between objects in a cycle).

There are a few other results that I can't really explain.  For instance every
program seems to tend change allocation behaviour at or near the magic number
of 2**20 bytes.  I suspect this is linked to some internal constant of Boehm's
allocator or collector.