[gclist] Representation of pointer set

Andrew Shi-hwa Chen chenandr@cse.msu.edu
Thu, 06 Jul 2000 10:23:01 -0400


Thank you for your insightful post.

Can anyone direct me to a serious study that shows that "most" 
programs won't require a large pointer set? Ideally one that spans a 
broad range of languages and types of programs.

Actually, while I'm at it, the same goes for other things - that most 
reference counts are one, that most objects are small, that cycles 
are rare, and so on. I've only seen language specific treatment of 
these issues, and usually only on a handful of programs.

Ideally it would be nice to have a statistical clarification of what 
"most" means - it would be nice to be able to speak about the 
pathological cases as being so-many sigmas (standard deviations) away 
from the mean and stuff like that. This way would could aim for six 
sigma garbage collection and stuff like that.

Thanks,
Andrew Chen

> During the mark phase, a GC will typically keep a set of pointers to
> objects that have been marked but their children have not yet been
> marked.  In the worst case, this set can grow very large, but that seems
> unlikely.
> 
> Representing the set as a linked list seems inefficient.  An array seems
> like the right choice.
> 
> The literature is full of techniques for dealing with overflow of this
> set.  They seem annoyingly complicated to me.
> 
> My question is: Why is this necessary, on a machine with a large virtual
> address space?  Why not just allocate an enormous array in *virtual*
> memory (1/4 of the address space?)?  Then the array can't overflow.  In
> most cases, the array will use a reasonably small amount of *physical*
> memory, so it will be efficient.  If somebody creates an extremely long
> linked list, then the array might actually eat up large amounts of
> physical memory, and therefore cause slowness due to paging.  But it
> seems tolerable to have slowness in unlikely cases.  Pretty much any
> modern GC has *that* property -- show me an incremental or generational
> GC, and I can construct a pathological case that will kill its
> performance.
> 
> This all assumes a reasonable operating system, which commits physical
> memory pages only when needed.
> 
> - Bob