[gclist] Representation of pointer set
Thu, 06 Jul 2000 09:46:57 -0500
At 10:23 AM 7/6/00 -0400, you wrote:
>Thank you for your insightful post.
>Can anyone direct me to a serious study that shows that "most"
>programs won't require a large pointer set? Ideally one that spans a
>broad range of languages and types of programs.
What he is saying is that if you keep a mark stack it won't grow very deep.
Since most collectors keep a mark stack and don't crash there should be
easy statistics on that.
>Actually, while I'm at it, the same goes for other things - that most
>reference counts are one, that most objects are small, that cycles
>are rare, and so on. I've only seen language specific treatment of
>these issues, and usually only on a handful of programs.
How common cycles are depends on the data structures represented. Double
linked lists are common except in pure functional languages where you can't
>Ideally it would be nice to have a statistical clarification of what
>"most" means - it would be nice to be able to speak about the
>pathological cases as being so-many sigmas (standard deviations) away
>from the mean and stuff like that. This way would could aim for six
>sigma garbage collection and stuff like that.
>> During the mark phase, a GC will typically keep a set of pointers to
>> objects that have been marked but their children have not yet been
>> marked. In the worst case, this set can grow very large, but that seems
>> Representing the set as a linked list seems inefficient. An array seems
>> like the right choice.
>> The literature is full of techniques for dealing with overflow of this
>> set. They seem annoyingly complicated to me.
>> My question is: Why is this necessary, on a machine with a large virtual
>> address space? Why not just allocate an enormous array in *virtual*
>> memory (1/4 of the address space?)? Then the array can't overflow. In
>> most cases, the array will use a reasonably small amount of *physical*
>> memory, so it will be efficient. If somebody creates an extremely long
>> linked list, then the array might actually eat up large amounts of
>> physical memory, and therefore cause slowness due to paging. But it
>> seems tolerable to have slowness in unlikely cases. Pretty much any
>> modern GC has *that* property -- show me an incremental or generational
>> GC, and I can construct a pathological case that will kill its
>> This all assumes a reasonable operating system, which commits physical
>> memory pages only when needed.
>> - Bob