[gclist] gc interface

Geert Bosch geert@sun3.iaf.nl
Wed, 03 Apr 96 01:17:06 GMT


On Tue, 2 Apr 1996 11:08:29 PST you wrote:

>Could you clarify how you handle cycles between storage pools?  

There's a dispatching Mark procedure, that takes an address as
parameter, looks up the corresponding storage pool and invokes
the Mark method of that storage pool. It's up to that storage
pool to decide how to implement the marking. The storage pool
can recursively mark all referenced pointers if the object was
not already marked, or it can just set the mark bit or do 
something else.

When the mark phase starts, first all storage pools are notified
of that (so they can do some bookkeeping if necessary) and then
all pointers in the root set are being marked. Then all storage
pool receive a scan message, so they can do additional marking
or, for a distributed storage pool for example, take into account external
pointers. Or, when they did not recursively process
the mark messages, they can complete that. 

>Can you guarantee mark time linear in the size of the heap?

Yes, the time to mark a new object is constant, although in the 
purely dispatching model as implemented now, the constant is
much higher than when using a single storage pool. My guess is
that in practise most pointers will point from a ``small object
pool'' to another ``small object pool'' (most likely the same),
so optimizing those cases where dispatching is not necessary
might improve performance a lot.
  
>It seems to me that it's hard to avoid traversing the reference 
>counted storage pool repeatedly, unless you also maintain mark 
>bits for reference counted objects.

I do not have implemented reference counted storage pools yet,
but I think it's possible to do. 

Of course it is not possible to allow programs to safely 
take the address of reference-counted objects: that would
not be detectable and when the reference count would be 0
the object would be removed while there where still pointers
to it.

So the reference counted objects will have ``pointers'' which 
are of something called a controlled type in Ada-terminology. 
This means that any object of the type has three procedures 
Initialize, Adjust and Finalize. Initialize and Finalize are 
invoked on instantiation and descruction of the object, and 
Adjust after the object has been overwritten by an assignment.

This is enough to do precise reference counting, even if 
``pointers'' are stored outside a reference-counted pool.
Having mark bits for reference-counted objects would still
be a good idea, to collect cyclic garbage. When an object 
in the reference-counted pool was not reached by a mark 
message, it is cyclic garbage. 

>Minor point: It sounds like you need to access the 
>global pool's type tags every time you mark an object, 
>and the tags are at a fixed offset within a page.  
>That's likely to significantly increase the number of 
>cache misses [...] (I think I can claim we were among
>the first to make this mistake.)

There were several other reasons why I didn't like to do that.
Most importantly, during the mark phase the type tags are
accessed frequently and reading one type tag of a few bits
will mean that the entire VM page of 4 kB or even bigger
needs to be in memory. This could easily cause trashing.
Another reason is that I don't want to waste a few bits 
in each VM page. It would be hard to allocate large objects
in that case and in 4 KB you can put 4 1024 byte objects, but
only in 4 kB - 4 byte, only 3 1024 byte objects fit.

So, there is a static table with 32-bits type info per 
VM page. Of these 32 bits, 6 bits are for the pool-ID and
26 bits can be used by the pool itself (to point to some
meta data for example).

Although the table might be quite large (4MB) for 4 GB 
of memory this is not such a problem: the table is 
sparsely allocated. On writes the VM page is committed 
and initialized if necessary and on reads a default value 
is returned.

For systems that don't support the necessary VM tricks,
there is also an implementation that uses an extra bitmap
of 1 kB that has bits set for pages of the page-attribute 
table which have been committed instead of relying on 
traps on writes.

Because it's late already, it is possible that my English
writing style is even worse than during the day. ;-)