[gclist] gc interface

Geert Bosch geert@sun3.iaf.nl
Tue, 02 Apr 96 11:29:09 GMT


Bob Duff wrote:
`` Ideally, one would have a well-defined interface 
   between gc, mutator, and compiler.  And one could 
   plug in different gc algorithms, different allocators, 
   etc.  And everything would be plug compatible.  Sounds
   impossible, since gc is inherently a global issue. ''

Francois-Rene Rideau wrote:
`` It *is* possible, but not given current 
   state of language technology. ''

Both of these statements might be a little too strong. With
just a few limitations it is possible to make a simple framework
in which arbitrary garbage-collecting allocators can be 
integrated. For the allocator that I wrote for Ada-95 I used
the following structure:


                    ___________________
                   /                   \
System            /       Global        \
Specific          \        Pool         /
                   \___________________/  
                     /       |           \
 - - - - - - - - - -/- - - - | - - - - - -\- - - - -
                   /         |             \
             ________   ___________    _____________
   User     /Ref.Cnt.\ / 8-byte obj\  / Pointer-free\
Defined     | Pool   | |  Pool     |  | object pool |
            \________/ \___________/  \_____________/
                \            |            /
                 \           |           / 
                  \          |          /
                  _\_________|_________/__
                 |                        |
                 | A p p l i c a t i o n  |
                 |________________________|



1. Global pool

- There is one global pool of memory.
- This global pool only allows allocation of integer multiples
  of the VM page size (eg. 4 kB), aligned on a page-boundary
- All memory allocated from this pool has a tag denoting it's type,
  and a few extra bits that contain type-dependent information
- Because of this limitation the global-pool can be really simple
  and straigforward.
- By default the global pool conservatively scans all allocated 
  memory for pointers and marks all destination objects


2. User-defined allocators 

- use the global pool for allocating the memory which the user-defined
  allocator will manage, either explicitly or implicitly using `new'
- The allocator defines it's own Initialize_GC, Finalize_GC, 
  Scan, Mark and Sweep methods. 
- Set the type-tag in the global pool, so the user-defined allocator
  will be responsible for its own memory.

When a user-defined reference-counting storage-pool is forced to
scan itself, it only has to scan for external references. If these
are known to be absent (because of type-information, for example)
the Scan can just be an empty method, or it can scan for cyclic
garbage. 

When references from other `un-cooperative' storage-pools are
allowed, call's to Mark (between Initialize_GC and Finalize_GC) 
can be used to update an `external-refence-count' field.

The downside of this inter-operability of different storage-pools
is that calls to 'Mark' become expensive dispatching calls. For
references within the same (type of) pool, the pool's Scan method
could use short-cuts of course.

In practise I think these effects can be optimized away for most
part. If all storage pools for small objects share the same code
(with just another Object_Size at initalization) and 95% of all
pointers point from a small object pool to a small object pool, 
only the remaining 5% would have to use a dispatching call. For
the large part of pointers one highly optimized loop should be
used.

If these assumptions are true in practise (I think/hope they are),
performance should not be really bad. 

3. Changes to compiler

These can be minimal. In principle, only the run-time library
has to be changed. Replacing 'malloc' and 'free' by functions
that displatch to the various registered pool-types based on
requested object size, would be enough: the allocated storage
will be managed by the global pool and by default the storage
will end up in a worst-case (conservative) pool.

One step more advanced is to have the compiler directly make
a call to the appropriate storage-pool based on the object type.
Note that Ada-95 already has a notion of user-defined storage-pools, 
so that the user has control over which pool to use for which type 
of objects.



Greetings,
   Geert