[gclist] Fast-allocating non-copying GC's

Geert Bosch geert@sun3.iaf.nl
Mon, 18 Mar 96 00:12:07 GMT


Dear ``garbage collectors'',

After some time of just reading this list, I think it's time to fire
my share of stupid questions. Currently I'm working on implementing
a GC-friendly allocator for Ada-95 (a language in the ALGOL family).
Below I'll give a basic outline of how I want to implement the
allocator. This posting is rather long, I'm sorry.
   If anybody would like to comment on this, I'd really 
welcome those comments.

Assumptions:
   - Changes to the compiler are hard or even impossible
   - Implementation will be 'proof of concept' of the
     used algorithms, so
   - Portability is not an issue: the allocator should
     work for a 32-bit computer with VM pages of 4 kB 
     running the GNAT Ada compiler
   - The algorithm should allow GC to proceed concurrently
     with the mutator, in the presence of OS support for
     efficient read/write barriers (using VM techniques)

The choices I had to make, with my arguments:

(1) Reference counting versus Mark and Sweep

    Reference counting only works for certain types of objects
    without requiring changes to the compiler. Besides that 
    handling cyclic garbage is too hard, so no reference-
    counting for me.

(2) Precise versus Conservative GC

    Since changing the compiler (GNAT) in a way that makes 
    precise GC possible is way too hard and time-consuming,
    I have chosen for a conservative GC.

    Consequences:
    
    (2.1) => There must be a way to quickly determine if a 
             value can possibly be a pointer.

    (2.2) => Given some pointer (which can possibly point into 
             an object) it must be possible to determine the 
             bounds of the object.
 
         (?) In Ada only a very small subset of the objects
             can contain pointers that point inside objects.
             The compiler knows about these cases. Would
             it be worth the effort to have the compiler 
             pass this info to the allocator at allocation 
             time?

(3) Copying/moving versus non-copying/moving

    Using conservative GC implies using a non-copying 
    and non-moving allocator. If this is not true, I'd 
    like to find a counter-example. 
    (There must be some exception ;-)

(4) Because of (2.1) and (2.2) I've chosen to have a 
    table in which I can lookup any possible pointer 
    and get its size and some other attributes which
    I'll describe in the next section.

    The table has one entry per VM-page of 4 kB, so
    the overhead for this table is 4 bytes per VM page 
    of 4 kB. (Sparse allocation is used for this table.
    For a typical application using 10 MB of memory, only
    12 or 16 kB RAM is needed for the table)

    (4.1) => A result of this is that objects in the
             same VM-page must have the same size.

Implementation
~~~~~~~~~~~~~~
Every object is associated with a storage pool. Every
storage pool consists of 1 or more VM pages. There are
different types of storage pools, which are used to
store different kinds of objects. 

Any storage pool is identified by its type and the 
number of the first VM-page it occupies. The table 
with page-info (described above) associates each 
VM-page with a storage pool.

For any type of storage pool there is an data-structure
which can be used to store meta-data in, like the start
of the free-list of objects of the type. By default
there is no per-pool meta-data, only per-type.

Main storage-pool
~~~~~~~~~~~~~~~~~
This storage pool requests storage from the operating 
system and manages the table with page-info. Allocations 
always consist of an integral number of VM-pages. This
simple storage-pool of course satisfies (4.1)

Small storage-pools
~~~~~~~~~~~~~~~~~~~
The main storage pool is not suitable for small objects.
Because object sizes are rounded up to the nearest integral
number of VM-pages, there is much internal fragmentation.

Objects which have more than 25% internal fragmentation when
directly allocated using the main storage pool are allocated
from small storage pools. Small storage-pool types are 
descendants of the main storage-pool type. Note that only
the pool types are represented by data-structures (Ada tagged
records). The only per pool data is the 32-bit of information
in the page-info table.

Pool-selection
~~~~~~~~~~~~~~
Objects larger than 3 * VM_Page_Size are always allocated
directly from the Main storage-pool. For smaller objects 
a table is used to determine the storage-pool type. The
actual allocation is done using a dispatching call to the
pool-type.

Consequences
~~~~~~~~~~~~
Positive
~~~~~~~~
+ Easy to define custom storage pools for
  storing pointerfree data, data which can
  be scanned precisely, data referenced
  by external pointers etc.
+ Objects can essentially be 'header-less'
+ A per object-size freelist can be used,
  for fast allocation
+ Implementation is easy

Negative
~~~~~~~~
- Locality of reference: different objects allocated
  at about the same time, might be in different storage pools.
- I can see really horrible fragmentation scenarios:
     (Imagine this on a machine with 8 MB of free 
      physical RAM for the application, and a 256 kB
      cache)

    1. Allocate 7 MB worth of objects
       with a size of 64 bytes.
    2. Free most of them, keeping objects nr N,
       where (N * Object_Size) mod VM_Page_Size = 0.
    3. Repeat (1) and (2) dividing the object size by 2 until
       it's 4 bytes
    4. Now we've 222208 bytes of living objects left, 
       taking up 35 MB of RAM
    5. Now the application starts accessing these remaining
       objects randomly.

Result: the system is trashing like ****, because the working
        set is 35 MB and available RAM is 8 MB, also all objects
        are aligned at 4096 byte boundaries which defeats some
        caching schemes.

        When using an optimal allocator, the result will be one
        block of 220 kB containing all objects and most of that
        will fit in the cache.
       
Solutions
~~~~~~~~~
I did not find any good alternative to my approach of grouping
objects by size, so I'll have to try reducing the worst fragmentation 
problems:

1.  It may be possible to convert a pool of a certain 
    type to another pool-type, if that type becomes more
    popular.

    For example, an 4 kB storage-pool containing 
    just 20 4-byte objects maybe converted to a 4 kB 
    storage-pool containing at most 20 8-byte objects.
    
    Also, a pool containing pointer-free data can easily
    be converted to a conservatively scanned pool if it
    is nearly empty.
    
2.  It may be possible to store objects in a pool of another
    type than the preferred pool-type.
    
3.  I might use less storage-pool types.
    
4.  Allocation strategy should be such that new objects should
    not be allocated in pages that might get released otherwise.

Paul R. Wilson writes:
`` FIFO reuse is a locality disaster if the interval of 
   reuse is bigger than the cache (or main memory, viewed 
   as a VM cache) size.  It systematically defeats LRUish 
   caching strategies because the *oldest* free memory
   (probably not touched for a long time) is reused first. ''
   
Allocating new objects on 'old' pages first would be good to 
help prevent fragmentation problems as described above. Old 
pages will live long anyways, so the gaps in it are essentially
``free''. A young page might die quickly when new objects are
allocated on old pages first.

The point is that I don't know how bad the fragmentation problems
will be in practise with an allocator that groups objects of the
same size. If somebody has information about the trade-offs between
the different kinds of fragmentation and locality of reference, I'd
like to know.

Thanks for reading, anyway!

   Geert Bosch