[gclist] Fast-allocating non-copying GC's
Geert Bosch
geert@sun3.iaf.nl
Mon, 18 Mar 96 00:12:07 GMT
Dear ``garbage collectors'',
After some time of just reading this list, I think it's time to fire
my share of stupid questions. Currently I'm working on implementing
a GC-friendly allocator for Ada-95 (a language in the ALGOL family).
Below I'll give a basic outline of how I want to implement the
allocator. This posting is rather long, I'm sorry.
If anybody would like to comment on this, I'd really
welcome those comments.
Assumptions:
- Changes to the compiler are hard or even impossible
- Implementation will be 'proof of concept' of the
used algorithms, so
- Portability is not an issue: the allocator should
work for a 32-bit computer with VM pages of 4 kB
running the GNAT Ada compiler
- The algorithm should allow GC to proceed concurrently
with the mutator, in the presence of OS support for
efficient read/write barriers (using VM techniques)
The choices I had to make, with my arguments:
(1) Reference counting versus Mark and Sweep
Reference counting only works for certain types of objects
without requiring changes to the compiler. Besides that
handling cyclic garbage is too hard, so no reference-
counting for me.
(2) Precise versus Conservative GC
Since changing the compiler (GNAT) in a way that makes
precise GC possible is way too hard and time-consuming,
I have chosen for a conservative GC.
Consequences:
(2.1) => There must be a way to quickly determine if a
value can possibly be a pointer.
(2.2) => Given some pointer (which can possibly point into
an object) it must be possible to determine the
bounds of the object.
(?) In Ada only a very small subset of the objects
can contain pointers that point inside objects.
The compiler knows about these cases. Would
it be worth the effort to have the compiler
pass this info to the allocator at allocation
time?
(3) Copying/moving versus non-copying/moving
Using conservative GC implies using a non-copying
and non-moving allocator. If this is not true, I'd
like to find a counter-example.
(There must be some exception ;-)
(4) Because of (2.1) and (2.2) I've chosen to have a
table in which I can lookup any possible pointer
and get its size and some other attributes which
I'll describe in the next section.
The table has one entry per VM-page of 4 kB, so
the overhead for this table is 4 bytes per VM page
of 4 kB. (Sparse allocation is used for this table.
For a typical application using 10 MB of memory, only
12 or 16 kB RAM is needed for the table)
(4.1) => A result of this is that objects in the
same VM-page must have the same size.
Implementation
~~~~~~~~~~~~~~
Every object is associated with a storage pool. Every
storage pool consists of 1 or more VM pages. There are
different types of storage pools, which are used to
store different kinds of objects.
Any storage pool is identified by its type and the
number of the first VM-page it occupies. The table
with page-info (described above) associates each
VM-page with a storage pool.
For any type of storage pool there is an data-structure
which can be used to store meta-data in, like the start
of the free-list of objects of the type. By default
there is no per-pool meta-data, only per-type.
Main storage-pool
~~~~~~~~~~~~~~~~~
This storage pool requests storage from the operating
system and manages the table with page-info. Allocations
always consist of an integral number of VM-pages. This
simple storage-pool of course satisfies (4.1)
Small storage-pools
~~~~~~~~~~~~~~~~~~~
The main storage pool is not suitable for small objects.
Because object sizes are rounded up to the nearest integral
number of VM-pages, there is much internal fragmentation.
Objects which have more than 25% internal fragmentation when
directly allocated using the main storage pool are allocated
from small storage pools. Small storage-pool types are
descendants of the main storage-pool type. Note that only
the pool types are represented by data-structures (Ada tagged
records). The only per pool data is the 32-bit of information
in the page-info table.
Pool-selection
~~~~~~~~~~~~~~
Objects larger than 3 * VM_Page_Size are always allocated
directly from the Main storage-pool. For smaller objects
a table is used to determine the storage-pool type. The
actual allocation is done using a dispatching call to the
pool-type.
Consequences
~~~~~~~~~~~~
Positive
~~~~~~~~
+ Easy to define custom storage pools for
storing pointerfree data, data which can
be scanned precisely, data referenced
by external pointers etc.
+ Objects can essentially be 'header-less'
+ A per object-size freelist can be used,
for fast allocation
+ Implementation is easy
Negative
~~~~~~~~
- Locality of reference: different objects allocated
at about the same time, might be in different storage pools.
- I can see really horrible fragmentation scenarios:
(Imagine this on a machine with 8 MB of free
physical RAM for the application, and a 256 kB
cache)
1. Allocate 7 MB worth of objects
with a size of 64 bytes.
2. Free most of them, keeping objects nr N,
where (N * Object_Size) mod VM_Page_Size = 0.
3. Repeat (1) and (2) dividing the object size by 2 until
it's 4 bytes
4. Now we've 222208 bytes of living objects left,
taking up 35 MB of RAM
5. Now the application starts accessing these remaining
objects randomly.
Result: the system is trashing like ****, because the working
set is 35 MB and available RAM is 8 MB, also all objects
are aligned at 4096 byte boundaries which defeats some
caching schemes.
When using an optimal allocator, the result will be one
block of 220 kB containing all objects and most of that
will fit in the cache.
Solutions
~~~~~~~~~
I did not find any good alternative to my approach of grouping
objects by size, so I'll have to try reducing the worst fragmentation
problems:
1. It may be possible to convert a pool of a certain
type to another pool-type, if that type becomes more
popular.
For example, an 4 kB storage-pool containing
just 20 4-byte objects maybe converted to a 4 kB
storage-pool containing at most 20 8-byte objects.
Also, a pool containing pointer-free data can easily
be converted to a conservatively scanned pool if it
is nearly empty.
2. It may be possible to store objects in a pool of another
type than the preferred pool-type.
3. I might use less storage-pool types.
4. Allocation strategy should be such that new objects should
not be allocated in pages that might get released otherwise.
Paul R. Wilson writes:
`` FIFO reuse is a locality disaster if the interval of
reuse is bigger than the cache (or main memory, viewed
as a VM cache) size. It systematically defeats LRUish
caching strategies because the *oldest* free memory
(probably not touched for a long time) is reused first. ''
Allocating new objects on 'old' pages first would be good to
help prevent fragmentation problems as described above. Old
pages will live long anyways, so the gaps in it are essentially
``free''. A young page might die quickly when new objects are
allocated on old pages first.
The point is that I don't know how bad the fragmentation problems
will be in practise with an allocator that groups objects of the
same size. If somebody has information about the trade-offs between
the different kinds of fragmentation and locality of reference, I'd
like to know.
Thanks for reading, anyway!
Geert Bosch