[gclist] Fast-allocating non-copying GC's

boehm.PARC@xerox.com boehm.PARC@xerox.com
Tue, 19 Mar 1996 10:58:18 PST


" It will be fairly easy to change to 8 kB
pages, but some tables might need to get changed and things like that.
In any case I won't be able to do any testing on other platforms than
the one I described."

Testing is of course an issue.  But otherwise I think this is less of a problem
than what you describe.  You can leave the logical page size fixed at 4K.  If
the hardware only supports an 8K page size, you should be able to treat a page
fault as 2 logical faults and process 2 of your pages at that point.  It's not
really acceptable to produce say Sun executables that only work on 4K pages
(though we've done it by accident on a few occasions).  There's a general
expectation that a Solaris executable will run on all Solaris machines.

"[You can avoid black listing and stack cleaning by creating 'pointer-free'
storage pools]"

I think we're talking about different things here.  If in fact you do any
significant amount of conservative scanning of somewhat permanent data areas
(e.g. static data for third-party libraries, the main program activation
record, etc.), it's likely you will see some integers that could eventually
point into the heap, as it grows.  It's desirable not to allocate large data
structures rooted at those addresses.  Similarly, on most RISC machines, dead
pointer values will be left on the stack, and not overwritten.  You would like
to clear those.  See ftp://parcftp.xerox.com/pub/gc/papers/pldi93.ps.Z for
details.

"What would be the reason for [worrying about main-pool allcation strategy]?
Actually I though of using first-fit. I wouldn't expect problems with
fragmentation of virtual memory, since unused VM pages can be released and the
total VM range is normally rather large."

In my experience, this is tricky, though possible.  Our collector does not
release unused VM pages, though I've considered it.  The reasons are:

1) It's expensive to release and remap a page.  At a minimum the kernel is
required to clear it.  There's likely to be appreciably more overhead than
that.

2) The mechanisms for unmapping and remapping a page are not completely
portable.  They're also less heavily excercised than brk and sbrk.  I've seen a
reasonable number of both performance and correctness problems in this area,
even for otherwise very stable and mature operating systems.  (I haven't
retested this recently.  But, as one example, mmapped memory under older
versions of SunOS4 appeared to have very different paging behavior from memory
obtained by other means.  We sometimes saw large slowdowns with mmapped memory
as a result.  As another example, I believe general mmap isn't very efficiently
implementable on the original RS/6000 MMU.  I think it took IBM years to get
any sort of credible implementation.)

"On the other hand, I don't see any drawbacks to best-fit in this
case and it might prevent problems with applications allocating
many large memory blocks."

Agreed.

Re: running out of memory.

The usual problem with implementations that ignore the problem is that they
almost always run out of memory in the middle of a garbage collection, e.g.
while marking.  At that point your world is in an inconsistant state
(especially if the marker is recursive), and it may be hard to get it
consistent enough to run a client exception handler.  (This is probably even
more true for a copying collector.) I agree that it's often hard to handle this
100% correctly even without GC.  But 99% solutions often seem to fly in
practice, where 10% solutions get people upset.  And under many UNIX-like
operating systems, 99% solutions are feasible.  I think a production system
needs to address this issue.

Re: snapshot at beginning algorithm.

That's probably not the easiest algorithm to implement.  It also has potential
problems with space growth during GC, and requires fairly expensive fault
handling.  It should have lower latency than the incremental update algorithm
we use, however.  It would be interesting to compare performance.  If you
haven't seen it, you might also want to look at the paper entitled
"Complementary Garbage Collection", by Matsui et al, in the IWMM 95
Proceedings.

You seem to assume that incremental collection is aonly important in
multithreaded environments.  I view those as almost completely orthogonal.

Re: detecting empty pools.

In the GC case you can often make that cheap by allocating all the mark bits
together.  Thus you just have to check whether a small region of memory is
completely zero.  This seems to be a bit harder in the absence of GC.

Hans