[gclist] Advice requested

Hans Boehm boehm@hoh.engr.sgi.com
Tue, 27 Apr 1999 10:39:50 -0700

On Apr 27,  2:47pm, Stephen Biggs wrote:
> We've come to considering a Mark-Sweep collector, now, and it has occured to
> me (at least) that this approach may not be workable. Why? Well:
I'm not sure I understand all the constraints correctly.  I'll give my reaction
to some of them, and perhaps you can clarify:

> o Our product works on several platforms (Windows plus Unix flavours,
> largely common-coded)
That makes life harder, but it's not a fundamental problem.
> o COBOL allows object pointers to be unaligned
Could you clarify?  I assume you mean that on a 32-bit architecture, pointers
may start at a byte address that is not a multiple of 4?  Why?  Presumably you
have control over object layout.  All modern architectures of which I'm aware
impose substantial penalties on misaligned pointers.  You can handle this in a
garbage collector, but it's a large performance drain all the way around.
> o We have to support multi-threading AND:
Not a fundamental problem, though it complicates the implementation.  It does
make garbage collection much more important, since it make it harder to manage
memory  explicitly.
> o Existing procedural programs that have some OO extensions
> o Existing OO programs that have no knowledge of garbage collection
If you can't recompile, that may mean that you need at least a partially
conservative collector.  Otherwise, it shouldn't be much of a constraint unless
existing programs count on being able to deallocate reachable objects (unlikely
for Cobol programs, I would expect).
> o Programs that communicate with other languages either within process or
> (using COM or CORBA) outside the (easily) garbage collected environment.
I suspect this isn't a terrible constraint either, but I don't know enough
about COM or CORBA to verify that.  It is common to build distributed memory
management on top of fairly conventional local collectors.
> Is a Mark-Sweep garbage collector, given the restrictions above, even
> possible? Can anyone point me to a successful language or product that does
> such a thing?
Sounds feasible, though nontrivial.
> How expensive, generally, is true "conservative" garbage collection under
> these conditions?
If you're comparing to manual reclamation or reference counting, it depends on
the average object size.  For sufficently small objects, it will probably
perform at least as well.

If you're comparing to a collector that copies young objects, a conservative
mark/sweep collector can be appreciably more expensive for short-lived objects.

> I'm also aware that garbage collection algorithms exist for some that have
> nearly all the restrictions. C/C++ is the nearest analogous language I can
> see for OO COBOL, but the algorithms that I've seen for that don't include
> binding to other languages or distributed garbage collection. Do any exist?
With a conservative collector, mixed languages are typically not much of an
issue.  Many such systems exist.

There are others on this mailing list who have more experience with distributed
GC issues.
> Also, the only C++ garbage collectors I've seen don't cope well with a
> multi-threaded environment and aren't guaranteed to be 100% correct.
I would claim that ours copes reasonably with some multithreaded environments.
There are two possible kinds of correctness issues (aside from GC
implementation correctness) to worry about:

1) C/C++ may or may not include some constructs that make even conservative GC
hard.  (The legality of these generally inspires debate among language
lawyers.) This seems to be a nonissue in practice.  Certainly I can't believe
that this would affect Cobol.

2) Compilers may optimize in ways that make it impossible to garbage collect
the resulting code.  This is very rarely a problem for C/C++, but it can be an
issue with suficiently aggressive and uncooperative compilers.  Sinc you have
control over the compiler, I don't see why it's an issue.

There are also potential issues with extra memory retention.  But they're much
less serious than reference count cycle problems.  And I suspect you have a lot
of type information that you can communicate to the collector, so you only need
a mildly conservative collector anyway (unless you link in C code).
> I'm
> told by our Unix programmers here that thread suspension, for a start, is
> out (so for Unix the option would be to ask the programmer to call garbage
> collection methods manually, somehow!).
Our collector does suspend threads on a number of Unix platforms (Irix,
Solaris, Linux).  It can generally be done, but not portably.
> My Own Opinion:
> Is that, if garbage collection is required, reference counting is the best
> bet, given our restrictions (and the fact that we have control over our
> compiler!).
Reference counting is rather expensive on multithreaded platforms, since you
need reference count updates need to be atomic.  It slows done nonallocating
code  substantially.  Leaked cycles are also a serious problem.  I would
generally recommend against it unless average object sizes are very large (e.g.
file systems) or possibly if you're dealing with embedded applications.
 Neither sounds likely for your environment.
> Second would be to not implement any collector at all.
A good strategy for some applications.  Probably not for long running ones.
> Third
> would be some form of Mark-Sweep collector, which would imply severe
> restrictions on the use of OO COBOL.
How so?

I have a web site at http://reality.sgi.com/boehm/gc.html that points off to a
number of other references.  There is also a collector there that you might be
able to plug in, at least for some experiments.

Hans-J. Boehm