[gclist] gcfaq cleanup

Boehm, Hans hans_boehm@hp.com
Mon, 22 Jul 2002 10:24:32 -0700


David -

The FAQ currently doesn't say anything about incremental, concurrent (mutator and collector run in parallel), or parallel (multiple collectors in parallel) collection?  It would be nice to at least explain the terminology.  My impression is that parallel collectors are rapidly becoming the norm for Java, so they should certainly be mentioned.

My own bias is that concurrent collection tends to be a bit more problematic, because the scheduling issues are hard to handle in a way that's really robust.  The collection rate really needs to be adjusted to the allocation rate, and that's hard to do if the two are running in independent threads.  But concurrent algorithms can all be tweaked into incremental ones by making the collector state explicit and run the collection incrementally as part of the mutator, thus reducing the possibility of the collector falling behind.  And incremental collection is certainly interesting for a number of applications.

Having just submitted a paper on finalization, but trying to avoid the same old topics:

1) Are there really languages in which most objects are finalizable as claimed in the FAQ?  I can believe that there are Java programs that behave that way.  But most of their authors should be reeducated.  (It's conceivable to me that this might happen with a Java program that handles mostly wrapped small native objects.  But I have never seen such a thing.  And I've found an introductory Java book that recommended clearing pointers in finalizers.  Hopefully few people have followed that advice.)  Note that C++ destructors don't count here, since they're something completely different.

2) Perhaps the implications of guaranteeing finalization before porgram termination should be mentioned?  (These include not being able to invoker any function/method from a finalizer that might rely on an object with a finalizer.  Since the presence of a finalizer is not usually part of the interface spec, this basically means you can't realy on any other object except the one you're finalizing.  And that one's about to be reclaimed by the OS anyway.  And then you have the problems the Sun people discovered when they deprecated runFinalizersOnExit for a somewhat different reason.)  I think C# went down this route, again.  (You CAN get guaranteed cleanup with finalizers, and no other language features.  It's even fairly clean.  But it requires user "atexit" code to handle the ordering correctly.  RunFinalizersOnExit() is just plain wrong.)

3) Since I'm not convinced the current "problems" in the FAQ really are problems, here's one that I think is:

How do you define the earliest point at which an object may become unreachable?  If the last accesses to an object consists of repeatedly reading the same loop-invariant integer field a million times within the loop, can I load the field once before the loop, and have the object itself thus become unreachable early?  I claim that if the answer is "yes", then I need a mechanism for telling the compiler "no".  (Consider the case in which that integer is really a handle for an OS resource.)  I think the answer for Java was intended to be "no", but that wasn't clear to me from reading the spec, nor am I having much luck in reconciling this with the memory model parts of the spec.  Nor do I know how most implementations handle this.

Java (and C#) also have a related problem, in that if I have even a volatile reference from p to q, it is not guaranteed that q is reachable as long as p is.  (This is true only if q is actually accessed after p.  To be fair, I suspect other languages avoid this problem mostly by being less precise in specifying any of this.)  This technically breaks a number of common idioms involving finalization, including the only practical way I know to enforce ordering between finalizers.

I think this problem is solvable, but it needs to be addressed.

A succinct summary of this for the FAQ:

It is tricky to define object reachability in a way that preserves the utility of finalization by guarding against running the finalizer too early, but still allows the compiler to optimize by removing object references it knows will not be followed.

Hans


> -----Original Message-----
> From: David Chase [mailto:chase@world.std.com]
> Sent: Monday, July 22, 2002 8:56 AM
> To: gclist@lists.iecc.com
> Subject: [gclist] gcfaq cleanup
> 
> 
> Hello all,
> 
> I am on vacation, and feeling guilty for my lackluster maintenance
> and editing of the faq.  I won't be able to update the official copy
> until John Levine returns from vacation, but I can make the edits
> now.
> 
> In particular, I am behind on correcting addresses for contributors,
> and there have been some (published) advances in GC algorithms.
> Some that I know of are:
> 
> 1) concurrent card mark processing (someone at Sun does this)
> 
> 2) elision of redundant card marks between safe points (Sun and
>     NaturalBridge do this, at least).
> 
> 3) Bacon's (?) reference counting improvements (per-thread
>     cache of inc/dec operations, periodically processed)
> 
> 4) a way of implementing copy-on-write where the old contents
>     of a cell are stored in a per-thread buffer, are thus the
>     "copy" for purposes of a snapshot.  I don't know the reference,
>     but I think it was someone in France.
> 
> How much else is worthwhile?
> 
> David Chase
>