[gclist] GC & profiling

Fergus Henderson fjh@cs.mu.oz.au
Thu, 19 Apr 2001 16:17:34 +1000

For accurate profiling of execution time, GC can cause some problems,
because GCs can occur at semi-random times, but the blame for time spent
in a GC should really be spread among all the routines that allocated
the memory, not on the routine that happened to do the final memory
allocation which triggered that GC.  If you don't treat GC specially,
then some innocuous routine can be unfairly blamed for a large proportion
of the execution time simply because that routine happened to be the
one that triggered a collection.

On the other hand, costs of ordinary memory allocation should indeed be
credited to the calling routine.

The remainder of this mail discusses some particular details of this
problem with regard to the Mercury profiler and the Boehm (et al)
collector.  I'm hoping, perhaps optimistically ;-), that someone will
find this stuff interesting enough to read it and give some feedback on
which of the different approaches is the best way of solving this problem.


For the Mercury implementation we use the Boehm collector,
and we have our own profiler, which produces output similar to gprof.
The Mercury profiler works by setting a global variable
at each procedure entry/exit point to record what the currently
active procedure is, and using the standard profiling interrupt
technique (setitimer()) to sample the values of this variable
during program execution.  (Since we compile to C, using a global
variable like this is a lot easier than trying to map the PC values
back to Mercury procedures.)  We also record information
about the number of times each procedure is called from each caller,
so that we can build a call graph profile.

So, one possible way of handling this GC issue is for the collector to set
a flag when it starts doing GC, and to clear the flag when it finishes
doing GC, and for the profiling timer interrupt handler to check that
flag and record the tick in a different counter if the flag is set.
The profiler can then report times spent in GC separately and can use
a different algorithm for propagating such times up the call graph.

I had a long look at the Boehm GC sources, trying to decide how and where
to set/unset this flag.  But it turned out to be a bit more complicated
than I had first anticipated.  There are a lot of entry points.
And deciding exactly when you're "in" the garbage collector is a
bit difficult, e.g. do we have to worry about finalization functions,
and if so, should they be counted as part of the collection?  It is also
complicated a lot by the possibility of incremental collection.

I tried several different approaches, and wasn't sure which one was
best.  Enclosed below is a description of the different alternatives
I've considered.  Comments?


There's macros ENTER_GC() and EXIT_GC(), which [in some configurations]
set/unset a volatile global variable GC_collecting -- exactly what
we want.  Changing things so that they do that unconditionally would be
easy enough (see diff 2).  Unfortunately, the purpose of GC_collecting
is slightly different -- it is documented as "A hint that we're in the
collector and holding the allocation lock for an extended period.", and
ENTER_GC() doesn't seem to be called in all the places that we want it
to be.  So I guess that is not an ideal approach.

There's also a function GC_collection_in_progress() that looked a bit
more promising.  This looks at the GC_mark_state variable, which is set
at various stages of the marking process.  The comments for that function
mention that it does the wrong thing if a collection has been abandoned,

 | /* Is a collection in progress?  Note that this can return true in */
 | /* the nonincremental case, if a collection has been abandoned and */
 | /* the mark state is now MS_INVALID.				      */
 | GC_bool GC_collection_in_progress(void);

but I think that can only happen if you explicitly call GC_try_to_collect(),
which currently we don't do.  So it would be sufficient to just modify the
documentation of GC_try_to_collect() to mention that using that function will
break the Mercury profiler (see diff 2).

To use this function, the profiler would contain code like so:

	#include "gc.h" /* for GC_bool */
	extern GC_bool GC_collection_in_progress(void);
	handle_profiling_interrupt() {
		if (GC_collection_in_progress()) {
			... credit this timer tick to GC ...
		} else {
			... credit this timer tick to the application ...

On the other hand, GC_collection_in_progress() still doesn't count 100% of the
time spent in the collector, e.g. it misses out on calls to finalization
routines, and I think also on some other parts of the collection.

There's also a callback function GC_start_call_back() which looked quite
promising.  However, that is only called at the start of full collections;
it doesn't get called for partial collections, which can happen either when
doing incremental collection or when explicitly calling GC_collect_a_little().
And there's no corresponding GC_finish_call_back().
On the other hand, adding one is not hard (see diff 1).
I could also add a GC_start_partial_call_back() function,
which would be called at the start of partial collections.

diff 1: add GC_finish_call_back

	Add declaration/definition for GC_finish_call_back().

	Call GC_finish_call_back() from all the appropriate points.

cvs diff: Diffing .
Index: alloc.c
RCS file: /home/mercury1/repository/mercury/boehm_gc/alloc.c,v
retrieving revision 1.10
diff -u -d -u -r1.10 alloc.c
--- alloc.c	2000/09/18 11:08:49	1.10
+++ alloc.c	2001/03/09 10:22:36
@@ -225,6 +225,13 @@
+void GC_notify_gc_finish()
+    if (GC_finish_call_back != (void (*)())0) {
+	(*GC_finish_call_back)();
+    }
 GC_bool GC_is_full_gc = FALSE;
@@ -242,6 +249,7 @@
             n_partial_gcs = 0;
+            GC_notify_gc_finished();
         } else if (GC_need_full_gc || n_partial_gcs >= GC_full_freq) {
 #   	    ifdef PRINTSTATS
@@ -274,6 +282,7 @@
+	GC_notify_gc_finished();
@@ -631,6 +640,7 @@
+    GC_notify_gc_finished();
 word GC_n_heap_sects = 0;	/* Number of sections currently in heap. */
@@ -826,6 +836,7 @@
 	(GC_dont_expand && GC_words_allocd > 0 || GC_should_collect())) {
+      GC_notify_gc_finished();
     } else {
       word blocks_to_get = GC_heapsize/(HBLKSIZE*GC_free_space_divisor)
       			   + needed_blocks;
@@ -851,6 +862,7 @@
       	    WARN("Out of Memory!  Trying to continue ...\n", 0);
+	    GC_notify_gc_finished();
 	} else {
 	    WARN("Out of Memory!  Returning NIL!\n", 0);
Index: gc_priv.h
RCS file: /home/mercury1/repository/mercury/boehm_gc/gc_priv.h,v
retrieving revision 1.12
diff -u -d -u -r1.12 gc_priv.h
--- gc_priv.h	2000/09/18 11:08:51	1.12
+++ gc_priv.h	2001/03/09 10:26:26
@@ -1539,6 +1548,8 @@
 			/* Not called if 0.  Called with allocation 	*/
 			/* lock held.					*/
 			/* 0 by default.				*/
+extern void (*GC_finish_call_back)(/* void */);
+			/* Called at end of full or partial collections.*/
 void GC_push_regs();	/* Push register contents onto mark stack.	*/
 			/* If NURSERY is defined, the default push	*/
 			/* action can be overridden with GC_push_proc	*/
Index: misc.c
RCS file: /home/mercury1/repository/mercury/boehm_gc/misc.c,v
retrieving revision 1.11
diff -u -d -u -r1.11 misc.c
--- misc.c	2000/09/18 11:08:53	1.11
+++ misc.c	2001/03/09 10:33:12
@@ -70,6 +70,7 @@
 void (*GC_check_heap)() = (void (*)())0;
 void (*GC_start_call_back)() = (void (*)())0;
+void (*GC_finish_call_back)() = (void (*)())0;
 ptr_t GC_stackbottom = 0;
diff 2: use ENTER_GC/EXIT_GC

	If PROFILE_TIME is defined, define ENTER_GC()
	and EXIT_GC() to always set/unset GC_collecting.

Index: gc_priv.h
RCS file: /home/mercury1/repository/mercury/boehm_gc/gc_priv.h,v
retrieving revision 1.12
diff -u -d -u -r1.12 gc_priv.h
--- gc_priv.h	2000/09/18 11:08:51	1.12
+++ gc_priv.h	2001/03/09 10:26:26
@@ -667,8 +667,17 @@
 		/* so it doesn't matter if we lie.			*/
 # endif
 # ifndef ENTER_GC
-#   define ENTER_GC()
-#   define EXIT_GC()
+#   ifdef PROFILE_TIME
+      extern VOLATILE GC_bool GC_collecting;
+#     define ENTER_GC() \
+		{ \
+		    GC_collecting = 1; \
+		}
+#     define EXIT_GC() GC_collecting = 0;
+#   else
+#     define ENTER_GC()
+#     define EXIT_GC()
+#   endif
 # endif
 # ifndef DCL_LOCK_STATE

diff 3: use GC_collection_in_progress and document issues for GC_try_to_collect

	Document that GC_try_to_collect() causes problems for
	time profiling, since Mercury's time profiling relies
	(XXX or should rely) on the GC_collection_in_progress()
	function, which does the wrong thing if GC_try_to_collect()
	is called and the collection is aborted.

Index: gc.h
RCS file: /home/mercury1/repository/mercury/boehm_gc/gc.h,v
retrieving revision 1.29
diff -u -d -u -r1.29 gc.h
--- gc.h	2000/10/04 04:51:02	1.29
+++ gc.h	2001/03/09 10:57:16
@@ -322,6 +322,9 @@
 /* aborted collections do no useful work; the next collection needs	*/
 /* to start from the beginning.						*/
 /* Return 0 if the collection was aborted, 1 if it succeeded.		*/
+/* WARNING: calling this function may break Mercury's time profiling;   */
+/* after an aborted collection, the profiler may think that the         */
+/* garbage collection is still running.                                 */
 typedef int (* GC_stop_func) GC_PROTO((void));
 GC_API int GC_try_to_collect GC_PROTO((GC_stop_func stop_func));

Fergus Henderson <fjh@cs.mu.oz.au>  |  "I have always known that the pursuit
                                    |  of excellence is a lethal habit"
WWW: <http://www.cs.mu.oz.au/~fjh>  |     -- the last words of T. S. Garp.