[gclist] bugs in Boehm collector

Fergus Henderson fjh@cs.mu.oz.au
Thu, 29 May 1997 03:53:42 +1000 (EST)


Hi,

[This mail is probably only of interest only to those using the
Boehm collector.]

The program below demonstrates some bugs in the Boehm collector.
I constructed it after observing the same symptoms in a much larger program.

	#include <stdlib.h>
	#include <stdio.h>
	#include "gc.h"

	#define DATA_SIZE 5000000	/* 20M */
	#define RAND_SIZE 1000000	/* 4M */

	int data[DATA_SIZE];

	int main() {
		int i;
		for (i = 0; i < RAND_SIZE; i++) {
			data[i] = rand();
		}
		while(GC_malloc(42))
			{}
	}

The program has zero bytes of live data, with a 20M root set, 4M of which is
initialized to random data.  It repeatedly allocates garbage.
When the program is run on a Linux x86 box, it runs for 30 seconds or so
and then dies with the message

	Too many heap sections: Increase MAXHINCR or MAX_HEAP_SECTS
	IOT trap/Abort

Problem #1 is that the heuristic for deciding when to invoke a
collection fails. GC_malloc() continues to allocate memory from the OS
instead of actually doing garbage collection to recover memory. 
Problem #2 is that even when it runs out of heap sections, it gives up,
rather than trying to do a garbage collection.

The reason for problem #1 is a little bit complicated.
The heuristic is to invoke a collection after allocating
min_alloc = (heap_size + root_size) / free_space_divisor bytes
of memory, where free_space_divisor is 4 by default.
The 4M of random data causes some pages to be blacklisted,
and then to reduce time spent scanning the free lists, some
of these blacklisted pages are just dropped.
The bug is that these dropped pages are not counted towards
the space allocated for the purposes of deciding when to
schedule the next collection.  If too many pages are black-listed,
then the heap size increases, which causes min_alloc to increase,
which causes the heap size to increase... eventually it blows up.

Problem #3 is that the bug still occurs even if the random data is
in the .rodata (initialized constants) section rather than the .data
or .bss sections.  The collector ought not to put the .rodata section
in the root set.  Unfortunately, there is no elegant solution to this one,
because i386 Linux doesn't define a `__data_start' symbol.  However,
the same kludge that is used for NS32K, namely using `environ',
will work here.

Problem #4 is that the debugging output says that there are zero
bytes of blacklisted heap memory, which is false.

Appended below are patches that address problems #1, #3, and #4.

Index: allchblk.c
===================================================================
RCS file: /home/staff/zs/imp/mercury/boehm_gc/allchblk.c,v
retrieving revision 1.7
diff -u -r1.7 allchblk.c
--- allchblk.c	1996/12/06 11:48:16	1.7
+++ allchblk.c	1997/05/28 14:00:37
@@ -225,6 +225,8 @@
 	              struct hblk * limit = hbp + (hhdr->hb_sz/HBLKSIZE);
 	              struct hblk * h;
 	              
+		      GC_words_wasted += hhdr->hb_sz;
+	              
 	              phdr -> hb_next = hhdr -> hb_next;
 	              for (h = hbp; h < limit; h++) {
 	                if (h == hbp || GC_install_header(h)) {
Index: gc_priv.h
===================================================================
RCS file: /home/staff/zs/imp/mercury/boehm_gc/gc_priv.h,v
retrieving revision 1.5
diff -u -r1.5 gc_priv.h
--- gc_priv.h	1996/12/06 11:48:23	1.5
+++ gc_priv.h	1997/05/28 14:13:38
@@ -798,8 +798,9 @@
   word _words_allocd;
   	/* Number of words allocated during this collection cycle */
   word _words_wasted;
-  	/* Number of words wasted due to internal fragmentation	 */
-  	/* in large objects allocated since last gc. Approximate.*/
+  	/* Number of words wasted due to internal fragmentation	*/
+  	/* in large objects, or due to dropping blacklisted     */
+	/* blocks, since last gc.  Approximate.                 */
   word _words_finalized;
   	/* Approximate number of words in objects (and headers)	*/
   	/* That became ready for finalization in the last 	*/
Index: blacklst.c
===================================================================
RCS file: /home/staff/zs/imp/mercury/boehm_gc/blacklst.c,v
retrieving revision 1.3
retrieving revision 1.4
diff -u -r1.3 -r1.4
--- blacklst.c	1996/03/05 11:06:17	1.3
+++ blacklst.c	1997/05/28 16:10:52	1.4
@@ -108,7 +108,7 @@
     GC_incomplete_stack_bl = very_old_stack_bl;
     GC_total_black_listed = total_black_listed();
 #   ifdef PRINTSTATS
-  	GC_printf1("%ld blacklisted bytes in heap\n",
+  	GC_printf1("%ld stack-blacklisted bytes in heap\n",
   		   (unsigned long)GC_total_black_listed);
 #   endif
     if (GC_total_black_listed != 0) {

--- config.h.old	Thu May 29 02:17:59 1997
+++ config.h	Thu May 29 02:18:01 1997
@@ -514,8 +514,16 @@
 #	define MPROTECT_VDB
 #       ifdef __ELF__
 #            define DYNAMIC_LOADING
-	     extern int _etext;
-#            define DATASTART ((ptr_t)((((word) (&_etext)) + 0xfff) & ~0xfff))
+    	     extern char **__environ;
+#            define DATASTART ((ptr_t)(&__environ))
+			      /* hideous kludge: __environ is the first */
+			      /* word in crt0.o, and delimits the start */
+			      /* of the data segment, no matter which   */
+			      /* ld options were passed through.        */
+			      /* We could use _etext instead, but that  */
+			      /* would include .rodata, which may       */
+			      /* contain large read-only data tables    */
+			      /* that we'd rather not scan.		*/
 	     extern int _end;
 #	     define DATAEND (&_end)
 #	else

-- 
Fergus Henderson <fjh@cs.mu.oz.au>   |  "I have always known that the pursuit
WWW: <http://www.cs.mu.oz.au/~fjh>   |  of excellence is a lethal habit"
PGP: finger fjh@128.250.37.3         |     -- the last words of T. S. Garp.