[gclist] a puzzle, Boehm gc compatiblity with freebsd?

Rob Rodgers knave@acm.org
Fri, 24 Aug 2001 16:19:37 -0700


Hello, fellow GCers

This is my first question to the list so please excuse me if I am
incomplete with details.

I have recently completed work on a Unix daemon.  Needless to say,
I am seeking to ensure that the program does not leak allocated memory
and it does not appear to do so.

Anyway, I decided to see what the Boehm collector had to say about this,
because many of the structures I allocate are small and are handed off to
asynchronously scheduled events -- just as a sanity check, really.

But before I got anywhere with this, I ran into a subtle bug that I spent all
day tracking down only to find that it should have been obvious all along.

Simply put, the collector mistakenly detects an allocated structure as
leaked even though there exists a pointer to that structure in a global
linked list (no XOR tricks) that is also reachable from the stack (in main()).

For starters, gctest appears to work:


""
 > ./gctest
Switched to incremental mode
Emulating dirty bits with mprotect/signals
Completed 1 tests
Finalized 2206/2206 objects - finalization is probably ok
Total number of bytes allocated is 60648832
Final heap size is 3944448 bytes
Collector appears to work
""


and in fact, my own ad hoc tests show that it does detect leaks very well,
having added some code to ping it at various times in various asynchronous
events ...

	.
	malloc (10000); /* just let this leak */
  	.

... which correctly detect and free the leaked memory.

However, I have encountered a very strange problem.  In addition to detecting
leaks, the collector appears to detect as leaks things that are not 
leaks.  This
has been very hard to reproduce except in the one case: I allocate a 
particular
structure and the collector *always* insists that it leaks:


int main (int argc, char* argv[]) {
     START_LEAK_TRACKING();
     start (argc, argv);
     return 0;
}


.... (start calls the following multiple times):


static int
somefunc (const char* path, const char* file_path, const int name, const 
int delay_secs) {
     proc_info* pinfo = 0;
     /* just mallocs and fills out structs */
     pinfo = pinfo_create (path, file_path, name, delay_secs);
     if (!pinfo) return 0;
	.	
	.
     if (!list_add_end (global__plist, pinfo)) {
         return 0;
     }
     return 1;
}
<- reports leak of the object malloced inside pinfo_create,
         but list_end.data == pinfo!

Leaked composite object at 0x8094fb0 (sniper.c:367, sz=56)


However, checking the list itself reveals that list_end.data == 0x8094fb0
and so the object has no been leaked.

This introduces a rather nasty bug to the program.  Because the object was
mistakenly detected as leaked, the memory is freed and the next call to
malloc() that allocates the same # of bytes recycles the memory -- resulting
in the list containing two pointers of identical value when in fact they 
are meant
to be the different. Worse, the initialization of the second structure 
overwrites
the memory that was initialized for the first.  To illustrate:


LIST ENTRY 0   SEQ = 1
list 0: 2 - pcd  TEST_1 foo
list 0: 2 - 0 (of 0/0)
list 0: 2 - SNST_START


after creating the "new" structure and adding it on the end becomes:


LIST ENTRY 0   SEQ = 2
list 0: 2 - pcd  TEST_0 foo
list 0: 2 - 0 (of 0/0)
list 0: 2 - SNST_START
LIST ENTRY 1   SEQ = 2
list 1: 2 - pcd  TEST_0 foo
list 1: 2 - 0 (of 0/0)
list 1: 2 - SNST_START


Enabling the tracing was not particularly helpful:


""
Initiating full world-stop collection 3 after 80 allocd bytes
0 bytes in heap blacklisted for interior pointers
Disposing of reclaim lists took 0 msecs
--> Marking for collection 3 after 80 allocd bytes + 0 wasted bytes
Collection 2 reclaimed 0 bytes ---> heapsize = 65536 bytes
World-stopped marking took 0 msecs
Leaked composite object at 0x8095fb0 (sniper.c:367, sz=56)

Bytes recovered before sweep - f.l. count = -12072
Immediately reclaimed -7976 bytes in heap of size 65536 bytes
48 (atomic) + 40 (composite) collectable bytes in use
Finalize + initiate sweep took 0 + 0 msecs
""


The application is single threaded (driven by select()) running on a patched
(network code) version FreeBSD 4.2 on x86 and have read through
README.debugging without much luck.  gcc version is 2.95.2 19991024 (release)
and we are using the FreeBSD libc.

RSR