[gclist] why malloc/free instead of GC?

Basile STARYNKEVITCH basile@starynkevitch.net
Mon, 17 Feb 2003 21:44:55 +0100


>>>>> "David" == David F Bacon <dfb@watson.ibm.com> writes:

Citing me,  Basile:

    Basile>> FWIW, I did coded a small (unrealistic & simplistic) C example,
    Basile>> and found that malloc & free (on a 1.2 Athlon or 2 GHz P4)
    Basile>> under Linux is significantly faster than Boehm's GC. (IIRC, a
    Basile>> typical malloc is < 1 microsecond, while a GC_malloc is < 40
    Basile>> microseconds).

    David> umm... 1 microsecond on a 2 GHz machine makes 2000 cycles,
    David> yes?  let's conservatively say that you achieve only 0.1
    David> instructions per cycle.  a tuned allocation sequence,
    David> inlined by the compiler, is between 10 and 20 instructions
    David> for the Jikes RVM (Java VM from IBM Research).  so let's
    David> call it 200 cycles in the absolute worst case.  how is
    David> GC_malloc spending 40,000 cycles per alloc?

Here is my test program:

################################################################
// file essm.c
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/times.h>

#ifdef USEGC
#include <gc.h>
#define malloc(S) GC_malloc(S)
#ifdef USEGCFREE
#define free(P) GC_free(P)
#else
#define free(P) {}
#endif
#endif

void* tabptr[16];


int
main (int argc, char **argv)
{
  long long maxcnt = 1000000;
  long long i = 0;
  int r=0;
  int s=0;
  double usert, syst;
  struct tms t;
  struct st *p = 0;
  if (argc > 1)
    maxcnt = atol (argv[1])*1000;
  memset(&t, 0, sizeof(t));
  if (maxcnt<100000) maxcnt=100000;
  printf ("begin maxcnt=%lld=%e\n", maxcnt, (double)maxcnt);
  for (i = 0; i < maxcnt; i++) {
    if ((i & 0x1fffff) == 0)
      printf ("i=%lld\n", i);
    r = lrand48() & 0xf;
    if (tabptr[r]) free(tabptr[r]); 
    s = (lrand48() % 100000) + 100;
    tabptr[r] = malloc(s);
  };
  times(&t);
  usert = ((double)t.tms_utime) / sysconf(_SC_CLK_TCK);
  syst = ((double)t.tms_stime) / sysconf(_SC_CLK_TCK);
  printf ("end maxcnt=%lld=%e\n", maxcnt, (double)maxcnt);
  printf ("cputime user=%g system=%g total == per iteration  user=%g system=%g\n",
	  usert, syst, usert/(double)maxcnt, syst/(double)maxcnt);
  return 0;
}
################################################################

My machine is a Debian/Sid (the unstable, I made apt-get update &
dist-upgrade today), glibc is 2.3.1, gcc is 3.2.3 20030210 (Debian
prerelease), processor is an AthlonXP2000 (a tinybit overclocked), RAM
is 512Mbytes (as 2*256 DDRAM 2700 memory banks):

% cat /proc/cpuinfo
processor	: 0
vendor_id	: AuthenticAMD
cpu family	: 6
model		: 8
model name	: AMD Athlon(TM) XP 2000+
stepping	: 0
cpu MHz		: 1733.438
cache size	: 256 KB
fdiv_bug	: no
hlt_bug		: no
f00f_bug	: no
coma_bug	: no
fpu		: yes
fpu_exception	: yes
cpuid level	: 1
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr sse syscall mmxext 3dnowext 3dnow
bogomips	: 3460.30

% free
             total       used       free     shared    buffers     cached
Mem:        514988     466708      48280          0      91364     209880
-/+ buffers/cache:     165464     349524
Swap:      1025000      29716     995284

================ compilation with malloc&free from glibc2.3.1
gcc -O3  essm.c -o essm

./essm gives:
begin maxcnt=1000000=1.000000e+06
i=0
end maxcnt=1000000=1.000000e+06
cputime user=0.61 system=0.8 total == per iteration  user=6.1e-07 system=8e-07

================ compilation with Boehm's GC 
gcc -O3 -DUSEGC essm.c -o essm_gc -lgc

(the -lgc is  /usr/lib/libgc.so.6 from Debian)

./essm_gc gives:
begin maxcnt=1000000=1.000000e+06
i=0
end maxcnt=1000000=1.000000e+06
cputime user=310.64 system=0.42 total == per iteration  user=0.00031064 system=4.2e-07

================ compilation with Boehm's GC using explicit free
gcc -O3 -DUSEGC -DUSEGCFREE essm.c -o essm_gcfr -lgc
./essm_gcfr
begin maxcnt=1000000=1.000000e+06
i=0
end maxcnt=1000000=1.000000e+06
cputime user=116.45 system=0.15 total == per iteration  user=0.00011645 system=1.5e-07

################################################################

actually I am surprised by the resulting times. I suppose than lrand48
happens to produce a valid pointer at inappropriate times ....

So it seems that on this example a glibc malloc+free last about
0.6microsecond, while a Boehm GC_malloc+GC_free last about 116
microseconds.

I'm interested if someone could reproduce the test and confirm or
infirm the (approximate) timing.

Regards
-- 

Basile STARYNKEVITCH         http://starynkevitch.net/Basile/ 
email: basile<at>starynkevitch<dot>net 
alias: basile<at>tunes<dot>org 
8, rue de la Faïencerie, 92340 Bourg La Reine, France