[gclist] why malloc/free instead of GC?
Basile STARYNKEVITCH
basile@starynkevitch.net
Mon, 17 Feb 2003 21:44:55 +0100
>>>>> "David" == David F Bacon <dfb@watson.ibm.com> writes:
Citing me, Basile:
Basile>> FWIW, I did coded a small (unrealistic & simplistic) C example,
Basile>> and found that malloc & free (on a 1.2 Athlon or 2 GHz P4)
Basile>> under Linux is significantly faster than Boehm's GC. (IIRC, a
Basile>> typical malloc is < 1 microsecond, while a GC_malloc is < 40
Basile>> microseconds).
David> umm... 1 microsecond on a 2 GHz machine makes 2000 cycles,
David> yes? let's conservatively say that you achieve only 0.1
David> instructions per cycle. a tuned allocation sequence,
David> inlined by the compiler, is between 10 and 20 instructions
David> for the Jikes RVM (Java VM from IBM Research). so let's
David> call it 200 cycles in the absolute worst case. how is
David> GC_malloc spending 40,000 cycles per alloc?
Here is my test program:
################################################################
// file essm.c
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/times.h>
#ifdef USEGC
#include <gc.h>
#define malloc(S) GC_malloc(S)
#ifdef USEGCFREE
#define free(P) GC_free(P)
#else
#define free(P) {}
#endif
#endif
void* tabptr[16];
int
main (int argc, char **argv)
{
long long maxcnt = 1000000;
long long i = 0;
int r=0;
int s=0;
double usert, syst;
struct tms t;
struct st *p = 0;
if (argc > 1)
maxcnt = atol (argv[1])*1000;
memset(&t, 0, sizeof(t));
if (maxcnt<100000) maxcnt=100000;
printf ("begin maxcnt=%lld=%e\n", maxcnt, (double)maxcnt);
for (i = 0; i < maxcnt; i++) {
if ((i & 0x1fffff) == 0)
printf ("i=%lld\n", i);
r = lrand48() & 0xf;
if (tabptr[r]) free(tabptr[r]);
s = (lrand48() % 100000) + 100;
tabptr[r] = malloc(s);
};
times(&t);
usert = ((double)t.tms_utime) / sysconf(_SC_CLK_TCK);
syst = ((double)t.tms_stime) / sysconf(_SC_CLK_TCK);
printf ("end maxcnt=%lld=%e\n", maxcnt, (double)maxcnt);
printf ("cputime user=%g system=%g total == per iteration user=%g system=%g\n",
usert, syst, usert/(double)maxcnt, syst/(double)maxcnt);
return 0;
}
################################################################
My machine is a Debian/Sid (the unstable, I made apt-get update &
dist-upgrade today), glibc is 2.3.1, gcc is 3.2.3 20030210 (Debian
prerelease), processor is an AthlonXP2000 (a tinybit overclocked), RAM
is 512Mbytes (as 2*256 DDRAM 2700 memory banks):
% cat /proc/cpuinfo
processor : 0
vendor_id : AuthenticAMD
cpu family : 6
model : 8
model name : AMD Athlon(TM) XP 2000+
stepping : 0
cpu MHz : 1733.438
cache size : 256 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr sse syscall mmxext 3dnowext 3dnow
bogomips : 3460.30
% free
total used free shared buffers cached
Mem: 514988 466708 48280 0 91364 209880
-/+ buffers/cache: 165464 349524
Swap: 1025000 29716 995284
================ compilation with malloc&free from glibc2.3.1
gcc -O3 essm.c -o essm
./essm gives:
begin maxcnt=1000000=1.000000e+06
i=0
end maxcnt=1000000=1.000000e+06
cputime user=0.61 system=0.8 total == per iteration user=6.1e-07 system=8e-07
================ compilation with Boehm's GC
gcc -O3 -DUSEGC essm.c -o essm_gc -lgc
(the -lgc is /usr/lib/libgc.so.6 from Debian)
./essm_gc gives:
begin maxcnt=1000000=1.000000e+06
i=0
end maxcnt=1000000=1.000000e+06
cputime user=310.64 system=0.42 total == per iteration user=0.00031064 system=4.2e-07
================ compilation with Boehm's GC using explicit free
gcc -O3 -DUSEGC -DUSEGCFREE essm.c -o essm_gcfr -lgc
./essm_gcfr
begin maxcnt=1000000=1.000000e+06
i=0
end maxcnt=1000000=1.000000e+06
cputime user=116.45 system=0.15 total == per iteration user=0.00011645 system=1.5e-07
################################################################
actually I am surprised by the resulting times. I suppose than lrand48
happens to produce a valid pointer at inappropriate times ....
So it seems that on this example a glibc malloc+free last about
0.6microsecond, while a Boehm GC_malloc+GC_free last about 116
microseconds.
I'm interested if someone could reproduce the test and confirm or
infirm the (approximate) timing.
Regards
--
Basile STARYNKEVITCH http://starynkevitch.net/Basile/
email: basile<at>starynkevitch<dot>net
alias: basile<at>tunes<dot>org
8, rue de la Faïencerie, 92340 Bourg La Reine, France