From ogoh@asu.edu  Fri Feb  7 16:33:45 2003
From: ogoh@asu.edu (Okehee Goh)
Date: Fri, 07 Feb 2003 09:33:45 -0700
Subject: [gclist] GC points and GC map
Message-ID: <CPEELCODKANPCLBCAGOFGEDMCFAA.ogoh@asu.edu>

 Hello,
 I'm not sure whether this list is active enough to post questions. If not,
please take my apology.
 I have questions regarding GC map and GC points to implement Exact GC( not
conservative GC).

 When I read the paper[2], there seem GC points that can tolerate GC.
Despite not considering exact GC, still there are some points which can't
tolerate GC on multi-threaded system?
 According to the paper [1] ( or [2]), in order to implement Exact GC, it
seems to need to place GC points on some instructions that generate GC map
(or stack map -- contains a set of registers and stack locations that refer
to live objects in heap)
 So, GC points and GC map are just necessary to support Exact GC? Otherwise,
it is also relevant to implement incremental GC?

 Actually I'm trying to design incremental GC which shows deterministic
behavior on multi-threaded system of single processor. It mean that one
thread runs at a time. Sill there are some points that can't tolerate GC
when GC thread tries to run by preempting other work threads?

 I appreciate any opinion. ( Forgive if this question is too basic)

 Regards,

 Okehee

[1]Ole Agesen. GC Points in a Threaded Environment. SMLI TR-98-70. Sun
Microsystems, Palo Alto, CA, December 1998.
[2] http://wwws.sun.com/software/communitysource/j2me/cdc/
[3] J. M. Stichnoth, G.-Y. Lueh, and M. Cierniak. Support for Garbage
Collection at Every Instruction in a Java Compiler. Proceedings of the ACM
Conference on Programming Language Design and Implementation, May 1999, pp.
118--127

 ---------------------------------------
Real-Time System lab of CSE of ASU
CSE Dept, College of EAS, ASU P.O.Box 875406
  Tempe AZ 85287
480-727-7765

From mwh@cs.umd.edu  Mon Feb 17 18:46:51 2003
From: mwh@cs.umd.edu (Michael Hicks)
Date: Mon, 17 Feb 2003 13:46:51 -0500
Subject: [gclist] why malloc/free instead of GC?
Message-ID: <1045507612.1760.58.camel@mwhlaptop>

A number of performance studies (starting with Zorn in `92, but perhaps
before?) and anecdotal evidence now suggests that there is little reason
to use malloc/free over GC.  Zorn states that the main shot against
conservative GC is that it requires a larger memory footprint (and
actually uses much of it).  Another shot might be the unexpected latency
resulting from GC, foiling soft real-time guarantees.

My question: are there any studies that indicate under what conditions
would benefit from avoiding GC for these reasons (putting aside the
safety benefits of GC)?  For example, what program characteristics would
imply that using BDW would require significantly more memory than would
using malloc/free?  What sorts of programs would incur excessively long
latencies during collection?  I can certainly speculate about the
answers to these questions (and would welcome list readers to do so),
but I am curious if any published or informal studies have been done. 
While Zorn's study points out which benchmark programs require more
memory than others when using BDW, it doesn't go into why that is the
case (as far as I could see on skimming it).

Thanks,
Mike

From lassi.tuura@cern.ch  Mon Feb 17 19:12:08 2003
From: lassi.tuura@cern.ch (Lassi A. Tuura)
Date: Mon, 17 Feb 2003 20:12:08 +0100
Subject: [gclist] why malloc/free instead of GC?
In-Reply-To: <1045507612.1760.58.camel@mwhlaptop>
References: <1045507612.1760.58.camel@mwhlaptop>
Message-ID: <3E513408.9080105@cern.ch>

You might want to refer to some of the recent discussions on the GCC
(GNU compiler collection) mailing list.  One hotly discussed topic is
the changes in the memory access patterns and other subtle hidden costs.

Memory access pattern costs are difficult to measure.  Some applications
greatly benefit from reusing freed memory quickly because of CPU cache
issues; GC may work against that.  Yet many allocation patterns are well
suited to GC.  Some apps are very sensitive to clustering of the objects
due to the access patterns, and depending on how you allocate objects
you can do well or horribly.  Depending on your memory allocator logic
it may have little or lot to do with GC costs.

On the other hand, memory allocation assumptions build into designs
which makes it hard to compare a system with and without GC.  For an
unbiased comparison you might have to rewrite the whole system.

BTW, GCC doesn't use the BDW collector but its own scheme, which is
another factor to fold into the impact calculations.

//lat
-- 
prototype, n.:
  First stage in the life cycle of a computer product, followed
  by pre-alpha, alpha, beta, release version, corrected release
  version, upgrade, corrected upgrade, etc.  Unlike its
  successors, the prototype is not expected to work.

From basile@starynkevitch.net  Mon Feb 17 19:49:45 2003
From: basile@starynkevitch.net (Basile STARYNKEVITCH)
Date: Mon, 17 Feb 2003 20:49:45 +0100
Subject: [gclist] why malloc/free instead of GC?
In-Reply-To: <1045507612.1760.58.camel@mwhlaptop>
References: <1045507612.1760.58.camel@mwhlaptop>
Message-ID: <15953.15577.512872.84744@hector.lesours>

>>>>> "Michael" == Michael Hicks <mwh@cs.umd.edu> writes:

    Michael> A number of performance studies (starting with Zorn in
    Michael> `92, but perhaps before?) and anecdotal evidence now
    Michael> suggests that there is little reason to use malloc/free
    Michael> over GC.  Zorn states that the main shot against
    Michael> conservative GC is that it requires a larger memory
    Michael> footprint (and actually uses much of it).  Another shot
    Michael> might be the unexpected latency resulting from GC,
    Michael> foiling soft real-time guarantees.

I tend to believe that people prefer malloc&free to Boehm's GC without
real technical reasons, but mostly for social reasons. [The almost
only soft realtime guarantee people want -outside the embedded market-
is compatibility with graphical user interfaces time requirements]

Most people I know that are proficient C coders did not even heard of
GC techniques (in particular Boehm's GC) before I talked them about
it.

FWIW, I did coded a small (unrealistic & simplistic) C example, and
found that malloc & free (on a 1.2 Athlon or 2 GHz P4) under Linux is
significantly faster than Boehm's GC. (IIRC, a typical malloc is < 1
microsecond, while a GC_malloc is < 40 microseconds).

Apparently, simple GC techniques are no more taught in CS
classes... (this was not true 20 years ago, at least not in France - I
learnt about GC in a lecture on Lisp in License, about the equivalent
of Bachelor?).

I find funny that Java rehabilitated the whole GC idea, while Java's
GC (because of the synchrony & finalization properties of the language
specification) is necessarily complex & slow.

The perception of GC by old (even technical) managers is the GC of
Lisp machines or systems, at the time when RAM was extremely
expensive.. [so garbage collection used the disk swap, and was heard
at that time because the disk made lot of noises] This is no more the
case.

Actually, I'm surprised that today's major opensource projects (like
Apache, GNOME, KDE...) don't use GC [with the exception of Emacs,
which used to explicitly show GC periods to user - this was a wrong
decision, because it made users complain against GC].

On a related note: people usually don't believe that modern ML (eg
Ocaml) or Lisp (eg CMUCL) implementations can perform about as quickly
as C (ie less than 2 times slower than C).


-- 

Basile STARYNKEVITCH         http://starynkevitch.net/Basile/ 
email: basile<at>starynkevitch<dot>net 
alias: basile<at>tunes<dot>org 
8, rue de la Faļencerie, 92340 Bourg La Reine, France

From basile@starynkevitch.net  Mon Feb 17 20:44:55 2003
From: basile@starynkevitch.net (Basile STARYNKEVITCH)
Date: Mon, 17 Feb 2003 21:44:55 +0100
Subject: [gclist] why malloc/free instead of GC?
In-Reply-To: <011f01c2d6c0$14e63010$1c02a8c0@watson.ibm.com>
References: <1045507612.1760.58.camel@mwhlaptop>
 <15953.15577.512872.84744@hector.lesours>
 <011f01c2d6c0$14e63010$1c02a8c0@watson.ibm.com>
Message-ID: <15953.18887.922393.270023@hector.lesours>

>>>>> "David" == David F Bacon <dfb@watson.ibm.com> writes:

Citing me,  Basile:

    Basile>> FWIW, I did coded a small (unrealistic & simplistic) C example,
    Basile>> and found that malloc & free (on a 1.2 Athlon or 2 GHz P4)
    Basile>> under Linux is significantly faster than Boehm's GC. (IIRC, a
    Basile>> typical malloc is < 1 microsecond, while a GC_malloc is < 40
    Basile>> microseconds).

    David> umm... 1 microsecond on a 2 GHz machine makes 2000 cycles,
    David> yes?  let's conservatively say that you achieve only 0.1
    David> instructions per cycle.  a tuned allocation sequence,
    David> inlined by the compiler, is between 10 and 20 instructions
    David> for the Jikes RVM (Java VM from IBM Research).  so let's
    David> call it 200 cycles in the absolute worst case.  how is
    David> GC_malloc spending 40,000 cycles per alloc?

Here is my test program:

################################################################
// file essm.c
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/times.h>

#ifdef USEGC
#include <gc.h>
#define malloc(S) GC_malloc(S)
#ifdef USEGCFREE
#define free(P) GC_free(P)
#else
#define free(P) {}
#endif
#endif

void* tabptr[16];


int
main (int argc, char **argv)
{
  long long maxcnt = 1000000;
  long long i = 0;
  int r=0;
  int s=0;
  double usert, syst;
  struct tms t;
  struct st *p = 0;
  if (argc > 1)
    maxcnt = atol (argv[1])*1000;
  memset(&t, 0, sizeof(t));
  if (maxcnt<100000) maxcnt=100000;
  printf ("begin maxcnt=%lld=%e\n", maxcnt, (double)maxcnt);
  for (i = 0; i < maxcnt; i++) {
    if ((i & 0x1fffff) == 0)
      printf ("i=%lld\n", i);
    r = lrand48() & 0xf;
    if (tabptr[r]) free(tabptr[r]); 
    s = (lrand48() % 100000) + 100;
    tabptr[r] = malloc(s);
  };
  times(&t);
  usert = ((double)t.tms_utime) / sysconf(_SC_CLK_TCK);
  syst = ((double)t.tms_stime) / sysconf(_SC_CLK_TCK);
  printf ("end maxcnt=%lld=%e\n", maxcnt, (double)maxcnt);
  printf ("cputime user=%g system=%g total == per iteration  user=%g system=%g\n",
	  usert, syst, usert/(double)maxcnt, syst/(double)maxcnt);
  return 0;
}
################################################################

My machine is a Debian/Sid (the unstable, I made apt-get update &
dist-upgrade today), glibc is 2.3.1, gcc is 3.2.3 20030210 (Debian
prerelease), processor is an AthlonXP2000 (a tinybit overclocked), RAM
is 512Mbytes (as 2*256 DDRAM 2700 memory banks):

% cat /proc/cpuinfo
processor	: 0
vendor_id	: AuthenticAMD
cpu family	: 6
model		: 8
model name	: AMD Athlon(TM) XP 2000+
stepping	: 0
cpu MHz		: 1733.438
cache size	: 256 KB
fdiv_bug	: no
hlt_bug		: no
f00f_bug	: no
coma_bug	: no
fpu		: yes
fpu_exception	: yes
cpuid level	: 1
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr sse syscall mmxext 3dnowext 3dnow
bogomips	: 3460.30

% free
             total       used       free     shared    buffers     cached
Mem:        514988     466708      48280          0      91364     209880
-/+ buffers/cache:     165464     349524
Swap:      1025000      29716     995284

================ compilation with malloc&free from glibc2.3.1
gcc -O3  essm.c -o essm

./essm gives:
begin maxcnt=1000000=1.000000e+06
i=0
end maxcnt=1000000=1.000000e+06
cputime user=0.61 system=0.8 total == per iteration  user=6.1e-07 system=8e-07

================ compilation with Boehm's GC 
gcc -O3 -DUSEGC essm.c -o essm_gc -lgc

(the -lgc is  /usr/lib/libgc.so.6 from Debian)

./essm_gc gives:
begin maxcnt=1000000=1.000000e+06
i=0
end maxcnt=1000000=1.000000e+06
cputime user=310.64 system=0.42 total == per iteration  user=0.00031064 system=4.2e-07

================ compilation with Boehm's GC using explicit free
gcc -O3 -DUSEGC -DUSEGCFREE essm.c -o essm_gcfr -lgc
./essm_gcfr
begin maxcnt=1000000=1.000000e+06
i=0
end maxcnt=1000000=1.000000e+06
cputime user=116.45 system=0.15 total == per iteration  user=0.00011645 system=1.5e-07

################################################################

actually I am surprised by the resulting times. I suppose than lrand48
happens to produce a valid pointer at inappropriate times ....

So it seems that on this example a glibc malloc+free last about
0.6microsecond, while a Boehm GC_malloc+GC_free last about 116
microseconds.

I'm interested if someone could reproduce the test and confirm or
infirm the (approximate) timing.

Regards
-- 

Basile STARYNKEVITCH         http://starynkevitch.net/Basile/ 
email: basile<at>starynkevitch<dot>net 
alias: basile<at>tunes<dot>org 
8, rue de la Faļencerie, 92340 Bourg La Reine, France

From ghudson@MIT.EDU  Mon Feb 17 23:47:11 2003
From: ghudson@MIT.EDU (Greg Hudson)
Date: 17 Feb 2003 18:47:11 -0500
Subject: [gclist] why malloc/free instead of GC?
In-Reply-To: <15953.15577.512872.84744@hector.lesours>
References: <1045507612.1760.58.camel@mwhlaptop>
 <15953.15577.512872.84744@hector.lesours>
Message-ID: <1045525631.1303.65.camel@error-messages.mit.edu>

Here are some reasons not to use GC in a new project written in C:

  * It's a very deep requirement which is not provided by the native
system.  The Boehm conservative GC may be very good and portable, but
it's unlikely to be 100% perfect.  And if anything goes wrong with
memory, now you have to suspect this piece of arcane magic in addition
to your own code.

  * There are network effects.  If you're writing a library, you may not
want to require everyone who uses your library to use GC.  If you're
writing an application which uses libraries, you'll have to
memory-manage those libraries' data objects anyway.

  * There's this finalization problem.  Finalizers aren't run on a
guaranteed schedule, so expensive objects like file descriptors can't be
trusted to finalizers.  But if you're not explicitly managing your
memory, you may lose track of when to deconstruct objects containing
expensive resources (e.g. if there is a "file-as-string" type which acts
like a string type, now you have to explicitly manage all strings in
order to explicitly manage your file descriptors).

The first two arguments don't apply to a language like Java with
built-in GC services.  The third argument does, but maybe it doesn't
come up very often in practice.

Of course, even if these are good arguments in opposition to GC, they
don't necessarily have much to do with the reasons programmers tend not
to use GC in the real world.  Most likely they just don't know much
about it.

From hans_boehm@hp.com  Tue Feb 18 04:24:27 2003
From: hans_boehm@hp.com (Boehm, Hans)
Date: Mon, 17 Feb 2003 20:24:27 -0800
Subject: [gclist] why malloc/free instead of GC?
Message-ID: <75A9FEBA25015040A761C1F74975667DA136A4@hplex4.hpl.hp.com>

I would add:

GC object roundtrip times are pretty much unavoidably proportional to the object size, where malloc + free times can be nearly constant.  If you allocate primarily large objects, malloc+free will be cheaper.  (For sufficiently small objects, it usually isn't, at least based on my measurements.  Conservative collectors like large objects even less.)

On the other hand:

I think finalization isn't an argument against tracing GCs.  You run into fundamentally the same issues with, say, user-implemented reference counting in C++.  The problems are inherent in abstracting away or hiding precise deallocation times.  And the problems aren't anywhere near unsolvable. See my 2003 POPL paper (also at http://www.hpl.hp.com/techreports/2002/HPL-2002-335.html) for details.

Finalization also usually provides an easy mechanism for dealing with libraries requiring explicit deallocation calls.  Thus I don't think that's a major problem.

Hans

From hans_boehm@hp.com  Tue Feb 18 04:37:00 2003
From: hans_boehm@hp.com (Boehm, Hans)
Date: Mon, 17 Feb 2003 20:37:00 -0800
Subject: [gclist] why malloc/free instead of GC?
Message-ID: <75A9FEBA25015040A761C1F74975667DA136A5@hplex4.hpl.hp.com>

Note that this gives you an average object size of a little over 50KB, and that the malloc/free version never touches the allocated memory.  Any garbage collector will lose against malloc/free under those conditions, due to both the huge average object size, and the fact that collectors generally like to initialize at least possible pointer fields within objects.

I don't think any real applications behave quite like this.  There are some that are close enough that they shouldn't use a GC.

Hans

-----Original Message-----
From: Basile STARYNKEVITCH
To: David F. Bacon
Cc: gclist@iecc.com
Sent: 2/17/03 12:44 PM
Subject: Re: [gclist] why malloc/free instead of GC?

>>>>> "David" == David F Bacon <dfb@watson.ibm.com> writes:

Citing me,  Basile:

    Basile>> FWIW, I did coded a small (unrealistic & simplistic) C
example,
    Basile>> and found that malloc & free (on a 1.2 Athlon or 2 GHz P4)
    Basile>> under Linux is significantly faster than Boehm's GC. (IIRC,
a
    Basile>> typical malloc is < 1 microsecond, while a GC_malloc is <
40
    Basile>> microseconds).

    David> umm... 1 microsecond on a 2 GHz machine makes 2000 cycles,
    David> yes?  let's conservatively say that you achieve only 0.1
    David> instructions per cycle.  a tuned allocation sequence,
    David> inlined by the compiler, is between 10 and 20 instructions
    David> for the Jikes RVM (Java VM from IBM Research).  so let's
    David> call it 200 cycles in the absolute worst case.  how is
    David> GC_malloc spending 40,000 cycles per alloc?

Here is my test program:

################################################################
// file essm.c
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/times.h>

#ifdef USEGC
#include <gc.h>
#define malloc(S) GC_malloc(S)
#ifdef USEGCFREE
#define free(P) GC_free(P)
#else
#define free(P) {}
#endif
#endif

void* tabptr[16];


int
main (int argc, char **argv)
{
  long long maxcnt = 1000000;
  long long i = 0;
  int r=0;
  int s=0;
  double usert, syst;
  struct tms t;
  struct st *p = 0;
  if (argc > 1)
    maxcnt = atol (argv[1])*1000;
  memset(&t, 0, sizeof(t));
  if (maxcnt<100000) maxcnt=100000;
  printf ("begin maxcnt=%lld=%e\n", maxcnt, (double)maxcnt);
  for (i = 0; i < maxcnt; i++) {
    if ((i & 0x1fffff) == 0)
      printf ("i=%lld\n", i);
    r = lrand48() & 0xf;
    if (tabptr[r]) free(tabptr[r]); 
    s = (lrand48() % 100000) + 100;
    tabptr[r] = malloc(s);
  };
  times(&t);
  usert = ((double)t.tms_utime) / sysconf(_SC_CLK_TCK);
  syst = ((double)t.tms_stime) / sysconf(_SC_CLK_TCK);
  printf ("end maxcnt=%lld=%e\n", maxcnt, (double)maxcnt);
  printf ("cputime user=%g system=%g total == per iteration  user=%g
system=%g\n",
	  usert, syst, usert/(double)maxcnt, syst/(double)maxcnt);
  return 0;
}
################################################################

My machine is a Debian/Sid (the unstable, I made apt-get update &
dist-upgrade today), glibc is 2.3.1, gcc is 3.2.3 20030210 (Debian
prerelease), processor is an AthlonXP2000 (a tinybit overclocked), RAM
is 512Mbytes (as 2*256 DDRAM 2700 memory banks):

% cat /proc/cpuinfo
processor	: 0
vendor_id	: AuthenticAMD
cpu family	: 6
model		: 8
model name	: AMD Athlon(TM) XP 2000+
stepping	: 0
cpu MHz		: 1733.438
cache size	: 256 KB
fdiv_bug	: no
hlt_bug		: no
f00f_bug	: no
coma_bug	: no
fpu		: yes
fpu_exception	: yes
cpuid level	: 1
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 mmx fxsr sse syscall mmxext 3dnowext 3dnow
bogomips	: 3460.30

% free
             total       used       free     shared    buffers
cached
Mem:        514988     466708      48280          0      91364
209880
-/+ buffers/cache:     165464     349524
Swap:      1025000      29716     995284

================ compilation with malloc&free from glibc2.3.1
gcc -O3  essm.c -o essm

./essm gives:
begin maxcnt=1000000=1.000000e+06
i=0
end maxcnt=1000000=1.000000e+06
cputime user=0.61 system=0.8 total == per iteration  user=6.1e-07
system=8e-07

================ compilation with Boehm's GC 
gcc -O3 -DUSEGC essm.c -o essm_gc -lgc

(the -lgc is  /usr/lib/libgc.so.6 from Debian)

./essm_gc gives:
begin maxcnt=1000000=1.000000e+06
i=0
end maxcnt=1000000=1.000000e+06
cputime user=310.64 system=0.42 total == per iteration  user=0.00031064
system=4.2e-07

================ compilation with Boehm's GC using explicit free
gcc -O3 -DUSEGC -DUSEGCFREE essm.c -o essm_gcfr -lgc
./essm_gcfr
begin maxcnt=1000000=1.000000e+06
i=0
end maxcnt=1000000=1.000000e+06
cputime user=116.45 system=0.15 total == per iteration  user=0.00011645
system=1.5e-07

################################################################

actually I am surprised by the resulting times. I suppose than lrand48
happens to produce a valid pointer at inappropriate times ....

So it seems that on this example a glibc malloc+free last about
0.6microsecond, while a Boehm GC_malloc+GC_free last about 116
microseconds.

I'm interested if someone could reproduce the test and confirm or
infirm the (approximate) timing.

Regards
-- 

Basile STARYNKEVITCH         http://starynkevitch.net/Basile/ 
email: basile<at>starynkevitch<dot>net 
alias: basile<at>tunes<dot>org 
8, rue de la Faļencerie, 92340 Bourg La Reine, France

From hans_boehm@hp.com  Tue Feb 18 04:37:00 2003
From: hans_boehm@hp.com (Boehm, Hans)
Date: Mon, 17 Feb 2003 20:37:00 -0800
Subject: [gclist] why malloc/free instead of GC?
Message-ID: <75A9FEBA25015040A761C1F74975667DA136A5@hplex4.hpl.hp.com>

Note that this gives you an average object size of a little over 50KB, and that the malloc/free version never touches the allocated memory.  Any garbage collector will lose against malloc/free under those conditions, due to both the huge average object size, and the fact that collectors generally like to initialize at least possible pointer fields within objects.

I don't think any real applications behave quite like this.  There are some that are close enough that they shouldn't use a GC.

Hans

-----Original Message-----
From: Basile STARYNKEVITCH
To: David F. Bacon
Cc: gclist@iecc.com
Sent: 2/17/03 12:44 PM
Subject: Re: [gclist] why malloc/free instead of GC?

>>>>> "David" == David F Bacon <dfb@watson.ibm.com> writes:

Citing me,  Basile:

    Basile>> FWIW, I did coded a small (unrealistic & simplistic) C
example,
    Basile>> and found that malloc & free (on a 1.2 Athlon or 2 GHz P4)
    Basile>> under Linux is significantly faster than Boehm's GC. (IIRC,
a
    Basile>> typical malloc is < 1 microsecond, while a GC_malloc is <
40
    Basile>> microseconds).

    David> umm... 1 microsecond on a 2 GHz machine makes 2000 cycles,
    David> yes?  let's conservatively say that you achieve only 0.1
    David> instructions per cycle.  a tuned allocation sequence,
    David> inlined by the compiler, is between 10 and 20 instructions
    David> for the Jikes RVM (Java VM from IBM Research).  so let's
    David> call it 200 cycles in the absolute worst case.  how is
    David> GC_malloc spending 40,000 cycles per alloc?

Here is my test program:

################################################################
// file essm.c
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/times.h>

#ifdef USEGC
#include <gc.h>
#define malloc(S) GC_malloc(S)
#ifdef USEGCFREE
#define free(P) GC_free(P)
#else
#define free(P) {}
#endif
#endif

void* tabptr[16];


int
main (int argc, char **argv)
{
  long long maxcnt = 1000000;
  long long i = 0;
  int r=0;
  int s=0;
  double usert, syst;
  struct tms t;
  struct st *p = 0;
  if (argc > 1)
    maxcnt = atol (argv[1])*1000;
  memset(&t, 0, sizeof(t));
  if (maxcnt<100000) maxcnt=100000;
  printf ("begin maxcnt=%lld=%e\n", maxcnt, (double)maxcnt);
  for (i = 0; i < maxcnt; i++) {
    if ((i & 0x1fffff) == 0)
      printf ("i=%lld\n", i);
    r = lrand48() & 0xf;
    if (tabptr[r]) free(tabptr[r]); 
    s = (lrand48() % 100000) + 100;
    tabptr[r] = malloc(s);
  };
  times(&t);
  usert = ((double)t.tms_utime) / sysconf(_SC_CLK_TCK);
  syst = ((double)t.tms_stime) / sysconf(_SC_CLK_TCK);
  printf ("end maxcnt=%lld=%e\n", maxcnt, (double)maxcnt);
  printf ("cputime user=%g system=%g total == per iteration  user=%g
system=%g\n",
	  usert, syst, usert/(double)maxcnt, syst/(double)maxcnt);
  return 0;
}
################################################################

My machine is a Debian/Sid (the unstable, I made apt-get update &
dist-upgrade today), glibc is 2.3.1, gcc is 3.2.3 20030210 (Debian
prerelease), processor is an AthlonXP2000 (a tinybit overclocked), RAM
is 512Mbytes (as 2*256 DDRAM 2700 memory banks):

% cat /proc/cpuinfo
processor	: 0
vendor_id	: AuthenticAMD
cpu family	: 6
model		: 8
model name	: AMD Athlon(TM) XP 2000+
stepping	: 0
cpu MHz		: 1733.438
cache size	: 256 KB
fdiv_bug	: no
hlt_bug		: no
f00f_bug	: no
coma_bug	: no
fpu		: yes
fpu_exception	: yes
cpuid level	: 1
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 mmx fxsr sse syscall mmxext 3dnowext 3dnow
bogomips	: 3460.30

% free
             total       used       free     shared    buffers
cached
Mem:        514988     466708      48280          0      91364
209880
-/+ buffers/cache:     165464     349524
Swap:      1025000      29716     995284

================ compilation with malloc&free from glibc2.3.1
gcc -O3  essm.c -o essm

./essm gives:
begin maxcnt=1000000=1.000000e+06
i=0
end maxcnt=1000000=1.000000e+06
cputime user=0.61 system=0.8 total == per iteration  user=6.1e-07
system=8e-07

================ compilation with Boehm's GC 
gcc -O3 -DUSEGC essm.c -o essm_gc -lgc

(the -lgc is  /usr/lib/libgc.so.6 from Debian)

./essm_gc gives:
begin maxcnt=1000000=1.000000e+06
i=0
end maxcnt=1000000=1.000000e+06
cputime user=310.64 system=0.42 total == per iteration  user=0.00031064
system=4.2e-07

================ compilation with Boehm's GC using explicit free
gcc -O3 -DUSEGC -DUSEGCFREE essm.c -o essm_gcfr -lgc
./essm_gcfr
begin maxcnt=1000000=1.000000e+06
i=0
end maxcnt=1000000=1.000000e+06
cputime user=116.45 system=0.15 total == per iteration  user=0.00011645
system=1.5e-07

################################################################

actually I am surprised by the resulting times. I suppose than lrand48
happens to produce a valid pointer at inappropriate times ....

So it seems that on this example a glibc malloc+free last about
0.6microsecond, while a Boehm GC_malloc+GC_free last about 116
microseconds.

I'm interested if someone could reproduce the test and confirm or
infirm the (approximate) timing.

Regards
-- 

Basile STARYNKEVITCH         http://starynkevitch.net/Basile/ 
email: basile<at>starynkevitch<dot>net 
alias: basile<at>tunes<dot>org 
8, rue de la Faļencerie, 92340 Bourg La Reine, France

From johnsson@crt.se  Tue Feb 18 08:42:47 2003
From: johnsson@crt.se (Thomas Johnsson)
Date: Tue, 18 Feb 2003 09:42:47 +0100
Subject: [gclist] Games in C++ with GC? (why malloc/free instead of GC?)
In-Reply-To: <15953.15577.512872.84744@hector.lesours>
References: <1045507612.1760.58.camel@mwhlaptop>
 <15953.15577.512872.84744@hector.lesours>
Message-ID: <15953.61959.985000.886450@gargle.gargle.HOWL>

Further to this disscussion about the use of GC:

Games, espcially 3D games, are often written in C++, using DirectX (or OpenGL).
Is there any experience in using GC ()conservative or not) in such applications?
Being a sort of real time program, I can imagine GC latencies is a potential problem ...
???

-- Thomas Johnsson


Basile STARYNKEVITCH writes:
 > >>>>> "Michael" == Michael Hicks <mwh@cs.umd.edu> writes:
 > 
 >     Michael> A number of performance studies (starting with Zorn in
 >     Michael> `92, but perhaps before?) and anecdotal evidence now
 >     Michael> suggests that there is little reason to use malloc/free
 >     Michael> over GC.  Zorn states that the main shot against
 >     Michael> conservative GC is that it requires a larger memory
 >     Michael> footprint (and actually uses much of it).  Another shot
 >     Michael> might be the unexpected latency resulting from GC,
 >     Michael> foiling soft real-time guarantees.
 > 
 > I tend to believe that people prefer malloc&free to Boehm's GC without
 > real technical reasons, but mostly for social reasons. [The almost
 > only soft realtime guarantee people want -outside the embedded market-
 > is compatibility with graphical user interfaces time requirements]
 > 
 > Most people I know that are proficient C coders did not even heard of
 > GC techniques (in particular Boehm's GC) before I talked them about
 > it.
 > [etc]

From Nick.Barnes@pobox.com  Tue Feb 18 10:57:09 2003
From: Nick.Barnes@pobox.com (Nick Barnes)
Date: Tue, 18 Feb 2003 10:57:09 +0000
Subject: [gclist] Games in C++ with GC? (why malloc/free instead of
In-Reply-To: Message from Thomas Johnsson <johnsson@crt.se>  of "Tue, 18
 Feb 2003 09:42:47 +0100."
 <15953.61959.985000.886450@gargle.gargle.HOWL>
Message-ID: <22350.1045565829@thrush.ravenbrook.com>

At 2003-02-18 08:42:47+0000, Thomas Johnsson writes:

> Games, espcially 3D games, are often written in C++, using DirectX
> (or OpenGL).  Is there any experience in using GC ()conservative or
> not) in such applications?  Being a sort of real time program, I can
> imagine GC latencies is a potential problem ...  ???

Many games companies these days are using Lua (a very simple embedded
language), which has a very simple stop-and-copy collector. Some games
use other GCed languages or sub-systems.  Some of these games have
latency problems, and some do not.  Some of the latency problems may
be due to the GC, and some may not.  Crash Bandicoot, a PS1 platformer
a few years back, was written in a Lisp dialect.

<http://www.franz.com/success/customer_apps/animation_graphics/naughtydog.lhtml>

However, this is all somewhat theoretical because most games do not
allocate during game play.

The X-Box snowboarding game "Amped" has very bad frame rate problems,
especially in the GUI.  It also uses Lua, especially in the GUI.
Whether these facts are related is unclear.  Other games apparently
use Lua more than Amped does, and yet do not have frame rate problems.

Nick Barnes
Ravenbrook Limited

From weigelt@metux.de  Tue Feb 18 13:04:21 2003
From: weigelt@metux.de (Enrico Weigelt)
Date: Tue, 18 Feb 2003 14:04:21 +0100
Subject: [gclist] glib/gtk w/ GC
Message-ID: <20030218130421.GA30555@metux.de>

hi folks,

i'm gonna start working on an gc based derivate of the glib/gtk.
anyone interested in helping ?

cu
-- 
---------------------------------------------------------------------
 Enrico Weigelt    ==   metux ITS 
 Webhosting ab 5 EUR/Monat.          UUCP, rawIP und vieles mehr.

 phone:     +49 36207 519931         www:       http://www.metux.de/     
 fax:       +49 36207 519932         email:     contact@metux.de
 cellphone: +49 174 7066481	     smsgate:   sms.weigelt@metux.de
---------------------------------------------------------------------
 Diese Mail wurde mit UUCP versandt.      http://www.metux.de/uucp/

From weigelt@metux.de  Tue Feb 18 13:31:50 2003
From: weigelt@metux.de (Enrico Weigelt)
Date: Tue, 18 Feb 2003 14:31:50 +0100
Subject: [gclist] why malloc/free instead of GC?
In-Reply-To: <75A9FEBA25015040A761C1F74975667DA136A5@hplex4.hpl.hp.com>
References: <75A9FEBA25015040A761C1F74975667DA136A5@hplex4.hpl.hp.com>
Message-ID: <20030218133150.GC30555@metux.de>

On Mon, Feb 17, 2003 at 08:37:00PM -0800, Boehm, Hans wrote:
>
> Note that this gives you an average object size of a little over 50KB, 
> and that the malloc/free version never touches the allocated memory.  
> Any garbage collector will lose against malloc/free under those conditions, 
> due to both the huge average object size, and the fact that collectors 
> generally like to initialize at least possible pointer fields within objects.

GCs can cause problems if you use really huge memory. If it has no type
information, it must scan through all memory chunks and look for pointers.
With type info (i.e. in oberon) it could become much faster. 
But there's still another problem: if your application holds many pages,
which aren't accessed for quite a long time, they're possibly swapped out, 
but each time the GC runs over the heap, they have to be swapped in again.

hmm, is there any way to avoid scanning over the whole heap each time ?

cu
-- 
---------------------------------------------------------------------
 Enrico Weigelt    ==   metux ITS 
 Webhosting ab 5 EUR/Monat.          UUCP, rawIP und vieles mehr.

 phone:     +49 36207 519931         www:       http://www.metux.de/     
 fax:       +49 36207 519932         email:     contact@metux.de
 cellphone: +49 174 7066481	     smsgate:   sms.weigelt@metux.de
---------------------------------------------------------------------
 Diese Mail wurde mit UUCP versandt.      http://www.metux.de/uucp/

From dfb@watson.ibm.com  Tue Feb 18 13:56:32 2003
From: dfb@watson.ibm.com (David F. Bacon)
Date: Tue, 18 Feb 2003 08:56:32 -0500
Subject: [gclist] why malloc/free instead of GC?
References: <75A9FEBA25015040A761C1F74975667DA136A4@hplex4.hpl.hp.com>
Message-ID: <01e301c2d755$8d1b6510$1c02a8c0@watson.ibm.com>

hans,

by "roundtrip" do you mean malloc+free?  i don't understand your statement
about proportionality to object size in GC.  also, why do conservative
collectors dislike large objects?  is it because a floating point number
could cause a dead large object to be retained?

david
----- Original Message -----
From: "Boehm, Hans" <hans_boehm@hp.com>
To: "'Greg Hudson '" <ghudson@MIT.EDU>; "'Basile STARYNKEVITCH '"
<basile@starynkevitch.net>
Cc: <gclist@iecc.com>
Sent: Monday, February 17, 2003 11:24 PM
Subject: Re: [gclist] why malloc/free instead of GC?


> I would add:
>
> GC object roundtrip times are pretty much unavoidably proportional to the
object size, where malloc + free times can be nearly constant.  If you
allocate primarily large objects, malloc+free will be cheaper.  (For
sufficiently small objects, it usually isn't, at least based on my
measurements.  Conservative collectors like large objects even less.)

From jc@port25.com  Tue Feb 18 14:11:07 2003
From: jc@port25.com (Juergen Christoffel)
Date: Tue, 18 Feb 2003 15:11:07 +0100
Subject: [gclist] why malloc/free instead of GC?
In-Reply-To: <20030218133150.GC30555@metux.de>
References: <75A9FEBA25015040A761C1F74975667DA136A5@hplex4.hpl.hp.com>
 <20030218133150.GC30555@metux.de>
Message-ID: <20030218141107.GB14818@port25.com>

On Tue, Feb 18, 2003 at 02:31:50PM +0100, Enrico Weigelt wrote:

> hmm, is there any way to avoid scanning over the whole heap each time ?

Yes, generational GC for example. 

Back in the eighties and early nineties, when Symbolics introduced
generational GC on their Lisp Machines, large programs did run faster with
GC turned on than without GC when the "Ephemeral GC" (that was their name
for Generational GC, IIRC) was turned on because this reduced working sets.

	--jc

-- 
  Non cogitant. Ergo non sunt. -- Georg Christoph Lichtenberg

From cef@geodesic.com  Tue Feb 18 14:27:55 2003
From: cef@geodesic.com (Charles Fiterman)
Date: Tue, 18 Feb 2003 08:27:55 -0600
Subject: [gclist] why malloc/free instead of GC?
In-Reply-To: <01e301c2d755$8d1b6510$1c02a8c0@watson.ibm.com>
References: <75A9FEBA25015040A761C1F74975667DA136A4@hplex4.hpl.hp.com>
Message-ID: <5.1.1.6.0.20030218080757.03080618@pop3.geodesic.com>

Consider a large online application with the following common requirement. 
90% of all requests will be filled in one second. All requests will be 
filled in ten seconds.

If you don't want to crash you must have type safety and that implies 
garbage collection of some sort. Large applications are written by pools of 
programmers some of whom are very bad.

If you have mark and sweep or moving collection at some point your 
application will become so large that collection time causes you to violate 
it no matter how many CPU's you add. You must have a way to distribute free 
operations and not run them all at once.

If you make some restrictions on class structures and control collections 
centrally you can have reference counting and related methods that 
distribute frees. These are inefficient but we are supposing you can always 
add more CPU's. The great advantage of reference counting is that it is 
scaleable to very large sizes.

Reference counting also has the advantage that the destruction of objects 
can have rational finalizers. Finalizers must be safe, general, sure, 
prompt and ordered. Safe means they don't violate the type system. General 
means finalizers can run any code in the language and have that code 
produce normal results, for example exceptions can't just get discarded. 
Sure means if you build an object it gets to destroy itself. Prompt means 
finalizers aren't indefinitely postponed. Ordered means finalizers run in a 
determined order, if you ship the application you don't change the order 
creating portability bugs.

From basile@starynkevitch.net  Tue Feb 18 14:49:47 2003
From: basile@starynkevitch.net (Basile STARYNKEVITCH)
Date: Tue, 18 Feb 2003 15:49:47 +0100
Subject: [gclist] why malloc/free instead of GC?
In-Reply-To: <5.1.1.6.0.20030218080757.03080618@pop3.geodesic.com>
References: <75A9FEBA25015040A761C1F74975667DA136A4@hplex4.hpl.hp.com>
 <5.1.1.6.0.20030218080757.03080618@pop3.geodesic.com>
Message-ID: <15954.18443.77701.827155@hector.lesours>

>>>>> "Charles" == Charles Fiterman <cef@geodesic.com> writes:

    Charles> Consider a large online application with the following
    Charles> common requirement.  90% of all requests will be filled
    Charles> in one second. All requests will be filled in ten
    Charles> seconds.

There are two meanings of large here : 

    1. application with a big memory requirement at runtime

    2. application with a big amount of code

These 2 meanings are not related. Some applications (e.g. numerical
engineering) may be a small amount of code requiring a huge amount of
memory for data. And some programs have a huge amount of code
(e.g. lots of special case processing), but needs a small amount of
data memory to run.

    Charles> If you don't want to crash you must have type safety and
    Charles> that implies garbage collection of some sort. Large
    Charles> applications are written by pools of programmers some of
    Charles> whom are very bad.

Yes. The programmer's time is an increasingly expensive resouce.

    Charles> If you have mark and sweep or moving collection at some
    Charles> point your application will become so large that
    Charles> collection time causes you to violate it no matter how
    Charles> many CPU's you add. You must have a way to distribute
    Charles> free operations and not run them all at once.

It seems to me that large (at least in meaning 2) applications exist
which 

  are coded in a GC-ed language (like Lisp, Smalltalk, Java, Ocaml, ....)

  never spend more than a few consecutive seconds in garbage
  collection (just because a few seconds in todays machine is a lot of
  CPU time).

Of course I would suppose that the largest software is still coded in
(decades old) Cobol (or perhaps Fortran). I'm not sure it is easy to
maintain.


Since copying a hundred megabytes per second is realistic on today's
machines, I would believe that a full major garbage collection of a
gigabyte heap (which for me is a big heap) should require less than 10
seconds. 


-- 

Basile STARYNKEVITCH         http://starynkevitch.net/Basile/ 
email: basile<at>starynkevitch<dot>net 
alias: basile<at>tunes<dot>org 
8, rue de la Faļencerie, 92340 Bourg La Reine, France

From cef@geodesic.com  Tue Feb 18 15:13:36 2003
From: cef@geodesic.com (Charles Fiterman)
Date: Tue, 18 Feb 2003 09:13:36 -0600
Subject: [gclist] why malloc/free instead of GC?
In-Reply-To: <15954.18443.77701.827155@hector.lesours>
References: <5.1.1.6.0.20030218080757.03080618@pop3.geodesic.com>
 <75A9FEBA25015040A761C1F74975667DA136A4@hplex4.hpl.hp.com>
 <5.1.1.6.0.20030218080757.03080618@pop3.geodesic.com>
Message-ID: <5.1.1.6.0.20030218090549.02faf830@pop3.geodesic.com>

At 03:49 PM 2/18/2003 +0100, Basile STARYNKEVITCH wrote:
> >>>>> "Charles" == Charles Fiterman <cef@geodesic.com> writes:
>
>     Charles> Consider a large online application with the following
>     Charles> common requirement.  90% of all requests will be filled
>     Charles> in one second. All requests will be filled in ten
>     Charles> seconds.
>
>There are two meanings of large here :
>
>     1. application with a big memory requirement at runtime
>
>     2. application with a big amount of code

Both.


>Since copying a hundred megabytes per second is realistic on today's
>machines, I would believe that a full major garbage collection of a
>gigabyte heap (which for me is a big heap) should require less than 10
>seconds.

The commercial world is approaching 10 gigabyte heaps. This means trouble. 
Programmers in such environments are starting to manage their own heaps to 
avoid garbage collection. This only makes storage requirements expand even 
faster.

Languages gain power more from their restrictions than their capabilities. 
Functional languages gain referential  transparency and composition from 
the loss of side effects. Type safe languages can be used in places where 
people fear viruses. Giving up circular data structures buys finalizers and 
large applications.

From kanderson@bbn.com  Tue Feb 18 16:19:04 2003
From: kanderson@bbn.com (Ken Anderson)
Date: Tue, 18 Feb 2003 11:19:04 -0500
Subject: [gclist] why malloc/free instead of GC?
In-Reply-To: <15953.15577.512872.84744@hector.lesours>
References: <1045507612.1760.58.camel@mwhlaptop>
 <1045507612.1760.58.camel@mwhlaptop>
Message-ID: <5.0.2.1.2.20030218111756.01f69278@zima.bbn.com>

At 02:49 PM 2/17/2003, Basile STARYNKEVITCH wrote:
>On a related note: people usually don't believe that modern ML (eg
>Ocaml) or Lisp (eg CMUCL) implementations can perform about as quickly
>as C (ie less than 2 times slower than C).

They can even be faster
http://www.ai.mit.edu/~gregs/ll1-discuss-archive-html/msg01817.html

From pechtcha@cs.nyu.edu  Tue Feb 18 17:28:51 2003
From: pechtcha@cs.nyu.edu (Igor Pechtchanski)
Date: Tue, 18 Feb 2003 12:28:51 -0500 (EST)
Subject: [gclist] why malloc/free instead of GC?
In-Reply-To: <5.1.1.6.0.20030218090549.02faf830@pop3.geodesic.com>
Message-ID: <Pine.GSO.4.44.0302181018050.7773-100000@slinky.cs.nyu.edu>

On Tue, 18 Feb 2003, Charles Fiterman wrote:

> At 03:49 PM 2/18/2003 +0100, Basile STARYNKEVITCH wrote:
> > >>>>> "Charles" == Charles Fiterman <cef@geodesic.com> writes:
> >
> >     Charles> Consider a large online application with the following
> >     Charles> common requirement.  90% of all requests will be filled
> >     Charles> in one second. All requests will be filled in ten
> >     Charles> seconds.
> >
> >There are two meanings of large here :
> >
> >     1. application with a big memory requirement at runtime
> >
> >     2. application with a big amount of code
>
> Both.
>
> >Since copying a hundred megabytes per second is realistic on today's
> >machines, I would believe that a full major garbage collection of a
> >gigabyte heap (which for me is a big heap) should require less than 10
> >seconds.
>
> The commercial world is approaching 10 gigabyte heaps. This means trouble.
> Programmers in such environments are starting to manage their own heaps to
> avoid garbage collection. This only makes storage requirements expand even
> faster.

I would think that there are three major classes of applications with huge
heap sizes:
1) those where the large heap size comes not from the amount of data
handled in any one transaction, but rather from the number of concurrent
transactions.  Since each transaction is (usually) a separate entity,
approaches like region-based GC or transaction-specific heaps might work
well there.
2) those actually handling massive amounts of data for each transaction
(such as search engines).  Such applications mostly do not have a lot of
simultaneous live data, just a large data stream.  Since the performance
of, say, copying GC is proportionate to the amount of live data, this
shouldn't affect performance.
3) those with a lot of shared data.  This data would probably be
long-lived, and, once promoted to the oldest generation, rarely collected
anyway.  And there are techniques, like pretenuring, that allow the
relevant data to be promoted sooner.

The above are all susceptible to existing non-reference-counting GC
techniques, albeit with some tuning.  It'd be interesting to know if there
is a fourth class of applications that actually maintain large amounts of
simultaneous short-lived *live* data throughout the execution.

> Languages gain power more from their restrictions than their capabilities.
> Functional languages gain referential  transparency and composition from
> the loss of side effects. Type safe languages can be used in places where
> people fear viruses. Giving up circular data structures buys finalizers and
> large applications.

AFAICS, the only thing non-RC GC doesn't scale to is applications with a
large dynamic (i.e., fluid) working set (the fourth class above).  I'm not
at all sure any real applications fall into that category, although it
would be interesting to be proven wrong.
	Igor
-- 
				http://cs.nyu.edu/~pechtcha/
      |\      _,,,---,,_		pechtcha@cs.nyu.edu
ZZZzz /,`.-'`'    -.  ;-;;,_		igor@watson.ibm.com
     |,4-  ) )-,_. ,\ (  `'-'		Igor Pechtchanski
    '---''(_/--'  `-'\_) fL	a.k.a JaguaR-R-R-r-r-r-.-.-.  Meow!

Oh, boy, virtual memory! Now I'm gonna make myself a really *big* RAMdisk!
  -- /usr/games/fortune

From arlie@sublinear.org  Tue Feb 18 17:32:03 2003
From: arlie@sublinear.org (Arlie Davis)
Date: Tue, 18 Feb 2003 12:32:03 -0500
Subject: [gclist] why malloc/free instead of GC?
In-Reply-To: <20030218133150.GC30555@metux.de>
Message-ID: <000201c2d773$a6f685a0$5bd1dc0c@sulaco>

Microsoft's CLR (.Net Framework) does a good job on "large" objects.
Objects above a certain threshold (20k, I believe) are allocated in a
traditional malloc/free heap, but their lifetime is still tracked
through GC.  Also, since the CLR has access to all type information, it
only scans memory locations that are known to be pointers.

This is an elegant solution, because it shows that object lifetime
(explicit free vs. GC) can be separated from allocation mechanism
(contiguous heap w/compaction vs. fragmentable heap).

The same approach could easily be adopted by other GC implementations.

Also, consider the example (brought up here) of an application which
must process a high volume of transactions, with a high degree of
consistency of time required per transaction.  Just because you use a
GC, doesn't mean you *always* allocate fresh objects for every
transaction.  You can still -- selectively -- use object pooling.  Many
large applications that are based on explicit-free gain performance by
pooling instances of objects.

The same can be applied to environments that use GCs.  Actually, since a
single reference to a small connected graph of objects will retain that
entire graph, it's easy to pool entire graphs, by just holding a single
reference.

The only risk you run is that some piece of code retains a reference to
one of the objects you build.  Of course, you'll have to design your
application around this.  But doing so may be much easier and safer than
abandoning the use of managed/GC heaps.

-- arlie


-----Original Message-----
From: owner-gclist@lists.iecc.com [mailto:owner-gclist@lists.iecc.com]
On Behalf Of Enrico Weigelt
Sent: Tuesday, February 18, 2003 8:32 AM
To: gclist@iecc.com
Subject: Re: [gclist] why malloc/free instead of GC?


On Mon, Feb 17, 2003 at 08:37:00PM -0800, Boehm, Hans wrote:
>
> Note that this gives you an average object size of a little over 50KB,
> and that the malloc/free version never touches the allocated memory.  
> Any garbage collector will lose against malloc/free under those
conditions, 
> due to both the huge average object size, and the fact that collectors

> generally like to initialize at least possible pointer fields within
objects.

GCs can cause problems if you use really huge memory. If it has no type
information, it must scan through all memory chunks and look for
pointers. With type info (i.e. in oberon) it could become much faster. 
But there's still another problem: if your application holds many pages,
which aren't accessed for quite a long time, they're possibly swapped
out, 
but each time the GC runs over the heap, they have to be swapped in
again.

hmm, is there any way to avoid scanning over the whole heap each time ?

cu
-- 
---------------------------------------------------------------------
 Enrico Weigelt    ==   metux ITS 
 Webhosting ab 5 EUR/Monat.          UUCP, rawIP und vieles mehr.

 phone:     +49 36207 519931         www:       http://www.metux.de/

 fax:       +49 36207 519932         email:     contact@metux.de
 cellphone: +49 174 7066481	     smsgate:   sms.weigelt@metux.de
---------------------------------------------------------------------
 Diese Mail wurde mit UUCP versandt.      http://www.metux.de/uucp/

From weigelt@metux.de  Tue Feb 18 17:20:56 2003
From: weigelt@metux.de (Enrico Weigelt)
Date: Tue, 18 Feb 2003 18:20:56 +0100
Subject: [gclist] why malloc/free instead of GC?
In-Reply-To: <20030218141107.GB14818@port25.com>
References: <75A9FEBA25015040A761C1F74975667DA136A5@hplex4.hpl.hp.com>
 <20030218133150.GC30555@metux.de> <20030218141107.GB14818@port25.com>
Message-ID: <20030218172056.GA29805@metux.de>

On Tue, Feb 18, 2003 at 03:11:07PM +0100, Juergen Christoffel wrote:

> Yes, generational GC for example. 
> 
> Back in the eighties and early nineties, when Symbolics introduced
> generational GC on their Lisp Machines, large programs did run faster with
> GC turned on than without GC when the "Ephemeral GC" (that was their name
> for Generational GC, IIRC) was turned on because this reduced working sets.
How does this one work ? 

cu
-- 
---------------------------------------------------------------------
 Enrico Weigelt    ==   metux ITS 
 Webhosting ab 5 EUR/Monat.          UUCP, rawIP und vieles mehr.

 phone:     +49 36207 519931         www:       http://www.metux.de/     
 fax:       +49 36207 519932         email:     contact@metux.de
 cellphone: +49 174 7066481	     smsgate:   sms.weigelt@metux.de
---------------------------------------------------------------------
 Diese Mail wurde mit UUCP versandt.      http://www.metux.de/uucp/

From hans_boehm@hp.com  Tue Feb 18 17:49:12 2003
From: hans_boehm@hp.com (Boehm, Hans)
Date: Tue, 18 Feb 2003 09:49:12 -0800
Subject: [gclist] why malloc/free instead of GC?
Message-ID: <75A9FEBA25015040A761C1F74975667DA136A7@hplex4.hpl.hp.com>

By "roundtrip" I meant "malloc+free" or "malloc+<my share of GC>".

I should really have said "tracing GC" in the following, but that was the topic of discussion, I think.  Fundamentally, a tracing collector needs to do the same amount of tracing work whether a client allocates, say, 10 objects containing 100 bytes each, or a single 1000 byte objects.  Large objects cost proportionately more tracing work.  This is not true for malloc/free allocation, where the 1000 byte allocation+deallocation often doesn't cost more than a single 100 byte allocation.  (A pure reference count collector without cycle detection behaves more or less like malloc/free here.)

Of course, this almost never results in an asymptotic difference in the running time of the client program, since initializing the object will cost time proportional to its size anyway.  And most programs tend to initialize at least a constant fraction of the objects they allocate.  However, the posted "unrealistic and simplistic" test program which started this discussion did not.  And in my experience, with normal collector tuning, the initialization time is usually a small constant factor (e.g. 2-10?) less than the object round trip time.  Thus it does seem to matter in real life.

This becomes even more true for a fully conservative collector for C, which really has to initialize objects itself, in order to avoid preserving stale pointers.  In that case the allocation time includes initialization time.  (In real life, I doubt this makes a huge difference, since the initialization time tends to be dominated by cache miss time.  If the client initializes the object later, as it normally would, it thus avoids the cache miss time.  But the time cost has effectively been moved from the client into the allocator.)

A conservative GC for C usually worsens matters as follows:

- If it needs to accommodate existing libraries or compilers, it will probably have to recognize "interior pointers", at least of they're stored on the stack or in registers.  (This could be avoided with some compiler cooperation, which is really needed anyway, but is rarely implemented, since its induced failure rate tends to be less than that of other compiler bugs.)

- This means that for any known non-pointer N on the stack, we can't safely allocate a large array A such that N is an address within A.

- As the number of such nonpointers and/or the size of A increases, eventually we get to the point at which we can't find room in the right section of the address space to safely allocate A.

Empirically, this is generally a non-issue on 64-bit hardware.  On 32-bit hardware, with interior pointers recognized everywhere (the default for our collector with C code), and an otherwise favorable application, allocations larger than about 100K seem to be problematic.  With interior pointer recognition only on the stack (default for gcj, for example), the threshold seems to be about a MB.

As a result of both of these effects, I usually recommend that with our collector and C code, users at least consider explicitly managing very large objects.  Fortunately, in most cases I've heard about, this tends to be fairly easy.  Often the large objects tend to be things like IO buffers with well-defined lifetimes that are in fact easy to manage explicitly.  GC pays off for complex linked structures which tend to be composed of small objects.

I think the same advice applies to Java, although it's no doubt politically incorrect there.  Keeping explicit pools for large, easily-managed objects will mostly get the GC out of the picture once the pool is sufficiently large.  You pay a bit of space for type-safety, in that you can't reuse a given large object for a different type.  (If the large objects contain pointers, the GC also still needs to trace them.  But large objects seem to often be pointer-free, e.g. bitmaps.)  But I would guess that so long as you use this technique only in the few cases where it's really needed, that's not a large cost.

Hans

> -----Original Message-----
> From: David F. Bacon [mailto:dfb@watson.ibm.com]
> Sent: Tuesday, February 18, 2003 5:57 AM
> To: Boehm, Hans; 'Greg Hudson '; 'Basile STARYNKEVITCH '
> Cc: gclist@iecc.com
> Subject: Re: [gclist] why malloc/free instead of GC?
> 
> 
> hans,
> 
> by "roundtrip" do you mean malloc+free?  i don't understand 
> your statement
> about proportionality to object size in GC.  also, why do conservative
> collectors dislike large objects?  is it because a floating 
> point number
> could cause a dead large object to be retained?
> 
> david
> ----- Original Message -----
> From: "Boehm, Hans" <hans_boehm@hp.com>
> To: "'Greg Hudson '" <ghudson@MIT.EDU>; "'Basile STARYNKEVITCH '"
> <basile@starynkevitch.net>
> Cc: <gclist@iecc.com>
> Sent: Monday, February 17, 2003 11:24 PM
> Subject: Re: [gclist] why malloc/free instead of GC?
> 
> 
> > I would add:
> >
> > GC object roundtrip times are pretty much unavoidably 
> proportional to the
> object size, where malloc + free times can be nearly constant.  If you
> allocate primarily large objects, malloc+free will be cheaper.  (For
> sufficiently small objects, it usually isn't, at least based on my
> measurements.  Conservative collectors like large objects even less.)
> 
> 
> 

From weigelt@metux.de  Tue Feb 18 17:40:18 2003
From: weigelt@metux.de (Enrico Weigelt)
Date: Tue, 18 Feb 2003 18:40:18 +0100
Subject: [gclist] why malloc/free instead of GC?
In-Reply-To: <5.1.1.6.0.20030218080757.03080618@pop3.geodesic.com>
References: <75A9FEBA25015040A761C1F74975667DA136A4@hplex4.hpl.hp.com>
 <5.1.1.6.0.20030218080757.03080618@pop3.geodesic.com>
Message-ID: <20030218174018.GC29805@metux.de>

On Tue, Feb 18, 2003 at 08:27:55AM -0600, Charles Fiterman wrote:

<snip>
> Reference counting also has the advantage that the destruction of objects 
> can have rational finalizers. Finalizers must be safe, general, sure, 
> prompt and ordered. Safe means they don't violate the type system. General 
> means finalizers can run any code in the language and have that code 
> produce normal results, for example exceptions can't just get discarded. 
> Sure means if you build an object it gets to destroy itself. Prompt means 
> finalizers aren't indefinitely postponed. Ordered means finalizers run in a 
> determined order, if you ship the application you don't change the order 
> creating portability bugs.

reference counting is tricky if you're using ring structures.
if you use an 'clean' refcounting (_each_ time you're referencing, increase
the counter, and always decrease on dereference), you'll get dead chunks
which will never be freed, when using an ring structure.

cu
-- 
---------------------------------------------------------------------
 Enrico Weigelt    ==   metux ITS 
 Webhosting ab 5 EUR/Monat.          UUCP, rawIP und vieles mehr.

 phone:     +49 36207 519931         www:       http://www.metux.de/     
 fax:       +49 36207 519932         email:     contact@metux.de
 cellphone: +49 174 7066481	     smsgate:   sms.weigelt@metux.de
---------------------------------------------------------------------
 Diese Mail wurde mit UUCP versandt.      http://www.metux.de/uucp/

From hans_boehm@hp.com  Tue Feb 18 18:11:42 2003
From: hans_boehm@hp.com (Boehm, Hans)
Date: Tue, 18 Feb 2003 10:11:42 -0800
Subject: [gclist] why malloc/free instead of GC?
Message-ID: <75A9FEBA25015040A761C1F74975667DA136A8@hplex4.hpl.hp.com>

> -----Original Message-----
> From: Arlie Davis [mailto:arlie@sublinear.org]
> 
> Microsoft's CLR (.Net Framework) does a good job on "large" objects.
> Objects above a certain threshold (20k, I believe) are allocated in a
> traditional malloc/free heap, but their lifetime is still tracked
> through GC.  Also, since the CLR has access to all type 
> information, it
> only scans memory locations that are known to be pointers.
> 
> This is an elegant solution, because it shows that object lifetime
> (explicit free vs. GC) can be separated from allocation mechanism
> (contiguous heap w/compaction vs. fragmentable heap).
Agreed.  But it doesn't solve the fundamental (and unsolvable) problem.  If you allocate a 1MB object, that will still need to be considered by the GC triggering heuristic, and thus move you much closer to the next GC.  If it didn't, allocating many such objects in a row would cause unacceptable heap growth.
> 
> The same approach could easily be adopted by other GC implementations.
Since ours doesn't move objects, there's no real performance distinction between allocating something in the GC heap and the malloc/free heap, and it doesn't matter.  The real benefit of such a technique is that you avoid copying/moving large objects, if the collector otherwise does so.  I think many copying collectors use a similar technique.

Hans

From pekka.p.pirinen@globalgraphics.com  Tue Feb 18 18:17:42 2003
From: pekka.p.pirinen@globalgraphics.com (Pekka P. Pirinen)
Date: Tue, 18 Feb 2003 18:17:42 GMT
Subject: [gclist] why malloc/free instead of GC?
In-Reply-To: <20030218172056.GA29805@metux.de> (message from Enrico
 Weigelt on Tue, 18 Feb 2003 18:20:56 +0100)
Message-ID: <200302181817.h1IIHgt01139@anor.cam.harlequin.co.uk>

[just you]
> On Tue, Feb 18, 2003 at 03:11:07PM +0100, Juergen Christoffel wrote:
>> Yes, generational GC for example.  [...]
>
> How does this one work ? 

<http://www.iecc.com/gclist/GC-algorithms.html#Generational%20collection>
Or did you want to know about Symbolics Ephemeral GC specifically?
-- 
Pekka P. Pirinen

From dfb@watson.ibm.com  Tue Feb 18 18:19:46 2003
From: dfb@watson.ibm.com (David F. Bacon)
Date: Tue, 18 Feb 2003 13:19:46 -0500
Subject: [gclist] why malloc/free instead of GC?
References: <75A9FEBA25015040A761C1F74975667DA136A7@hplex4.hpl.hp.com>
Message-ID: <00e801c2d77a$511dbac0$a2590209@watson.ibm.com>

> By "roundtrip" I meant "malloc+free" or "malloc+<my share of GC>".

ok.

> I should really have said "tracing GC" in the following, but that was the
topic of discussion, I think.  Fundamentally, a tracing collector needs to
do the same amount of tracing work whether a client allocates, say, 10
objects containing 100 bytes each, or a single 1000 byte objects.  Large
objects cost proportionately more tracing work.  This is not true for
malloc/free allocation, where the 1000 byte allocation+deallocation often
doesn't cost more than a single 100 byte allocation.  (A pure reference
count collector without cycle detection behaves more or less like
malloc/free here.)

you're assuming the collector has to scan the whole object to find the
pointers, right?  for a type-accurate collector, the tracing work is
proportional to the pointer density of the program times the memory size,
which is usually much smaller.

> Of course, this almost never results in an asymptotic difference in the
running time of the client program, since initializing the object will cost
time proportional to its size anyway.  And most programs tend to initialize
at least a constant fraction of the objects they allocate.  However, the
posted "unrealistic and simplistic" test program which started this
discussion did not.  And in my experience, with normal collector tuning, the
initialization time is usually a small constant factor (e.g. 2-10?) less
than the object round trip time.  Thus it does seem to matter in real life.
>
> This becomes even more true for a fully conservative collector for C,
which really has to initialize objects itself, in order to avoid preserving
stale pointers.  In that case the allocation time includes initialization
time.  (In real life, I doubt this makes a huge difference, since the
initialization time tends to be dominated by cache miss time.  If the client
initializes the object later, as it normally would, it thus avoids the cache
miss time.  But the time cost has effectively been moved from the client int
o the allocator.)

i keep thinking that we should be able to fix this problem, at least for
objects larger than a cache line, by using the "cache line clear" operations
that now exist in many cpus.  has anyone expored this?

> As a result of both of these effects, I usually recommend that with our
collector and C code, users at least consider explicitly managing very large
objects.  Fortunately, in most cases I've heard about, this tends to be
fairly easy.  Often the large objects tend to be things like IO buffers with
well-defined lifetimes that are in fact easy to manage explicitly.  GC pays
off for complex linked structures which tend to be composed of small
objects.
>
> I think the same advice applies to Java, although it's no doubt
politically incorrect there.  Keeping explicit pools for large,
easily-managed objects will mostly get the GC out of the picture once the
pool is sufficiently large.  You pay a bit of space for type-safety, in that
you can't reuse a given large object for a different type.  (If the large
objects contain pointers, the GC also still needs to trace them.  But large
objects seem to often be pointer-free, e.g. bitmaps.)  But I would guess
that so long as you use this technique only in the few cases where it's
really needed, that's not a large cost.

object pooling is common in java systems.  the problem is that it brings
back all of the headaches of malloc/free.  i don't know about it being
"politically incorrect".  there is absolutely no reason why large objects
should be inefficient under gc in java.  but if you create some state
information, and the creation operation is expensive, then object pooling is
an attractive and useful feature regardless of the physical size of the
object.

david

From hans_boehm@hp.com  Tue Feb 18 18:45:13 2003
From: hans_boehm@hp.com (Boehm, Hans)
Date: Tue, 18 Feb 2003 10:45:13 -0800
Subject: [gclist] why malloc/free instead of GC?
Message-ID: <75A9FEBA25015040A761C1F74975667DA136A9@hplex4.hpl.hp.com>

> From: Charles Fiterman [mailto:cef@geodesic.com]
> If you have mark and sweep or moving collection at some point your 
> application will become so large that collection time causes 
> you to violate 
> it no matter how many CPU's you add. You must have a way to 
> distribute free 
> operations and not run them all at once.
Why?  All evidence I've seen suggests that

a) Heap sizes grow roughly with memory access speed.  The amount of time it takes to touch or trace a "large" heap on a "fast" seems to stay roughly constant over the years, even though the meaning of "large" and "fast" changes.

b) Tracing collection scales quite well with processor count.  I haven't done the measurement, but I strongly suspect that you can buy a machine that will collect a 10GB heap with 50% pointer-full-object-occupancy in under a second.  (It won't be a cheap machine, but ...)

> Reference counting also has the advantage that the 
> destruction of objects 
> can have rational finalizers. Finalizers must be safe, general, sure, 
> prompt and ordered. Safe means they don't violate the type 
> system. General 
> means finalizers can run any code in the language and have that code 
> produce normal results, for example exceptions can't just get 
> discarded. 
> Sure means if you build an object it gets to destroy itself. 
> Prompt means 
> finalizers aren't indefinitely postponed. Ordered means 
> finalizers run in a 
> determined order, if you ship the application you don't 
> change the order 
> creating portability bugs.
> 

I don't know how to get "prompt"ness and "sure"ness guarantees without running finalizers synchronously in the thread dropping the reference.  That's "safe" in your sense, but it's unsafe in that it can result in spurious deadlocks and/or similar problems.  Hence it's highly undesirable.  For details, see my 2003 POPL paper (http://portal.acm.org/citation.cfm?doid=604131.604153 or http://www.hpl.hp.com/techreports/2002/HPL-2002-335.html ).

Hans

From weigelt@metux.de  Tue Feb 18 18:25:17 2003
From: weigelt@metux.de (Enrico Weigelt)
Date: Tue, 18 Feb 2003 19:25:17 +0100
Subject: [gclist] why malloc/free instead of GC?
In-Reply-To: <Pine.GSO.4.44.0302181018050.7773-100000@slinky.cs.nyu.edu>
References: <5.1.1.6.0.20030218090549.02faf830@pop3.geodesic.com>
 <Pine.GSO.4.44.0302181018050.7773-100000@slinky.cs.nyu.edu>
Message-ID: <20030218182516.GD29805@metux.de>

On Tue, Feb 18, 2003 at 12:28:51PM -0500, Igor Pechtchanski wrote:

<snip>
> 1) those where the large heap size comes not from the amount of data
> handled in any one transaction, but rather from the number of concurrent
> transactions.  Since each transaction is (usually) a separate entity,
> approaches like region-based GC or transaction-specific heaps might work
> well there.
This is tricky if there are references from one region to another. 
How to cope with this ?

> 2) those actually handling massive amounts of data for each transaction
> (such as search engines).  Such applications mostly do not have a lot of
> simultaneous live data, just a large data stream.  Since the performance
> of, say, copying GC is proportionate to the amount of live data, this
> shouldn't affect performance.
For applications with limited lifetimes of several entities (i.e. request),
there's another quite interesting model. Look at the apache-2, they're 
using different pools for entities with different lifetimes 
(i.e. request vs. thread vs. global). This is a little bit like the 
unix process model, where the whole vm memory gets freed when the process
has died. The problem with this model is to take care of not mixing up
several pools.

-- 
---------------------------------------------------------------------
 Enrico Weigelt    ==   metux ITS 
 Webhosting ab 5 EUR/Monat.          UUCP, rawIP und vieles mehr.

 phone:     +49 36207 519931         www:       http://www.metux.de/     
 fax:       +49 36207 519932         email:     contact@metux.de
 cellphone: +49 174 7066481	     smsgate:   sms.weigelt@metux.de
---------------------------------------------------------------------
 Diese Mail wurde mit UUCP versandt.      http://www.metux.de/uucp/

From pechtcha@cs.nyu.edu  Tue Feb 18 18:58:07 2003
From: pechtcha@cs.nyu.edu (Igor Pechtchanski)
Date: Tue, 18 Feb 2003 13:58:07 -0500 (EST)
Subject: [gclist] why malloc/free instead of GC?
In-Reply-To: <20030218182516.GD29805@metux.de>
Message-ID: <Pine.GSO.4.44.0302181355170.7773-100000@slinky.cs.nyu.edu>

On Tue, 18 Feb 2003, Enrico Weigelt wrote:

> On Tue, Feb 18, 2003 at 12:28:51PM -0500, Igor Pechtchanski wrote:
>
> <snip>
> > 1) those where the large heap size comes not from the amount of data
> > handled in any one transaction, but rather from the number of concurrent
> > transactions.  Since each transaction is (usually) a separate entity,
> > approaches like region-based GC or transaction-specific heaps might work
> > well there.
> This is tricky if there are references from one region to another.
> How to cope with this ?

Same way generational GC copes with it, i.e., through write barriers.
However, when I said "separate entities", I meant mostly "independent
heaps", in which case there won't be any cross-references.

> > 2) those actually handling massive amounts of data for each transaction
> > (such as search engines).  Such applications mostly do not have a lot of
> > simultaneous live data, just a large data stream.  Since the performance
> > of, say, copying GC is proportionate to the amount of live data, this
> > shouldn't affect performance.
> For applications with limited lifetimes of several entities (i.e. request),
> there's another quite interesting model. Look at the apache-2, they're
> using different pools for entities with different lifetimes
> (i.e. request vs. thread vs. global). This is a little bit like the
> unix process model, where the whole vm memory gets freed when the process
> has died. The problem with this model is to take care of not mixing up
> several pools.

That's largely the idea behind region-based allocation/GC.  I believe it
was introduced in Trishul Chilimbi's paper, but I'm sure people here will
correct me if I'm wrong.
	Igor
-- 
				http://cs.nyu.edu/~pechtcha/
      |\      _,,,---,,_		pechtcha@cs.nyu.edu
ZZZzz /,`.-'`'    -.  ;-;;,_		igor@watson.ibm.com
     |,4-  ) )-,_. ,\ (  `'-'		Igor Pechtchanski
    '---''(_/--'  `-'\_) fL	a.k.a JaguaR-R-R-r-r-r-.-.-.  Meow!

Oh, boy, virtual memory! Now I'm gonna make myself a really *big* RAMdisk!
  -- /usr/games/fortune

From cef@geodesic.com  Tue Feb 18 19:07:09 2003
From: cef@geodesic.com (Charles Fiterman)
Date: Tue, 18 Feb 2003 13:07:09 -0600
Subject: [gclist] Replies.
Message-ID: <5.1.1.6.0.20030218130550.030f7fc0@pop3.geodesic.com>

I prefer not to have replies go to me and gclist. Since I'm obviously on 
gclist that means I get two copies. I can't imagine anyone who wouldn't 
have this preference.

From hans_boehm@hp.com  Tue Feb 18 19:33:14 2003
From: hans_boehm@hp.com (Boehm, Hans)
Date: Tue, 18 Feb 2003 11:33:14 -0800
Subject: [gclist] why malloc/free instead of GC?
Message-ID: <75A9FEBA25015040A761C1F74975667DA136AA@hplex4.hpl.hp.com>

> From: David F. Bacon [mailto:dfb@watson.ibm.com]
> 
> > I should really have said "tracing GC" in the following, 
> but that was the
> topic of discussion, I think.  Fundamentally, a tracing 
> collector needs to
> do the same amount of tracing work whether a client allocates, say, 10
> objects containing 100 bytes each, or a single 1000 byte 
> objects.  Large
> objects cost proportionately more tracing work.  This is not true for
> malloc/free allocation, where the 1000 byte 
> allocation+deallocation often
> doesn't cost more than a single 100 byte allocation.  (A pure 
> reference
> count collector without cycle detection behaves more or less like
> malloc/free here.)
> 
> you're assuming the collector has to scan the whole object to find the
> pointers, right?  for a type-accurate collector, the tracing work is
> proportional to the pointer density of the program times the 
> memory size,
> which is usually much smaller.
Even for the gcj collector, that's pretty much true.

But I think it matters only in that it makes the precise argument harder.  Assume you have a nongenerational collector, 20 MB of static 20% pointer-density live data in a 40MB heap, and you start repeatedly allocating and immediately dropping 1MB pointer-free objects.  You will still have to trace 4MB of pointers every 20 allocations.  Thus they will still be far more expensive than allocating cons cells.  (It's not hard to doctor this example to deal with a generational collector, though you probably need to allocate some pointers in that case.)

> > This becomes even more true for a fully conservative 
> collector for C,
> which really has to initialize objects itself, in order to 
> avoid preserving
> stale pointers.  In that case the allocation time includes 
> initialization
> time.  (In real life, I doubt this makes a huge difference, since the
> initialization time tends to be dominated by cache miss time. 
>  If the client
> initializes the object later, as it normally would, it thus 
> avoids the cache
> miss time.  But the time cost has effectively been moved from 
> the client int
> o the allocator.)
> 
> i keep thinking that we should be able to fix this problem, 
> at least for
> objects larger than a cache line, by using the "cache line 
> clear" operations
> that now exist in many cpus.  has anyone expored this?
In the case of our collector, it would clearly help in the not infrequent case of building a free list in an empty page.  The initial write in that case is with bzero or memset, which probably already take advantage of any such possibility.  In other cases, the limiting factor seems to be the fact that you don't want to introduce model dependencies by hard-coding the line size, etc.  I haven't experimented with it much, since I think none of the Intel architectures currently provide something along these lines.
> object pooling is common in java systems.  the problem is 
> that it brings
> back all of the headaches of malloc/free.
The question in my mind is whether you can confine it to one or two large object types.  My guess is that usually you can (and should).

I agree that widespread object pooling is generally a bad idea.  It's external fragmentation cost is usually too high in large systems, in addition to the malloc/free problems.

Hans

From weigelt@metux.de  Tue Feb 18 19:11:18 2003
From: weigelt@metux.de (Enrico Weigelt)
Date: Tue, 18 Feb 2003 20:11:18 +0100
Subject: [gclist] why malloc/free instead of GC?
In-Reply-To: <000201c2d773$a6f685a0$5bd1dc0c@sulaco>
References: <20030218133150.GC30555@metux.de>
 <000201c2d773$a6f685a0$5bd1dc0c@sulaco>
Message-ID: <20030218191118.GA5030@metux.de>

On Tue, Feb 18, 2003 at 12:32:03PM -0500, Arlie Davis wrote:

> Microsoft's CLR (.Net Framework) does a good job on "large" objects.
> Objects above a certain threshold (20k, I believe) are allocated in a
> traditional malloc/free heap, but their lifetime is still tracked
> through GC.  Also, since the CLR has access to all type information, it
> only scans memory locations that are known to be pointers.
This is just like oberon does w/ all objects.

If we've got some type information, we could use three heaps for 
different kinds of objects:

a) strings (and other objects which contain no pointers)
b) typed objects (we only have to know where pointers lay around)
c) untyped objects (must be treatened as an array of pointers)

For these three types we use different methods for catching pointers.
 
a) there are have no pointers. nothing to do
b) we have to look on each pointer location defined by the pointer map.
c) simply scan the whole chunk as an pointer array (assume aligned ptrs?)

On Unix we can map memory almost everywhere we wana have it, 
so we could split up our address space into several huge ranges,
where our pools lay in. So we can decide very fast, which object 
type an pointer points to.

Now it is the task of upper level functions to decide, where to allocate
an new object from. By default we use c) if we dont know more about
the object (malloc() replacement).

> Also, consider the example (brought up here) of an application which
> must process a high volume of transactions, with a high degree of
> consistency of time required per transaction.  Just because you use a
> GC, doesn't mean you *always* allocate fresh objects for every
> transaction.  You can still -- selectively -- use object pooling.  Many
> large applications that are based on explicit-free gain performance by
> pooling instances of objects.
Yes, for example you have an very large number of the same objects,
you could easily allocate an big array and use an simple allcation map.
You should try not to use pointers to the elements of this array,
because the gc perhaps does not like it (in oberon this is forbidden
by the quite strict type system), but adress by an simple index.
(this could also save memory space)

hmm... i'm currently thinking about using page fault information 
for optimizing the GC process (branch cutting?). is there any chance ?

cu
-- 
---------------------------------------------------------------------
 Enrico Weigelt    ==   metux ITS 
 Webhosting ab 5 EUR/Monat.          UUCP, rawIP und vieles mehr.

 phone:     +49 36207 519931         www:       http://www.metux.de/     
 fax:       +49 36207 519932         email:     contact@metux.de
 cellphone: +49 174 7066481	     smsgate:   sms.weigelt@metux.de
---------------------------------------------------------------------
 Diese Mail wurde mit UUCP versandt.      http://www.metux.de/uucp/

From rog@vitanuova.com  Tue Feb 18 19:44:31 2003
From: rog@vitanuova.com (rog@vitanuova.com)
Date: Tue, 18 Feb 2003 19:44:31 0000
Subject: [gclist] why malloc/free instead of GC?
Message-ID: <1136864263c71d58f046b54884bda3fd@vitanuova.com>

> a) strings (and other objects which contain no pointers)
> b) typed objects (we only have to know where pointers lay around)
> c) untyped objects (must be treatened as an array of pointers)

pointers between b) and c) are gonna cause problems.

From weigelt@metux.de  Tue Feb 18 19:33:21 2003
From: weigelt@metux.de (Enrico Weigelt)
Date: Tue, 18 Feb 2003 20:33:21 +0100
Subject: [gclist] why malloc/free instead of GC?
In-Reply-To: <1136864263c71d58f046b54884bda3fd@vitanuova.com>
References: <1136864263c71d58f046b54884bda3fd@vitanuova.com>
Message-ID: <20030218193321.GC5030@metux.de>

On Tue, Feb 18, 2003 at 07:44:31PM +0000, rog@vitanuova.com wrote:
> > a) strings (and other objects which contain no pointers)
> > b) typed objects (we only have to know where pointers lay around)
> > c) untyped objects (must be treatened as an array of pointers)
> 
> pointers between b) and c) are gonna cause problems.
why should they ? 

i dont wanna build an allocator with strictly separated heaps.
the different heaps are for easy detection of the best scan method.
you can simply guess on the address, whether the object is typed,
untyped or w/o pointers.

cu
-- 
---------------------------------------------------------------------
 Enrico Weigelt    ==   metux ITS 
 Webhosting ab 5 EUR/Monat.          UUCP, rawIP und vieles mehr.

 phone:     +49 36207 519931         www:       http://www.metux.de/     
 fax:       +49 36207 519932         email:     contact@metux.de
 cellphone: +49 174 7066481	     smsgate:   sms.weigelt@metux.de
---------------------------------------------------------------------
 Diese Mail wurde mit UUCP versandt.      http://www.metux.de/uucp/

From weigelt@metux.de  Tue Feb 18 19:28:03 2003
From: weigelt@metux.de (Enrico Weigelt)
Date: Tue, 18 Feb 2003 20:28:03 +0100
Subject: [gclist] why malloc/free instead of GC?
In-Reply-To: <Pine.GSO.4.44.0302181355170.7773-100000@slinky.cs.nyu.edu>
References: <20030218182516.GD29805@metux.de>
 <Pine.GSO.4.44.0302181355170.7773-100000@slinky.cs.nyu.edu>
Message-ID: <20030218192803.GB5030@metux.de>

On Tue, Feb 18, 2003 at 01:58:07PM -0500, Igor Pechtchanski wrote:

<snip>
> Same way generational GC copes with it, i.e., through write barriers.
> However, when I said "separate entities", I meant mostly "independent
> heaps", in which case there won't be any cross-references.
Ok, so the programmer has to make sure, that these two pools are 
strictly separate. hmm. 
This is almost the same like the pooled allocator of the apache2.

cu
-- 
---------------------------------------------------------------------
 Enrico Weigelt    ==   metux ITS 
 Webhosting ab 5 EUR/Monat.          UUCP, rawIP und vieles mehr.

 phone:     +49 36207 519931         www:       http://www.metux.de/     
 fax:       +49 36207 519932         email:     contact@metux.de
 cellphone: +49 174 7066481	     smsgate:   sms.weigelt@metux.de
---------------------------------------------------------------------
 Diese Mail wurde mit UUCP versandt.      http://www.metux.de/uucp/

From basile@starynkevitch.net  Tue Feb 18 22:13:49 2003
From: basile@starynkevitch.net (Basile STARYNKEVITCH)
Date: Tue, 18 Feb 2003 23:13:49 +0100
Subject: [gclist] why malloc/free instead of GC?
In-Reply-To: <01e301c2d755$8d1b6510$1c02a8c0@watson.ibm.com>
References: <75A9FEBA25015040A761C1F74975667DA136A4@hplex4.hpl.hp.com>
 <01e301c2d755$8d1b6510$1c02a8c0@watson.ibm.com>
Message-ID: <15954.45085.866899.37782@hector.lesours>

For completeness, I changed my tiny test a bit to allocate smaller
objects, to take into account Hans Boehm's remark on typical object
size


################################
// file essm.c

// compile for malloc: gcc -O essm.c -o essm
// compile for Boehm's GC: gcc -O -DUSEGC essm.c -o essm_gc -lgc
// compile for Qish GC: 
/// gcc -O -I../Qish/include -DUSEQISH essm.c -o essm_qish -L../Qish/lib -lqish -ldl

#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/times.h>
#include <assert.h>

#ifdef USEGC
#include <gc.h>
#define malloc(S) GC_malloc(S)
#ifdef USEGCFREE
#define free(P) GC_free(P)
#else
#define free(P) {}
#endif
#endif

void *tabptr[16];

#ifdef USEQISH
#include "qish.h"
#define tabptr qish_roots
#define HEADEROF(Ad) (*(unsigned*)(Ad))

struct obj_st
{
  unsigned header;
  void *tab[0];
};

void *
essm_qish_gc_copy (void **padr, void *dst, const void *src)
{
  int i = 0;
  static int cnt;
  struct obj_st *odst = dst;
  const struct obj_st *osrc = src;;
  unsigned header = osrc->header;
  cnt++;
  odst->header = osrc->header;
  for (i = 0; i < header; i++)
    odst->tab[i] = osrc->tab[i];
  *padr = odst;
  return (void *) (odst->tab + header);
}

void *
essm_qish_minor_scan (void *ptr)
{
  int i = 0;
  static int cnt;
  struct obj_st *ob = ptr;
  unsigned header = ob->header;
  cnt++;
  for (i = 0; i < header; i++)
    QISHGC_MINOR_UPDATE (ob->tab[i]);
  return (void *) (ob->tab + header);
}

void *
essm_qish_full_scan (void *ptr)
{
  int i = 0;
  static int cnt;
  struct obj_st *ob = ptr;
  unsigned header = ob->header;
  cnt++;
  for (i = 1; i <= header; i++)
    QISHGC_FULL_UPDATE (ob->tab[i]);
  return (void *) (ob->tab + header);
}

void
essm_qish_fixed_scan (void *ptr, int sz)
{
  qish_panic ("fixed_scan should not be called ptr=%p sz=%d", ptr, sz);
}

#endif


int
main (int argc, char **argv)
{
  long long maxcnt = 1000000;
#define MAXALLOC 20
  long long taballoc[MAXALLOC + 1];
  long long cumalloc = 0;
  long long i = 0;
  int r = 0, s = 0, n = 0;
  double usert, syst, tick;
  struct tms t;
  struct st *p = 0;
  if (argc > 1)
    maxcnt = atol (argv[1]) * 1000;
#ifdef USEQISH
  qishgc_init ();
  qish_gc_copy_p = essm_qish_gc_copy;
  qish_minor_scan_p = essm_qish_minor_scan;
  qish_fixed_scan_p = essm_qish_fixed_scan;
  qish_full_scan_p = essm_qish_full_scan;
#endif //USEQISH
  memset (&t, 0, sizeof (t));
  memset (taballoc, 0, sizeof (taballoc));
  if (maxcnt < 100000)
    maxcnt = 100000;
  printf ("begin maxcnt=%lld=%e\n", maxcnt, (double) maxcnt);
  for (i = 0; i < maxcnt; i++)
    {
      if ((i & 0x1fffff) == 0)
	printf ("i=%lld [=%.3g %%]\n", i, 100.0*(double)i/maxcnt);
      r = lrand48 () & 0xf;
#ifndef USEQISH
      if (tabptr[r])
	free (tabptr[r]);
#endif
      n = lrand48 () % 131072 + 4;
      // approximate s = integer square root(n)
      s = n / 256 + 2;
      s = (s + n / s) / 2;
      s = (s + n / s) / 2;
      s = (s + n / s) / 2;
      s = (s + n / s) / 2;
      s = (s + n / s) / 2;
      s = (s + n / s) / 2;
      s = (s + n / s) / 2;
      s = (s + n / s) / 2;
      s = (s + 8) & (~3);
      cumalloc += s;
      if (s / 16 < MAXALLOC)
	taballoc[s / 16]++;
#ifndef USEQISH
      tabptr[r] = malloc (s);
      if (s > 4 * sizeof (void *) && lrand48 () % 8 < 3)
	((void **) (tabptr[r]))[1] = tabptr[lrand48 () % 8];
#else
      n = s / 4 + 1;
      {
	volatile struct	{
	  struct obj_st *ptr;
	}	_locals_ =	{0};
#define l_ptr _locals_.ptr
	BEGIN_LOCAL_FRAME_WITHOUT_ARGS ();
	assert(n>0 && n<1000);
	QISH_ALLOCATE (l_ptr, sizeof (struct obj_st) + n * sizeof (void *));
	l_ptr->header = n;
	if (n > 3 && lrand48 () % 8 < 3)  {
	    l_ptr->tab[0] = (void *) (tabptr[lrand48 () % 8]);
	    QISH_WRITE_NOTIFY (l_ptr);
	}
	tabptr[r] = l_ptr;
	dbgprintf("r=%d n=%d l_ptr=%p", r, n, l_ptr);
	EXIT_FRAME ();
      }
#endif
      if (!tabptr[r])
	fprintf (stderr, "malloc(%d) failed i=%lld\n", s, i);
    };
  times (&t);
  tick = (double) sysconf (_SC_CLK_TCK);
  usert = ((double) t.tms_utime) / tick;
  syst = ((double) t.tms_stime) / tick;
  printf
    ("end maxcnt=%lld=%e cumulated alloc=%lld=%.3g bytes, mean %.3g bytes\n",
     maxcnt, (double) maxcnt, cumalloc, (double) cumalloc,
     (double) cumalloc / (double) maxcnt);
#if 0
  // don't work as I want...
  for (i = 0; i < MAXALLOC; i++)
    {
      long long ta = taballoc[i];
      printf ("alloc<%d: %lld=%.3g i.e. %.3g %%\n",
	      16 + i * 16, ta, (double) ta,
	      100.0 * ((double) ta / ((double) maxcnt)));
    };
#endif
#ifdef USEQISH
  printf("done %d minor & %d full garbage collections\n", 
	 qish_nb_minor_collections, qish_nb_full_collections);
#endif
  printf
    ("%s cputime user=%g system=%g tick=%g total == per iteration  user=%g system=%g\n",
     argv[0], usert, syst, tick, usert / (double) maxcnt,
     syst / (double) maxcnt);
  return 0;
}
// eof essm.c 


################################

with gcc -DUSEGC -O essm.c -o essm_gc -lgc
the program ./essm_gc gives (using Boehm's GC)

begin maxcnt=1000000=1.000000e+06
i=0 [=0 %]
end maxcnt=1000000=1.000000e+06 cumulated alloc=248908533=2.49e+08 bytes, mean 249 bytes
 ./essm_gc cputime user=2.13 system=0.02 tick=100 total == per iteration  user=2.13e-06 system=2e-08

with gcc -O -g essm.c -o essm 
the program ./essm gives (using malloc/free)
begin maxcnt=1000000=1.000000e+06
i=0 [=0 %]
end maxcnt=1000000=1.000000e+06 cumulated alloc=247413992=2.47e+08 bytes, mean 247 bytes
 ./essm cputime user=0.72 system=0 tick=100 total == per iteration  user=7.2e-07 system=0

For completeness and shameless plug I also hacked the same (useless)
program to use my Qish generational copying GC - see
http://freshmeat.net/projects/qish for details on Qish. I even added a
pair of BEGIN_LOCAL_FRAME_WITHOUT_ARGS + EXIT_FRAME macros, even if on
this particular example they are useless (since the result of
allocation goes into a global root).

 ./essm_qish
begin maxcnt=1000000=1.000000e+06
i=0 [=0 %]
end maxcnt=1000000=1.000000e+06 cumulated alloc=247407768=2.47e+08 bytes, mean 247 bytes
done 29 minor & 0 full garbage collections
 ./essm_qish cputime user=0.72 system=0.28 tick=100
total == per iteration  user=7.2e-07 system=2.8e-07

Now I try to run it bigger, with more allocations (so to trigger
several full garbage collections)

To have the full GC executed sevveral times, I run it more:
$PWD/essm_qish 54321
begin maxcnt=54321000=5.432100e+07
i=0 [=0 %]
i=2097152 [=3.86 %]
i=4194304 [=7.72 %]
i=6291456 [=11.6 %]
i=8388608 [=15.4 %]
i=10485760 [=19.3 %]
i=12582912 [=23.2 %]
i=14680064 [=27 %]
i=16777216 [=30.9 %]
i=18874368 [=34.7 %]
i=20971520 [=38.6 %]
i=23068672 [=42.5 %]
i=25165824 [=46.3 %]
i=27262976 [=50.2 %]
i=29360128 [=54 %]
i=31457280 [=57.9 %]
i=33554432 [=61.8 %]
i=35651584 [=65.6 %]
i=37748736 [=69.5 %]
i=39845888 [=73.4 %]
i=41943040 [=77.2 %]
i=44040192 [=81.1 %]
i=46137344 [=84.9 %]
i=48234496 [=88.8 %]
i=50331648 [=92.7 %]
i=52428800 [=96.5 %]
end maxcnt=54321000=5.432100e+07 cumulated alloc=13437122740=1.34e+10 bytes, mean 247 bytes
done 1645 minor & 6 full garbage collections
/home/basile/Misc/essm_qish cputime user=41.45 system=13.37 tick=100 total == per iteration  user=7.63057e-07 system=2.46129e-07

With the same allocation count the Boehm's test end with

i=52428800 [=96.5 %]
end maxcnt=54321000=5.432100e+07 cumulated alloc=13437153848=1.34e+10 bytes, mean 247 bytes
/home/basile/Misc/essm_gc cputime user=116.91 system=0.19 tick=100
total == per iteration  user=2.15221e-06 system=3.49773e-09

So a malloc/free is about 0.6 microseconds while a GC_malloc is about
2 (or 2.2) microseconds, and a Qish allocation is about 1.1
microseconds (on average) *including garbage collection time* (of
course Qish has more overhead in practice, because of the mandatory
local roots registration and of the write barrier; and Qish is much
less confortable to code with, since it requires a particular coding
style).


Sorry to Hans Boehm for having provided an unrealistic benchmark.

(If any reader of this list happens to have tried Qish I would be
delighted to get feedback; Qish is opensource, under LGPL)

Regards.
-- 

Basile STARYNKEVITCH         http://starynkevitch.net/Basile/ 
email: basile<at>starynkevitch<dot>net 
alias: basile<at>tunes<dot>org 
8, rue de la Faļencerie, 92340 Bourg La Reine, France

From hans_boehm@hp.com  Tue Feb 18 23:52:50 2003
From: hans_boehm@hp.com (Boehm, Hans)
Date: Tue, 18 Feb 2003 15:52:50 -0800
Subject: [gclist] why malloc/free instead of GC?
Message-ID: <75A9FEBA25015040A761C1F74975667DA136B2@hplex4.hpl.hp.com>

It looks to me like much of this difference can still be explained by the fact that GC_malloc initializes the resulting objects, and hence takes the cache misses that a real client would otherwise take later.  To make the measurements more comparable, you should initialize the objects after you allocate them.  (I still wouldn't expect GC_malloc to win.  I've normally seen that only for cons-cell sized or slightly larger objects.)

Are there also differences in the amount of thread support that's included in the measurements?  E.g. the system malloc usually tests a global to determine at runtime whether it needs to lock.

Hans

> -----Original Message-----
> From: Basile STARYNKEVITCH [mailto:basile@starynkevitch.net]
> Sent: Tuesday, February 18, 2003 2:14 PM
> To: gclist@iecc.com
> Subject: Re: [gclist] why malloc/free instead of GC?
> 
> 
> For completeness, I changed my tiny test a bit to allocate smaller
> objects, to take into account Hans Boehm's remark on typical object
> size
> 
> ...
> 

From fjh@cs.mu.OZ.AU  Wed Feb 19 04:10:46 2003
From: fjh@cs.mu.OZ.AU (Fergus Henderson)
Date: Wed, 19 Feb 2003 15:10:46 +1100
Subject: [gclist] why malloc/free instead of GC?
In-Reply-To: <15954.45085.866899.37782@hector.lesours>
References: <75A9FEBA25015040A761C1F74975667DA136A4@hplex4.hpl.hp.com>
 <01e301c2d755$8d1b6510$1c02a8c0@watson.ibm.com>
 <15954.45085.866899.37782@hector.lesours>
Message-ID: <20030219041046.GA27015@ceres.cs.mu.oz.au>

On 18-Feb-2003, Basile STARYNKEVITCH <basile@starynkevitch.net> wrote:
> For completeness and shameless plug I also hacked the same (useless)
> program to use my Qish generational copying GC - see
> http://freshmeat.net/projects/qish for details on Qish.

IIRC, qish depends on GCC's `-fvolatile' and `-fvolatile-globals' options,
right?

Firstly, because of this, it's not really fair to compare just GC
times, since qish will have a significant overhead on code which
does not do any allocation at all.  So benchmarks which do allocation
but have little or no computation (referencing global variables,
dereferencing pointers, etc.) will unfairly advantage qish.

Secondly, you may be interested to know that these options (or at least
`-fvolatile-globals' -- I'm not 100% sure about `-fvolatile') have been
removed from the CVS sources for GCC, because they were broken in GCC
versions 3.0 and beyond.  So this may cause trouble for Qish.

-- 
Fergus Henderson <fjh@cs.mu.oz.au>  |  "I have always known that the pursuit
The University of Melbourne         |  of excellence is a lethal habit"
WWW: <http://www.cs.mu.oz.au/~fjh>  |     -- the last words of T. S. Garp.

From basile@starynkevitch.net  Wed Feb 19 04:44:30 2003
From: basile@starynkevitch.net (Basile STARYNKEVITCH)
Date: Wed, 19 Feb 2003 05:44:30 +0100
Subject: [gclist] why malloc/free instead of GC?
In-Reply-To: <20030219041046.GA27015@ceres.cs.mu.oz.au>
References: <75A9FEBA25015040A761C1F74975667DA136A4@hplex4.hpl.hp.com>
 <01e301c2d755$8d1b6510$1c02a8c0@watson.ibm.com>
 <15954.45085.866899.37782@hector.lesours>
 <20030219041046.GA27015@ceres.cs.mu.oz.au>
Message-ID: <15955.2990.301487.95133@hector.lesours>

>>>>> "Fergus" == Fergus Henderson <fjh@cs.mu.OZ.AU> writes:

    Fergus> On 18-Feb-2003, Basile STARYNKEVITCH
    Fergus> <basile@starynkevitch.net> wrote:
    >> For completeness and shameless plug I also hacked the same
    >> (useless) program to use my Qish generational copying GC - see
    >> http://freshmeat.net/projects/qish for details on Qish.

    Fergus> IIRC, qish depends on GCC's `-fvolatile' and
    Fergus> `-fvolatile-globals' options, right?

Not exactly (see below). The posted code was compiled with (assuming
Qish is in ../Qish):

gcc -O -I../Qish/include -DUSEQISH essm.c -o essm_qish -L../Qish/lib \
 -lqish -ldl

For information, gcc -O3 also works and gives user=7.352e-07
system=2.272e-07 seconds per iteration, while the binary compiled with
-O gives user=7.525e-07 system=2.384e-07 seconds per iteration and the
binary compiled with -O0 [no optimisation at all] gives
user=8.243e-07 system=2.249e-07 seconds per iteration.


    Fergus> Firstly, because of this, it's not really fair to compare
    Fergus> just GC times, since qish will have a significant overhead
    Fergus> on code which does not do any allocation at all. 

I agree with the comment, but Qish does not require actually
-fvolatile or -fvolatile-globals [even if I wrote that in the
documentation; but I checked since the ISO C99 spec about
volatile]. It does require that pointer arguments are declared
volatile, and that local pointer variables are (like in the example)
in a volatile structure initialized to 0:

	volatile struct	{
	  struct obj_st *ptr;
	}	_locals_ =	{0};

    Fergus>  So
    Fergus> benchmarks which do allocation but have little or no
    Fergus> computation (referencing global variables, dereferencing
    Fergus> pointers, etc.) will unfairly advantage qish.

I agree with the remark above. But since -fvolatile is not required,
there is no advantage to Qish here, and even a disadvantage to Qish
(because it requires some careful coding conventions, and because the
mandatory BEGIN_LOCAL_FRAME*/EXIT_FRAME macros cost a few machine
instructions each in every call involving pointers).

    Fergus> Secondly, you may be interested to know that these options
    Fergus> (or at least `-fvolatile-globals' -- I'm not 100% sure
    Fergus> about `-fvolatile') have been removed from the CVS sources
    Fergus> for GCC, because they were broken in GCC versions 3.0 and
    Fergus> beyond.  So this may cause trouble for Qish.

I don't need them. I just need a compiler respecting the volatile
keyword, and a coder which carefully use them:

A.  in pointer arguments:

    foo(struct yourstruct_st* volatile p)

B.  in local pointers, like above.
 
BTW I actually tested some qish code with TinyCC (see www.tinycc.org).

Actually, I wrote that Qish needed -fvolatile before understanding
exactly what the volatile keyword means in C99. This was my
mistake. Qish don't need -fvolatile, but do need careful use of
volatile keyword (see points A,B above) and requires some specific
coding style (notably frame entering & exiting macros, and write
barrier macros).


And yes, Qish does have an overhead, because even functions which only
passes GC-ed pointers [to allocating functions] need to follow coding
conventions (in particular the BEGIN_LOCAL_FRAME*/EXIT_FRAME macros)
even if they don't do allocation themselves. so if f(p) calls g(p,q)
which calls h(p,r) which allocate pointers [where p,q,r are GC-ed
pointers arguments declared volatile] , all the f, g, and h functions
need the BEGIN_LOCAL_FRAME*/EXIT_FRAME macros pairs even if only h
allocate pointers.

Above all, Hans is right to recall that Qish is not multithreaded and
won't run in a multithreaded application. 
-- 

Basile STARYNKEVITCH         http://starynkevitch.net/Basile/ 
email: basile<at>starynkevitch<dot>net 
alias: basile<at>tunes<dot>org 
8, rue de la Faļencerie, 92340 Bourg La Reine, France

From basile@starynkevitch.net  Wed Feb 19 05:17:51 2003
From: basile@starynkevitch.net (Basile STARYNKEVITCH)
Date: Wed, 19 Feb 2003 06:17:51 +0100
Subject: [gclist] glib/gtk w/ GC
In-Reply-To: <20030218130421.GA30555@metux.de>
References: <20030218130421.GA30555@metux.de>
Message-ID: <15955.4991.443947.347765@hector.lesours>

>>>>> "Enrico" == Enrico Weigelt <weigelt@metux.de> writes:

    Enrico> hi folks, i'm gonna start working on an gc based derivate
    Enrico> of the glib/gtk.  anyone interested in helping ?

It is a huge work.

The main problem is that the memory mechanism is deeply rooted in
GTK2. The object reference counters goes down into glib/gobject, and
widgets are finalized.

What kind of GC do you want to use? Your own, or Boehm's?

Actually, I thought of doing this, and concluded that writing a
toolkit which borrows piece of code from GTK2 is easier than porting
GTK2 to a GC.

There used to be some (opensource, but not very popular) toolkits
above Boehm's GC.

You need some finalization for widgets, because they use system
resources (eg X11 windows).

-- 

Basile STARYNKEVITCH         http://starynkevitch.net/Basile/ 
email: basile<at>starynkevitch<dot>net 
alias: basile<at>tunes<dot>org 
8, rue de la Faļencerie, 92340 Bourg La Reine, France

From Arun_Singla@infosys.com  Wed Feb 19 05:02:44 2003
From: Arun_Singla@infosys.com (Arun Singla)
Date: Wed, 19 Feb 2003 10:32:44 +0530
Subject: [gclist] unsubscribe
Message-ID: <FDA2F718F151D411893000B0D0226B4502CEED10@mysmsg01.ad.infosy
 s.com>

Arun Singla

Software  Engineer
EISAA
Infosys Technologies Limited
Hootagalli
Mysore

Phone -91-821-404101
Fax -91-821-404200


http://www.infy.com
mailto: arun_singla@infosys.com


-----Original Message-----
From: Fergus Henderson [mailto:fjh@cs.mu.OZ.AU]
Sent: Wednesday, February 19, 2003 9:41 AM
To: Basile STARYNKEVITCH
Cc: gclist@iecc.com
Subject: Re: [gclist] why malloc/free instead of GC?

On 18-Feb-2003, Basile STARYNKEVITCH <basile@starynkevitch.net> wrote:
> For completeness and shameless plug I also hacked the same (useless)
> program to use my Qish generational copying GC - see
> http://freshmeat.net/projects/qish for details on Qish.

IIRC, qish depends on GCC's `-fvolatile' and `-fvolatile-globals'
options,
right?

Firstly, because of this, it's not really fair to compare just GC
times, since qish will have a significant overhead on code which
does not do any allocation at all.  So benchmarks which do allocation
but have little or no computation (referencing global variables,
dereferencing pointers, etc.) will unfairly advantage qish.

Secondly, you may be interested to know that these options (or at least
`-fvolatile-globals' -- I'm not 100% sure about `-fvolatile') have been
removed from the CVS sources for GCC, because they were broken in GCC
versions 3.0 and beyond.  So this may cause trouble for Qish.

--
Fergus Henderson <fjh@cs.mu.oz.au>  |  "I have always known that the
pursuit
The University of Melbourne         |  of excellence is a lethal habit"
WWW: <http://www.cs.mu.oz.au/~fjh>  |     -- the last words of T. S.
Garp.

From arlie@sublinear.org  Wed Feb 19 22:18:53 2003
From: arlie@sublinear.org (Arlie Davis)
Date: Wed, 19 Feb 2003 17:18:53 -0500
Subject: [gclist] why malloc/free instead of GC?
In-Reply-To: <75A9FEBA25015040A761C1F74975667DA136B2@hplex4.hpl.hp.com>
Message-ID: <000d01c2d864$e2ef54a0$5bd1dc0c@sulaco>

Also, note that most apps that use malloc/free for typical "class"
objects (small-to-medium size, with significant pointer density) perform
some sort of class initialization.  It may be a field-by-field
initialization of pointers, or (more often) it is a bulk zero fill
(modulo vtable setup).  The time to do this, and the cache misses, will
not show up in traces of malloc/free cost, but do show up in GC
allocations.

So, there is yet another reason that direct, API-level comparisons of GC
vs. malloc are inaccurate, or at least incomplete.  A better (though
still incomplete) comparison would be total time spent in, say, C++
new/delete, to GC alloc / GC collect.

Also, in environments that mix reference counting with unmanaged heaps,
such as COM development on Win32, you must also account for the time
spent in AddRef and Release.  Most thread-safe implementations use
interlocked integer primitives, which are quite costly on SMP machines.

I've done a fair amount of profiling of real-world server apps on Win32,
and in many implementations, SMP scalability is severely hindered by the
very high frequency of interlocked operations.  In services that make
heavy use of COM interfaces, reference counting is often one of the
biggest users of interlocked access.

All of this must be taken into account when considering the behavior of
real-world, complex applications & services, and how they use memory.

-- arlie


-----Original Message-----
From: owner-gclist@lists.iecc.com [mailto:owner-gclist@lists.iecc.com]
On Behalf Of Boehm, Hans
Sent: Tuesday, February 18, 2003 6:53 PM
To: 'Basile STARYNKEVITCH'
Cc: gclist@iecc.com
Subject: Re: [gclist] why malloc/free instead of GC?


It looks to me like much of this difference can still be explained by
the fact that GC_malloc initializes the resulting objects, and hence
takes the cache misses that a real client would otherwise take later.
To make the measurements more comparable, you should initialize the
objects after you allocate them.  (I still wouldn't expect GC_malloc to
win.  I've normally seen that only for cons-cell sized or slightly
larger objects.)

Are there also differences in the amount of thread support that's
included in the measurements?  E.g. the system malloc usually tests a
global to determine at runtime whether it needs to lock.

Hans

> -----Original Message-----
> From: Basile STARYNKEVITCH [mailto:basile@starynkevitch.net]
> Sent: Tuesday, February 18, 2003 2:14 PM
> To: gclist@iecc.com
> Subject: Re: [gclist] why malloc/free instead of GC?
> 
> 
> For completeness, I changed my tiny test a bit to allocate smaller 
> objects, to take into account Hans Boehm's remark on typical object 
> size
> 
> ...
> 

From jcampbell3@prodigy.net  Thu Feb 20 03:09:14 2003
From: jcampbell3@prodigy.net (Larry Evans)
Date: Wed, 19 Feb 2003 21:09:14 -0600
Subject: [gclist] why malloc/free instead of GC?
In-Reply-To: <000d01c2d864$e2ef54a0$5bd1dc0c@sulaco>
References: <000d01c2d864$e2ef54a0$5bd1dc0c@sulaco>
Message-ID: <3E5446DA.1080903@prodigy.net>

Arlie Davis wrote:
[snip]
> 
> Also, in environments that mix reference counting with unmanaged heaps,
> such as COM development on Win32, you must also account for the time
> spent in AddRef and Release.  Most thread-safe implementations use
> interlocked integer primitives, which are quite costly on SMP machines.
> 
> I've done a fair amount of profiling of real-world server apps on Win32,
> and in many implementations, SMP scalability is severely hindered by the
> very high frequency of interlocked operations.  In services that make
> heavy use of COM interfaces, reference counting is often one of the
> biggest users of interlocked access.
If smart pointers were used, wouldn't weighted reference counting
[ Richard E. Jones and Rafael D. Lins. _Cyclic weighted reference counting without delay_
   Technical Report 28-92, Computing Laboratory, The University of Kent at Canterbury, December 1992
]
alleviate this at the cost of more memory being used by the smart pointers?
> 
[snip]

From arlie@sublinear.org  Thu Feb 20 05:12:46 2003
From: arlie@sublinear.org (Arlie Davis)
Date: Thu, 20 Feb 2003 00:12:46 -0500
Subject: [gclist] why malloc/free instead of GC?
In-Reply-To: <3E5446DA.1080903@prodigy.net>
Message-ID: <001f01c2d89e$b4b490c0$5bd1dc0c@sulaco>

I believe the short answer is "no".  The paper you refer to deals with
discovering cyclic reference loops and dealing with them, especially in
distributed environments.

What I'm referring to is reference counting under Microsoft's COM (using
the IUnknown interface), and the nearly-mandatory implementation of
using interlocked integer access.  The environment I'm referring to is a
common one -- a single process hosting multiple threads, executing on
multiple processors, in which all threads may discover, use, and release
reference-counted interfaces.

Also note that implementations of the "weighted reference counting"
described in the paper would suffer the same performance problem, if you
allow for multiple threads to alter the same weighted reference.  The
threads will necessarily need to synchronize access to the weighted
reference field.  On most current SMP x86 systems, this can only be
accomplished using some form of interlocked access, or techniques that
boil down to the same.

Basically, they are totally different problems.

-- arlie


-----Original Message-----
From: owner-gclist@lists.iecc.com [mailto:owner-gclist@lists.iecc.com]
On Behalf Of Larry Evans
Sent: Wednesday, February 19, 2003 10:09 PM
To: gclist@iecc.com
Subject: Re: [gclist] why malloc/free instead of GC?


Arlie Davis wrote:
[snip]
> 
> Also, in environments that mix reference counting with unmanaged 
> heaps, such as COM development on Win32, you must also account for the

> time spent in AddRef and Release.  Most thread-safe implementations 
> use interlocked integer primitives, which are quite costly on SMP 
> machines.
> 
> I've done a fair amount of profiling of real-world server apps on 
> Win32, and in many implementations, SMP scalability is severely 
> hindered by the very high frequency of interlocked operations.  In 
> services that make heavy use of COM interfaces, reference counting is 
> often one of the biggest users of interlocked access.
If smart pointers were used, wouldn't weighted reference counting [
Richard E. Jones and Rafael D. Lins. _Cyclic weighted reference counting
without delay_
   Technical Report 28-92, Computing Laboratory, The University of Kent
at Canterbury, December 1992 ] alleviate this at the cost of more memory
being used by the smart pointers?
> 
[snip]

From weigelt@metux.de  Thu Feb 20 14:54:56 2003
From: weigelt@metux.de (Enrico Weigelt)
Date: Thu, 20 Feb 2003 15:54:56 +0100
Subject: [gclist] glib/gtk w/ GC
In-Reply-To: <029d01c2d8e3$7c705e70$34128aca@z>
References: <20030218130421.GA30555@metux.de>
 <003d01c2d757$03d56b00$d46a86cb@z> <20030218171500.GA28255@metux.de>
 <000901c2d805$a3f677b0$34128aca@z> <20030219211516.GA25569@metux.de>
 <029d01c2d8e3$7c705e70$34128aca@z>
Message-ID: <20030220145455.GC1764@metux.de>

On Thu, Feb 20, 2003 at 11:25:06PM +1000, Steven Shaw wrote:

<snip>
> You wish to convince people to use gc?
> 
> What I'm trying to say is that some people who use glib who wouldn't want gc
> (perhaps because they couldn't live with the downside).
No, i simply want an lib like glib, but gc-based.
I'll then start porting some applications to this one.

<snip>
> > > You might find some resistance to what you propose because of that.
> > > I guess you are proposing a fork anyways?
> > Well, i dont care of them. An forkoff will me necessary, because
> > this new lib _will_ break the existing interfaces.
> 
> Sure. I guess everyone using original-glib can continue to use that if they
> want. Others can adopt the new gc-glib you propose.
Yes, but that's not the whole point. 
IMHO it is very important, that an library which is meant to be production
stable _must_ provide at least the same interface (or an derived one) of
it's earlier version, so it can _always_ be used as an drop-in replacement
for the older versions.

> > btw: at this point we also should start defining _strict_ interfaces,
> > which must bei 100% reliable: if an version a supports some interface,
> > the following versions _must_ continue providing them.
> 
> Tell me more about _strict_ interfaces. Why are you so concerned over it?
> Are you proposing something like MS-COM?
No, i'm speaking of library/module interfaces at several points of view.
Let's take some examples:

* glib-1.2-binary-i386:
    + derived from glib-1.1-binary-i386
    + runs on systems which provide i386 processor enviroment
    + links clients against glib.so.1.2-i386
    + exported functions (w/ function signatures, ...)

* glib-1.2-binary-i686:
    + derived from glib-1.2-binary-i686
    + runs on systems which provide i686 processor enviroment
    + links clients against glib.so.1.2-i686
    + exported functions ...

* glib-1.2-C-include:
    + derived from glib-1.1-C-include
    + provides functions, types, variables, defines
    + provides rules for interface translation on compile time
    + specifies pathes, etc.    

So if we are doing an translation (loading an binary into an VM dyn. linking
is also an translation process just as compiling sources to binaries)

Now we're compiling an application againgst glib, we import the interface
glib-1.2-C-include. The translator now knows evrything about the glib's 
C-binding necessary to build an glib-based application. As an product of
this translation we have an package which needs glib as an dynamic library
in some special binary format (i.e.i686), do it requires the appropriate
interface (glib-1.2-binary-i686)

> I wish there was a programming system where it was easy to have constant
> (inevitable) evolution of the interfaces; where old libraries can be used
> side-by-side with new ones. 
Yes, i want to enforce this. It's an kind of design-by-contract.

cu
-- 
---------------------------------------------------------------------
 Enrico Weigelt    ==   metux ITS 
 Webhosting ab 5 EUR/Monat.          UUCP, rawIP und vieles mehr.

 phone:     +49 36207 519931         www:       http://www.metux.de/     
 fax:       +49 36207 519932         email:     contact@metux.de
 cellphone: +49 174 7066481	     smsgate:   sms.weigelt@metux.de
---------------------------------------------------------------------
 Diese Mail wurde mit UUCP versandt.      http://www.metux.de/uucp/

From weigelt@metux.de  Thu Feb 20 15:04:36 2003
From: weigelt@metux.de (Enrico Weigelt)
Date: Thu, 20 Feb 2003 16:04:36 +0100
Subject: [gclist] glib/gtk w/ GC
In-Reply-To: <15955.4991.443947.347765@hector.lesours>
References: <20030218130421.GA30555@metux.de>
 <15955.4991.443947.347765@hector.lesours>
Message-ID: <20030220150436.GE1764@metux.de>

On Wed, Feb 19, 2003 at 06:17:51AM +0100, Basile STARYNKEVITCH wrote:

<snip>

> What kind of GC do you want to use? Your own, or Boehm's?
I'd start with boehm's, but then try to do some optimizations,
i.e several pools for different object classes (strings, etc)

<snip>
> There used to be some (opensource, but not very popular) toolkits
> above Boehm's GC.
examples ?

<snip>
> You need some finalization for widgets, because they use system
> resources (eg X11 windows).
yes, but finalization should not be such an problem.

cu
-- 
---------------------------------------------------------------------
 Enrico Weigelt    ==   metux ITS 
 Webhosting ab 5 EUR/Monat.          UUCP, rawIP und vieles mehr.

 phone:     +49 36207 519931         www:       http://www.metux.de/     
 fax:       +49 36207 519932         email:     contact@metux.de
 cellphone: +49 174 7066481	     smsgate:   sms.weigelt@metux.de
---------------------------------------------------------------------
 Diese Mail wurde mit UUCP versandt.      http://www.metux.de/uucp/

From mhamburg@adobe.com  Thu Feb 20 18:40:21 2003
From: mhamburg@adobe.com (Mark Hamburg)
Date: Thu, 20 Feb 2003 10:40:21 -0800
Subject: [gclist] Daily gclist MIME digest V4 #28
In-Reply-To: <200302201020.h1KAKbkn014113@smtp-relay-1.adobe.com>
Message-ID: <BA7A6115.21257%mhamburg@adobe.com>

on 2/20/03 2:20 AM, gclist-owner@lists.iecc.com at
gclist-owner@lists.iecc.com wrote:

> Also, in environments that mix reference counting with unmanaged heaps,
> such as COM development on Win32, you must also account for the time
> spent in AddRef and Release.  Most thread-safe implementations use
> interlocked integer primitives, which are quite costly on SMP machines.
>
> I've done a fair amount of profiling of real-world server apps on Win32,
> and in many implementations, SMP scalability is severely hindered by the
> very high frequency of interlocked operations.  In services that make
> heavy use of COM interfaces, reference counting is often one of the
> biggest users of interlocked access.

I can believe that naļve reference counting is expensive. I've worked on
projects that go through a fair number of moderately unsafe contortions to
avoid needless increments and decrements.

The best scheme I've seen so far for dealing with this -- particularly in a
non-GC friendly environment -- has probably been what Apple (NeXT) did in
Cocoa (NeXTStep) with the autorelease pools.  These allow most references on
the stack to be passed around with no need to increment or decrement
pointers. I suspect that most naļve implementations of COMPtr or RCPtr
templates in contrast increment and decrement the count on each
construction, destruction, and assignment. If you pass through a lot of
subroutines, that gets very expensive very quickly.

Mark

From jmunsin@iki.fi  Thu Feb 20 19:15:31 2003
From: jmunsin@iki.fi (Jonas Munsin)
Date: Thu, 20 Feb 2003 21:15:31 +0200
Subject: [gclist] why malloc/free instead of GC?
In-Reply-To: <15953.15577.512872.84744@hector.lesours>
References: <1045507612.1760.58.camel@mwhlaptop>
 <15953.15577.512872.84744@hector.lesours>
Message-ID: <20030220191531.GA17288@nemo.sby.abo.fi>

On Mon, Feb 17, 2003 at 08:49:45PM +0100, Basile STARYNKEVITCH wrote:
> Actually, I'm surprised that today's major opensource projects (like
> Apache, GNOME, KDE...) don't use GC [with the exception of Emacs,
> which used to explicitly show GC periods to user - this was a wrong
> decision, because it made users complain against GC].

There are a few C opensource projects which use gc, w3m is one that
comes to mind.

From mwh@cs.umd.edu  Wed Feb 26 19:50:28 2003
From: mwh@cs.umd.edu (Michael Hicks)
Date: Wed, 26 Feb 2003 14:50:28 -0500
Subject: [gclist] controlling heapsize in BDW collector
Message-ID: <1046289029.1519.64.camel@mwhlaptop>

Hi all.

I wonder if anyone can provide some input on how to correctly set the
heapsize for the BDW collector.  I'm trying to do some performance
comparisons between GC and non-GC'ed apps, and in particular I want to
examine the tradeoff between memory footprint and latency in a GC'ed
setting.  The idea is that the more memory you're willing to allow, the
less latency impact there will be with GC, since you'll collect less
often.  And the converse is also true.

So, I have an application that has about a 128K footprint when using
GC_malloc and GC_free, and about a 348K footprint when removing the
GC_free's so that the collector is used.  What I'd like to do is force
the heapsize to be somewhere between 128K and 348K (as close to 128K as
possible) while still using the collector, so that garbage collections
occur more often.  Then I can assess the latency impact.  However, when
I do this by calling GC_set_max_heap_size(max_heap_size), GC_malloc
returns NULL in basically every case unless I set max_heap_size to be
roughly 348K.  I also set the GC_use_entire_heap flag to be true, with
the same result.

Why would this be happening?  When using GC_free, the heap usage never
rises above 100K, so it's not that I'm allocating a lot of batched
objects and then freeing them all at once.  By the same token, I'd be
really surprised if this was some kind of fragmentation overhead (2/3 of
the heap is fragmentation!!!???).  The objects being allocated are
relatively large, ranging from 2K to 15K.  Finally, spurious retention
also seems unlikely: to be safe I NULL all of the objects that are
allocated (these are packets being forwarded by a proxy), and the
results are the same.

If this is not some kind of limitation with the collector, can anyone
suggest how I would go about debugging this behavior?  Turning off
-DSILENT has not been too helpful.  Has anyone had success setting the
maximum heapsize to something below what the collector would naturally
come to?

Thanks in advance,
Mike

From hans_boehm@hp.com  Wed Feb 26 20:57:44 2003
From: hans_boehm@hp.com (Boehm, Hans)
Date: Wed, 26 Feb 2003 12:57:44 -0800
Subject: [gclist] controlling heapsize in BDW collector
Message-ID: <75A9FEBA25015040A761C1F74975667DA136E1@hplex4.hpl.hp.com>

[I recently set up the gc@linux.hpl.hp.com mailing list for discussions specific to this collector.  I'm not sure that this question is completely collector specific, but if it were, that would be an alternative place to ask.]

Do you clear pointers to objects at the same point at which you would have explicitly deallocated them?  Otherwise, I would expect that the maximum amount of reachable memory is larger than the maximum amount of malloc/free allocated memory.  A factor of 3 seems unlikely, but not impossible.

You are really operating the collector at a point it wasn't designed for.  In particular, it sounds like you only have on the order of 10 live objects around.  The collector will perform suboptimally here for a variety of reasons:

1) Garbage collectors are inherently not terribly efficient with an average object size of 10K or so.  See the previous discussion on this list.

2) A conservative collector, or one with otherwise incomplete liveness information, will typically follow some small number of pointers on the stack that were used as compiler temporaries, but are really dead.  I would normally expect this number to be on the order of at most a dozen, and it usually doesn't matter.  But with only a dozen live objects ...

3) The collector needs to scan some of amount of static data, e.g. owned by libc, during each collection cycle.  Even a 300K heap is too small to amortize that cost.  (It will try to grow the heap to compensate, though GC_set_max_heap_size or the GC_MAXIMUM_HEAP_SIZE environment variable should inhibit that.)

4) The collector's data structures aren't tuned for heaps this small.  The heap expansion increment and some temporary space areas are too large by default.

If you want to debug this, try placing building a debuggable collector and placing a breakpoint in GC_expand_heap_inner().  Looking at the stack at the last heap expansion generally gives you a good idea why it decided it needed to grow the heap.  Calling GC_dump() at that point should tell you something about what the heap looks like.  (And with a 340K heap, the size of the dump will be manageable.)

Hans

> -----Original Message-----
> From: Michael Hicks [mailto:mwh@cs.umd.edu]
> Sent: Wednesday, February 26, 2003 11:50 AM
> To: gclist@iecc.com
> Subject: [gclist] controlling heapsize in BDW collector
> 
> 
> Hi all.
> 
> I wonder if anyone can provide some input on how to correctly set the
> heapsize for the BDW collector.  I'm trying to do some performance
> comparisons between GC and non-GC'ed apps, and in particular I want to
> examine the tradeoff between memory footprint and latency in a GC'ed
> setting.  The idea is that the more memory you're willing to 
> allow, the
> less latency impact there will be with GC, since you'll collect less
> often.  And the converse is also true.
> 
> So, I have an application that has about a 128K footprint when using
> GC_malloc and GC_free, and about a 348K footprint when removing the
> GC_free's so that the collector is used.  What I'd like to do is force
> the heapsize to be somewhere between 128K and 348K (as close 
> to 128K as
> possible) while still using the collector, so that garbage collections
> occur more often.  Then I can assess the latency impact.  
> However, when
> I do this by calling GC_set_max_heap_size(max_heap_size), GC_malloc
> returns NULL in basically every case unless I set max_heap_size to be
> roughly 348K.  I also set the GC_use_entire_heap flag to be true, with
> the same result.
> 
> Why would this be happening?  When using GC_free, the heap usage never
> rises above 100K, so it's not that I'm allocating a lot of batched
> objects and then freeing them all at once.  By the same token, I'd be
> really surprised if this was some kind of fragmentation 
> overhead (2/3 of
> the heap is fragmentation!!!???).  The objects being allocated are
> relatively large, ranging from 2K to 15K.  Finally, spurious retention
> also seems unlikely: to be safe I NULL all of the objects that are
> allocated (these are packets being forwarded by a proxy), and the
> results are the same.
> 
> If this is not some kind of limitation with the collector, can anyone
> suggest how I would go about debugging this behavior?  Turning off
> -DSILENT has not been too helpful.  Has anyone had success setting the
> maximum heapsize to something below what the collector would naturally
> come to?
> 
> Thanks in advance,
> Mike
> 

From tkb@tkb.mpl.com  Wed Feb 26 21:09:27 2003
From: tkb@tkb.mpl.com (tkb@tkb.mpl.com)
Date: Wed, 26 Feb 2003 16:09:27 -0500
Subject: [gclist] controlling heapsize in BDW collector
In-Reply-To: <75A9FEBA25015040A761C1F74975667DA136E1@hplex4.hpl.hp.com>
References: <75A9FEBA25015040A761C1F74975667DA136E1@hplex4.hpl.hp.com>
Message-ID: <15965.11527.89770.911730@erekose.mpl.com>

Boehm, Hans writes:
> [I recently set up the gc@linux.hpl.hp.com mailing list for
> discussions specific to this collector.  I'm not sure that this
> question is completely collector specific, but if it were, that
> would be an alternative place to ask.]

I'll repeat the information from Hans Boehm's web site

    http://www.hpl.hp.com/personal/Hans_Boehm/gc/ 

about subscribing to that mailing list for quick reference.

    We have recently set up two mailing list for collector announcements
    and discussions:

       * gc-announce@linux.hpl.hp.com is used for announcements of new

    versions. Postings are restricted. We expect this to always remain a
    very low volume list.

       * gc@linux.hpl.hp.com is used for discussions, bug reports, and

    the like. Subscribers may post. 

    To subscribe to these lists, send a mail message containing the word
    "subscribe" to gc-announce-request@linux.hpl.hp.com or to
    gc-request@linux.hpl.hp.com. (Please ignore the instructions about
    web-based subscription. The listed web site is behind the HP firewall.)
-- 
T. Kurt Bond, tkb@tkb.mpl.com