[gclist] When to collect.

Giuliano Carlini GCARLINI@us.oracle.com
14 Nov 97 14:43:22 -0800


--=_ORCL_15854894_0_0
Content-Transfer-Encoding:7bit
Content-Type:text/plain; charset="us-ascii"

Neat idea Charles. You do have to be careful though so you
don't end up optimizing locally but deoptimizing globally.
Your approach implemented naively would speed up a process,
but if used by many processes would often cause the host to
slow down.

How much memory the process should use is clearly dependent
on physical memory size in addition to the OS's limit on
the amount of physical memory the process is allowed to
consume.

Using your stats, going from 256K to 1M roughly doubles the
speed. Going from 1M to 16M roughly doubles it again. On
a machine with 64M physical memory using 1M probably makes
sense, but 16 probably doesn't. But, if you had 512M physical,
using 16M would probably be okay.

g

--=_ORCL_15854894_0_0
Content-Type:message/rfc822

Date: 14 Nov 97 09:41:27
From:Charles Fiterman <cef@geodesic.com>
To:gclist@iecc.com
Subject:[gclist] When to collect.
Return-Path:<majordom-gclist-out-owner-GCARLINI=us.oracle.com@iecc.com>
Delivered-To:majordom-gclist-out@iecc.com
X-Sender:mail38454@alterdial.uu.net
Sender:owner-gclist@iecc.com
Precedence: bulk
MIME-Version: 1.0
Content-Transfer-Encoding:7bit
Content-Type:text/plain; charset="us-ascii"

The problem is what I call the collect or expand
decision. Given a request that can't be filled
without expand or collect which do I do.

I don't like any of the current algorithms and
are trying to rethink from scratch.

First the idea is not to get by with as little
memory as you can but to use as much as won't
trigger paging. This simple program shows what
I mean.

#include <strstrea.h>
#include <iomanip.h>
#include <duration.h>
#include <gct.h>

main()
{
	duration t;
	unsigned long e = 32 * 1024, col = 0, count = 100000;

	cout.setf(ios::fixed);
	cout.precision(2);

	gcPrintStats = 1;

	cout << "Show how Great Circle trades space for speed." 
		 << endl
		 << endl;

	for (;;) {
		t.start();

		for (unsigned long i = 0; i < count; i++)
			new char[1024];

		cout << count 
			 << " allocations took "
			 << setw(6) << t.end()
			 << " seconds and "
			 << setw(4) << (gcCollections - col)
			 << " collections"
			 << endl;
		col = gcCollections;

		if ((e <<= 1) > (32 * 1024 * 1024))
			break;

		cout << (gcExpandHeap(e) ?
				 "Expanded heap by " : "Failed to expand heap by ")
			 << setw(7) << e
			 << endl;
	}
}

It runs faster and faster until is runs into paging and then it slows 
down very violently.

	Initial heapsize	Elapsed time	Number of collections

	 64 kB         		8.6 sec		366
	128 kB         		6.7 sec		295
	256 kB         		5.2 sec		213
	512 kB         		4.8 sec		138
	  1 MB         		3.7 sec		 81
	  2 MB         		3.1 sec		 44
	  4 MB         		2.8 sec		 22
	  8 MB         		2.4 sec		 12
	 16 MB         		2.3 sec		  6
	 32 MB         	      129.8 sec        	  3

This suggests that the system administrator suggest a top size.
Collect or expand would prefer expand until this was reached.
Then timeing would say if we are spending more than X% of our
time in collect, expand otherwise collect. X should be substantial
as exceeding the limit triggers paging which will be worse.

This has another advantage. Overhead in a conservative collector
divides into three parts. Scaning, zeroing and allocation. All
these tend to be tightly optomized. Keeping statistics on what
was allocated since the last collection slows down allocation.

			-  
Charles Fiterman		Geodesic Systems
414 North Orleans Suite 410	Phone 312 832 1221 x223
Chicago IL 60610-4418		FAX   312 832 1230
				http://www.geodesic.com

A young man saw a product advertised in a magazine that said
"Guaranteed 100% effective against bugs when properly used."
He sent away and got two blocks of wood with the instructions
"Place bug on block A, strike sharply with block B." Debuggers
are like that, and you can get stung while using them. Great
Circle tries to be something better than that.


--=_ORCL_15854894_0_0--