[gclist] My copying collector or Boehm's?
Ji-Yong D. Chung
virtualcyber@erols.com
Fri, 16 Mar 2001 18:20:59 -0500
Hi,
There is always a good chance, given my relative inexperience,
that I really could be screwing up with my use of your collector. At least
it is not crashing :)
> [I wrote]
> > With Boehm's collector, my interpreter runs about 2.5
> > times slower than before. This is not a knock on Boehm's
> > collector.
> [you replied] I still find that surprising, since it suggests that your
interpreter is
> spending >60% of its time allocating/collecting with our collector,
> something I don't normally see. A profile and GC log would be interesting
> to me. In particular, it would be nice to know:
> a) How much time is spent in the marker?
> b) How much time is spent locking around allocation? (The win32 threads
> port uses EnterCriticalSection() and LeaveCriticalSection(). On many
other
> platforms a custom locking scheme is used instead, since the standard one
> exhibited serious performance problems, some form of convoying being the
> most common.
Only 1 thread was really active in my test runs, so I'd guess that
convoy did not have a chance to form.
In any case, I did two profile runs on VC++6.0. One for my Scheme
interpreter's
main evaluation loop, and another one for GC_malloc. (While it would have
been
preferable to profile them in the same run, I had slight difficulty with
getting my profiler
to run correctly).
Both profiles were taken, with my interpreter running the same scheme source
code.
=====================================================================
|| This profile run is on the Evaluation loop of my Scheme interpreter.
|| This is dificult to read, but there as supposed to be 6 columns.
|| They are:
|| (1) function time in milliseconds,
|| (2) function time as percent of overall measured time
|| (3) function+child time,
|| (4) function+child time as percent of overall measured time
|| (5) hit count
|| (6) name of the function.
Profile: Function timing, sorted by time
Date: Fri Mar 16 17:37:57 2001
Program Statistics
------------------
Command line at 2001 Mar 16 17:34: tpscript.exe
Start function: _GC_malloc
Total time: 18652.854 millisecond
Time outside of functions: 7747.360 millisecond
Call depth: 13
Total functions: 3854
Total hits: 5216935
Function coverage: 2.9%
Overhead Calculated 6
Overhead Average 6
Module Statistics for tpscript.exe
----------------------------------
Time in module: 6488.312 millisecond
Percent of time in module: 59.5%
Functions in module: 3589
Hits in module: 2337349
Module function coverage: 0.1%
Func Func+Child Hit
Time % Time % Count
Function
---------------------------------------------------------
6475.395 59.4 10726.211 98.4 2336757
_GC_malloc (gc.dll)
11.505 0.1 28.241 0.3 148
_GC_default_push_other_roots (gc.dll)
0.838 0.0 1.082 0.0 148
_GC_push_all_stack (gc.dll)
0.573 0.0 1.086 0.0 148
_GC_push_finalizer_structures (gc.dll)
0.000 0.0 0.000 0.0 148
Script::SchemeVM::PushRoots(void)
Note* -- As you indicated in your email, my program spends most
the time in _GC_malloc. I think this included the time spent on
garbage collection as well.
============================================================
|| Here is the measurement on your GC.dll
Module Statistics for gc.dll
----------------------------
Time in module: 4417.182 millisecond
Percent of time in module: 40.5%
Functions in module: 265
Hits in module: 2879586
Module function coverage: 39.6%
Func Func+Child Hit
Time % Time % Count Function
---------------------------------------------------------
2097.812 19.2 2112.566 19.4 10188 _GC_mark_from (mark.obj)
559.504 5.1 559.504 5.1 13777 _GC_reclaim_clear
(reclaim.obj)
125.837 1.2 171.310 1.6 2380112 _GC_malloc (malloc.obj)
111.131 1.0 133.912 1.2 2093 _GC_build_fl (new_hblk.obj)
105.330 1.0 384.709 3.5 296 _GC_apply_to_all_blocks
(headers.obj)
99.517 0.9 4245.413 38.9 18596 _GC_generic_malloc
(malloc.obj)
75.057 0.7 863.804 7.9 18744 _GC_continue_reclaim
(reclaim.obj)
74.471 0.7 74.471 0.7 1794 _GC_reclaim_clear2
(reclaim.obj)
66.135 0.6 66.135 0.6 25577 _GC_clear_hdr_marks
(mark.obj)
65.022 0.6 4058.541 37.2 18596 _GC_allocobj (alloc.obj)
56.553 0.5 188.229 1.7 24667 _GC_reclaim_block
(reclaim.obj)
53.679 0.5 750.716 6.9 16503 _GC_reclaim_generic
(reclaim.obj)
49.125 0.5 4107.725 37.7 18596 _GC_generic_malloc_inner
(malloc.obj)
48.739 0.4 73.931 0.7 22284 _GC_block_nearly_full
(reclaim.obj)
45.451 0.4 300.788 2.8 148 _GC_finish_collection
(alloc.obj)
44.964 0.4 111.610 1.0 14328 _GC_allochblk_nth
(allchblk.obj)
42.626 0.4 2286.872 21.0 11816 _GC_mark_some (mark.obj)
38.236 0.4 38.236 0.4 24815 _GC_next_used_block
(headers.obj)
38.181 0.4 38.181 0.4 18596 _GC_write_hint (os_dep.obj)
38.171 0.4 38.171 0.4 18596 _GC_invoke_finalizers
(finalize.obj)
38.030 0.3 788.746 7.2 16503
_GC_reclaim_small_nonempty_block (reclaim.obj)
36.689 0.3 36.689 0.3 148 _GC_read_changed
(stubborn.obj)
35.890 0.3 91.150 0.8 24667 _clear_marks_for_block
(mark.obj)
32.599 0.3 2382.769 21.8 148 _GC_stopped_mark (alloc.obj)
31.552 0.3 50.877 0.5 18596 _GC_clear_stack (misc.obj)
26.944 0.2 26.944 0.2 780 _GC_reclaim_clear4
(reclaim.obj)
26.059 0.2 81.437 0.7 1332
_GC_push_next_marked_uncollectable (mark.obj)
25.969 0.2 25.969 0.2 25406 _GC_block_empty (reclaim.obj)
21.913 0.2 21.913 0.2 445 _GC_build_fl_clear4
(new_hblk.obj)
21.005 0.2 135.505 1.2 2241 _GC_allochblk (allchblk.obj)
19.420 0.2 19.420 0.2 18744 _GC_approx_sp (mark_rts.obj)
17.333 0.2 17.333 0.2 9792 _GC_add_to_black_list_stack
(blacklst.obj)
16.263 0.1 17.142 0.2 1184 _GC_push_marked (mark.obj)
15.211 0.1 252.143 2.3 148 _GC_start_reclaim
(reclaim.obj)
14.620 0.1 14.620 0.1 296 _GC_get_lo_stack_addr
(win32_threads.obj)
13.995 0.1 13.995 0.1 148 _GC_clear_bl (blacklst.obj)
12.799 0.1 12.799 0.1 11073 _GC_block_nearly_full3
(reclaim.obj)
12.394 0.1 12.394 0.1 10472 _GC_block_nearly_full1
(reclaim.obj)
12.031 0.1 15.668 0.1 2118 _GC_add_to_fl (allchblk.obj)
11.344 0.1 11.344 0.1 148 _GC_stop_world
(win32_threads.obj)
10.281 0.1 16.618 0.2 2093 _GC_get_first_part
(allchblk.obj)
8.093 0.1 32.655 0.3 1939 _GC_freehblk (allchblk.obj)
7.453 0.1 7.453 0.1 11816 _GC_default_oom_fn (misc.obj)
7.453 0.1 7.453 0.1 11816 _GC_never_stop_func
(misc.obj)
7.368 0.1 276.784 2.5 2241 _GC_new_hblk (new_hblk.obj)
7.141 0.1 7.141 0.1 148 _GC_start_world
(win32_threads.obj)
6.898 0.1 22.951 0.2 2094 _setup_header (allchblk.obj)
6.653 0.1 6.653 0.1 2102 _GC_is_black_listed
(blacklst.obj)
6.282 0.1 11.214 0.1 2093 _GC_install_counts
(headers.obj)
6.064 0.1 6.981 0.1 1939 _GC_free_block_ending_at
(allchblk.obj)
============================================================
> c) What fraction of the time is spent context switching (see (b)).
I have just two threads, one of which is taking a nice nap while
the other is running. (So there is not much overhead on that).
> d) Does the amount of live data in the GC log look right?
I have not yet looked into this -- I have to dig into your package to
find how to use GC_log.
> > The drop off in performance was expected,
> > because (1) my original collector is turned on and off
> > at precise points in my C++ code to minimize
> > collection
>
> That's potentially a big win, clearly. You can
> also do that with our collector, though it may be much
> less effective with a global heap than per-thread heaps.
Do you mean, provided that I can make your collector to use per-thread
heap, I can disallow GC_malloc from checking whether
it should garbage collect?
> > (2) my collector uses type information all the time,
> My experience has been that for small objects this matters in the expected
> case only to the extent that it reduces the overhead of checking a real
> pointer, i.e. only if you can actually reduce checking on a "pointer"
field
> to a comparison against null. [snipped, as the rest is a bit over my
> head]
And indeed, there is no need for pointer checking, as it does
use the type information.
> > (4) heap residency
> > was low for the test cases. -- which favors copying
> > collector over mark-sweep.
> Somewhat. But it also helps our collector a lot, provided you similarly
> increase the heap size. (This should work better in 6.0 than in the 5.x
> releases.) If it doesn't help, I would definitely suspect the locking
code.
I have increased the starting heap size for your collector to about
1 Meg, but I have not yet seen substantial speedup yet. I will try
2 Megs.