[gclist] Name that hypothesis

Mark Tillotson markt@harlequin.co.uk
Thu, 5 Dec 1996 23:34:05 GMT


> Nick Barnes wrote:
> 
> > Many generational collectors currently out there will still win for
> > the radioactive decay model, because collecting an old generation
> > requires scanning all younger generations but collecting a young
> > generation only requires scanning the remembered set in the older
> > generations.
> 

William D Clinger <will@ccs.neu.edu> replied:
> This is not a win, but a loss.  With the radioactive decay
> model, and a conventional generational collector, the youngest
> generation(s) will usually contain the lowest percentage of
> garbage.  This is the opposite of the situation that conventional
> generational collectors are designed to exploit.  By concentrating
> its effort on the generations that contains the least garbage, the
> collector reclaims the least storage for the most effort, resulting
> in a higher amortized gc overhead than would be obtained by a
> simple non-generational collector.


The point is not how much garbage is in each generation at any one instant
(I'm ignoring promptness of finalization here of course).  What is
important is the relative _rates_ at which garbage is "created" in
each generation.

The following is a pretty basic line of argument, and very idealized,
but it gets the idea across - something like this should belong in a
FAQ perhaps??  (Nick?)


A very hypothetical example:
        Total live heap:            30MB
        Spare memory:                5MB
        youngest generation size:  0.1MB
        allocation-rate:             1MB/s
        promotional-rate from youngest gen: 0.005MB/s
             (I'm positing 0.5% survival rate after 100K of allocation)

non-generational non-incremental collector:
        o has to scan whole 30MB every 5 seconds, to keep up with allocator
          thus 6MB is scanned to reclaim 1MB. 
        o this is with a memory overhead of 17% (more spare allows
          greater efficiency, but we're talking many MB,) 
        o let us suppose that this requires 50% of the CPU (ie. we can scan
          at 12MB/s nominal)
        o pauses are 2.5s (obtrusive) _if paged in_
        o GC obliterates primary and secondary cache-locality
        o if real memory short, GC will thrash virtual memory

M/S generational collector:
        o has to scan 100KB ~10 times a second, reclaiming 99.5KB,
          tenuring 500bytes. 
        o Spare memory exhausted (and full GC needed) every 20
          minutes (having a lot less than 5MB spare is no problem)
        o uses 6% of CPU time
        o pauses of 8ms, unobstrusive to humans
        o good secondary cache performance.
        o no problem with disk paging

copying generational collector
        o has to copy 500bytes of live data 10 times a second, reclaiming
          the other 99.5KB without even touching it.  
        o requires 100KB extra for to-space, but this is trivial...
        o requires maybe 0.??% of cpu (in reality most time spent scanning
          roots & housekeeping)
        o pauses on the 1 or 2 ms level, perhaps (ie. the allocator will
          dominate memory management overheads)
        o good secondary cache performance.
        o no problem with disk paging


[ To back up a hypothetical situation with a real one, I will point
  out that I regularly use an interactive language system image of
  some 80MB total heap, and don't get any noticeable pauses for 
  GCs (its generational, non-incremental).  In the last 20 minutes or so
  I ran some code that allocated over 300MB of short-lived objects, and
  1.2% of the CPU time went on GC, in about 1200 pauses of 10ms each 
]

__Mark
[ markt@harlequin.co.uk | http://www.harlequin.co.uk/ | +44(0)1954 785433 ]