[gclist] Language behaviour wrt GC (Was: Name that hypothesis)

Hans Boehm boehm@hoh.mti.sgi.com
Tue, 10 Dec 1996 11:06:24 -0800


On Dec 10,  6:28am, Paul R. Wilson wrote:
> (Warning: the following is a "big picture" ramble that many of you
> have heard most of before.  I thought it might be worth reiterating
> if we're talking about basic effects of languages on style and
> lifetimes.)
>
> We're comparing a system with some kind of persistent object store
> (even if it's just a simple system image save/reload) to one without.
> If you don't have persistence in the language, you do it manually
> through the file system, and long-lived objects "escape" the GC'd heap
> and "come back" next time you start up and reload the data.
> This has weird implications for generational GC.
>
> We too have seen programs where a significant percentage of the data
> survive until the end of the run.  (Ramps and big plateaus, as we
> call them in our allocator survey).  But if you have a persistent
> object store that's GC'd, those are data that typically become
> garbage at the end of a program run, and can be collected any time
> after the end of the run.

>-- End of excerpt from Paul R. Wilson

A lot of the problem clearly is that we don't know what programming model we
will eventually converge to.  I agree that a garbage-collected persistent store
will change matters a bit.  I'm not convinced it's that fundamental a change,
though.

It seems to me that there will continue to be many reasons to use something
resembling conventional files:

1) Flat, unlinked representations are generally more compact.  Compact
representations are very useful for communication through low bandwidth
channels like phone lines, CDROMs, or whatever.  (I think for present purposes
any compressed representation is equivalent to a flat representation.)
They also tend to be more portable.  They're less susceptable to security leaks
(think erased text in stored text files, a problem with some well-known word
processors).

2) Some people write programs to analyze large amounts of data that are
imported from external sources.  Even we sometimes import source code from
elsewhere.  Importing these will inherently look like a file read to the
garbage collector.

Thus I think we will always have to deal with program phases that build large
internal data structures based on externally read data.  It may be that the
resulting structures are amenable to generational collection on a very long
time scale (though I'm not convinced), but the initial generation(s) still need
to deal with periods of very high survival rates.

Hans

-- 
Hans-Juergen Boehm
boehm@mti.sgi.com