[gclist] When to collect.

Jerry Leichter leichter@smarts.com
Thu, 20 Nov 1997 09:20:44 -0500 (EST)


The sometimes unfortunate interaction between larger virtual sizes and garbage
collection has been known for many years.  I remember seeing a note about it
in reference to SNOBOL4 on the first 370's with virtual memory:  Give it more
"memory" and it runs more slowly.

One interesting question is whether a bit of OS support would help here.
Currently, feedback from the VM system to the user is very limited.  In
general, requests succeed or fail, and that's it.  There are very system-
specific mechanisms for determining things like page fault rates, working set
size estimate (essentially, resident set size), and so on; but it's a hodge
podge of system-specific information whose exact semantics may be very
difficult to determine.  Is it possible to define a GC-friendly interface in a
system-independent way?  A trivial example might be a request to allocate as
much new VM (up to some limit) that the OS is currently willing to add to the
resident set size.  Or perhaps even simpler - and it could be used both for
growth and shrinkage - would be a call requesting the maximum amount of VM the
system would currently be willing to keep resident.

These are rough measures, of course, because the virtual size can reasonably
be much large than the resident set size, with tons of unused code or data
paged out, unused.

OS's are nominally "application neutral", but in practice they provide
services based on a view of what applications typically do.  Adjusting memory
demands to available resources is very difficult to do in typical
applications.  About the only applications that do it are fancy sort packages.
Since applications don't do this, OS's don't provide any decent support.
A GC is potentially in a position to make such adjustments for any application
that runs over it.  However, you get the vicious circle:  Since it's hard to
get good feedback from OS's, GC's don't try very hard, so OS's see no demand
for better feedback (not that GC'ed languages loom large in most on the list
of things most OS designers worry about....)

This would be the third class of interfaces that OS's could provide to help GC
that I know of.  (The first two are "stop all threads in this process other
than this one", which makes concurrent GC's much simpler; and cheap,
reasonable ways for user-level code to handle VM exceptions, allowing things
like using read-only pages for write barriers.  Perhaps a general, efficient
way for user code to deal with VM events could be used to build an "optimal
size" mechanism, but I doubt it:  Good feedback depends on global information
about the system, and such information isn't, and probably shouldn't be,
handed to user page fault management code.)
							-- Jerry