[gclist] Sharing GC resources between applications

David F. Bacon dfb at watson.ibm.com
Wed Mar 23 05:01:54 PST 2005


Isolates are designed so that they can be implemented within a single JVM, I
believe, largely because fork() may not exist on embedded devices.  So
modulo the reliability problems, I wouldn't write them off.

david
----- Original Message ----- 
From: "Robin Boerdijk" <robin_boerdijk at yahoo.com>
To: "David F. Bacon" <dfb at watson.ibm.com>
Sent: Wednesday, March 23, 2005 2:45 AM
Subject: Re: [gclist] Sharing GC resources between applications


> Hi David,
>
> Thanks for all the info. Interestingly, I ran into the multi-JVM
> resource problem also in a transaction processing environment. In my
> case this involved the Tuxedo TP monitor of BEA Systems. I have
> developed a software package that allows Tuxedo service routines to be
> written in Java by running an embedded JVM in Tuxedo server processes.
> For maximum stability, a number of these server processes must be
> running in parallel but this quickly exhausts system resources. It
> looks like the Java Isolates solution will not help here as it does not
> seem to be designed for embedded JVM applications. Hopefully some other
> mechanism for sharing JVM resources will become widely accepted in the
> next couple of years.
>
> Regards,
>
> Robin Boerdijk
>
> --- "David F. Bacon" <dfb at watson.ibm.com> wrote:
>
> > Those are all good questions.  The answer is that some work has been
> > done,
> > but the state of both research and practice is still lacking.
> > You've
> > alluded to the two major resource problems: code and data.  I'll
> > (somewhat
> > long-windedly) address both from my own (biased) perspective.
> >
> > For all of Java's virtues, with respect to code sharing, it threw
> > onto the
> > trash heap lessons that have been learned repeatedly over the course
> > of 30
> > years of operating system and language design.  To wit, that direct
> > disk-to-memory mapping and read-only sharing of code (and constant
> > data) are
> > absolutely critical to performance, scalability (both up and down),
> > reliability, maintainability, etc etc (see
> > http://www.opost.com/dlm/tenex/hbook.html).  This is what allowed a
> > 1981-era
> > DEC-20 to support 60 simultaneous users on a machine with 18MB of
> > RAM.
> >
> > The DEC-20 also ran a bytecode-interpreted, garbage collected
> > language:
> > PDP-10 TECO, which was the implementation language for the original
> > Emacs
> > (rather than Lisp).  It had a facility for loading a whole host of
> > TECO
> > libraries and then "dumping" a binary image.  This meant that a
> > system with
> > a large number of loaded libraries could be saved to disk and mapped
> > directly into memory, so the startup time for the first user was the
> > time to
> > page in the application from disk, and for subsequent users the time
> > to
> > memory map those pages in shared mode.  Startup times were vastly
> > faster
> > than for systems like Eclipse which must load, re-format, compile,
> > and link
> > every class individually -- despite the fact that the CPU was roughly
> > 1000
> > times slower.
> >
> > The unfortunate implications of Java's completely dynamic
> > architecture which
> > did not allow sharing became obvious as soon as attempts were made to
> > use
> > Java for significant applications running outside of web browsers.
> > Java's
> > class loading semantics require that a set of classes (which
> > logically
> > comprise a library, package, or application) are loaded in the
> > dynamic order
> > in which they are referenced.  This severely inhibits the creation of
> > a shar
> > able compiled version of a set of classes, which is only exacerbated
> > by the
> > nature of JIT compilation.
> >
> > At IBM this problem hit most severely in mainframe transaction
> > processing
> > environments, where the transaction semantics required that
> > individual
> > transactions did not interfere with each other -- which meant that
> > logically
> > they should run in their own JVM, but of course throughput
> > requirements
> > precluded such a horribly inefficient approach.   The response was a
> > facility that essentially provided two heaps, one for per-transaction
> > objects and one for shared classes and data (see
> > http://www.research.ibm.com/journal/sj/391/dillenberger.html under "A
> > Scalable Java Virtual Machine Implementation").   In essence, this
> > was quite
> > similar to the "dumped Emacs" approach I alluded to above, although
> > the
> > shared image was initialized dynamically rather than being memory
> > mapped.  I
> > was involved in the early design of this system -- it was an
> > expedient
> > solution to a very severe problem, but isn't nearly general enough.
> >
> > Isolates (JSR 121), as Dave Detlefs mentions, are an attempt at a
> > more
> > general solution to this problem.  It's been some time since I looked
> > at it
> > in detail, but I believe they suffer from a lack of pure semantics
> > due to a
> > desire to allow multiple implementation styles at the whim of the
> > underlying
> > system (either process forking, or multiple isolates per operating
> > system
> > process, etc).   When isolates are implemented within a single
> > process, they
> > will probably use fewer resources, but will sacrifice another kind of
> > isolation: fault isolation.  If one isolate does something that
> > causes the
> > JVM to fail, all other isolates will crash as well, which is a
> > serious
> > drawback.
> >
> > John Corwin, Dave Grove, Chet Murthy, and I worked on a more
> > fundamental
> > solution to the problem with MJ
> > (http://portal.acm.org/citation.cfm?doid=949305.949326), a module
> > system for
> > Java that provided clearly delimited loadable units (note that
> > neither
> > packages nor jar files do this) and strong isolation properties
> > between
> > them.  In addition to creating modules that could be pre-compiled and
> > memory-mapped, it solved a bunch of problems: "classpath hell",
> > multiple
> > instances of different versions of a library within a single JVM
> > instance,
> > etc.  MJ was designed to fit into the existing Java language, so in
> > some
> > cases it sacrificed cleanliness of design for compatibility -- but
> > overall
> > it worked pretty well.  Unfortunately MJ has yet to see the light of
> > day as
> > either a product or an open source release.
> >
> > The assemblies of .NET are a similar approach to the same problem,
> > and
> > essentially yield DLL's.  However, you need only fire up Adobe
> > Acrobat
> > Reader (and get yourself some coffee) to get a sense that this is
> > still far
> > from solving the problem.
> >
> >
> > In terms of heap data, there are a number of issues that arise.
> > First of
> > all, garbage collected heaps are inherently over-provisioned.  An
> > application will typically run in twice its theoretical minimum heap
> > size.
> > When that over-provisioning gets multiplied across many simultaneous
> > JVMs,
> > it adds up fast.  Phase behavior in which heap requirements go up and
> > down
> > amplify this effect -- essentially a kind of macro-fragmentation
> > across
> > JVMs.  One approach to this problem would be to have a "rotating
> > to-space",
> > where garbage collection is coordinated so that it is never done by
> > more
> > than one JVM at a time (or some fixed number).  The memory is
> > collected into
> > the to-space, and then the old heap is freed and unmapped and becomes
> > the
> > to-space of the next JVM to be collected.  I thought someone had
> > tried that,
> > but can't recall now -- perhaps another gclist'er can comment.   At
> > all
> > events, the engineering will be a major undertaking.
> >
> > The work that Emery Berger mentioned on getting the JVM and the
> > virtual
> > memory manager to cooperate is another important piece of the overall
> > solution.  Right now the real-world "solution" to the problem is to
> > run on
> > machines with huge physical memories.  This is only tolerated because
> > RAM is
> > relatively cheap.  The code space problem is in some sense more
> > urgent
> > because it manifests itself in both space *and* time (loading).
> >
> > In dealing with systems with thousands of processes, such as James
> > Hague
> > describes for Erlang, you sometimes confront the classical problem of
> > functional language implementation: to make it efficient you have to
> > "re-discover" the imperative nature of the underlying program.  In
> > particular, Erlang implementations that use private heaps copy
> > parameters by
> > value.  So you can wind up with thousands of copies of the exact same
> > object
> > across many processes, rather than sharing them.  If those objects
> > are
> > large, and there are a lot of them, then it's a big problem.  A
> > significant
> > amount of memory is also lost due to fragmentation caused by having
> > so many
> > small heaps.  That being said, Erlang is an extremely elegant
> > language and
> > the approach has much to recommend it, precisely because resource
> > properties
> > are localized to a process.
> >
> > The alternative is to have a shared heap and an incremental or
> > real-time
> > garbage collector.  This is the approach that Forsyth mentions for
> > Inferno,
> > and that was taken by Sagonas and Wilhelmsson for Erlang
> > (http://portal.acm.org/citation.cfm?doid=1029873.1029875).   I
> > believe that
> > neither of those systems defragment the global heap, though.  This
> > isn't
> > inherent; but once you bite of the fruit of the shared heap, it's
> > something
> > that has to be done for a truly complete solution.  The Metronome
> > real-time
> >
> === message truncated ===
>
>
>
>
> __________________________________
> Do you Yahoo!?
> Yahoo! Small Business - Try our new resources site!
> http://smallbusiness.yahoo.com/resources/



More information about the GClist mailing list