[gclist] memory protections and system calls

Chris Reedy creedy@mitre.org
Mon, 1 Jul 1996 15:22:28 -0400
Previous message: [gclist] memory protections and system calls
Next message: [gclist] Distributed Garbage Collection Revisited 1
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
This is a little hard to answer quickly, so please bear with me while I try.

At 10:41 AM 7/1/96 -0500, Paul R. Wilson wrote:
>>From creedy@mitre.org Mon Jul  1 08:52:47 1996
>>Subject: Re: [gclist] memory protections and system calls
>>Cc: wilson@cs.utexas.edu
>>
>>At  3:04 PM 6/28/96 -0500, Paul R. Wilson wrote:
>>>We view this as a bug in the OS, but we still have to deal with it.  In
>>>our view, a protected page should be treated the same way by the kernel
>>>as by user code---the kernel should reflect the access violation back
>>>to the user process in the form of a user-level signal, so that the
>>>app can deal with it---typically by unprotecting the page and doing
>>>some bookkeeping before returning.  Then the kernel should resume and
>>>do whatever it's been asked to do.  (Like writing data from a buffer.)
>>
>>Having been something of an OS hack in the past:
>>
>>It is almost impossible to do this in _current_ operating systems without
>>making the OS dependent on user code. 
>
>In what sense(s)?

First, when I say "current operating systems" I am referring to things such
as the traditional (i.e. old) versions of UNIX which were developed before
much of the current research in how operating systems should be built. 
Versions of UNIX based on CMU's Mach are better.  (Someone who really knows
Mach, I don't, could tell us how much better.)

Having said that -- the key question is: Does correct operation of the OS
depend on correct operation of the user code?  If it does, there is a
dependency.  Since I don't want a dependency, I have to make sure that the
OS is protected from the user code failing to operate correctly.

So ... what can go wrong with the user code?  Well, it could abort (say
with a segmentation violation), it could never return, it could corrupt any
memory it can access, it could create another access violation in its
access violation handler, etc..  Thus, in order for the OS to protect
itself from these kinds of conditions I probably need to:

a.  Provide mechanisms that allow me to release any resources (locks,
kernel memory, frozen virtual memory pages, etc.) that are being held by
the operating system on behalf of this user if the user code aborts.  This
probably needs to be done when as stack is being unwound, since the user
could provide a signal handler that would attempt to recover from the
failure.

b.  Maintain separate stack and temporary storage for the kernel, since the
effects of user code corrupting kernel working storage are guaranteed to be
unfortunate (as well as a source of mischief for hackers).

c.  Set watch timers to keep the process from holding kernel resources forever.

And remember that I don't necessarily know which instruction will create an
access violation.  The user code may have created structures that span
pages (I don't know that it didn't).  So I may get an access violation any
time I make reference to data in the user address space.

>>This is something that no sane OS
>>developer will ever do.  (The problems associated with user code causing
>>the OS to crash and/or behave in strange and unusual ways are too much to
>>deal with.)  Your typical OS developer would tell you that this is not a
>>bug in the OS, but the OS defending itself from the user.
>
>I'm not sure I understand your point.  What in particular is it about the
>structure of current operating systems that makes this dangerous?  Is it
>that they're not multithreaded, hence can't stop in the middle of a system
>call and let a user process fix a problem, because the OS will lock up?
>Or is it something else?  Why do you need capabilities (or whatever) to
>fix it?
>
Continuing ...

Basically current (i.e. old) operating systems where not designed to
support the kind of mechanism discussed above.  It's not _just_ that
they're not multithreaded.  It's also that they don't keep careful track of
resources allocated on behalf of a user process (so that I can unwind the
stack and recover resources that were temporarily allocated just for this
system call), they often make use of the user stack and, in general, may
make any number of assumptions (such as pointers in argument lists not
changing) because they know that the user process can't operate while a
system call is in progress.

Going through an older generation operating system and making the changes
required to support this (remembering that I have to be prepared for this
to occur on almost any instruction) would entail a complete rewrite of the
OS kernel.  This is why you need to restructure the approach to OS/User
code interface.  Once you recognize that you need to do this you can
restructure things to support the needed mechanisms.

>Conceptually, I don't see why it matters whether a user-level protection trap
>actually occurs in user code, or in kernel code on behalf of the user process.
>I'd think that the user process owns the memory mappings that must be
>manipulated, hence it's okay for it to hold locks in the kernel on whatever
>must be locked.  (Any dependent process that inherits those mappings is
>vulnerable anyway.)  We don't have to allow the user process to do
>arbitrary operations inside the kernel.

Conceptually, its not a problem.  But ... "The devil is in the details." 
Once I allow the OS to make calls on user level code, it can be extremely
difficult for the OS to protect itself from failures in that code.  The
research OSs that provide this capability do it in a controlled way that
allows execution threads to pass between multiple user and system processes
in such a way that each individual process can protect itself in case of
failure.  Without this, things are probably hopeless.  (If I have to put
additional code, or even check to see whether I need to, into my OS at each
of the thousands of places where this kind of effect can occur, I can
guarantee that it is hopeless.)

I hope this helps.  If you want to see one example of an OS that attempts
to deal with this problem, you might want to take a look at:
http://www.sun.com/tech/projects/spring/index.html where you can find a lot
of good material about Sun's Spring OS (it's still a research project,
unfortunately).

I hope this answers the question.  If it doesn't, let me know and I'll try
again.

  Chris

This is an informal message and not an official Mitretek Systems position.
Dr. Christopher L. Reedy, Mail Stop Z667
Mitretek Systems, 7525 Colshire Drive, McLean, VA 22102-7400
Email: creedy@mitretek.org  Phone: (703) 610-1615  FAX: (703) 610-1603
Previous message: [gclist] memory protections and system calls
Next message: [gclist] Distributed Garbage Collection Revisited 1
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]