[gclist] Priority inversion stuff

Henry G. Baker hbaker@netcom.com
Thu, 22 Jan 1998 17:46:00 -0800 (PST)


[hbaker:  I hope that people are still interested in this stuff.
This thread in comp.risks is really good.]

Date: Sun, 18 Jan 98 08:49:01 EST
From: Jerry Leichter <leichter@lrw.com>
Subject: Priority Inversion and early Unix (Rose, RISKS-19.54)

Greg Rose mentions the implementation of critical sections using process
priority on a PDP-11 in early Unix.  Actually, no.  Priority inversion was
not an issue here; correctness of the code is.

In multilevel interrupt priority systems, one standard programming
technique is to associate a priority level with any particular piece of data
that may potentially be shared between interrupt and non-interrupt code.  In
order to access that piece of data, code must ensure that the process is at
that interrupt level.  (The processor will only accept interrupts whose
priority is at least as high as the current processor level, and will raise
the processor's level to match the interrupt's level before executing the
appropriate handler.)

This protocol is safe, but requires a bit of reasoning, and an additional
assumption.

Let's start by assuming processes may not change the processors priority;
only the interrupt mechanism does so.  Consider code accessing an object
whose minimum access priority is Pa.  The processor is thus currently
running at priority Pa as well.  It can only lose the process to a process
with priority Pn > Pa, which can't access objects at priority Pa; so this
code's actions can't be interrupted.  Further, when this code began
execution, it did so by hypotheses as the result of an interrupt at priority
Pa.  Thus, the processor must previously have been running at priority 
Po < Pa.  The code at priority Po couldn't have been accessing the object
either.  So access to the object is safe.

Not allowing processes to change their priority is safe, but too
restrictive, as it makes it impossible to communicate between priority
levels!  Note that it's always safe to *raise* the processor priority level:
This can't affect what other processes the code might have interrupted, and
can only exclude more possible interrupts.  Further, after raising the
priority, it's certainly safe to lower it back to where it was, since that
was known to be safe.

On the other hand, lowering the priority level *below the entry level* is
deadly.  Consider the following: P3 is running a priority 3, accessing an
object designated as level 3.  P4 gains control at priority level 4, and
drops the priority level to 2.  A new priority 3 interrupt comes in,
starting a new process at priority level 3 -- which accesses the same object
as P3.  P3's critical section has been violated.  (Dropping the priority
level by exactly 1 below your entry level avoids this, but doesn't buy you
anything, since you must not touch any object "owned" by that level: *You*
might have interrupted someone running at that level who was accessing the
object.)

Finally, it's clear that if you raise your priority level from Po to Pn, you
can't possibly be interrupting any code of priority Po+1 ... Pn, and you are
already in a logical critical section for objects of priority Po; so you may
safely access any object with a level between Po and Pn inclusive.

Putting this all together, a safe set of rules is: Every object has an
associated priority level.  Every process has an associated "entry priority
level" Pe, set by the interrupt that started it.  (User-level code has
Pe=0).  A process at my raise its priority level Pc, and may lower it, but
not below Pe.  While at Pc, it may access any object with associated level
Po as long as Pe <= Po <= Pc.

Priority inversion is impossible here: There is no waiting, except for the
processor priority level to drop low enough for an interrupt to be
delivered.  (Alternatively, you can think of Po as the priority of a monitor
associated with the object.  "Priority inheritance" is enforced implicitly,
since you have to "inherit" the priority before you are even allowed into
the monitor.  Curious, in all the years I taught about this algorithm in OS
classes, I never thought of this analogy before.)

The degenerate case has only two levels, "user" and interrupt.  You get
exactly the usual rules and behavior for simple critical sections.

DEC operating systems PDP-11's (at least RSX; probably others) used this
style of locking before Unix was developed.  (The hardware was designed the
way it was exactly to allow this style of synchronization.)  Unix, for the
most part, deliberately used a degenerate form, in which "raise to priority
level" was almost always taken to mean "raise it to the maximum".  Most Unix
device drivers raise the level on entry.  In effect, this gives you one big
mutex around the entire kernel, rather than monitors around various objects.
This approach is simpler, works fine for time-sharing systems (the advantage
of doing things at a finer grain is decreased interrupt latency, which is
more on a issue for real-time kinds of systems) -- and, most important, is
the only approach that was portable to the wide variety of hardware
available in the late '70's and '80's that didn't necessarily support
multilevel interrupts.

VMS also uses the same synchronization scheme, though it was long ago
extended to support multiprocessors.  (Since priority levels don't work for
synchronization across processors, in-memory mutexes have to be used.
However, a priority level is associated with each such mutex.  To attempt to
lock a mutex, you must first be sure your own processor is at the mutex's
priority level, or higher.  This limits the contention on the locks to the
highest priority thread from each processor.)

BTW, you may wonder: Since a process may never lower its priority below its
entry level, how can it communicate "downward" -- e.g., how can a driver's
interrupt code write into a status variable accessible from user level?
That's what the software interrupt mechanism is for: If code at priority Pc
needs to do something at priority Pn < Pc, it requests a software interrupt
at Pn.  That interrupt will be queued until the code at Pc, and any other
higher-level interrupts, complete.  Then an interrupt at Pn takes place,
and the interrupt routine -- known as a "fork routine" in DEC parlance" --
will be able to access the appropriate objects.

Jerry