Build OS features in Software, not Hardware (was Re: New OS idea)

Eric W. Biederman ebiederm@cse.unl.edu
Tue, 3 Dec 1996 16:10:53 -0600


>>>>> "Alaric" == Alaric B Williams <alaric@abwillms.demon.co.uk> writes:

Alaric> Francois-Rene Rideau <rideau@ens.fr> wrote:
>> I see absolutely NO REASON
>> why this would have to be done in hardware,
>> without any software cooperation,
>> through MMU-based memory protection
>> and interrupt-driven task preemption.

Alaric> I'd phrase it like so: MMUs check software for reliability the
Alaric> /emperical/ way; let it out to play, on the end of a rope; if it falls
Alaric> down a hole, we can haul it out.

The real usefulness of MMUs is that they allow paging and memory
reorganization to be done cheaply.  The fact that they might check
software reliability is a nice side effect.  This justifies there
existence.

[snip]
Alaric> And there's nothing wrong with interrupt driven preemption IMHO; it's
Alaric> an implementation choice. I'm experimenting with inserting frequent
Alaric> yields at compile time purely because it avoids some of the more
Alaric> awkward side effects of preemption, such as writing hardware interrupt
Alaric> handlers! 

This is done in Standard ML I believe.  What they do is reduce the
code to continuation passing style (Which effectively changes
subroutine routine return into a subroutine call (called the
continuation)).  Then before every subroutine call they check to see
if a yield is needed (In the general case this is the same as their
check for stack/heap overflow).  If so they pass the current
continuation to the exception handler and let it handle the
situation. 

; This is rounghly your addition loop below coded in the lambda calculus
(lambda ()
  ((lambda (fn1 count1 addr1 amount1)	
     (fn1 fn1 count1 addr1 amount1))

   (lambda (fn2 count2 addr2 amount2)  ; fn1
     (set! addr2 (+ addr2 amount2))
     (let ((new-count (- count2 1)))
       (if (not (= new-count 0))
	   (fn new-count (+ addr2 4) amount2))))

        10000000 some_addr 0))        ; count1 addr1 amount1

; Recoded in true Continuation passing style
(lambda (continuation)
  ((lambda (fn1 count1 addr1 amount1 cont1)
     (fn1 fn1 count1 addr1 amount1 cont1))

   (lambda (fn2 count2 addr2 amount2 cont2)       ; fn1
     (set! addr2 (+ addr2 amount2))
     (let ((new-count (- count2)))
       (if (not (= new-count 0))
	   (fn new-count (+ addr2 4) amount2)
	 (cont2 nil))))

        10000000 some_addr 0 continuation))        ; count1 addr1 amount1 cont1

Which of course still expands into:
Alaric> mov ecx,10000000 or so
Alaric> mov esi,some_address
Alaric> xor eax,eax

Alaric> loop_top:

Alaric> add eax,[esi]
Alaric> add esi,4
Alaric> dec ecx
Alaric> jnz loop_top

Grumble my assembly expresion was awfully low level it's bigger than the assembly :(
If I wanted to check for interrupts I would do something like:

dw exception_flag 0xffffffff  ; An interrupt handler would set this to zero

mov ecx,10000000 or so
mov esi,some_address
xor eax,eax

loop_top:

add eax,[esi]
add esi,4
dec ecx
test ecx, exception_flag  ; I don't want to dedicate a PC register for this purpose
jnz loop_top              ; The are so few :(  And the flag would be in the L1 cache

test exception_flag,0xffffffff 
jnz   no_exception

; create the current continuation upon the stack
add esp, 12        ; allocate space for the continuation
                   ; except for the code pointer portion
                   ; Note I have simplified this case by omitting a type tag
mov [esp+0], ecx
mov [esp+4], esi
mov [esp+8], eax
call do_exception ; add the address to continue at and the next instruction

loop_restart:
; assume the continuation is kept on the stack
; and I was returned to with ret 
mov ecx, [esp+0]
mov esi, [esp+4]
mov eax, [esp+8]
test ecx, ecx
jnz loop_top:

no_exception:

Alaric> In the case of a tight loop, certainly, interrupt driven tasking is
Alaric> faster and smaller.

Possibly a little faster (one or two instructions), and a little
smaller.  However using the above technique allows the majority of
interrupt handling to happen in sync with code, allowing much simpler
exception handling.  Or simpler signal handling.  

The reason for having the routine save it's own state is that with a
little care a type tag can be written that specifies where pointers
and such things are, to allow for garbage collection, real state
saving and all kinds of sophisticated features.

>> Hardware implementation only slows down the machines

Alaric> Not necessarily! The 486 can, I seem to remember, perform a push in
Alaric> one clock cycle. Are you saying that if it didn't check the stack
Alaric> wasn't overflowing, it could do it in 0? :-)

Ah but so could a machine that could execute multiple instructions in
parallel, say an interger subtract that causes an exception on
overflow, a trap barrier (to make sure the arithmatic happens before
the memory move) and a memory move.  

The simpler you can make hardware the better, but please do remember
it can execute things in parallel better.

>> and makes software more complicated
>> to follow hardware bloatedness,

Alaric> Trap handlers in the OS don't take up much space, and I don't expect a
Alaric> runtime compiler to be a small piece of software.

Tunes will definentily need a regular compilation facility.  However
FORTHs regularly include a very simple runtime compiler (not
especially optimizing), and routinely included in embeded
environments. 

Alaric> A QNX kernel with hardware memory protection is 8k. An ARGON kernel
Alaric> with a bytecode compiler in it will probably be more like 200k.

even 200k isn't a major issue. It fits in my L2 cache.  And even my
standard libc is over 1Meg.  Remember to measure code size in total
system size, and not per individual program.  The MMU and dynamic
linking can majorly reduce some excess code duplication.

[snip]
Alaric> It could probably execute those four or so instructions in less time
Alaric> than my 486 can do an AAM, but it still bloats it.

Ah but the compiler on the 486 isn't usually bright enough to use AAM
so it's hardly an issue.

>> Actually I'm even confident that those zillion dollars will eventually
>> have been more than what would have been needed to resorb
>> hunger in the world, if spent more intelligently.
>> Well, the world's not perfect.

Alaric> You've noticed that too? :-)

>> *My* project is to prove, following the trail shown by Turing himself,
>> that the key of better computer systems lies
>> not in ever-more bloated hardware,
>> but in better software design.
>> I'd like the whole system to be built that way,
>> but I first need build the development tools.

Alaric> Hmmm. I would beg to disagree; I interpret that as the RISC-style
Alaric> argument that we should have real simple hardware being driven by
Alaric> software. The software has a small set of operations from which it
Alaric> builds everything else up.

Note: If compilers, and distribution formats were sufficient so that
there was no need to standardize on an instruction set then the
optimal solution would be VLIW.  With the instruction set specific for
each chip.  In general with VLIW the inner loop of a calculation is
reduced to one clock cycle.  Without the need for lots of scheduling
hardware etc.  However binary compatibility is very important and
software is generally harder to produce and longer lasting than
hardware.

[snip]

>> Plus no one says that space/speed critical code (e.g. device drivers)
>> should forcibly be written in those languages,
>> or that no proof-capable-and-requiring low-level language
>> could be provided to the user to write the 1% optimized code/metacode
>> when the compiler doesn't get something right.

Alaric> Hmmm. Modern compilers (such as gcc) can write better assembler than
Alaric> most humans - except for such chestnuts as C's semantics being too
Alaric> lightweight; there's no rotate instruction in C, and the compiler
Alaric> cannot isolate pieces of code that are equivelant to rotates well
Alaric> enough to use hardware ROT instructions.

Well thats easy to prove most humans can't program much less in
assembly.  Usually it's the more the case that compiler generate more
consistent code quality than most programmers, while programmers at
any single spot can usually optimize better, though much slower.

A trouble with trusting compilers too much, especially with high level
languages is that sometimes they can be so stupid :(  Give me a
compiler that can optimize bubble sort into mergesort or quicksort
(with out specific code for that case) and I'll change my mind.  

What is needed in a language is a combination of what compilers do
well.  Constant propogation and in general expression simplification
and optimization.  And allowing people to do what they do well write
algorithms.  And especially attempt a mechanism that allows those
algorithms to be applied to mulitple types.

As far as semantics.  My ideal is a language with the primitive
control constructs of scheme, not distinguishing a procedure call from
a goto :)  And data structure description abilities of a language like
C.  And a safe macro system similiar to what scheme has to offer.  But
built on top of vectors rather than lists.  

How proving and type safeness or in general assertion proving can be
put into the above frame work, is a little bit tricky.  However I
believe a clever use of macros could handle that case.  Also it might
be wise to have a language with very simple types at the bottom and
use the macros to build up the frame system.

Languages always seem better to me when they say.  This feature is an
intrinsic part of the language, and this other feature can be written
in the language.

Nicer syntax should be an added feature that simply compiles to a
trivial syntax, instead of built into the language.

[snip]
>> I could also talk about other things that traditional designs do ill,
>> like building a wall between users and programmers;

Alaric> I'd prefer a soft squishy layer to a wall, but I don't think we want
Alaric> to completely remove the distinction until either everyone is educated
Alaric> enough to program, or programming becomes so easy (due to natural
Alaric> language interfaces, "Make me a program that can factorise large
Alaric> primes"), that there need be no distinction; no need to put in
Alaric> restraints to stop four year old children playing games on Daddy's
Alaric> computer from wandering into the living heart of the operating system
Alaric> and dragging the hard disk into the wastebasket...

There are still access control issues.  And the administrator/user
disctinction while it shouldn't be as hard as it is now should
defininentily be there.  But I suspect in the long run it will be like
electronics.  Everyone I know can manipulate basic circutes, plug in
devices, change light bulbs, and a few other basic things.  But when
the project gets big it is easier to hire an electrician than to do it
yourself.

>> like requiring the user-programmer to explicitly
>> implement object persistence when this could be managed orthogonally;

Alaric> I'd agree with that one, though!
[snip]

Alaric> "Simply drag your mother in law's cellphone number from the
Alaric> Address Book to the Laser Satellite icon, and the Targeting
Alaric> Wizard will locate her. Then follow the onscrpeen prompts for
Alaric> gigawattage and dispersion pattern..."

Alaric> (Windows for Early Warning and Defence User's manual P385)

And she will get so upset when you burn her a hole in the ceiling
right next to her, and let the rain in....

And your computer just crashed with the error message:
Windows error: $#&*@!@! No error

Eric