Don't hardwire OS features (was Re: Build OS features in Software, not Hardware)

Francois-Rene Rideau rideau@nef.ens.fr
02 Dec 1996 09:29:51 +0100


I named my previous reply to the vague "New OS idea" thread
"Build OS features in Software, not Hardware".
   But it is more general than that:
it is about not hardwiring any feature at any low-level place.
After all, it is the hardware+OS, not the mere hardware,
that constitutes what the user-programmer sees!
And this also applies to applications with hardwired interfaces, etc.

Objects should bring meaningful information without
this information being tied to noise.
   Current OSes and development systems
have very low signal/noise ratio.
   This can change.
I imagine a system where the signal/noise ratio
tends asymptotically to +oo.

>>: Francois-Rene Rideau <rideau@ens.fr>
>: alaric@abwillms.demon.co.uk (Alaric B. Williams)

>>   I see absolutely NO REASON
>> why this would have to be done in hardware,
>> without any software cooperation,
>> through MMU-based memory protection
>> and interrupt-driven task preemption.
>
> I'd phrase it like so: MMUs check software for reliability the
> /emperical/ way; let it out to play, on the end of a rope; if it falls
> down a hole, we can haul it out.
>
Yes, and that would be fine if tasks did not have to communicate
and rely on each other.
Only no task can be useful if it is thus isolated!
Utility of a task is proportional to the knowledge
it can exchange with the rest of the system;
hence an MMU is not very useful, because it only
manages gross approximations to security concerns,
that can be done better in software,
given proper compiler support
(of course, MMUs are fit to current unsafe compiler technology).

> It's pretty obviously much nicer to analyse it statically before
> executing it, so we can say "It will not crash unless my checker is
> broken". This approach will detect any crashability, whereas the MMU
> approach detects crashability only when it crashes.
>
In the case of objects that rightfully interact,
the MMU just CANNOT detect anything wrong in the interaction.
Program proofs CAN.

Imagine that you have a routine
that can display objects of format XXX very fast,
but that would do nasty things if given bad input.
Imagine that fully checking the format would be dead slow
(imagine an NP-complete checking problem).
   Well, on current unsafe systems,
you'd either check and be **slow**
or not check and have a security hole.
   With Tunes, you could add a system-enforced prerequisite
that input to the routine should be of the right format,
and it would ensure that only objects of the right format
are given to the routine
(which could involve full slow checking when no other proof is found).

Imagine that some operation on an object must be atomic wrt some other.
   In current unsafe systems, you have to rely on a conventional lock system,
that cannot exclude deadlocks.
   In Tunes, some resources could have a prerequisite
that they cannot be locked by objects
not proved to release the lock in time.

> And there's nothing wrong with interrupt driven preemption IMHO;
>
Mind that I have nothing against interrupts, which are good,
but against uncooperative preemption,
where the preempted task can't restore any invariant
needed for graceful intertask synchronization,
so that no task can trust invariant to be implicitly preserved,
and all synchronization must therefore be expensively achieved
through explicit locking.
This makes, say, concurrent GC something very nasty and inefficient,
whereas it could be simple and efficient.
   In Tunes, the interrupted task would be basically be told
"now do your best to yield execution ASAP,
after putting the system in a rightful state".
The task would exit the current loop/iteration,
restore invariants needed for concurrent GC/whatever to take place,
then call the scheduler for next task.

> It's an implementation choice. I'm experimenting with inserting frequent
> yields at compile time purely because it avoids some of the more
> awkward side effects of preemption, such as writing hardware interrupt
> handlers! It will bloat the object code with all those possible
> yields; they'd better be inlined in the usual NOP case, rather than a
> procedure call instruction every time we go through the inner loop!
>
Inserting yields shouldn't bloat the object code,
since it will be compressed out in disk-stored versions of object code,
and only inserted at execution time, when they can indeed be inlined!
Plus all the bad-case-checking made unneeded by cooperation
will help win back the space lost by those yields().

> In the case of a tight loop, certainly, interrupt driven tasking is
> faster and smaller.
>
Sure. That's why interrupt-based scheduling will be used,
with an "interrupt recovery" routine to restore a rightful
state from the interrupted state.
[Just a side note:
Linus Torvalds independently discovered
a particular case of this technique,
which the Linux kernel uses
to recover from page faults during user-memory access!].
   This is not preemptive interrupt-driven scheduling,
but cooperative interrupt-toggled scheduling.


>>Obvious reason: the hardware can only check
>>a statically bounded small number of cases (finite expressivity)
>>while bugs/attacks come in unboundedly many cases (infinite expressivity).
>
> One way of looking at it...
>
This argument just cannot be broken.
It can even be refined,
in that the bound to hardware checking expressivity
is *severely* restricted by the hardware size,
whereas the expressivity needed to deal with bugs/attacks
grows hyperexponentially with software complexity.
Hence the argument does not give just a theoretical limit,
but practical limit.

>>   Hardware implementation only slows down the machines
>
> Not necessarily! The 486 can, I seem to remember, perform a push in
> one clock cycle. Are you saying that if it didn't check the stack
> wasn't overflowing, it could do it in 0? :-)
>
No. I'm saying that if you built with the same technology
a CPU that didn't check for overflow, that CPU would run twice faster.
So by using the 486,
you pay the price for hardware bloat that slows down computation
[but you gain the famous "compatibility"].
   Of course, if you already have a 486,
then you pay the price for checking anyway,
so let's try to take advantage of checking,
even though what you'll take back from it
will be less than what it took from you initially.


>>and makes software more complicated
>>to follow hardware bloatedness,
>
> Trap handlers in the OS don't take up much space, and I don't expect a
> runtime compiler to be a small piece of software.
>
Trap handlers in the OS do take place.
More than the total size of an embedded application.
Then, the problem is not just price of memory,
but also the fact that the more bloated the software/hardware,
the more difficult to prove it correct for reliability.
Whereas it is possible to prove that a cleanly-written
sort routine actually works as documented,
program proof is just not feasible when software is bloated.

> A QNX kernel with hardware memory protection is 8k. An ARGON kernel
> with a bytecode compiler in it will probably be more like 200k.
>
The whole of an embedded system could fit the same 8k, not just the kernel.
And of course, there is no need for any compiler to fit into a kernel,
or for a kernel altogether.

>>   I'm confident that using clusters of cheapo MISC chips in parallel,
>>programming them with a proof-friendly high-level language
>>(like Clean or OCAML, unlike C/C++),
>>will appear as the obvious solution,
>>all the more when MISC can go farther than anything else in raw speed.
>
> Really? How do they do that? If that stands for Minimal Instruction
> Set, then prepare for even more object code bloat if we have to
> produce four or so 32-bit instructions for something a 486 could do in
> a couple of bytes (AAM comes to mind...)
>
> It could probably execute those four or so instructions in less time
> than my 486 can do an AAM, but it still bloats it.
>
Minimal Instruction Set Indeed!
As for bloat, I see no problem with size of speed-critical sections,
as long as it runs blazingly fast!
For other sections where code density counts,
then compression should be achieved by threading methods,
and MISC computer allow *very* compact and efficient threaded code,
with calls that fit 15 bit!
You just can't beat that on an Intel or a RISC!!!
   And you just can't say that MISC is slower than CISC
just because it doesn't emulate CISC as fast as CISC runs!!!
You should compare Size/Speed of whole chunks of equivalent code,
between systems of comparable technology/price.
Hence, with the price of a single PPro system,
you should compare it to a cluster of tens of MISC units in parallel!


>>*My* project is to prove, following the trail shown by Turing himself,
>>that the key of better computer systems lies
>>not in ever-more bloated hardware,
>>but in better software design.
>>I'd like the whole system to be built that way,
>>but I first need build the development tools.
>
> Hmmm. I would beg to disagree; I interpret that as the RISC-style
> argument that we should have real simple hardware being driven by
> software. The software has a small set of operations from which it
> builds everything else up.
>
Exactly! Don't have bloated hardware that costs a lot,
can't adapt, and does things slowly,
but have minimal hardware that does just what you need
at just the price, and is unimpeded speedwise by bloat,
then do everything in software!
Actually even before RISC was born,
this has been the argument for software against hardware,
for digital computers against analogic computers, etc!
Now it demands that we used clusters of MISC chips in parallel
instead of single RISC computers emulating Von-Neumann architecture
with ever more bloat.

> Well, I think that's a fine place to start at. Sure, use a RISC CPU.
Nope. Use a *MISC* CPU. Clusters of if you need power,
though one is enough as a cost-effective solution.

> But there's a wide range of hardware devices that can take the load
> off of software. There are chips that can do FFTs in the blink of a
> clock tick. If we wrote an FFT application for this RISC chip, it
> would contain a loop to do the FFT in terms of teeny operations.
> How
> can the compiler see that an FFT is being done, and make it use the
> hardware instead? With Difficulty! If we had FFT instructions in our
> virtual instruction set, then on normal CPUs, those would be replaced
> with calls to an FFT library. On our megazap FFT-in-hardware system,
> they would use the chip!
>
Surely *if you have an FFT-capable chip*,
then you should use it, and you don't, then you can't!
   This only proves
that efficient code generation should be machine-specific,
and that a really efficient portable language
should not be low-level, but high-level,
so that it transports enough information for such
high-level optimizations to take place!
   Again, if you provide any finite virtual machine with zillions
of specific instructions like FFT,
then you're having a virtual BISC (Bloated Instruction Set Computer)
as an abstraction layer to your computer!
I already proved that this can't be a solution:
hardware, however bloated,
can only solve problems more simple than they,
while actual problems need unbounded expressivity.
   FFT should NOT be in any instruction set.
Instead, it should be in compiler libraries where its place belongs.
Of course, the version of the library written for the FFT-capable
hardware *will* take advantage of the hardware!
And don't take me wrong: when I say library,
I don't mean ground-level libraries
as in current closed compiler technology,
but meta(n)-level libraries that reflectively include
compilation/optimization tactics, etc.

> So, yes, there should be a minimal operation list available. If a
> piece of software is being prepared for a system with no stunning
> hardware accelerators, then those special operations it uses can be
> described in terms of basic ones. A very simple instruction set can be
> used to talk to the CPU, but a more expressive format is needed to
> represent semantics above the kernel level. The conversion from small,
> expressive bytecode to bloated, simple, object code will be nearly
> irreversible!
>
Yes yes yes. Object code should be irreversibly produced for a particular
target, while a *high-level* language or maximal possible expressivity
should be used to represent user-level and/or portable objects.

>>MISC CPUs are so small that they you could put one per memory chip,
>>instead of using levels of cache memory between a one active chip
>>and many passive ones.
> 
> Reminds me of a design for the "perfect" ARGON workstation - one CPU
> per object, each with the RAM that object uses! :-)
>
Nope. Because a good system's software objects
will dynamically adapt to user interaction at blazing fast rate,
whereas the hardware evolves only very slowly.
Hence it is not feasible to buy and install a new CPU
for every single object being allocated.


>>Plus no one says that space/speed critical code (e.g. device drivers)
>>should forcibly be written in those languages,
>>or that no proof-capable-and-requiring low-level language
>>could be provided to the user to write the 1% optimized code/metacode
>>when the compiler doesn't get something right.
> 
> Hmmm. Modern compilers (such as gcc) can write better assembler than
> most humans - except for such chestnuts as C's semantics being too
> lightweight; there's no rotate instruction in C, and the compiler
> cannot isolate pieces of code that are equivelant to rotates well
> enough to use hardware ROT instructions.
>
This is no problem. See how linux/include/asm*/*.h does it. Neat!
Actually, the whole of a compiler could be written like that,
if given a more expressive tactic language
than plain non-recursive cccp macros.

> A language like Eiffel or maybe Ada is well suited for high and low
> level coding.
>
Eiffel and Ada *suck*. Use functional languages instead:
CLOS, OCAML, Clean, etc.

>>people end up with using lots of special-purpose languages
>>that communicate unsafely through unstructured text/binary streams.
>
> Yeah :-(
>

>>   This means the end of the need to explicitly generate/parse,
>>check/blindly-trust and convert/ignore file formats,
>>which eats most CPU time,
> 
> But there need to be agreed formats!
>
Yes, but agreement should be found within the system
without the casual user having to know about it!
Besides not requiring the user to care about such format noise,
the system could consistently manage distributed objects,
instead of just unsafely providing local representations for them
without any consistency guarantee.

> Say we have a picture object on our system (A), that stores a picture
> as an array of Yu'v' colour triplets. An interface-compatible picture
> on system B would use RGB colour. This is because A has a Yu'v'
> graphics card, and it wants to be able to directly blit pictures to
> it's display buffer. B uses a more standard 24-bit RGB VGA card.
> 
> How do we transfer a copy of A's picture to B?
>
*We* don't. The computer system does it silently thanks
to its RGB->YUV tactic.

> If A and B used the same internal representation of pictures, then we
> could transparently transfer that array of integers. But they don't. 
>
> Converting Yu'v' to RGB will induce a loss of information, for a 
> start!
>
Only if the converted version is meant as a replacement for the RGB version!
On a Good System (TM), the YUV version will only be a cache for displaying
the original RGB object on YUV systems. The system knows that
the original object where the information is taken from is the RGB thingy.

>>   This means that objects can freely communicate,
>>that any service is made universally available
>>and programmable/controllable, etc.
>
> Locationally transparent objects, yes.
>
Note that I did not only say universally available,
which is your locational transparency:
I also said universally programmable,
which is much more. This means that there should
never be the need to develop a new (crippled) language
to control any object, but that the system language
can reflectively integrate all the needed object-specific features,
which are not just ground-level driving routines,
but also control structures, optimization tactics,
and all that makes C++ a poor choice to use as a native
language to generically manipulate optimized matrices.

>>Interface note:
>
>>   Instead of generating meaningless code stubs from meaningless interfaces,
>>we'd rather generate full interfaces (not stubs) from meaningful code.
>>Because properly annotated code is meaningful, this is possible,
>>while the inverse isn't.
>
> I take it you're saying that the interface shouldn't be written into
> the meat of the code, right? As in we should be dealing with:
> 
> class Complex {
>    void add(Complex &c);
>    void mul(Complex &c);
> };
>
> rather than
>
> class Complex {
>    void process_command_string(char *text);
>    char *render();
> }
>
Right.

> The latter class can be communicated to by the user at runtime,
> whereas the former class can be communicated to by programmers! The
> first case defines no user interface at all, whereas the second case
> would force the use of code like:
> 
> Complex a,b;
> a.process_command_string("Please add to thyself the complex number " +
> b.render());
>

Now you also know why /bin/*sh, m4, TCL, some C/C++ packages
and all other such stuff suck:
they use dirty backdoor evaluation as a hack to compensate
their lack of recursive or reflective expressiveness.


>>   Different Interface generators could automatically generate
>>interfaces for the MacOS, Windows, X/Motif, X/Xt, X/Athena,
>>raw VGA, text graphic or raw terminal lovers/users.
>>And why restrict to vision-based interfaces?
>>sound-based, touch-based, (smell-based?) interfaces
>>could be automatically generated, too.
>
> How do you intend to generalise the definition of interfaces like
> that?
>
Do you want a formal definition for an interface?
* an channel is a structure with which an objects can send data to another;
 it needs not be a stream of raw binary data.
 For the object sending the data, the channel is output;
 for the object receiving data, the channel is input.
* a *display* function for type T on output O
 is the data of function f from T to O,
 and a reconstruction function g from O to T,
 such that someone observing the output
 from f applied to an object of type T
 can reconstruct the object thanks to g.
 Example: 123896 is an integer being displayed;
 you can reconstruct the meaning of the integer from this string output.
 Example: outputing a 1000-line text on a tty without scrolling
 is displaying the text. Because of the time needed to read things,
 one may consider that an acceptable text display should
 pause before scrolling forward.
* for any input I and output O connected to a same external object,
 the couple (I,O) is named terminal I/O.
 Examples: I=keyboard, O=text array display;
 or I=keyboard&mouse, O=graphic display;
 or I=a few buttons, O=loud-speakers;
 or I=braille input, O=braille output;
 or I=voice analyzer, O=voice synthetizer.
* a *reversibly interaction* on a terminal I/O
 is a function from I to O with internal state S,
 such that for any two states s1,s2,
 there exist input sequences t1 and t2 to go from
 state s1 to s2 and from state s2 to state s1 respectively.
 Note: an I/O-interface is *not* a particular O-display.
* An *interface* function for type T on terminal I/O
 is a display function from type T
 to reversibly interacting objects on terminal I/O.
 Example: piping a 1000-line text into /usr/bin/less that allows
 to scroll up and down is interfacing the text;
 piping it to /bin/more isn't.
 Example: broadcasting a song on radio is displaying the song,
 but not interfacing it;
 putting it on a tape that one can rewind is interfacing it.

You can add criteria on the above objects
for the computability/acceptability/complexity/whatever
of displays and interfaces.

> In the case of the first Complex class, how can we tell the interface
> generator that humans like complex numbers written in "a+ib" format?
By giving a display tactic for complex numbers,
based on a reversible definition for a representation of complex
numbers in the form a+ib that could also be used to
generate an input tactic...

> We have to think of the case of textual representation to do that. We
> would also need to think of voice-interface users too; they may prefer
> a shorthand format to saying "Plus Eye" between the two numbers. And
> how could the system know that complex numbers can be represented as
> cartesian points on an Argand diagram without actually telling it that
> (which is, surely, defining a 2D user interface)?
>
Surely, any knowledge in the computer must have be put inside by humans!
This is not what I pretend to change. No one can.
What I'd want to change is the fact that currently specific interfaces
are hardwired into object code,
whereas I want generic interfaces to be defined apart from it,
by a system of annotations to code and tactics to dynamically
generate interfaces.

>>   Of course, interface generators could be parametrized,
>>tweaked, etc, to most suit the users' needs, tastes, and habits.
>>   It's easy to give an outer shape once you have the inner meaning;
>>it's impossible to guess the inner meaning from the outer shape.
> 
> Hmmm.
>
>>   The Tunes project aims at showing that such a system is possible,
>>by implementing one as a proof-of-concept.
> 
> How will you implement this interface generation thingy?
>
By reflective epsilon-calculus:
* Code can contain epsilon-constructs
 that are "holes" with respect to constructive foundation of the objects;
* Those epsilons behave much like the hilbert epsilon-symbol:
 they denote an object that fulfills a given property,
 or bust if no such object exists.
* Of course the system cannot magically replace epsilons
 by rightful objects if the expressiveness of the property constraint
 is complex enough. So it will use tactics to try replace epsilons.
* Tactics are reflectively programmable by programmers-users,
 which means anyone can modify them for use on one's own objects.
* The standard fall-back tactic when trying to fill a hole
 is to ask the user for a value to give.
 Another kind of readily available tactic to use instead as a fallback
 is to throw an exception.
* Tactics to fill epsilons may themselves recursively include epsilons.


>>I could also talk about other things that traditional designs do ill,
>>like building a wall between users and programmers;
>
> I'd prefer a soft squishy layer to a wall, but I don't think we want
> to completely remove the distinction until either everyone is educated
> enough to program, or programming becomes so easy (due to natural
> language interfaces, "Make me a program that can factorise large
> primes"), that there need be no distinction; no need to put in
> restraints to stop four year old children playing games on Daddy's
> computer from wandering into the living heart of the operating system
> and dragging the hard disk into the wastebasket...
>
The wall I'm talking about is that which forces me
to wait for Monkeysoft(R) to integrate the feature I need
into Worse(TM) version 354.3226,
or to rewrite everything from scratch myself with just my feature added.
Only very proficient and rich people can afford serious software
development on a closed system. An open system allows anyone to contribute.
   At a smaller scale,
the same holds in open systems where a static language is used:
in a C program, contributions involve recompilation of the whole program,
which is slow, which makes contribution very dependent on each other,
which prevents easy sharing of data accross programs, etc.
   Finally, the fact that different languages are given to
"programmers" and "end-users", each being crippled in many ways,
constitutes a barrier between them.
   All these are barriers that prevent people to contribute
significant improvements unless they have lots of time/money/proficiency
to invest.
   As for security concerns, crippling interaction languages ain't
a reliable way to protect: on a MacIntosh, icons will never stay
in place because there will always be someone to volontarily or not
drag the icon into another, possibly the garbage can. The solution
is not in crippling, but in extending the language so as to allow
explicit restrictions, such as "the icon must stay where I put it",
or "only people with the right password can modify the heart of the system".

== Fare' -- rideau@ens.fr -- Franc,ois-Rene' Rideau -- DDa(.ng-Vu~ Ba^n ==
Join the TUNES project for a computing system based on computing freedom !
                TUNES is a Useful, Not Expedient System
URL: "http://www.eleves.ens.fr:8080/home/rideau/Tunes/"