"no kernel" operating system design
Emmanuel Marty
core@mirus.fr
Mon, 30 Mar 1998 02:22:29 +0200
<bigger>"=A0No Kernel=A0" operating system design
Emmanuel Marty <<core@mirus.fr>
With major contributions from Far=E9 Rideau <<rideau@clipper.ens.fr>
Sunday march 29, 1998
The main purpose of an operating system, is to arbitrate and abstract the
usage of system resources between all applications. Resources include,
but aren't restricted to, CPU, memory, disks, display, keyboard, and
other storage, input and output devices.
The operating system puts the machine into a state suitable for
applications, and provides them with function calls that allow them to
use the underlying hardware without worrying about how it works
in-depth.
There are two widely accepted ways to achieve this=A0:
1) The "=A0monolithic=A0" kernel approach
All code responsible for booting the system, accessing the hardware
(device drivers) and easing access to it both for the programmer and user
(like file systems) is stuck together in a big kernel. This is the design
used by Linux for example.
The kernel image is easy to boot (just load the binary image and jump
into it) and calls between two kernel components (the file system and the
underlying disk device driver, for example) are fast. It is easy to
program too, since all of your code is part of the same program.
However, such an approach "=A0freezes=A0" the operating system design - it i=
s
thought for the present hardware situation, not for the future.
Adding or modifying a device driver requires to recompile the whole
kernel and reboot the system into the new one, possibly resulting in a
machine lockup if the new kernel isn't working. This makes writing new
device drivers much harder. In addition, you need the kernel source code
to be able to recompile it=A0; if it isn't available, you just loose memory
with unused drivers.
You can later on, add support for kernel modules, that are dynamically
loaded and inserted into the kernel, but you can't compile everything
this way, since you still require a monolithic kernel with enough drivers
to boot the system, allocate memory, read the module off the disk on the
filesystem, and link it dynamically.
In short, this approach is conceptually bad, isn't scaleable, and freezes
the design into the past. Adding a brand new class of hardware requires
writing management support for it in the kernel.
2) The "=A0microkernel=A0" kernel approach
A microkernel only manages vital low-level functions such as I/O,
interrupt, dma channel allocation and CPU management, and manages the
rest of the resources by satellite "=A0servers=A0".
While this approach is conceptually much better than the monolithic one,
it still isn't a global design - CPUs, DMA channels, etc. are still
treated as special kind of resources with separated function calls to
manage them.
Communication between the microkernel components is done via function
calls while communication between the microkernel and the servers is done
via message passing.
Adding a brand new class of hardware peripherals still requires writing=20
management code for it inside the microkernel, unless you deal with a
"=A0miscellaneous=A0" class of devices where you stick all devices that
appeared on the market after the operating system was designed and that
cannot really fit in the existing frozen design.
A new approach=A0: the "=A0no-kernel=A0" idea.
It has the following goals=A0:
Implement an open, designed for the future, resource abstraction
subsystem, to which different operating systems can be connected,
allowing to run native, Linux or Java applications for example, on a
clean, fast, open environment.
Globally designed=A0: making it possible to add new types of resources when
they appear, without needing the above system layers to be aware of it.
All system components communicate with each other the same way.
No 'miscellaneous' type of devices. New types are added and handled the
same way the others are.
Hotswappable device drivers=A0: drivers can provide minimal functionality
and be replaced by a larger one on the fly, later in the boot process, if
needed.
Inheritance=A0: a resource can base itself on another and extend its
functionality, just like deriving a C++ class.
Small memory footprint.
Speed. Both the algorithm and implementation are designed to be fast and
little overhead over the system resources. "=A0core=A0" device drivers for
CPUs, memory, buses, DMA, etc. have a portable, fast implementation in an
compiled language, and an architecture-specific, optimised assembler
version. Compiled languages cannot deal very well (yet) with doing
little accesses to RAM that are very costly on modern architectures.
Scaleable=A0: non-vital features can be removed. This system can be burnt
in the 128 KB ROM of a PDA, booted a multiprocessor workstation, or drive
a cluster of servers.
Distributed=A0: having a standard connection protocol between all system
components, it makes no difference if they are local to the machine or
connected across a network.
The design and implementation really focus on an uniform, logical, and
globally designed way of handling resources and functionality, saving
code, memory, and programmer time, reducing overhead and increasing
speed.
Design=A0:
The whole system is built as independent modules. Every module describes
the external functionality it needs, to be able to execute. These are not
to be thought as Linux-like "=A0kernel modules=A0". They are separated binar=
y
entities, without symbolic linking between them, that request and provide
functionality, through an interface common to all.
The system boots into the "=A0dependency manager=A0" ("=A0hub=A0"), which wi=
ll
try to load the high-level "=A0operating system=A0" module (which eventually
will come up with the user interface)=A0; before it is able to do that, it
will have to load the modules that provide all functionality required by
it=A0: filesystem which needs disk, which needs bus, DMA, memory, IRQ
management=A0; console which needs display board, which needs bus, DMA,
memory, IRQ, and keyboard which needs the same, etc. When all those
driver modules are connected and initialised the resources they manage,
the system will be up and running with the minimal resources needed.
Additional resources wanted by the user can be inserted as part as the
system boot sequence, and might in turn load others on which they
depend.
The generic and global driver interface allows a console module driver to
use indifferently a keyboard on the motherboard bus, one on the USB bus,
or even one on another machine over the internet, locally relayed by a
module collecting scan codes over the network, but exporting the
"=A0keyboard=A0" functionality just like a local keyboard driver module. One
can write a driver module for a disabled person equipment and export the
"=A0keyboard=A0" functionality as well, allowing it to be used with the
regular console.
When the console needs a "=A0display=A0" functionality, the dependency
manager will find none at that point of the boot process - it will try to
locate an unloaded module that exports this functionality, will find one
for a pci display adapter, which will in turn trigger bus initialisation
and probing of the pci bus, find the board, and the dependency manager
will connect the console to the display driver module.
The dependency manager provides itself the "=A0locate module=A0"
functionality for the initial bootup, being able to locate module binary
images stuck next to itself, for example=A0; when a filesystem module has
been initialised, it can provide that functionality as well, allowing
modules to be loaded from files on the disk. When a network protocol
stack is up, it can provide that functionality too, allowing driver
modules to be downloaded from the network, in the case of a diskless
workstation for example. In all cases, it never makes a difference to the
dependency manager where the modules come from=A0; just like it makes no
difference to the console, what device is providing the keyboard scan
codes.
The dependency manager provides the "=A0connect to module=A0" functionality,
for ones local to the system. However, a network protocol stack can
provide this functionality along with the "=A0locate module=A0" one, allowin=
g
a local module to use functionality present in another system across the
network=A0! Making distributed computing trivial, and part of the operating
system.
It is for example trivial to design a module providing the "=A03D
rendering=A0" functionality and depending on the "=A0network=A0stack "
functionality, actually connecting to a CPU farm across the network,
using another faster architecture but sharing the same module connection
protocol, and providing fast rendering to the local system. Such a device
can be hotswapped for local rendering without needing the other modules
making use of that functionality, to be aware of it.
A PDA can have a local module that provides the "=A0connect to module=A0"
functionality across a modem, connecting to a distant server that
provides the "=A0filesystem=A0" functionality.
When a memory shortage happens or if the operating system does garbage
collection, the dependency manager can unload all modules on which no
other depends anymore. It can for example get rid of all modules that are
only necessary during the boot process, once the system is up, freeing
memory which is usually lost in the monolithic approach.
This flexibility allows as many [anything] as the system supports:
several disks are common, but several keyboards, mice, sound devices,
displays, etc. too. And there is no need to look for a device
differently, depending on the bus it is on - a keyboard driver can be
opened by a console driver, without bothering to know if the keyboard is
connected to the motherboard, to the Universal Serial Bus, or to another
machine over a network, a thousand miles away.
Ideally, boot and core modules code, and the dependency manager, are
written both in assembler, or the language with best performance on the
given architecture=A0; and a portable language, for easy porting to new
platforms, where that code can later be replaced by assembler.
Architecture-specific modules are written in assembler where feasible.
Most modules and the rest of the system are written in the most efficient
portable language, for which the compiler is available on every supported
platform, namely C.
Ideally too, all driver messages would be grouped outside of them so that
they can easily be translated.
Modules do not mix data and code, so that they can be burnt in ROM. This
is just a question of using a "=A0data=A0" section for all variables. The
dependency manager must be able to relocate the modules anywhere in
memory.
Their binary format does not vary whether they are stuck next to the
dependency manager because they are required to start the system, loaded
from disk or across the network=A0; they of course have to use the same
processor architecture though. If the system has a "=A0bytecode
translator=A0" function, we can imagine modules written in bytecode that
can be recompiled on the fly and be written once for all platforms.
The modules that make up the system manage indifferently hardware
(memory, network boards, disks...) and software (file systems, network
protocols..) resources, through an uniform interface.
Hardware resources that have to be managed in current computer systems
include=A0:
CPUs (those who are identical to the bootstrap processor)
Central memory (DMA-able, processor-only, DSP memory...)
I/O regions
Interrupts
DMA channels
Buses (motherboard, PCI, VLB, ISA, USB, Zorro, Nubus, Sbus, SCSI,
IDE...)
Ports (serial, parallel, keyboard, mouse, joystick..)
Inputs (keyboards, mice, joysticks, infrared, sound, midi...)
Outputs (displays, infrared, sound, midi..)
Transports (ethernet, plip ..)
Coprocessors (3D accelerators, timers, DSPs, micro-controllers..)
Mass memory (floppy drives, disks, CDROMs, CD-Rs, ZIPs, DATs...)
...
Whole new types of hardware can be added easily and elegantly, when
they appear on the market.
Central memory initialisation means that this modules sets up a list of
memory regions to be allocated and used by the CPUs, and provides the
"=A0memory=A0" functionality. It does not mean that no region will be added
after that. For example, a PCI board with a DSP and onboard memory,
detected later, can list its memory regions and provide the "=A0memory=A0"
functionality aswell for the motherboard CPUs to be able to write to it.
This works the same for all types of modules.
Software resources include=A0:
File systems
Disk cache
Null device
Network protocols stacks (TCP, ICMP, IP, IPX, PPP, HDLC...)
Advanced power management
Coprocessor emulation
Byte code translators
Linux executable runner
Serial mouse (provides software translation, connects to serial port
module)
Modem (same)
..
Parts of the system such as the actual user interface (console, windowing
system ..), and useful libraries (compression, image viewing ...), are
left to the operating system running on top of the management modules, if
the goal is for example to design a module creating Linux syscall
compatibility and allowing to run Linux software on top of that flexible
"=A0no kernel=A0" approach.
Extending this concept to a full operating system with native
applications, the actual user interface (console, windowing system..),
useful shared libraries (compression, image viewing ...), process
management and multitasking, users, protection, security politics, and
the actual applications, can be "=A0fine-grained=A0" as well, built as
modules that require and provide functionality.
Every of the modules keeps track of the resources it manages locally, so
that listing them will show resources that are both local to the system
and available transparently through the network. See appendix 1 for a
fictive example of how these resources are listed locally by every
module.
The initialisation sequence is decided by the "=A0dependency manager=A0",
loading the device driver modules in the logical order dictated
by the dependencies. A fictive example can be seen in appendix 2.
Ideally, as many as possible of the modules will run in "=A0userspace=A0",
letting the modules that access the hardware only, be in "=A0kernelspace=A0"=
,
provided that security can be enforced without resorting to slower
methods. We can imagine that modules all run in the same "=A0userspace=A0"
memory space, along with "=A0trusted=A0" (native=A0; verified, or forced as
trusted) processes.
"=A0emulated=A0" processes (such as Linux ones) and "=A0untrusted=A0" ones=
run in
their own memory space with paranoid security enforcement.
Comments are welcome.
Appendix 1=A0: Example of resource listing local to every module
motherboard memory, provides "=A0memory=A0"
Base memory 00001000-00A00000 (Type DMAable)
DMA memory 00100000-00FFFFFF (Type DMAable)
System memory 01000000-04FFFFFF (Type NonDMAable)
motherboard irq, provides "=A0irq=A0", requires "=A0memory=A0"
IRQ 0 (Type Fixed)
IRQ 1 (Type Fixed)
IRQ 3 (Type Fixed)
IRQ 4 (Type Fixed)
IRQ 5 (Type Fixed)
IRQ 6 (Type Fixed)
IRQ 7 (Type Fixed)
IRQ 8 (Type Fixed)
IRQ 9 (Type Dynamic)
IRQ 10 (Type Dynamic)
IRQ 11 (Type Dynamic)
IRQ 12 (Type Dynamic)
IRQ 13 (Type Fixed)
IRQ 14 (Type Fixed)
IRQ 15 (Type Dynamic)
motherboard i/o, provides "=A0i/o=A0", requires "=A0memory=A0"
ports 0000-FFFF
motherboard bus, provides "=A0bus=A0", requires "=A0memory=A0"
Motherboard 0 (Type Motherboard, 1 unit)
pci bus, provides "=A0bus=A0", requires "=A0i/o=A0", "=A0IRQ=A0", "=A0memory=
=A0"
Intel PCI Bus 0 (Type PCI, Motherboard bus 0, 255 units)
agp bus, provides "=A0bus=A0", requires "=A0i/o=A0", "=A0IRQ=A0", "=A0memory=
=A0"
Intel AGP Bus 0 (Type AGP, Motherboard bus 0, 1 unit)
isa bus, provides "=A0bus=A0", requires "=A0i/o=A0", "=A0IRQ=A0", "=A0memory=
=A0"
ISA bus 0 (Type ISA, Motherboard bus 0)
usb bus, provides "=A0bus=A0", requires "=A0i/o=A0", "=A0IRQ=A0", "=A0memory=
=A0"
Universal Serial Bus 0 (Type USB, Motherboard bus 0, 127 units)
Universal Serial Bus 1 (Type USB, Motherboard bus 0, 127 units)
ide, provides "=A0ide=A0", requires "=A0bus=A0", "=A0i/o=A0", "=A0IRQ=A0",=
"=A0memory=A0"
Intel 82371AB PIIX4 IDE controller 0 (Type IDE, Motherboard bus 0, 2
units)
Intel 82371AB PIIX4 IDE controller 1 (Type IDE, Motherboard bus 0, 2
units)
cpu, provides "=A0CPU=A0", requires "=A0bus=A0"
Intel i686 stepping 4 Pentium II (Type CPU, Motherboard bus 0, 2=20
units)
serial, provides "=A0serial=A0", requires "=A0bus=A0", "=A0i/o=A0", "=A0IRQ=
=A0"
Serial 0 (motherboard bus 0, 2 units)
parallel, provides "=A0parallel=A0", requires "=A0bus=A0", "=A0i/o=A0",=
"=A0IRQ=A0"
Parallel 0 (motherboard bus 0, 1 unit)
ps/2 keyboard, provides "=A0keyboard=A0", requires "=A0bus=A0", "=A0i/o=A0",=
=20
"=A0IRQ=A0"
PS/2 keyboard 0 (motherboard bus 0, 1 unit)
ps/2 mouse, provides "=A0mouse=A0", requires "=A0bus=A0", "=A0i/o=A0",=
"=A0IRQ=A0"
PS/2 mouse 0 (motherboard bus 0, 1 unit)
serial mouse, provides "=A0mouse=A0", requires "=A0serial=A0"
Serial mouse (serial port 0 unit 0, 1 unit)
modem, provides "=A0modem=A0", requires "=A0serial=A0"
Generic modem 0 (serial port unit 1, 1 unit)
vga, provides "=A0display=A0", requires "=A0bus=A0", "=A0i/o=A0", "=A0memory=
=A0"
VGA display
matrox mystique, provides "=A0display=A0", requires "=A0bus=A0", "=A0i/o=A0"=
,
"=A0IRQ=A0", "=A0memory=A0"
Matrox Mystique (PCI bus 0 unit 0, 1 unit)
s3 trio64v+, provides "=A0display=A0", requires "=A0bus=A0", "=A0i/o=A0", "=
=A0IRQ=A0",
"=A0memory=A0"
S3 Trio64v+ (PCI bus 0 unit 2, 1 unit)
sb awe32, provides "=A0wave output=A0", "=A0FM output=A0", "=A0General Midi
output=A0", "=A0midi output=A0", "=A0wave input=A0", "=A0midi input=A0", req=
uires
"=A0bus=A0", "=A0i/o=A0", "=A0IRQ=A0", "=A0memory=A0"
SB AWE32 Wave output (ISA bus 0, 1 unit)
SB AWE32 FM (ISA bus 0, 32 units)
SB AWE32 GM (ISA bus 0, 32 units)
SB AWE32 midi output (ISA bus 0, 16 units)
SB AWE32 Wave input (ISA bus 0, 2 units)
SB AWE32 midi input (ISA bus 0, 16 units)
floppy, provides "=A0disk=A0", requires "=A0bus=A0", "=A0i/o=A0", "=A0IRQ=A0=
"
PC 3"1/2 Floppy (Floppy, Floppy bus unit 0, 1 unit)
aha78xx, provides "=A0bus=A0", requires "=A0bus=A0", "=A0i/o=A0",=A0"=A0IRQ=
=A0",
"=A0memory=A0"
Adaptec 2940UW SCSI controller 0 (Type SCSI, PCI bus 0 unit 1, 16
units)
scsi disk, provides "=A0disk=A0", requires "=A0bus=A0", "=A0memory=A0"
Quantum Fireball 2.1 GB Harddisk (Harddisk, SCSI bus 0)
Quantum Fireball 1.2 GB Harddisk (Harddisk, SCSI bus 0)
Generic SCSI CDROM (CDROM, SCSI bus 0)
Yamaha CD-R 104 (WORM, SCSI bus 1)
ide disk, provides "=A0disk=A0", requires "=A0bus=A0", "=A0memory=A0"
Seagate AV 1.2 GB Harddisk (Harddisk, IDE bus 0)
Seagate AV 1.2 GB Harddisk (Harddisk, IDE bus 0)
Quantum Trailblazer 3.4 GB Harddisk (Harddisk, IDE bus 1)
ne2000, provides "=A0transport=A0", requires "=A0bus=A0", "=A0i/o=A0", "=A0I=
RQ=A0",
"=A0memory=A0"
NE2000 compatible (Ethernet, ISA bus 0)
plip, provides "=A0transport=A0", requires "=A0parallel=A0", "=A0memory=A0"
PLIP (Point to point, motherboard bus 0)
slip, provides "=A0transport=A0", requires "=A0parallel=A0", "=A0memory=A0"
SLIP (Point to point, serial port, unit 1)
tcp/ip, provides "=A0network protocol=A0", requires "=A0transport=A0",
"=A0memory=A0"
TCP/IP
network memory, provides "=A0memory=A0", requires "=A0network protocol=A0"
Memory on jen.mirus.fr, 00000000-07FFFFFFF (Type NonDMAable)
Memory on hope.mirus.fr, 00000000-03FFFFFFF (Type NonDMAable)
Appendix 2=A0: Example of boot sequence
Dependency manager - wishes to load a module with "=A0operating system=A0"
functionality, doing multitasking, console, application launch etc,
loading first modules with "=A0memory=A0" functionality (required by the
dependency manager) and "=A0display=A0" (for boot console).
Probes modules providing "=A0memory=A0" functionality
Loads module "=A0motherboard memory=A0"
Probes modules providing "=A0display=A0" functionality
Loads module "=A0VGA=A0"
Probes modules providing "=A0operating system=A0" functionality
Finds module "=A0os=A0"
Loads module "=A0os=A0"
Os needs "=A0memory=A0"
Probes modules providing "=A0memory=A0" functionality
Finds module "=A0motherboard memory=A0", already loaded
Os needs "=A0filesystem=A0"
Probes modules providing "=A0filesystem=A0" functionality
Finds modules "=A0ext2=A0" and "=A0fat=A0"
Loads module "=A0ext2=A0"
Ext2 needs "=A0disk=A0"
Probes modules providing "=A0disk=A0" functionality
Finds modules "=A0scsi disk=A0" and "=A0ide disk=A0"
Loads module "=A0scsi disk=A0"
Scsi disk needs "=A0bus=A0" and "=A0memory=A0"
Probes modules providing "=A0bus=A0" functionality
Finds modules "=A0agp bus=A0", "=A0isa bus=A0", "=A0usb bus=A0", "=A0ide=
bus=A0",=20
"=A0aha78xx=A0"
Loads module "=A0motherboard bus=A0"
Loads module "=A0pci bus=A0"
Pci bus needs "=A0i/o=A0"
Probes modules providing "=A0i/o=A0" functionality
Finds module "=A0motherboard i/o=A0"
Loads module "=A0motherboard i/o=A0"
Pci bus needs "=A0irq=A0"
Probes modules providing "=A0irq=A0" functionality
Finds module "=A0motherboard irq=A0"
Loads module "=A0motherboard irq=A0"
Loads module "=A0agp bus=A0"
Loads module "=A0isa bus=A0"
Loads module "=A0usb bus=A0"
Loads module "=A0ide bus=A0"
Loads module "=A0ide disk=A0"
Loads module "=A0fat=A0"
Os needs "=A0display=A0"
Probes modules providing=A0"=A0display=A0" functionality
Finds module "=A0matrox mystique=A0" and "=A0s3 trio64v+=A0" (on ext2 disk)
Loads module=A0"=A0matrox mystique=A0"
Loads module "=A0s3 trio64v+=A0"
Os needs "=A0keyboard=A0"
Probes modules providing "=A0keyboard=A0" functionality
Finds module "=A0ps/2 keyboard=A0"
Loads module "=A0ps/2 keyboard=A0"
Done. "=A0os=A0" is able to boot. Later on, network initialisation will spaw=
n
"=A0network protocol=A0" and "=A0transport=A0" modules, etc.
</bigger>