[unios] m-kernel

David Jeske jeske@home.chat.net
Thu, 14 Jan 1999 14:56:53 -0800


On Thu, Jan 14, 1999 at 01:17:07PM -0800, OJ Hickman wrote:
>>> [OJ Hickman]:
>>> I think that shared memory based systems show the most potential for
>>> competitive performance. When ever posable services should be
>>> implamented in shared class objects. These are both data and
>>> code encapsulated in a shared object and so can serve as a form of IPC
>>> or hardware driver. [Some process blocking may be needed]
>> 
>> [David Jeske]:
>> Show the most potential for competitive performance? Lets just be
>> clear, shared memory is faster than stream based IPC.
> 
> [OJ Hickman]:
> So you answered your own question?

Absolutly not. My issue with microkernel style designs is not the
speed or types of the communcation channels. My problem is that they
put static code into compile programs which are dependent on _using_ a
given communication channel to convey a 'logical' message.

Here is an example. Many people have been compiling programs with some
kind of RPC stubber, the Mach people made quite a bit of use of RPC
stubbers. Then the UTAH people come alone, and write Flick. It's an
RPC stubber which generates significantly faster code. However, all
the existing software out there has static RPC code stuck in it. Then
someday people start deciding to use shared memory to do IPC instead
of stream, but all the programs out there are using streams, so we
have to recode or restub for shared memory. Then, we decide we can
come up with a better way to exchange data over shared memory, we
again have to restub or recompile all the software out there.

_that_ is my problem with microkernels. They relegate the
responsibility for turning a 'logical' communication channel into a
physical one to the end user software. 

>> [David Jeske]:
>> My problem is that the kinds of things I like to do involve plugging
>> things together becased on their semantic intefaces, not based on
>> whether or not they were put in the same process, or are using shared
>> memory based IPC or stream based IPC.
> 
> [OJ Hickman]: 
> So your main concern is JUST design philosophy?

Absolutly not. I liken the problem to the reasons we invented shared
libraries. In the early days, there were no shared libraries, you
linked all the code you needed into every program. This meant if you
made a fix or change to a library, you had to recompile all
targets. If you improved the speed of the code in a library, you had
to recompile all targets. Systems where programs are responsible for
including their own code to map logical interfaces into some form of
IPC are making the same mistake. We need to factor that work out and
allow the system to create the code which binds together two logical
interfaces.

>> [David Jeske]:
>> If you are asking me why I don't want to do this.... The answer is:
>> Because I don't like the idea of having to rigidly decide, and
>> compile, a block of code as a 'shared object' or a 'server'. I want to
>> compile a block of code which exports an interface. I want other
> blocks of code to talk to that interface. I want the system to decide
>> whether to inline the two blocks of code together (i.e. for maximum
>> speed) or to run the two blocks of code on two different machines via
>> network IPC.
>
> [OJ Hickman]: 
> You imply taking control out of human hands and forcing us to
> write to generic object interaction interfaces unable to take
> advantage of the strengths of eather shared objects or IPC.

I don't see it that way. You imply that you are going to have global
interfaces for "[IPC, time, scheduling, memory]". I don't want
programs to have to always communicate through the 'least common
denomenator'. If it makes sense for two programs to communicate a
stream of bytes, they can use a stream interface, but if it makes
sense for two programs to communicate with annother interface (i.e. a
mail send interface, or a drawing interface), the system should be
aware of that level of communication. 

I guarantee compilers like Flick can do a better job of writing fast
RPC code than you can. 

I'm trying to put the power to compose software out-of smaller
components into human hands. 

> [OJ Hickman]:
> More then that, you imply giving control to an all wise
> [and hypercomplex] 'reflector' or some other sort of optimizer.

You are using loaded words. However, you are correct. I'd prefer to
have an highly optimized program convert my logical interface into
actual IPC both because: (1) it's going to do a heck of alot better
than I can in most cases, and (2) I want to leverage the work of
another group of programmers by allowing improvements to RPC
technology to improve the speed of my program.

However, the optimizer itself is just software too. Sure you will have
generic optimizers which will convert interface based communication
into actual code. You already do. I'm just advocating moving this step
into the run-time software system instead of having a developer doing
it and losing the ability to optimize/improve it later. However, there
is no thing to prevent you from writing an optimizer for a single case
which just has a bunch of custom code to handle IPC. The distinction
is that I should be able to throw out your custom IPC code and that
software should still connect based on it's local interface. 

I'm just arguig that "Application specific solutions are always faster."

For example, distributed shared memory based communication is inferior
to stream RPC, because when you are writing a DSM system, you do not
know where the communication boundaries lie. You are trying to make a
stream based communcation channel (like a network) make two pieces of
memory look like they are connected together. So if you based a system
on shared memory communication because it's faster for the local
machine, you will forever have sub-optimal performance for remote
communnication. However, if you have components communication based on
logical interfacs, then the system can put a shared memory based stub
when the communication is local, and a stream based stub when the
communication is remote.

>>> [OJ Hickman]:
>>> I think a lot of IPC is avoidable without 'macro-ify' the over all
>>> service.
>> 
>> [David Jeske]:
>> You propose making 1 and 2 shared objects and 3 a storage
>> server. Which means 1 and 2 do not get the same level of 'safe'
>> protection that a separate process server does.
> 
> [OJ Hickman]:
> Safer than a macro-subsystem. Shared objects are fairly isolated
> by the class interface and the debuger should be able to locate
> errors to within the problem object. Yes this is a trade off -
> one that should be made by the implamentor.

Yes, safer than a macro-subsystem. However, we can do better than
that. We can get safety for all components and speed too.

>> [David Jeske]:
>> However, in response, I think you are correct, and for me, the answer
>> is, stop compiling the implementation details of the communication
>> channel into the executable. I'd rather use a run-time
>> binding/compiling solution to make target specific code which handles
>> the specifics of IPC.
>
> Thank you.
> 
> But, sorry, I think [most] humans are smarter then any operating
> system will ever be. Control needs to kept be in human hands.

I would like to answer this in two ways. (1) control WILL be in human
hands, and (2) I don't think it's about _smarts_.

(1) control WILL be in human hands

Humans will write all the code in the system. You are arguing the age
old C vs. Assembly argument, and I think it's been settled
already. People will use the tool which gives them the control and
speed they want for a given job. I'm not advocating removing control
from human hands, I'm advocating _adding more control_. Instead of
hard coding RPC code into a program, I'm advocating drawing a line
between the 'logical' API of the program and the RPC code. If the
programmer wants to write his own custom shared memory RPC code,
that's fine. However, I should also be able to let the system use
Flick compiled stream RPC code instead when that program needs to talk
over a network. If you want to write a custom RPC stub for talking
over a network too, that's fine, I don't have any problem with
that. However, the system should be able to understand how to talk to
your program at a level higher than 'shared memory' or 'stream'.

(2) It's not about smarts.

Consider the Flick RPC compiler, and just about any modern procesor C
compilers. The average programmer will not be able to beat these
compilers, and even if they could, they will not spend the time to
beat them for most software.

At work here, I was in a talk about optimizing for a particular
microprocessor. During this talk they discussed various techniques for
optimizing assembly. One of the optimiztion techniques is a perfect
example of the kind of way in which 'making humans doing it by hand'
actually makes all software slower. It turns out there are some
operations in the which are hard-coded risc ops, but there are some
which are microcoded. Because of how they do their instruction
decoding, IF one of these microcoded instructions falls across a
cacheline boundary, it will take longer to execute than if it had
remained completely within the cacheline boundary.

How much software do you think goes through the steps necessary to
make sure these instructions don't cross a cacheline boundary? I'm
betting less than 1% of all code. Imagine how much code is out there
which is not optimized for this issue. If we delivered target code in
a form where the system could perform this optimization, then TONS of
code would run faster.

To summarize...Can a human writing hand optimized code _for a given
target_ beat the compiler? Of course he can. However, my points are
that:

1) code runs on targets it wasn't optimized for. Whether that is
   assembly which was optimized for a different processor in a processor
   family, or RPC code which was optimized for local machine shared
   memory instead of stream based IPC is irrelevant. 

2) Most code is not performance critical. If we _require_ the programmer
   optimize code, then most code will _not_ be optimized. If we _allow_
   the programmer to optimize code, but make the system also capable
   of optimizing code, then _most_ or _all_ code will be optimized.

>> [David Jeske]:
>> As long as we limit ourselves to producing static binaries at compile
>> time, the tradeoffs of existing kernel's and software systems will
>> always exist. Namely, some kind of 'safety vs. run-time speed'
>> tradeoff.
> 
> [OJ Hickman]:
> You seem to say that my ideas are good for real world
> problems but don't fit into some philisophical framework.

Absolutly not. I'm saying that your ideas are going to limit what kind
of performance and flexibility can be achieved because they are not as
optimized for leveraging the practical work of different programmers
as they could be.

-- 
David Jeske (N9LCA) + http://www.chat.net/~jeske/ + jeske@chat.net