discussion:Goals Round III

Mike Prince mprince@crl.com
Sat, 26 Nov 1994 12:35:57 -0800 (PST)


On Wed, 23 Nov 1994, Gary D. Duzan wrote:

> Ideally, I would like to see every function and procedure generate
> an explicit error value along with the results which could be
> handled by the code or by a default mechanism, preferably without
> too much pain.

Here's my spin on that.  Every function has an error handling routine.  
When an error occurs, execution is redirected to that routine instead of 
returning an error code to the caller.  That routine can either handle 
the error, ignore the problem, or pass an error return code back to the 
caller.

>    This reminds me of the old RPC vs. DSM argument. DSM can have
> a major performance impact on code, but allows distribution without
> programmer knowledge (and gives a warm fuzzy to the multiprocessor
> folks). On the other hand, RPC's can be quite fast but require
> explicit programmer knowledge. Now, if we are working with our own
> language, we can say that normal procedure calls may or may not be
> distributed and work network failures into the usual error handling
> mechanism, which hands much of the work to the compiler/operating
> system.
>    As for recovering a distributed application, that can get really
> hairy. Say an algorithm is distributed over three nodes, and one
> fails. How do you restart the part that failed? You have to start
> thinking about making all network activity transaction-based,
> logging everything, etc., and even then you may not cover every
> case. Some applications may need something like this, but it is
> going to pile on plenty of overhead if we try to make it the
> standard. IMHO, it is better to return an error code and let the
> programmer deal with it. A library of transaction code would be
> nice, of course.

I'd do a combination of the above for controlling/recovery of distributed 
processing.  First, have the programmer explicitly set the bounds for 
code or execution.  For instance this thread/agent cannot pass to remote 
systems, or this code can migrate within our domain, but not out of it.  
I'll talk about that more if you'r interested.

Secondly, I agree with returning error messages and letting the 
programmer deal with it.  Although the easiest and most popular will be 
to invoke "common" recovery modules (perhaps supplied with the higher OS 
or language) by passing the error through.  

>    Speaking of databases, since there is a lot of data floating
> around an operating system, it makes sense to organize it into a
> database format. And if it is going to be done, it may as well be
> done right and given as a tool for programmers to build their own
> databases. Do it right, and distributing it is the same as
> distributing anything else.

I believe the low-os will have some database like primitives to deal with 
managing objects, but to force them to compete with real-db primitives 
would be unfair.  What we could do is to build a more powerful set of db 
primitives into our mid/higher OS that would then be considered 
"standard" for a particular flavor of our OS.

Mike