discussion:Goals Round III
Gary D. Duzan
gary@wheel.tiac.net
Wed, 23 Nov 1994 22:54:42 -0500
In Message <Pine.SUN.3.91.941123164211.25144A-100000@crl5.crl.com> ,
Mike Prince <mprince@crl.com> wrote:
=>On Wed, 23 Nov 1994, Francois-Rene Rideau wrote:
=>
=>> Most users don't write programs that need error recovery; or error recov
ery
=>> is straight-forward (i.e. if the computation is interrupted, go on when
=>> possible; if it eats all memory, then the algorithm is definitely wrong).
=>
=>Every program needs error recovery. Your program completes correctly or
=>there is an error. We've just been taught that if a program doesn't work
=>it's our fault and we should try new parameters or somewhat. Instead the
=>program should guide us through with its error recovery routine and get
=>the correct parameters to run.
Error status is a pet gripe of mine. Errors are BAD. And there
are hosts of things that can cause them, some of which we may not
think of or know about. IMHO, everything should generate an error
status. Even simple arithmetic operations can have overflows, etc.
Ideally, I would like to see every function and procedure generate
an explicit error value along with the results which could be
handled by the code or by a default mechanism, preferably without
too much pain. Of course, there are tradeoffs: C doesn't do bounds
checking (generally) for performance reasons. It has been a while
since I looked at it, but I think Ada might have some ideas here.
=>> Programmers shouldn't mangle with failures that are not related to the
=>> algorithm they use. If network security is not good enough to ensure the
=>> integrity of the computation, then let the system not distribute the
=>> computation on the net; but do *NOT* introduce failure recovery where there
is
=>> no need for one; let it be transparent.
=>
=>We introduce one of my "new" directions. I believe the programmer
=>should be aware that the code is going to be distributed, and be aware of
=>possible pitfalls. Including that of creating a "thread" and having it
=>never join. From what I've read from Fare, he seems to favor very
=>deterministic behavior from programs. These are at odds, but not
=>exclusionary.
This reminds me of the old RPC vs. DSM argument. DSM can have
a major performance impact on code, but allows distribution without
programmer knowledge (and gives a warm fuzzy to the multiprocessor
folks). On the other hand, RPC's can be quite fast but require
explicit programmer knowledge. Now, if we are working with our own
language, we can say that normal procedure calls may or may not be
distributed and work network failures into the usual error handling
mechanism, which hands much of the work to the compiler/operating
system.
As for recovering a distributed application, that can get really
hairy. Say an algorithm is distributed over three nodes, and one
fails. How do you restart the part that failed? You have to start
thinking about making all network activity transaction-based,
logging everything, etc., and even then you may not cover every
case. Some applications may need something like this, but it is
going to pile on plenty of overhead if we try to make it the
standard. IMHO, it is better to return an error code and let the
programmer deal with it. A library of transaction code would be
nice, of course.
=>> As for programming distributed databases (which is quite different from
=>> math problems)
=>
=>Up till now it's been different because computation has been limited to
=>the same machine for most applications. But what happens when most
=>programs are broken into pieces and farmed out?
Speaking of databases, since there is a lot of data floating
around an operating system, it makes sense to organize it into a
database format. And if it is going to be done, it may as well be
done right and given as a tool for programmers to build their own
databases. Do it right, and distributing it is the same as
distributing anything else.
Gary D. Duzan
Humble Practitioner of the Computer Arts