discussion:lazy, goals etc.
Sat, 26 Nov 1994 22:00:29 +0800
Phew. Um, just take a step back and look carefully at the things that
are being discussed. I think we are making this _far too_ complex at
this stage. Complexity that is the result of a clear design is one
thing. You can understand every part of it. Complexity now, before
there is even a hint of a design, is going to burgeon into something
that will make NT look like CP/M.
One of the primary ideas of MOOSE TUNES (note that we still don't really
have a name yet) is to be able to use a distributed network of processors.
When you distribute something, you change a lot of basic assumptions that
are made in uniprocessor systems.
1) coherency - is what I see what you see, at the same time?
A) uniprocessor - no problem, it is assumed. Since there is really only
one thread of control in the _entire_ system, simple things like
semaphores will take care of the few conflicts.
B) multinode - a problem. A big problem. It is hard enough to tell
when you and I are talking about the same data object let alone
telling if we have done some sort of interleaved modification. A
shared object, A, might reside on a node other than the programs trying
to modify it. The host node of A may decide to rearrange queued
accesses in a different order than temporal, or transit times may
make accesses come out of order. Even with time stamps, things get
really ugly. Add local caching and the game gets even more interesting.
2) reliability - how certain am I that I am not getting garbage?
A) uniprocessor - simple hardware makes sure that what you read is
what is really there. If something fails, it is easy to tell, your
computer dies or dumps core or just goes away to silicon heaven.
B) multinode - problematic. The only way I can be sure of information
from another node is to totally redo it. If my simulation matches
what I read, then everything is fine. Unfortunately, this tends to
mean that I have to do everything locally. Alternatively, I can have
ten nodes do exactly the same thing. If nine agree on the results,
then I can be fairly sure that those results are correct. I can use
all kinds of protocols like adding CRC checks to help catch some low
level errors. I can add higher level protocol checks to catch other
kinds of loss. Question is, if I have a value that I am fairly sure
of, but not absolutely certain of, how to I communicate that up the
data food chain? How do I communicate failure in general?
3) time - is my time your time?
A) uniprocessor - definitely. There is only one place in time and space
where things can be changed. That serializes things nicely.
B) multinode - it's all relative. Unlike physics, the speed of
information in a computer is very dependent on where the information
is coming from and where it is going. Suppose I have a multinode
system set up like this:
W --- X --- Y --- Z
Suppose that W sends something to Y at time t1. Each hop from node
to node costs 10ms. Thus, the message from W will get to Y at about
t1+20ms. Suppose that at t1+5ms, Z sends a message to Y. Which
message gets there first: the one that was sent last. Imagine what
this can do to coherency. It could be worse. Perhaps the link
between Y and Z took 100ms for a message. Then, even though the
number of hops from W to Y was higher than the number of hops from
Z to Y, W would be "closer" to Y than Z. Suppose that the delay is
not fixed, but is dependent on the amount of traffic over the link
plus a few fixed delays (i.e. nearly any type of widely used computer
network particularly Ethernet-based ones). Get the picture?
(also see Data Age etc.)
When working with a multiprocessor system, particularly multinode ones,
a lot of the assumptions made in single processor systems have to be
tossed out the window. You can't be sure that the data you get is not
garbled. You can't be sure that the data you sent is going to get to
It is all very well to talk about recovery, but you have to be able to
tell just what you are recovering in the first place. Consider the total
system state. If we want to talk about recovering from an error, we need
to be able to wind some part of the system state back just before the error,
and fix things and then redo some processing. Sometimes it is easy to
know what that part of the state was. If my FPU generates a divide-by-zero
error, I am fairly certain that the last divide performed had a zero
divisor. Of course, in a totally distributed application, it may be a bit
hard to tell just which divide was the last one...
Talking about system state at a given time is like talking about the state
of the universe at a particular time. Einstein and Heisenberg (sp?) have
a lot to say about that. The analogy to relativity was made a few weeks
back. Fortunately, in the "real" universe, the speed of light remains
a nice constant (at least it certainly appears to). In a networked system,
the speed of information (analogous to the speed of light) is both very
slow (compared to light) and it changes! Causality goes out bar hopping
at this point. At light speed, information less than one second old can
come from about the moon. On a bad day, across a network, information one
second old might come from one hop away.
Information space does _not_ behave like the visible universe. We cannot
rely on our intuitive understanding of concepts like causality and
simultaneity. They just don't work in the same way.
All this boils down to one thing. Uncertainty. Few of the things we take
for granted on a uniprocessor system are certain on a distributed system.
Handling uncertainty is the most important single problem that must be
solved in a distributed system. Most solutions aim for "good enough".
Transactions are nice, but what happens if some bozo just unplugged
the net connection to the machine you were transacting with? Will
you hang for a few days waiting? Will you timeout? If you do decide
that the system has a problem, how do you report that to the next
program in line for your data? How do you do this so that "Hello World"
is not 600MB?
Transactions, error codes etc. These are possibilities, but they make
a lot of assumptions about reliability. Suppose someone starts a
transaction with me and then _they_ fail. Now what? An error occured,
but the entity I really want to report this to is the one that failed.
Maybe I can attach an error code to every single communication in the
system. If the message does not get through, I lose the error code
as well. What if the system has failed so much that I get a totally
different message from the one I am expecting (lack of causality), but
that the message is perfectly good?
There are ways to work around these problems. The best ways are those
that unify error detection _and_ correction and allow graceful failure.
Slapping on a "solution" for each individual problem will result in
something like the mess of Windows networking. Let's approach these
problems holistically. We have three subproblems: reliability,
causality and efficiency. We want our data certain, on time and with
a minimal overhead. Perhaps different levels of certainty, timeliness
and overhead are acceptable for different applications. What are
possible solutions and what guarantees do we get from each? Try
to look at each solution with the three subproblems in mind.