[gclist] Re: gclist-digest V2 #110

Carl Bruggeman bruggema@ranger.uta.edu
Thu, 22 Jan 1998 11:18:18 -0600

> The "1 in 1,000" raises a question: what is the real probability,
> and if known, is it acceptable?  On one project, I was faced with
> implementing something either using a simple technique with a 
> random 1 in 10^n chance of failure, or a complicated technique
> with no (obvious) chance of failure.  I had control of n.
> I decided on the simple technique because it would result in a more
> reliable system.  Why:
> 	1) Extra time spent on the complicated technique could otherwise
> 	   be spent more profitably on fixing known bugs that occur
> 	   much more frequently than 1 in 10^n.
> 	2) Given my track record on writing perfect code, it was likely
> 	   that the complicated technique was going to end up failing
> 	   more than 1 in 10^n.
> It is true that the number of users and runs per users makes a difference.
> And the probabilities are often difficult to figure.  But if the true 
> probability is actually quite small relative to other problems in the
> system, then worrying about that is almost surely misplacing one's attention.

> Worrying about absolute perfection is the province of academics.
> In the industrial world, we have to evaluate relative risks.

This is the second time that someone has hung the sign "academic
perfectionist" sign on me when my message explicitly discussed risks
for production systems, and in fact was precisely the point that you
are making above:  all choices have tradeoffs and associated costs. 
One large cost of _production_ systems is support -- answering
customer complaints about anomalous behavior.  This cost can be
ignored in academic and research systems.  My first message called for
people to report on experience with large systems that might give us
an idea what the probability P is and what the average failure rate
might be.  My second message suggests a way to determine what P might
be for emacs.

The question that must be answered for all systems is what is "a
relatively small value for P".  This is not a constant, but depends on
the cost of failure.  For code controlling a reactor or space shuttle,
you clearly would have chosen option 2 because of the high cost of
failure.  For prototype and research systems the cost is clearly low. 
The cost of failure for using an imprecise collector in emacs is
difficult to determine.  One cost will be increase support for
answering email, etc.  This cost is proportional to your installed base
and the failure rate.  The are more intangible costs as well.  Some
people may have a bad experience and switch, etc.  Emacs could develop
a tarnished image as "leaky", etc.  The emacs team has to decide what
these more intangible things are worth to them.


Carl Bruggeman -- bruggema@cse.uta.edu -- Phone: (817) 272-3600 Fax: 272-3784
Computer Science and Engineering Department -- University of Texas at Arlington