Pipeline Publishing, Volume 3, Issue 11
This Month's Issue:
The Long Arm of Telecommunications Law
download article in pdf format
last page next page
Carrier Grade: The Myth and the Reality of Five Nines
back to cover

article page | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |

So competing redundancy requires a carrier and vendors to have strong architectural oversight groups which look at both the big and the small picture to catch these redundant efforts and unexpected system interactions of multi-layered redundancy. Such a group is economically justifiable because redundancy costs a lot. It is certainly responsible for most of the difference in costs of carrier-grade verses consumer grade equipment. Optimizing multi-layer and intersystem redundancy will pay for itself via streamlining these otherwise competing or redundant redundancies.

Measuring System Level Availability & reliability

Measuring system availability is actually different from individual element availability. If a string of network elements are directly dependent on each other, say a common circuit which passed though each one, than the availability of the system follows the law of multiplying probabilities. In a 4 node system each of which is five-nines reliable, than the availability would be .99999 * .99999 * .99999 * .99999 = .99996. This partially explains why five-nines was chosen – because it is so close to unity/one that the degradation of availability due to directly dependent probabilities still is very reliable. So the network effect is reduced. (Choosing 0.95 reliability as a standard, for example, would mean that with a string of 12 dependent nodes, one would be in failure mode about half of the time). But with everything built to five-nines, if just one bad apple exists (say with even a stated four-nines reliability), than the string of four nodes as a group becomes .99987 - very close to the reliability of the lowest performing element. In fact, in this case the close-to-unity of the other devices nearly removes them from the equation; the dependent string will always be very near the value of the bad apple, which can be very bad if the apple is actually rotten. In this situation all of the careful design and investment in carrier-grade devices of five-nine reliability becomes economically worthless.

But actual networks are not simply directly dependent one on another. We have seen that redundancy is used to shield the service running in a network from the effects of a single failure in the components of the network. In reality, forms of Markov computations are the best tool for computing availability/reliability in multi-path redundant systems. “Markov Analysis (MA) is a powerful modeling and analysis technique with strong applications in the time-based reliability and availability analysis. The reliability behavior of a system is represented using a state-transition diagram, which consists of a set of discrete

The US government has provided a realistic benchmark with its Networx procurement specifications.


states that the system can be in, and defines the speed at which transitions between those states take place. As such, Markov models consist of comprehensive representations of possible chains of events, i.e., transitions, within systems, which in the case of reliability and availability analysis correspond to sequences of failures and repair…. The Markov model is analyzed in order to determine such measures as the probability of being in a given state at a given point in time, the amount of time a system is expected to spend in a given state, as well as the expected number of transitions between states, for instance representing the number of failures and repairs.”

But the true value of Markov computations comes in applying it at the design stage. Markov mixing process in complex systems reveals little depressions of stability - think of marbles on a rubber sheet; stable depressions in the sheet form dry lake beds where the marbles will collect. Even if the sheet is slightly shaken, the marbles will bounce and role around, but most will stay inside their depressions – local zones of stability. This can allow richly interacting networks of only moderately high component reliability to be of higher reliability working together than each component devices achieves alone. In fact this is the real world of good network design. While few network designers actually use Markov mathematics explicitly, their decisions taken from a perspective of the complex network as a whole, with all of its interacting and redundant systems, frequently create Markov spaces.

Other approaches should be in the designer’s tool-kit. Decision Analysis is can help determine the consequences of any specific design choice on the system and ability to meet utilitarian goals. But there are graphical ways of representing this and simplifying design. Perhaps more frequently used would be Reliability Block Diagram (RBD), Event Tree Analysis, or a Fault Tree diagram.

It all works together

“Downtime usually translates into significant productivity and revenue losses for many enterprises. Maximizing network uptime requires the use of operational best practices and redundant network designs in conjunction with high-availability technologies within network elements.” [Cisco]

Availability is of course only one part of the complex concept of Carrier-grade. Several other “ilities”: Reliability/Dependability,

article page | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
last page back to top of page next page

© 2006, All information contained herein is the sole property of Pipeline Publishing, LLC. Pipeline Publishing LLC reserves all rights and privileges regarding
the use of this information. Any unauthorized use, such as copying, modifying, or reprinting, will be prosecuted under the fullest extent under the governing law.