 |
Homepage / Publications & Opinion / Archive / Articles, Lectures, Preprints & Reprints

Assessing The Scale Of Telephone Network Outages
John Foster & Peter Cochrane
Summary
A logarithmic disaster scale for telecommunication network outages is presented. The definition follows the approach previously developed by Richter for earthquakes. Previously reported network failures are cited as example points on the scale together with their stated cause.
Introduction
Over the past few years there have been a number of spectacular telecommunication network failures on a scale not previously experienced or recognised [1-5]. Quantifying the impact of these and future failures is now both of interest and essential if measures and future network design is to be correctly focused. The key difficulty is the diversity of the failure types, causes, mechanisms, correction, avoidance and customer impact in each case. We had a requirement to specify a simple means of ranking network failures by severity, such that it could be readily understood by the non-specialist as well as by the trained engineer. We found upon investigation that the Richter scale for ranking the severity of earthquakes could be modified to meet that need, bringing the advantage of an established familiarity across a broad range of potential users.
Above all our need was for a simple intuitive measurement, requiring the minimum of input data. For any simple measure, network outages vary over many orders of magnitude, so a logarithmic scale was sought from the outset. Absolute accuracy, while important to those investigating outages or seeking to protect networks from them in future, was not a strong requirement We were also concerned that our measure reflected the intrinsic severity of an outage, as opposed to its impact or the level of attention it received. An outage in the early hours of the morning has arguably far less impact on customers than the identical event in the peak of the working day, and the level of public attention given to an outage will diminish with distance from the affected area. Our measure needed to be independent of these effects.
The Scale
Numerous approaches were examined and discarded on the basis of their inherent complexity and focus. Ultimately it was decided to base the measure on the time integral of the number of people affected by the loss of service. It is convenient to choose the hour as the basic time unit, which also makes the units compatible with the familiar Erlang. Whilst it is possible to take the logarithm of the Erlang quantity number to any base, the simplest and most convenient choice turns out to be base 10. Following the approach of the Richter scale, we therefore defined the total network information capacity outage or loss of traffic in customer affected times as follows:
? ? ? ? ? ? ? ?D = log10 NT ? ? ? ? ? ? ? ? ? (1)
? ? where N = number of customer circuits affected
? ? ? ? ? ? ? ?T = total down time
This metric is shown in Fig 1 together with a spread of reported network failures [1-5] to give a frame of reference in terms of perceived disaster size and the position on our scale.
Calibration
The base 10 logarithm of a quantity which gives an indication of intrinsic severity is the root, not only of our measure, but of the Richter scale of earthquake magnitude. We seek to exploit that commonality still further, by comparing the scale levels reached by some typical and extreme events. For example;
- On the earthquake scale, a magnitude of around 6.0 has special significance, because it marks the (fuzzy) boundary between minor and dangerous events. A magnitude of 6.0 on our outages scale would represent an event in which, say, 100,000 people lost service for an average of 10 hours - say a large town being cut off for the whole of a working day.
- Beyond this, earthquakes in excess of 7.0 magnitude are definitely considered major events. In telecommunications networks outages above 7.0 are mercifully rare, but the series of related outages in the US [3,4] in the summer of 1991 would rank at such a level. Globally there appears to have been only one outage that exceeded level 8 on our scale[1,2]. Estimated readings near 10.0 have so far occurred only for earthquakes [6].
In Fig 1 we have included qualitative ranges of media, regulatory and governmental reaction relative to our disaster scale.
Partial Outages
Where an event occurs at a local level, there is complete loss of service to the customers involved and the metric is simple to calculate. If the problem involves loss only of wider-scale services, such as a trunk network outage or a severing of international links, then due allowance should be made for the partial nature of the loss. An exact calculation would depend on local traffic patterns, but in the general case the loss of service can be weighted according to the traffic concentration n.N.T, where n is the proportion of people affected. So if we reckon on 30% of calls occupying trunk circuits, and about 3% crossing international boundaries; then a denial of trunk and international service will be measured at 0.30 N.T and 0.03 N.T respectively. In terms of our scale points, these correspond to lowering the basic NT figures by approximately 0.5 and 1.5. Our scale is thus relatively insensitive to traffic patterns and localisation effects.
Conclusions
We have provided a new means of assessing and rating the significance of telecommunication network failures from a customer-affecting reference. The measure can be rapidly assimilated by virtue of its relationship to the well known Richter scale, and the ranges given have a similar correspondence with practical experience and social disruption.
References
- Mason, C. "Software problem cripples AT&T long-distance network", Telephony, vol 218/4, 22 Jan 1990, p 10.
- Neumann, Peter G. "Some reflections on a telephone switching problem", CACM vol 33 no 7, Jul 1990, p 154.
- "SS7 errors torpedo networks in DC, LA". Telephony, vol 221/1, 1 Jul 1991.
- "DSC admits software bug led to outages". Telephony, vol 221/3, 15 Jul 1991, pp 8-9.
- Davenport, P. "Scarborough returns to electronic dark ages". The Times, issue 63864, 15 Nov 1990.
- "50 killed in Kirghizia earthquake". The Times, 21 Aug 1992, p 6.
|