Preprints & Reprints
Back to Preprints & Reprints > Publications & Opinion > Homepage


Failure type forecasting via burst errors
Richard Butler, Peter Cochrane

A transmission system performance monitor based on forecasting probable cause of failure by error event analysis is described with results for a HDB3 line coding.

Introduction
Modern digital systems are inherently capable of providing error free operation and a performance monitoring strategy that regards error activity as a portent of failure is therefore feasible [1]. This basic approach has previously been described and practically demonstrated for an unstructured (random) bit stream. Here we present results from laboratory trials and simulations for coded data on a 2Mbit/s optical fibre system using an HDB3 line code [2].

Present Performance Monitoring Strategy
In-service performance monitoring of digital transmission systems is most commonly assessed by line code or frame alignment errors. Network operators currently employ Bit Error Ratio (BER), Error Free Seconds (EFS), or CCITT recommendation G821 [3] metrics to assess and monitor performance. These involve detection thresholds set between acceptable and unacceptable error performance.

The advantage of this strategy is that it provides a simple and quantifiable quality control mechanism. The disadvantage is that it is a "just too late" strategy. Alarms are raised and action taken when systems fail or are violated. The nature of the fault or its cause may not be obvious, or even known, and maintenance cannot be planned in advance.

Proposed Performance Monitoring Strategy
Well designed optical fibre transmission systems now operate free from errors and a non-traditional view of systems is possible with error activity reclassified as a symptom of an ailment and precursor to failure. Ideally, this should allow diagnosis and facilitate preventative action, prior to the condition becoming critical. In our monitoring strategy [1,4] error activity is continually recorded, analysed and metrics calculated. These are used as precursors and predictors of likely failure. By comparing temporal metrics of error activity it is then possible to forecast the cause of system failure for at least a limited class of events.

Metrics
The metrics employed are based on the analysis of the power spectral density P(w) of error bursts [4]. For an error burst e(t) of duration n bits of epoch T seconds:

where ......(1)

To characterise a burst of duration 'd' bits only the distribution of amplitudes of these cosine terms need be considered. This distribution is treated as a probability density function, for which moments are calculated. The mean (m) is used as a principal metric where,

......(2)

Metric Variation
Measurements have been performed with known error generators introduced into a 2Mbit/s transmission system using an HDB3 line code. A simulation of the process has been developed to compute the error probability for each transmitted bit. The metrics of the error burst power spectral density have been calculated and compared with the measured results. This demonstrates that the metrics can detect changes in the transmission degradation type and intensity introduced.

Decay Time Constant: The family of interferers is described as follows, where v(t) is relative to the normalised data amplitude and period:

for t = 0,1,2 ...... 320 ......(3)

Measured and simulated results for the mean metric for various decay time constant (t) is given in Fig 1, where it can be seen they are in close agreement.

Duration: The family of interferers employed is described as follows:

for t = 0,1,2 ....... d ......(4)

Measured and simulated results for the mean metric for various durations (d) are given in Fig 2. Again, the simulated and measured values are in close agreement. Note the subtle difference between the positive and negative first peak interferers. The simulation in this case used a zero DC offset on the decision thresholds. In the measured results slight DC offsets are present and gave a rise to the noticeable differences between the positive and negative cases.

Component Degradations
In the course of the measurement programme other system defects and degradations were also identified. Referring to Fig 3, the LED in the optical fibre transmission system was driven in a linear region, whilst in Fig 4 it was driven in a non-linear region. In the linear case the mean metric increases as each positive and negative half cycle is added. However, with non-linear operation the mean metric increases during negative half cycles and shows a negligible increase during positive half cycles of the interferer.

Monte Carlo Simulation
The results from 1000 simulation runs for the decaying sine wave interferer (1) are shown in Fig 5. This illustrates the mean metric variation with a clustering in the region of 70 bits. These results are further expanded in Fig 6 with the inclusion of decay time constant on the scene axis. The metric trend is similar to that of Fig 1 with a near linear increase between 200 and 500 bits. Beyond this range little change is seen in the mean metric.

Conclusion
A new performance monitoring strategy, based on forecasting and postulating failure, has been demonstrated with HDB3 line coding. Metrics enabling a practical implementation have been described and their ability to detect changes in external interference and system degradations have also been demonstrated. The measured results indicate that the metrics for bit and code errors vary in a similar, almost identical manner, for the given parameters. Using these metrics it would appear feasible to be able to predict some system failure types and perhaps prompt pre-emptive maintenance action.

References
1 BUTLER R A and COCHRANE P: 'Correlation of Interference and Bit Error Activity in a Digital Transmission System', IEE Electronics Letters, Vol 26, No.6, March 90, pp 363-364.
2 COCHRANE P, HALL R D, MOSS J P, BETTS R A and BICKERS L: 'Local-Line Single Mode Optics - Viable Options for Today and Tomorrow', IEEE Jnl SAC-4/9, Special Issue on Fibre Optic Systems for Terrestrial Applications, Dec 86, pp 1396-1403.
3 CCITT Recommendation G821, 'Error Performance of an International Digital Connection Forming Part of an Integrated Services Digital Network'., Blue Book. 1988.
4 BUTLER R A and COCHRANE P: 'The Use of Correlation of Error Activity as a Diagnostic Tool', IEE 3rd Bangor Symposium on Communication, UCNW,Bangor, North Wales, May 91, pp 214-218.