Sensor Fault Detection via Multiscale Analysis of ... - ACS Publications

Jun 13, 2003 - frequency, and (5) diagnosis of the existence of faults via tests on the residuals. The proposed strategy is able to eliminate the effe...
0 downloads 0 Views 236KB Size
3372

Ind. Eng. Chem. Res. 2003, 42, 3372-3380

Sensor Fault Detection via Multiscale Analysis of Prediction Model Residuals Rongfu Luo, Manish Misra, Tyler Soderstrom, and David M. Himmelblau* Chemical Engineering Department, The University of Texas at Austin, Austin, Texas 78712

In this paper, a new approach to sensor validation in real time is described that is based on (1) representation of the sensor signal by wavelets, (2) decomposition of the signal for different frequency ranges, (3) formation of a prediction model using information at an appropriate frequency (level of decomposition), (4) calculation of residuals, namely, the difference between the predictions of the prediction model and the values of the signal decomposed at the same frequency, and (5) diagnosis of the existence of faults via tests on the residuals. The proposed strategy is able to eliminate the effect of noise and process changes from the effects of physical changes in the sensor itself. To demonstrate the effectiveness of the proposed strategy, a simulated noisy signal from a thermocouple in a continuous stirred tank reactor and a temperature signal from an operating pilot distillation column were analyzed. The results of the diagnosis indicated that the proposed strategy had a low type I (false alarm) rate and a satisfactory type II (failure to detect faults) rate. 1. Introduction Sensor validation refers to determination of whether a sensor is providing a correct signal. This topic has long been a subject of interest, and over the past few years with the advent of faster data sampling methods and computer-based data analysis, a vast literature has evolved describing sensor validation techniques. Research into sensor validation alone without considering fault isolation has been the subject of over 3000 articles in the last 2 decades. Effective sensor validation should be able to discriminate between cases in which (a) a process change occurs that causes a sensor reading to change (with a valid sensor), (b) a sensor reading changes, but the process does not (a faulty sensor), and (c) a sensor fault and a temporal process change occur simultaneously in a dynamic process. In this work, our focus is on category c, namely, identifying the occurrence of a sensor fault in which the effect of the fault is confounded with the process dynamics. The goals in analyzing a sensor signal are (1) to minimize type I errors, i.e., false alarms, and type II errors, i.e., inability to detect an invalid signal when one occurs; (2) to provide a rapid decision as to the existence of a faulty condition when it occurs; and, possibly, (3) to be able to predict deterioration in the sensor in advance of a significant change.1 The recent success of wavelets and multiscale analysis in monitoring and control has also catalyzed interest in the investigation of wavelet-based methods for sensor validation. Wavelet transforms are a class of mathematical transforms similar to Fourier transforms but are distinguished from other transforms in that they not only dissect a signal into its component frequencies but they also vary the scale at which the component frequencies are analyzed. Furthermore, they have finite lengths; they do not repeat as do sines and cosines. * To whom correspondence should be addressed. Tel.: (512) 471-7445. Fax: (512) 471-7060. E-mail: himmeblau@ che.utexas.edu.

An introduction to the theory and applications of wavelets can be found in work by Daubechies (1992),2 Strang and Nguyen (1996),3 and Percival and Walden (2000).4 The theory of wavelet transforms is based on the concept that a signal can be decomposed into components in nested closed subspaces at multiple resolutions. A signal in space Vj or Wj is broken down into two orthogonal components. One component lies in a subspace Vj-1 called the coarse approximation. The other component lies in a subspace Wj-1 called the fine detail of the signal. The union of Vj-1 and Wj-1 gives the original signal in Vj. The subspace Vj-1 can be further decomposed into Vj-2 and Wj-2 and so on, as indicated in Figure 1, until a satisfactory coarse approximation is obtained. Furthermore, by recombination of the approximations and details of the signal components, the exact original signal can be reconstituted. The resolution in subspace Vj is specified by defining the basis functions in Vj and Wj in terms of a scaling function (father wavelet) φ(t) and a wavelet function (mother wavelet) φ(t)

φ(t) ) x2

∑n anφ(2t - n)

φ(t) ) x2

∑n bnφ(2t - n)

where an and bn are the filter coefficients for low-pass and high-pass filters for the approximations and details, respectively. We chose Haar wavelets as the basis functions for our work because they are simple and require only linear operations in making calculations. For the Haar wavelets, the filter coefficients are a0 ) a1 ) b0 ) -b1 ) 1/x2. The scaling function φ(t) is a box function (Figure 2a), and the wavelet function φ(t) is a square wave (Figure 2b), both in the interval 0-1. The procedure to calculate the coefficients aj-1 and bj-1 from the respec-

10.1021/ie010644v CCC: $25.00 © 2003 American Chemical Society Published on Web 06/13/2003

Ind. Eng. Chem. Res., Vol. 42, No. 14, 2003 3373

Figure 1. Multiresolution analysis. (a) The space VJ can be decomposed into two orthogonal subspaces, VJ-1 and WJ-1. VJ-1 can further be decomposed into VJ-2 and WJ-2. This process can be continued until sufficiently coarse subspaces are obtained. (b) The subspace VJ-1 is contained in the subspace VJ. The remaining part of VJ apart from VJ-1 is the orthogonal part WJ-1. Thus, VJ ) VJ-1 + WJ-1. This relation is true for all J.

tive coefficients aj and bj for the Haar wavelets is

aj-1,n )

1 (aj,2n + aj,2n+1) x2

bj-1,n )

1 (aj,2n - aj,2n+1) x2

Figure 3 is an information flow diagram that explains how a signal is decomposed into approximations and

details using Haar wavelets for four successive data points. The tree shows how different levels of decomposition are related. The approximations are designated by A and the details by B. The highest resolution for the approximations occurs at the top of the tree, and the resolution decreases the lower you go. The procedure can continue as long as fresh data points are collected until the resolution of the approximations reaches a satisfactory level. The use of wavelets for fault detection started about a decade ago when software for PCs became widely available. In chemical process monitoring and control, Bakshi and Stephanopoulos (1994)5 developed a methodology for pattern-based fault diagnosis based on multiscale extraction of trends from the wavelet decomposition of process data. Dai et al. (1994)6 proposed a wavelet-based method for process signal feature analysis and for using the process “fingerprints” for fault detection. In the work of Vedam and Venkatasubramanian (1997),7 a wavelet theory based on a nonlinear adaptive algorithm was developed for identification of trends from sensor data. To perform diagnosis using the identification trends, they developed a framework for knowledge-based fault detection. Bakshi (1998)8 used multiscale principal component analysis (PCA) for process monitoring of multivariate data. In our earlier work, we showed (1) that wavelet-based nonparametric statistical inference1 and (2) dynamic PCA9 were both able to identify sensor faults and keep the type I and II errors to a minimum. In this work, a different technique was used but was still based on multiscale decomposition of the data rather than analysis of the data itself. Residual analysis was applied to detect the sensor faults using information from the lowest appropriate frequency of the wavelet decomposition of a signal. Although residual analysis is a popular method of fault analysis that uses discrepancies between measurements and the predictions of a model, not much research

Figure 2. Haar wavelets: (A) the scaling function (father wavelet); (B) the wavelet function (mother wavelet) for the Haar wavelet.

3374

Ind. Eng. Chem. Res., Vol. 42, No. 14, 2003

Third, SPC uses noisy data. Our approximations were denoised. Finally, our procedure focused only on rapid detection of abnormal operation of a sensor and was not concerned with subsequent data after an abnormality was detected. In what follows, we first describe the specific technique used to generate and analyze the residuals. Next, we explain why we used the approximations to detect the sensor faults. Finally, to demonstrate the validity of the strategy, we discuss two examples: one a simulated thermocouple measuring the temperature in a simulated nonisothermal continuous stirred tank reactor (CSTR)1 and the other an actual thermocouple measuring the temperature in a tray in a pilot-plant distillation column. 2. Developing the Multiscale Prediction Model

Figure 3. Tree representing the decomposition of the sensor signal by using Haar wavelets: (a) arrival of the first point (A0,1); (b) arrival of the second point (A0,2) allows calculation of the approximation (A1,1) and the detail (B1,1) on level 1; (c) arrival of the third point (A0,3); (d) arrival of the fourth point (A0,4) allows calculation of the approximation (A2,1) and the detail (B2,1) on level 2. The first subscript is the index of level, and the second subscript is the index of the time steps on the same level. The solid lines represent the calculation of the approximations, and the dashed lines represent the calculation of the details.

has focused on modeling the low-frequency components of a signal and calculating residuals at this frequency. We employed the approximations of the original data with the hope that a model developed using the approximations will be less affected by noise and erratic disturbances than would be the original signal itself. High-frequency information was tested, classified as noise, and removed from the data in the signal. Lower frequency information was presumed to represent the effect of process dynamics on the sensor and contain the information useful to predict sensor operation. To detect faulty operation of a sensor, residuals between predictions generated from a dynamic model of the approximations and the approximations themselves in a future window were calculated. Although superficially somewhat similar to ordinary statistical process control (SPC), our proposed procedure differs in several essential ways. First, SPC uses data from the signal itself. Our procedure uses approximations from the decomposed signal as data. Second, SPC assumes that a steady-state process model applies to the data. We use an unsteady-state model to predict values of the future approximations and generate residuals for approximations based on that model.

Figure 4. Information flow to calculate the residuals.

If a prediction model represents the signal being modeled from a sensor with reasonable accuracy, any discrepancies between observations and predictions (the residuals) can give some information about faults in the sensor. The steps we used in developing a multiscale prediction model were as follows. Figure 4a shows the information flow. 1. Decompose the signal to be used for model identification into its wavelet approximations and details, as illustrated in Figure 3. (For our work using Haar wavelets, we chose level 4 as the lowest level of decomposition.) 2. Construct an autoregression (AR) model to model the wavelet approximations from information collected during t1 to tk-1 at the selected frequency. 3. Predict subsequent values of the approximations in a window tk to tn from the AR model. 4. Decompose the sensor signal during the period tk to tn using the same mother wavelet and the level of decomposition as used in developing the AR model. 5. Compute the residuals as the differences between the predicted approximations at the selected level of decomposition and the approximations of the signal itself at the same level in the window tk to tn at each measurement time. After some investigation for this work, we chose the wavelet approximation at level 1 in the Matlab Wavelet Toolbox10 (corresponding to 0.65-1.0 Hz) as the most effective to model the decomposed signal. Other levels of decomposition may be more favorable for other sensors in other processes. Several kinds of models might have been used for prediction, such as empirical models, neural nets, state space models, etc. We chose a simple one, namely, an AR model, in which the parameters appeared linearly. The attractive features of an AR model include the ease of use, faithfulness in representing a process, and linearity with respect to the coefficients.

Ind. Eng. Chem. Res., Vol. 42, No. 14, 2003 3375

Figure 6. Autocorrelation plot for the residuals of the approximations at level 1.

Figure 5. Autocorrelation plot for the original data.

For a scalar signal y(t), the AR model was

y(t) + a1y(t-T) + a2y(t-2T) + ... + any(t-nT) ) e(t) (1) where ai is a model parameter, T is the sampling interval, n is the model order (in this work an order of 3 proved to be best), and e(t) is the measurement noise. The objective function used in estimating the coefficients in the model was the mean squared error, and the number of points used was 100. A residual was defined as

r(t) ) |y(t) - yˆ (t)|

(2)

where yˆ (t) is the value of the wavelet approximation at level 1 predicted for the interval tk to tn from eq 1 after identification and y(t) is the wavelet approximation of the signal for the same period. 3. Fault Detection by Analysis of Residuals In our previous work,1 we showed that details of denoised sensor signals in intermediate frequencies were better represented by a normal distribution than the data from the original signal itself. To apply normal confidence limits to detect faults by using the residuals, we need to show that the residuals in the two examples discussed below in section 4 were (a) independent and (b) approximately normally distributed. 3.1. Independence of the Residuals of the Approximations. A primary concern with using residuals of the approximations for fault detection is whether they are independent. We analyzed the data collected from the original sensor signal and determined, as expected, that the data were autocorrelated as shown in Figure 5. We also analyzed the residuals of the approximations and found that they were essentially not autocorrelated as shown in Figure 6. In general, the residuals of the approximations are more likely to be normally distributed than the sensor signal measurements themselves or residuals of sensor signals. As a fallback, if the residuals are not normally distributed, then nonparametric tests can be used. Tests can be applied to a single value of a wavelet approximation or to a window of approximations. 3.2. Quantile-Quantile Plot. Quantile plots11 are general-purpose displays that portray distribution fea-

Figure 7. Quantile-quantile plot for the normality test of residuals of the sensor signal itself from the pilot-plant distillation column tray.

tures of a data set. They are not only useful to illustrate the distribution, but they can also be used to assess the fidelity of a set of data to a hypothesized probability distribution. A quantile, denoted Q{f}, is a number that divides a sample or population into two groups so that a specified fraction f of the data values is less than or equal to the value of the quantile. Let x(1) e x(2) e ... e x(n) denote the ordered observations for a data set; the ith-ordered observation corresponds to Q{f} with f ) i/n. Quantile-quantile plots can be used to compare a sample distribution with a theoretical reference distribution such as the normal probability distribution. If the plotted points closely approximate a straight line, the sample is similar in shape to the reference distribution. The quantiles for the standard normal distribution were calculated from11

QSN{ fi} ) 4.91[fi0.14 - (1 - fi)0.14]

(3)

Figures 7-10 show the quantile-quantile plots for the wavelet approximations and the original data at level 1 for the two cases discussed in section 4, respectively. These figures indicated that the assumption of normality for the approximations at level 1 appears to be

3376

Ind. Eng. Chem. Res., Vol. 42, No. 14, 2003

Figure 8. Quantile-quantile plot for the normality test of the residuals of the wavelet approximation of the sensor signal from the pilot-plant distillation column tray.

Figure 10. Normal quantile-quantile plot for residuals of the wavelet approximations of the sensor signal from the simulated CSTR.

3. Compute the Shapiro-Wilk statistic

W ) b2/s2

Figure 9. Quantile-quantile plot for the normality test of the residuals of the original sensor signal from the simulated CSTR.

reasonable. Therefore, we could use tests based on the normal distribution to detect sensor faults. 3.3. Shapiro-Wilk Test. Because graphical evaluations are somewhat subjective, it is useful to supplement the plots with appropriate tests of statistical hypotheses. The Shapiro-Wilk test11 is generally regarded as the most powerful statistical test for normality. Note that the assumptions underlying the use of the Shapiro-Wilk statistic include independence of the observations. Let x(i) denote the ordered observations x(1) e x(2) e ... e x(n). 1. Compute the numerator of the sample variance of the observations: 2

s )

∑(x(i) - xj)

2

(4)

where xj is the mean of the observations. 2. Using the values of ai found in the coefficients table for the Shapiro-Wilk test, calculate

b)

∑aix(i)

(5)

(6)

4. Reject the assumption of normality if W is less than the value in the critical value table of the Shapiro-Wilk test. For the Shapiro-Wilk test, the two-sided region of rejection for P ) 0.95 (R ) 0.05) is W < 0.940 or W > 0.990; for P ) 0.99 (R ) 0.01), the region is W < 0.930 or W > 0.991. Table 1 shows the results of the ShapiroWilk test for the residuals of the sensor signal itself and for the residuals of the approximations for two cases: (a) a simulated CSTR and (b) a tray in a pilot-plant distillation column. For the approximations, the values of W are not statistically significant; hence, the hypothesis of normality is not rejected. For the sensor signal and the simulated CSTR data, the hypothesis of normality is rejected because W is significant for P ) 0.95 as well as P ) 0.99. The Shapiro-Wilk test confirms the decision made by examining the quantitile-quantitile plots. The residuals of the approximations at level 1 are essentially normally distributed; hence, confidence limits based on the normal distribution can be used in the detection of abnormal signals. 4. Two Examples of the Application of the Proposed Technique We now illustrate the proposed technique for sensor fault detection via two examples. 4.1. Analysis of a Simulated Temperature Signal from a CSTR. The CSTR used in this example is described in the authors’ previous publication.1 The state equations are

( ) [ ]

-EA dC(t) q ) [C0 - C(t)] - kC(t) exp dt V Tf(t)

(7)

dTf(t) q -EA ∆H ) [T0 - Tf(t)] kC(t) exp dt V FCp Tf(t) UAR [T (t) - Tc] (8) FCpV f

Ind. Eng. Chem. Res., Vol. 42, No. 14, 2003 3377 Table 1. Results of the Shapiro-Wilk Normality Testa Shapiro-Wilk statistic W

signal residuals of the sensor signal from the pilot-plant distillation column residuals of the wavelet approximations of the sensor signal from the pilot-plant distillation column residuals of the sensor signal from the simulated CSTR residuals of the wavelet approximations of the original sensor signal from the simulated CSTR a

0.994 0.987

rejected not rejected

0.990 0.971

marginally rejected not rejected

The hypothesis is that the residuals are normally distributed.

where the parameters are defined in the Nomenclature section. The following parameter values were used: k ) 7.86 × 1012 s-1; Cp ) 4.186 × 103 J/kg‚K; EA ) 14 090 K; F ) 1.0 × 103 kg/m3; AR ) 1.0 × 10-3 m2; V ) 1.0 × 10-3 m3; T0 ) 350 K; Tc ) 340 K; C0 ) 6.5 × 10-3 kg‚ mol/m3; ∆H ) -1.13 × 10-5 kJ/kg‚mol; q ) 1.0 × 10-5 m3/s. The steady state was as follows: Cd ) 1.537 × 10-4 kg‚mol/m3; Td ) 460.91 K; Ud ) 2.092 × 10-2 kJ/s‚m2‚ K. The initial state was at C(0) ) 3.531 × 10-4 kg‚mol/ m3 and T(0) ) 500 K. The thermocouple wire was copper-constantin of 1.0 mm diameter and 30 cm length. The well was stainless steel of 2.5 mm i.d., 5.0 mm o.d., and also 30 cm length. In developing the thermocouple model for the simulations, we assumed that a film resistance existed both inside and outside the thermowell and that there existed negligible heat-transfer resistance in the wall of the thermowell. An energy balance on a system consisting of the thermocouple wires was

MtCpt

dT ) hiAi(Tw - T) dt

(9)

and a balance on the thermowell wall was

MwCpw

dTw ) hoAo(Tf - Tw) - hiAi(Tw - T) dt

(10)

From eq 9,

Tw )

MtCpt dT +T hiAi dt

(11)

Substitution of eq 11 into eq 10 yielded a second-order differential equation that related the tank temperature Tf to the thermocouple temperature T:

[

conclusion of the test

] [

]

MtCptMwCpw MtCpt MtCpt MwCpw T+ + + T′+ hoAohiAi hiAi hoAo hoAo T ) Tf (12)

where Mt ) 4.27 × 10-3 kg is the mass of the thermocouple; Cpt ) 397.4 J/kg‚K is the heat capacity of the thermocouple; Mw ) 35.5 × 10-3 kg is the mass of the thermowell wall; Cpw ) 448 J/kg‚K is the heat capacity of the thermowell wall; Ao ) 4.8 × 10-3 m2 is the outside area of heat transfer of the thermowell; and Ai ) 2.4 m2 is the inside area of the thermowell. The outside heat-transfer coefficient was ho ) 2.59 × 103 kJ/s‚m2‚K calculated from a correlation for heating and cooling liquids flowing perpendicular to a single cylinder, and the inside heat-transfer coefficient was hi ) 2.59 × 103 kJ/s‚m2‚K (before the fault occurred). The fault represented air replacing fluid in the thermowell so that hi ) 4.43 × 10-2 kJ/s‚m2‚K after the fault occurred. The latter value of hi was calculated from a

Figure 11. Information flow diagram for the CSTR simulation: T0 is the input temperature to the CSTR, Tf is the CSTR output temperature and becomes the input to the thermocouple (sensor), and T is the simulated sensor output.

relation in Perry and Chilton.12 Upon substitution of these constants for the nonfaulty case, eq 12 becomes

2.04T + 16.1T ′ + T ) Tf

(13)

The information flow diagram for the calculations is shown in Figure 11. The input temperature to the CSTR is T0, and white Gaussian noise was added to it. Tf is the CSTR output temperature and became the input to the thermocouple (sensor). The simulated sensor output temperature had normally distributed measurement noise added to it and is denoted by T. The temperature was sampled by the thermocouple every second. Each moving fault detection window had 10 samples in it; the total length of record was 500 points; i.e., we used a series of 50 moving windows. The multiscale analysis of the temperature signal was carried out by decomposing the signal into four frequency levels using the Haar wavelet. The frequency range for level 1 is 0.65-1.0 Hz, and that for level 4 is