1024
Ind. Eng. Chem. Res. 1998, 37, 1024-1032
Sensor Fault Detection via Multiscale Analysis and Nonparametric Statistical Inference Rongfu Luo, Manish Misra, S. Joe Qin, Randall Barton, and David M. Himmelblau* Department of Chemical Engineering, The University of Texas at Austin, Austin, Texas 78712
Sensor validation is a topic of widespread importance. A new approach to sensor validation in real time is described that is based on (1) representation of the sensor signal by wavelets, (2) decomposition of the signal into different frequency ranges, (3) calculation of useful features of the signal at different frequencies, and (4) diagnosis of faulty operation via nonparametric statistical tests. The proposed strategy is able to isolate the effect of noise and process changes from the effects of physical changes in the sensor itself. To clarify the circumstances under which the above strategy could be used, a noisy signal from a simulated thermocouple in a dynamic continuous nonlinear unsteady state stirred tank reactor (CSTR) was analyzed. Faults were introduced into the thermocouple, and the diagnosis was carried out. The results of the diagnosis indicated that the proposed strategy had low type I (false alarm) and type II (failure to detect faults) errors and was distinctly better than a standard test for changes in a nonstationary signal of unknown characteristics. 1. Introduction Sensor validation in the broadest sense is related to reliability, namely determination that a sensor is or is not providing a correct signal. We would especially like to be able to distinguish between instances in which a faulty sensor affects the transmitted signal and instances in which a process change affects the signal at the same time. Validating a sensor signal means proving and documenting the proof that the sensor (or instrument) consistently does what it purports to do. What this means is that the sensor must be shown to consistently provide the correct temperature, pressure, and so forth. Analysis by the validation hardware or software should provide an alarm if the sensor signal deviates from the correct value. Then a human can decide whether to remove the sensor from being on line or adjust the signal or take some other action. The goals in analyzing a sensor signal are as follows: (1) to minimize type I errors, that is, false alarms, and type II errors, that is, inability to detect an invalid signal when one occurs; (2) to provide a rapid decision as to the existence of a faulty condition when it occurs; and possibly (3) to be able to predict deterioration in the sensor in advance of significant change. How can the sensor signal from a dynamic process be used to validate the sensor? Extensive literature and several textbooks exist in which suggestions are offered as to how faulty sensors can be detected. By faulty we mean degradation from normal operating performance rather than failure (100% degradation). Over 3000 articles have been published in the last 20 years that focus solely on sensor validation (exclusive of those pertaining to fault detection). Various classes of strategy exist: (1) knowledgebased methods; (2) reasonableness checks; (3) redundant instrumentation; (4) cluster analysis, pattern recognition, and signature analysis; (5) modeling sensors; (6) modeling sensors plus the process to obtain soft sensor * To whom correspondence should addressed. E-mail:
[email protected]. Telephone: (512) 471-7445. Fax: (512) 471-7060.
values; (7) pattern recognition and signature analysis; (8) qualitative reasoning; (9) statistical analysis; (10) parametric and nonparametric modeling of an individual sensor; (11) parametric and nonparametric modeling of a process including the sensor. Our proposed strategy is a combination of nonparametric empirical modeling and nonparametric statistical analysis of a single sensor signal. We focus on fault detection and not on fault isolation, that is, determining the cause of the fault. The specific approach taken in this work was to represent a nonergodic, nonstationary sensor signal in real time by wavelet basis functions. Then the signal is decomposed (as a function of time) into successively lower frequency ranges. To detect faulty operation of the sensor, various features of the representation of the signal for each frequency range were selected including zero crossings and local energy. The latter proved to be the most useful for the example described in section 7 below. High-frequency information was tested and classified as noise. Low-frequency information was presumed to represent the effect of process dynamics on the sensor, an effect that could be confounded with sensor malfunctions and consequently was disregarded. For our example we assumed the duration of a process change was much longer than that of a sensor change. What was left to analyze were the middle frequency ranges. No assumptions were made about the probability distribution of the noisy sensor signal prior to decomposition. Because the probability distribution of the feature selected for analysis was unknown, nonparametric statistical tests were used to determine whether or not a fault existed in the operation of the sensor. In what follows we first describe the specific technique used to represent a signal and decompose it. Next we explain the tests employed to apply the validation strategy. Finally, the application of the validation strategy to an example, namely a thermocouple in a nonisothermal CSTR, demonstrates the effectiveness of the proposed sensor validation method. In contrast, the well regarded BDS statistic, which was proposed by Brock and Dechert (1988) and Scheinkman
S0888-5885(97)00465-X CCC: $15.00 © 1998 American Chemical Society Published on Web 01/23/1998
Ind. Eng. Chem. Res., Vol. 37, No. 3, 1998 1025
(1990) and used to detect a change in a nonstationary time series of unknown characteristics, failed to detect the sensor fault. 2. Wavelet Representation The fundamental idea behind wavelets is to analyze according to scale. From an intuitive point of view, the wavelet decomposition consists of calculating a “similarity” between the signal and the wavelet. If the similarity index (coefficients) is large, the resemblance is strong; otherwise it is slight. The wavelets used in this work provided a multiscale analysis of the signal as a sum of orthogonal signals corresponding to different time scales, allowing a time-scale analysis. Graps (1995), for example, provided an introduction to wavelet transforms. A wavelet is a function ψ(x) ∈ L2 such that
∫-∞+∞ψ(x) dx ) 0
(1)
Let us denote by ψs(x) the dilation of ψ(x) by a factor s:
1 x ψs(x) ) ψ s s
()
(2)
The wavelet transform of a function f(x) at the scale s and time x is given by the convolution product:
Wsf(x) ) f*ψs(x)
(3)
The scale parameter can be sampled along the dyadic sequence (2j)j∈Z, without modifying the overall properties of the transform. The wavelet transform at the scale 2j is given by
W2 jf(x) ) f*ψ2 j(x)
(4)
In this work, we used two types of Daubechies wavelets: db1 and db3 (Misiti et al., 1995). In dbi, i is the order. This family includes the Haar wavelet, which is the same as db1, one of the simplest wavelets. The main properties of the Haar wavelet are as follows:
{
1
1 if 0 e x < /2 ψ(x) ) -1 if 1/2 e x < 1 0 if x ∉ [0,1] and the corresponding scaling function
φ(x) )
quency under faulty and nonfaulty sensor conditions were essentially the same. Consequently, we deemed the highest frequency components to be noise and examined how to remove the noise from the sensor signal prior to decomposition. It is generally impossible to filter out all of the noise in a signal without affecting the information in the signal. The conventional method for denoising is filtering. Typically, a filter removes information over a range of frequencies that is determined by the filter parameter that characterizes the time-frequency localization of the filter. However, a single filter parameter is inadequate to bring out all the features constituting a signal. Most filters have an increasingly global effect on the features with increasing scale that causes distortion of features due to smoothing that leads to confusion in interpreting the underlying phenomenon (Stephanopoulos, 1991; Bakshi and Stephanopoulos, 1994). In this work, wavelet transforms were used to denoise signals with the result that the distortion was minimal (i.e., most interesting features remained). The underlying model for the noisy signal is
{
1 if x ∈ [0, 1] 0 if x ∉ [0, 1]
which defines the low-frequency components, that is, the approximations, while ψ defines the high-frequency components, that is, the details. The other dbi have no explicit expression. However, the square modulus of the transfer function of the scaling function is explicit and fairly simple: (1) The support length of ψ and φ is 2i 1. The number of vanishing moments of ψ is i. (2) Most dbi are not symmetrical. For some, the asymmetry is very pronounced. (3) The regularity, which is related to the differentiability, increases with the order. (4) The decomposition analysis is orthogonal. (5) The scaling filter coefficients for db3 are 0.2352, 0.5706, 0.3252, -0.0955, -0.0604, and 0.0249. 3. Denoising Tests using zero crossings showed that the frequency distributions of the zero crossings at the highest fre-
s(n) ) f(n) + σe(n)
(5)
where n is the equally spaced time index, s is the measured signal, f is the deterministic signal, e is the noise, and σ is the noise magnitude. The simplest model assumes that e(n) is a Gaussian white noise N(0,1). The denoising objective is to suppress the noise part of the signal s and to recover f. 3.1. Soft and Hard Thresholding. Denoising was performed through thresholding. Hard thresholding is the simplest method, namely the process of setting to zero the elements whose absolute values are lower than the threshold. If the original signal is f ) x, the resulting estimate of f from hard thresholding is
ˆf )
{
x if |x| > ts 0 if |x| e ts
(6)
where ts is threshold and ˆf is the estimate of f. Soft thresholding is a modification of hard thresholding, first setting to zero the elements whose absolute values are lower than the threshold and then shrinking other coefficients toward 0. For example, if the original signal is f ) x, the thresholded signal is
ˆf )
{
sign(x)(|x| - ts) if |x| > ts 0 if |x| e ts
(7)
As can be seen from Figure 1, the hard thresholding procedure creates jumps at x ) (ts, while the soft procedure does not, so the latter had better mathematical properties for our work, and we used it. 3.2. Denoising Procedure. It has been shown that shrinking noisy wavelet coefficients via thresholding offers very attractive alternatives to existing methods of recovering signals from noisy data (Donoho, 1993). The procedure comprises three steps (Misiti et al., 1995): (1) Decompose. Choose a wavelet and a number N. Compute the wavelet decomposition of the signals s at levels 1 through N (corresponding to the highest frequency through to the lowest frequency ranges). (2) Threshold the detail coefficients. For all levels from 1 to N, select a threshold (cutoff limit) and apply soft thresholding to the detail coefficients. (3) Reconstruct. Compute the wavelet reconstruction on the basis of the original approximation coefficients of level N and the
1026 Ind. Eng. Chem. Res., Vol. 37, No. 3, 1998
energy at each time point) of a signal f(x) can be calculated as
E(x) ) f 2(x)
(9)
The rationale underlying this choice is as follows. It is known that the output of a wide-sense stationary (WSS) process through a linear time invariant (LTI) system is still wide-sense stationary. The high frequencies of a signal obtained through wavelet transformation are presumed to be zero-mean. Therefore, the following statements are true (Chen and Kuo, 1996): (1) The autocorrelation R[m] and power spectrum density (PSD) S(ω) are a Fourier transform pair: ∞
Figure 1. Hard and soft thresholding of the signal f ) x (the threshold is ts ) 0.4).
modified detail coefficients of levels from 1 to N. We adopted Donoho’s method based on the smallest risk for threshold selection (Donoho, 1993; Donoho and Johnstone, 1992):
ts )
σx2 log(n)
xn
∑ R[m]e-jmω k)-∞
S(ω) )
(10)
and
R[m] )
∫-ππS(ω)ejmω dω
1 2π
(11)
(2) The autocorrelation functions of the input and output are connected by
(8)
where ts is the threshold, σ is the noise level, and n is the number of data points in the signal. If we assume that the coefficients of the detail in the finest scale are essentially noise coefficients with standard deviation equal to σ, the median absolute deviation of the coefficients is a robust estimate of σ. If nonwhite noise is expected, the threshold must be rescaled by a leveldependent estimation of the noise at each level; that is, σ should be estimated level by level (Donoho, 1993; Misiti et al., 1995). This work used level-dependent thresholds to deal with nonwhite noise. The denoising procedure described above has two useful properties (Donoho, 1993): (1) The noise can be almost entirely suppressed. (2) Features sharp in the original signal will remain sharp in reconstruction. 4. Features Used in Sensor Signal Validation To determine whether or not a sensor is malfunctioning, either the sensor signal itself or features extracted from the decomposition of the signal can be used. For example, a change in the distribution of the noise inherent in the signal might be a useful feature for discrimination between faulty and nonfaulty signals. However, in our example the change was so small that the power for discrimination was too low to be effective. Zero crossings and level crossings at high frequencies of the denoised signal have been considered useful for detecting the changes in a signal (Mallat, 1991; Kedem, 1994) and have been reported to be effective in detecting sensor faults such as constant readings (Luo et al., 1996). However, these features were not adequate for this work because they only reflected a change in the number of oscillations but not in the magnitude of the denoised signal. To detect the change in amplitude at the higher frequencies of a signal, we employed the local energy, which reflects the difference in the values of the peak power spectrum density. The local energy (the
Ry[m] ) Rx[m]*F[m]
(12)
where ∞
F[m] )
h[m + k]h[k] ∑ -∞
(13)
is the mth lag correlation of the deterministic filter h. (3) The power spectrum densities of the input and output are related via
Sy(ω) ) Sx(ω)|H(ejω)|2
(14)
where H(ejω) is the Fourier transform of the filter. By setting m ) 0, we have
R[0] )
∫-ππS(ω) dω
1 2π
(15)
Since f 2(x) can be viewed as an approximation of R[0] at each time point, the local energy can be viewed as an approximation of the total power, which is the area under the power spectrum density (psd) function, in a local region. The local energy has been used to detect changes by examining the low-frequency part of a signal by Chen and Kuo (1996). In this work, we used the local energy at the high frequencies of the denoised signal as the feature to be tested for change in the performance of the sensor. 5. Nonparametric Statistical Tests Because the probability distribution of the local energy was not known, two nonparametric tests were used to detect changes of the local energy in time. 5.1. Wilcoxon Rank Sum Test. The Wilcoxon rank sum test is a standard test based on order used to detect change in a statistic (Mendenhall and Sincich, 1992). A summary of the Wilcoxon rank sum test for independent random samples is shown in Table 1. Note that the form of the distribution functions is not specified;
Ind. Eng. Chem. Res., Vol. 37, No. 3, 1998 1027 Table 1. Wilcoxon Rank Sum Test for a Shift in Population Locations of Independent Random Samplesa Let D1 and D2 represent the relative frequency distributions for populations 1 and 2 with the sizes n1 and n2 and the rank sums T1 and T2, respectively. one-tailed test
two-tailed test
H0: D1 and D2 are identical Ha: D1 is shifted to the right (or left) of D2
H0: D1 and D2 are identical Ha: D1 is shifted either to the left or to the right of D2
test statistic: T1 if n1 < n2; T2 if n2 < n1 (either rank sum can be used if n1 ) n2)
test statistic: T1 if n1 < n2; T2 if n2 < n1 (either rank sum can be used if n1 ) n2, we denote this rank as T)
rejection region: T1: T1 g TU or T2: T2 e TL or
rejection region: T g TU or T e TL
T1 e TL T2 g TU
a Tied observations (if they occur) are assigned ranks equal to the average of the ranks of the tied observations. For example, if the second and third ranked observations were tied, each would be assigned the rank 2.5.
the hypothesis simply specifies that two statistics are the same. The lower-tail value TL and the upper-tail value TU of the rank sum distribution were taken from Mendenhall and Sincich (1992). 5.2. Trend Test. A test for a difference in trend, or rate of change, can be used to detect the faulty operation of sensors, particularly for the spike change of variables, that is, outliers (Tham and Parr, 1994). Suppose that we have collected a sequence of data points {xk, k ) 1, ..., n}, where n is even. If there is a trend, then we would expect the median of the first half of the original sequence, X1 ) {x1, x2, ..., xn/2}, to be different from that of the second half, X2 ) {xn/2+1, xn/2+2, ..., xn}. The following test is made to assess whether xk is an outlier (Tham and Parr, 1994): If
and
xM s ) (xs, xs+1, ..., xs+M-1) M are called “M-histories”, Il(xM t , xt ) is an indicator that M M equals 1 if ||xt , xs || < l and zero otherwise, ||‚|| being the sup-norm, and LM ) L - M + 1. The correlation integral is an estimate of the probability that any two M M-histories, xM t and xs , are within l of each other. If {xk: k ) 1, ..., L} are iid with a nondegenerate density F, then
CM(l,L) f C1(l,L)M with probability one, as L f ∞ (18) where
|xk - xm hm and δ > 0 k | > δ∆ k
(16)
1
C1(l,L)M )
xm k
xk is an outlier, where is the median of the signal sequence xk-i, i ) 1, 2, ..., n, and
∑
LM(LM - 1) 0