396
Ind. Eng. Chem. Res. 2000, 39, 396-407
Sensor Fault Detection Using Noise Analysis Chao-Ming Ying and Babu Joseph* Department of Chemical Engineering, Washington University, Campus Box 1198, St. Louis, Missouri 63130
The feasibility of sensor fault detection using noise analysis is evaluated. The noise powers at various frequency bands present in the sensor output are calculated using power spectrum density estimation and compared with historically established noise pattern to identify any abnormalities. The method is applicable to systems for which the noise is stationary under normal operating conditions. Principal component analysis (PCA) is used to reduce the space of secondary variables derived from the power spectrum. T2 statistics is used to detect deviations from the norm. We take advantage of the low-pass filtering characteristics exhibited by most process plants and closed-loop control systems, which allows the noise power at higher frequency bands to be used in the fault detection. The algorithm does not require a process model because it focuses on characterization of each individual sensor and the measurement it generates. Experimental studies with two kinds of garden variety sensors (off the shelf temperature and pressure sensors) are used to validate the feasibility of the proposed approach. Introduction Sensors form the crucial link in providing controllers and operators with a view of the process status. Sensor failures can cause process disturbances, loss of control, profit loss, or catastrophic accidents. In process control, many (reportedly up to 60%) of the perceived malfunctions in a plant are found to stem from the lack of credibility of sensor data.1 While hard (catastrophic) failure of a sensor is easy to detect, the degradation of a sensor such as noncatastrophic damage, error, or change in calibration and/or physical deterioration of the sensor itself can be hard to detect during normal operation. Reliable on-line detection mechanisms would be useful for the operators and the control systems that must rely on the information provided by the sensor for decision making. Sensor fault, as used in this paper, is defined as a change in the sensor operation that causes it to send erroneous information regarding the measured variable. It could be a catastrophic fault such as a break in the connection or it could be a more subtle change such as the signal conditioner going out of calibration. Because the former types of faults are easy to detect, we focus more on the latter type of faults in this paper. Generally, sensor fault detection and isolation (FDI) techniques can be classified as the following2 with the assumption that the process is in normal condition: Analytical Redundancy. This approach exploits the implicit redundancy in the static and dynamic relationships between measurements and process inputs using a mathematical process model.3-5 Different models of residual generation have been proposed:5 parity space approach,6-8 dedicated observer approach,9,10 fault detection filter,11 parameter identification method,12 and artificial neural networks.13-16 Implementation of the AR fault detection algorithms is usually expensive because of the time and effort needed to develop, test, and identify a good process model. The method is plantdependent, that is, for each process plant, a unique model must be constructed. An alternative to the * To whom correspondence should be addressed. Phone: 314935-6076. Fax: 314-935-7211. E-mail:
[email protected].
development of an accurate dynamic process model is to use past historical data for construction of data-driven models. Using the identified relationship between different sensor measurements, principal component analysis (PCA), and partial least squares (PLS) can identify sensor faults.17-26 Although dynamic PCA is available currently in the literature,27,28 most of the application studies to date have been limited to steady-state PCA. Good steady-state detection algorithms are needed to prevent missed or false alarms. Wise and Gallagher22 give an extensive review of the process chemometrics approach to process monitoring and fault detection. Knowledge-Based Methods. This approach uses qualitative models of the plant combined with experience-based heuristics to infer a fault. The models might include knowledge concerning patterns of signal behavior, operation conditions, or historical fault statistics.29,30 The method is also plant-dependent. It has found applications in process industry for the diagnosis of commonly encountered (repeated occurrence) faults. The method requires the construction and maintenance of an accurate knowledge base. Extensive knowledge of the process is required to build a comprehensive knowledge base. The method primarily focuses on catastrophic failures of sensors and/or equipment. Measurement Aberration Detection (MAD). The two approaches discussed above are usually based on the centralized control computer or distributed control system (DCS) host computer. MAD tries to identify faults locally at the sensor level. Yung and Clarke31 proposed the so-called local sensor validation method. However, the analysis is based on steady state. Henry and Clarke2 proposed a self-validating (SEVA) scheme. They suggest introducing some additional signals to monitor the health of a sensor as well as to identify a fault. Yang and Clarke1 describe a self-validating thermocouple according to the SEVA rationale. This approach requires additional signals complementing the sensor data. Additional circuits are attached to enable the self-validation capability of the sensor.1 The concept of using noise to detect faults is not new. Recently, Luo et al.32 proposed using multiscale analysis to detect faults in a sensor. The high-frequency noise is
10.1021/ie9903341 CCC: $19.00 © 2000 American Chemical Society Published on Web 12/28/1999
Ind. Eng. Chem. Res., Vol. 39, No. 2, 2000 397
Figure 1. Overview of the proposed FDI system. The signal conditioner should use a wide-bandwidth filter. Otherwise, FDI should be performed before the signal is filtered.
removed first (using wavelet-based filters) and wavelet decomposition is then applied to the filtered signal. Various statistical methods are then applied to the wavelet decomposed signal. Their recent paper28 uses dynamic PCA on the wavelet decomposition, where the dynamics of noise power at some specific frequency band is modeled by PCA. While they focus on modeling the dynamics of the noise component in the mid-frequency band to detect faults, we are proposing the use of several high-frequency bands by sampling the signal at a high rate and monitoring their variations from normal conditions. This method does not require a dynamic process model. Also, no knowledge database is required. Most DCS and PLC programmable logic control systems employed in the process industry use fairly low sampling rates (of the order of 1 Hz). However, many sensors are capable of sampling and processing at much faster rates with the availability of low-cost A/D converters. A major impetus to explore this concept is the introduction (and expected widespread usage soon) of intelligent (microprocessor-based) sensor/transmitter devices. With the current industry trend toward elimination of field sensor wiring through introduction of local area sensor networks such as Fieldbus and Hartbus, it is possible now to require that a sensor supply not just data but also information regarding its validity (such as accuracy, level of confidence, expected error, possibility of total or partial failure, etc.). The methodology proposed here would not be possible without the availability of embedded microprocessors and signal processing in the sensor/transmitter unit. The objective then is to explore an FDI approach using signal-processing techniques. A high sampling rate is used to capture the frequency spectrum of the measurement over a wide frequency band. Usually, the process variations are of low frequencies below the bandwidth of the process and the control system. Hence, there is reason to suspect that the high-frequency band variations are caused by noise. By isolation and analysis of the measurement noise (which includes contributions from the process noise, the instrument noise from the signal conditioning unit and electromagnetic interference), the method attempts to detect variations caused by defects in the sensor or signal-conditioning unit. The method isolates and monitors the high-frequency bands, in a region away from the frequency bands normally excited by process disturbances and control system
actions. A major assumption used is that the noise is stationary. Using this assumption, the measurement noise can be characterized by its past history. The proposed method can be implemented locally in a signalprocessing module associated with the sensor. This allows application of the algorithm to existing sensors currently in use by the industry. Another advantage of this approach is that it adopts the idea of distributed computing and intelligence. That is, fault detection is performed at each local sensor and the risk of possible system failure is distributed. Figure 1 shows the basic structure of the proposed FDI system. Briefly, the methodology can be summarized as follows. First, a sample window of small size (small in comparison to the time constants of the process) is chosen. The measurement is sampled at a very high frequency in this window, allowing us to capture the noise present in the measurement. Next, we remove any outliers in the measurement using the criteria described below. A multiresolution of the noise spectrum using short-time Fourier transform (STFT) is used to break up the noise into various frequency scales so that the change in the noise pattern can be detected more easily. The power spectrum densities (PSDs) at various frequency bands are calculated. The PSD represents the average noise power of all frequencies within each of the bands. This PSD is then compared against the PSD of the noise obtained when the sensor was operating normally (no-fault condition), obtained, and stored in memory. The multiresolution allows us to capture even small changes in the noise pattern that cannot be detected by looking at the change in the mean of the power spectral density averaged over all the highfrequency bands. Instead, we look at changes in the mean of the power spectral density within each band. This results in a multivariate-monitoring problem. Therefore, we will need to use multivariate statistical monitoring tools to detect changes. Because the noises in different frequency bands are likely to be correlated with each other, PCA must also be used in the monitoring scheme. Finally, a Hotelling T2 chart allows us to graph and hence detect the change introduced by the sensor fault or sensor degradation. Sensors are an integral part of a control system. The closed-loop system will have its own set of natural harmonics, which will always be present in the signal. Consider a local control system as shown in Figure 2. Usually, the process-transfer function acts as a low-pass
398
Ind. Eng. Chem. Res., Vol. 39, No. 2, 2000
Figure 2. A local control system.
filter, that is, the time constants in Gp are large when compared with the response time of the sensor itself. By closure of the control loop, a well-tuned controller may increase the bandwidth of the system by a factor of 4 or more. However, for most chemical systems, the closed-loop system bandwidth (less than 1 Hz in most cases) is generally small compared with the noise frequency. In this case the dynamics of the control system will not significantly alter the PSD at higher frequency bands, thus enabling correct sensor FDI even under closed-loop regulation in the presence of disturbances or under setpoint transfer. The outline of the paper is as follows: first, the power spectrum density estimation used is discussed. Second, the advantages and disadvantages of Hotelling T2 are discussed. Third, the idea of using PCA to improve Hotelling T2 is presented. Fourth, the fault detection and isolation algorithm is then outlined. To verify the methodology, experimental studies on a temperature sensor and a pressure sensor are presented. 1. Power Spectrum Density (PSD) Estimation PSD is the power of the noise as a function of frequencies (or frequency bands). There are many ways to estimate PSD based on a finite length of data.33,34 In this paper we use the Welch method35 because the confidence limits are easily computed. Welch’s algorithm of PSD estimation is as follows. Suppose we have sampled a window with N data: y(0), y(1), ..., y(N - 1). Welch divided the N data into K ) (N/Q) segments, possibly overlapping, of length Q: (i)
y (n) ) y(n + iQ - Q),
0 e n e Q -1, 1 e i e K (1)
A window, w(n), is applied directly to the data segments before computation of the periodogram. K modified periodograms are then computed:
J(i) Q (ω) )
1 QU
|∑
Q-1
|
2
y(i) (n)w(n)e-jωn ,
n)0
i ) 1, 2, ..., K (2)
where
U)
1 Q-1
w2(n) ∑ Q n)0
(3)
PSD is defined as
Bw xx(ω) )
1
K
J(i) ∑ Q (ω) Ki)1
Figure 3. A typical noise PSD pattern in a thermocouple measurement data collected in a window of 20 s. The circles are the PSD of the frequency band centered around that frequency.
around ω). This method of PSD estimation is asymptotically unbiased. Welch showed that if the segments of y(n) are nonoverlapping, then
var(Bw xx(ω)) )
1 2 p (ω) K xx
(5)
where pxx(ω) is the true spectral density. If y(n) is sampled from a Gaussian process, the equivalent degree of freedom of the approximating chi-square distribution 35 of Bw xx(ω) is
EDF {Bw xx(ω)} ) 2K
(6)
This is used to calculate the 100R% confidence limit on Bw xx(ω). Bw xx(ω) may be estimated from old data and updated with new data to improve accuracy. We can approximate 1 w w var(Bw xx(ω)) ≈ /2(var(Bxx(ω)|old) + var(Bxx(ω)|new)) (7) w where var(Bw xx(ω)|old) and var(Bxx(ω)|new) are the variance estimates based on old data and new data, respectively. As can be seen from eq 5, the variance of Bw xx(ω) is inversely proportional to K; that is, to keep the same spectral frequency resolution (Q), if more data are available by increasing the window size N, a more accurate estimate of PSD with a smaller confidence limit is obtained, thus enabling the detection of fault which may not be detectable by short-length PSD analysis. However, a larger window size increases the delay in fault detection. A longer data length also requires more computation time. Because all the calculations must be done on-line, very long data PSD analysis is not feasible. We used 5000 data points to estimate PSD in the example problems studied. Figure 3 shows a typical PSD curve (middle one) obtained along with its 99.7% confidence limits. The noise is assumed to be Gaussian.
(4)
which stands for the power of noise at frequency ω (average noise power in the frequency band centered
2. Hotelling T2 Control Chart Consider p frequency bands, centered around ωi, i ) 1, 2, ..., p. At each time interval, p PSD estimates are
Ind. Eng. Chem. Res., Vol. 39, No. 2, 2000 399
available, namely, Bw xx(ωi), i ) 1, 2, ..., p. If each PSD is plotted vs time, for p frequency bands, we will get p PSD plots. The PSD at various frequency bands are usually correlated with each other. It is better to consolidate them to determine the sensor status. One simple way to do this is the Hotelling T2 control chart.20,36 Suppose the PSD data we obtain at time k is expressed in vector form as T w w Bk ) [Bw xx(ω1,k), Bxx(ω2,k), ..., Bxx(ωp,k)]
(8)
The purpose is to test whether Bk deviates significantly from the mean µ, where µ ) [µ1, µ2, ..., µp]T is the vector of means for each frequency band under normal conditions. The test statistics plotted on the control chart for each sample is
T2 ) (Bk - µ)TΣ-1(Bk - µ)
(9)
where Σ is the covariance matrix. Both µ and Σ are estimated by some samples of data in the learning stage:
µ) Σ ) (m - 1)-1
1
T w w Bk ) [Bw xx(ω1,k), Bxx(ω2,k), ..., Bxx(ωp,k)]
(12)
and m samples are available in the learning stage to build a PCA model. These m samples construct a m × p matrix X with each variable being a column and each sample a row:
[
w Bw xx(1,ω)1 ... Bxx(1,ω)p l l l X) w B Bw (1,ω) (1,ω) ... xx xx 1 p
]
(13)
PCA decomposes the data matrix X as the sum of the outer product of vector ti (score) and pi (load) plus a residual matrix, E:22 l
X ) TPT + E )
tipiT + E ∑ i)1
(14)
m
∑ Bk
mk)1
m
(Bk - µ)(Bk - µ)T ∑ k)1
(10)
The test statistics is
T2 >
number of variables to be monitored. Suppose the PSD data we obtained at time k are denoted by
(m - 1)(m + 1)p FR(p,m-p) m(m - p)
Principal component projection reduces the original set of variables to l principal components (PC). In practice, 2 or 3 PCs are often sufficient to explain most of the predictable variations in the sensor performance.20 The Hotelling T2 statistics can now be applied to the l PCs (Note the covariance matrix of ti’s is a diagonal matrix because of orthogonality): l
(11)
where FR(p,m-p) is the upper 100R% critical point of the F distribution with p and m - p degrees of freedom. m is the number of samples used to estimate µ and Σ. If eq 11 is true, a fault is detected. Otherwise, we admit that the sensor is at normal conditions. The T2 test is applied on a periodic basis to see if the sensor noise characteristics have changed. Because we are focusing on frequencies much higher than the bandwidth of the process and control system, a change in this T2 implies that sensor characteristics have changed. If this change is persistent, then the T2 chart will reflect this over subsequent samples. In practice, the percent level of confidence (R) must be selected with care to distinguish between Type I and Type II errors. Rather than use this as an absolute yardstick of the existence of a fault in a sensor, we suggest that it be used to determine a degree of confidence in the measurement itself. Hence, the further away the T2 chart moves from the expected value, the higher the probability that a fault has occurred. This is verified in the experimental tests described below. 3. Principal Component Analysis (PCA) The Hotelling T2 control chart works fine when the correlation between variables is not very high. If the correlation is very high, then µ will be close to singular. In this case, the Hotelling T2 control chart will not work well because a small estimation error in Σ will change the T2 value greatly, causing false alarms. One remedy here is to extract the noncorrelated part from the data using principal component analysis (PCA). PCA finds combinations of variables that describe major trends in the data, thus reducing the actual
2
Tl )
∑ i)1
titiT λi
(15)
where ti in this instance refers to the ith row of T. Statistical confidence limits for the values for T2 can be calculated by means of F distribution as follows:20
Tl,m,R2 )
l(m - 1)(m + 1) FR(l,m-1) (m - l)m
(16)
where m is the number of samples used to develop the PCA model in the learning stage. It is important to note that T2 statistics make some assumptions regarding the distribution of the data, specifically, that it is multivariate normal.22 Clearly, the PSD data is often not normally distributed (chi-square distributed if the noise is Gaussian). However, the central limit theorem states that sums of several different groups will tend to be normally distributed, regardless of the probability distribution of the individual groups.22 This suggests that the PCA-based method, where scores are a weighted sum of individual variables, will tend to produce measures that are more normally distributed than the original data. This is verified by the experimental studies presented later. 4. Sensor Fault Detection and Isolation Algorithm The proposed sensor FDI algorithm can be divided into several stages: learning stage, fault detection stage, and fault isolation stage. 4.1. Learning Stage. The learning stage is triggered when the FDI module is first installed into the sensor or when the process changes its operating conditions significantly so that the observed noise PSD pattern also
400
Ind. Eng. Chem. Res., Vol. 39, No. 2, 2000
changes. The purpose of the learning stage is to build the PCA model and T2 statistics. (1) Determine the data length of each batch, the sampling rate, the confidence percentage R, the number of learning samples, and frequency bands to monitor. The lowest frequency band should be well above the control system bandwidth. (2) Monitor the process for a specified time under normal sensor conditions to get learning samples. Calculate the PSD distribution as well as the 100R% confidence limit. The confidence limit can be calculated directly if the noise distribution is known; otherwise, an estimate method like a histogram should be used instead. (3) Build a PCA model from a sampled PSD data set and use 2 or 3 principal components to form the Hotelling T2 statistics. 4.2. Fault Detection Stage. This is the working stage of the proposed method. It reads some length (window size) of data and tests the health of the sensor. Fatal failures such as open circuit (zero reading) or hard over (full-scale reading) are easily detectable using conventional methods and therefore are not considered here. (1) Read a batch of data at each time interval; perform PSD estimation to form a new PSD sample. (2) Use the PCA model developed in the learning stage and monitor the T2 value of the new sample. If T2 is above the confidence limit (learned in the learning stage), give a warning. If T2 continues to be over the limit, a sensor fault alarm should be activated. (3) If there is a sensor fault, trigger the sensor fault isolation stage to determine possible faults. Setpoint transfers and process disturbance should not trigger a fault alarm because it only affects lowfrequency bands (see experimental results below). Middle- to high-frequency bands are monitored to detect sensor faults. If the process operating conditions have changed significantly, then there is a possibility that the assumption of noise stationarity is no longer valid. In this case, it may be necessary to invoke the learning stage again. Field tests are necessary to determine the range of validity of the stationarity assumption. 4.3. Fault Isolation Stage. Fault isolation is not as critical as fault detection. However, information on possible fault type may help the operator remove the fault quickly. A simple way is to read the PSD plots with confidence limits based on Welch’s method. If different faults cause PSD plots change in different ways, then we may be able to identify the fault from this pattern. We do not have to investigate this possibility in detail at this time. 5. Parameters Used in the Algorithm The algorithm set forth above requires a number of design parameters to be specified. The important ones are discussed below. (1) Sampling Frequency: A generally high sampling frequency (g1000 Hz) is preferred to get sufficient information on noise power distribution. For most chemical processes, sampling rates from several hundred to several thousand Hz may be used. The minimum sampling frequency should be higher than the closedloop process stopband cutoff frequency (e10 Hz). This minimum requirement can be easily met with current A/D and microchip technology.
(2) Data Window Size: This is the size of the data sampled during one fault detection interval. Usually, this should be large enough (∼1000) to ensure sufficiently accurate estimation of PSD. A longer window size is preferred but limited by the computation power and promptness requirement of fault detection. (3) Frequency Band Selection: A minimum frequency band used for fault detection should be higher than the closed-loop process stopband cutoff frequency. (For the examples we studied in our laboratory, we used a frequency higher than 10 Hz.) However, the minimum frequency band should also be close to the process stopband cutoff frequency to capture most of the noise not affected by the process dynamics. The number of bands used cannot be too small as it would obscure variations. A very large number of bands is not desirable because the accuracy of PSD estimation is reduced. In our examples, the number of frequency bands larger than 8 and less than 32 proved to be satisfactory. (4) Number of Principal Components: The number of PCs that provide an adequate description of the data can be assessed using a number of methods.37 Some general rules of thumb based on inspecting the eigenvalue versus PC plot are available.38 Principal components with large variance are usually retained. However, exceptions do exist where some PCs with small variances may prove to be more physically significant than those with large variances.39-41 For many cases, however, 2-3 principal components are usually enough to describe the relationship existing in the data. (5) Percent Level of Confidence R: The higher the percent level of confidence, the lower the probability of Type I errors (false alarm) and the higher the probability of Type II errors (miss alarm). The percent level of confidence is usually determined by the importance of a sensor. The most crucial sensor should use the lowest percent level of confidence in fault detection because one cannot afford a Type II error. However, too low of a percent level of confidence will result in small average run lengths (ARLs). In this regard, we suggest that rather than using the confidence limit as an absolute criteria for determining the existence of a fault, one should use the T2 variation as a measure of the confidence we have on the change in the sensor characteristics. Thus, large variations in T2 values should be taken as an indication that the noise pattern has changed. The engineer monitoring the process can then decide whether this change justifies further field tests or calibration of the sensor. (6) Outlier Detection and Removal: With the assumption that noise is Gaussian, the probability of the measurement going out of 3σ is 0.3%. If a measurement goes out of that range, one may discard it as an outlier. For non-Gaussian signals, higher limits may be used. Because the measurement is assumed to change slowly compared with the sampling frequency, outlier detection and removal can be done in a piecewise manner, that is, window by window. 6. Experimental Studies Using a Temperature Sensor Experiments were conducted using an off-the-shelf temperature sensor and a pressure sensor to test the validity of the proposed scheme. The temperature sensor used is a type K thermocouple. The experimental setup shown in Figure 4 was used to test fault detection in the temperature sensor. Note the closed-loop nature of
Ind. Eng. Chem. Res., Vol. 39, No. 2, 2000 401
Figure 4. Air temperature control setup.
Figure 6. PSD data used for trainning the PCA model. The vertical axis is the PSD estimate at frequency band centering f, and the horizontal axis is the sample number. The dashed lines are the estimated upper and lower 99.7% confidence limits on the PSD estimation based on eq 7 at the spedcified frequency band.
Figure 5. Frequency response of the temperature control system (closed-loop, response to setpoint changes).
the process. PLS•Toolbox 2.0.138 was used to build the PCA model. The temperature of air exiting from a hot air blower was measured using a thermocouple. A well-tuned controller was commissioned to read the sensor measurement and manipulate the heater power to control the temperature. A discrete PID controller with a sample time of 1 s was used in the control system. This introduced discrete step changes in the manipulated variable because of the zero-order hold present. To verify that the control system affects only the low-frequency range, the frequency response of the closed-loop system-transfer function is plotted in Figure 5. This plot was generated using a first-order plus dead-time (FOPDT) model of the process. The process time constant was estimated to be 55 s and process delay to be 1 s. (These numbers are relatively small in comparison with the time constants encountered in actual industrial processes.) Despite the discrete nature of the PID controller used, the highfrequency bands were well-suppressed in the closed-loop system. For example, at frequency ) 125/16 × 4 ) 31.25 Hz, the amplitude of the process output will be less than 0.001 of the original amplitude at the setpoint. A unit step function has an amplitude of 0.0051 at a frequency of 31.25 Hz. Thus, the effect of unit step setpoint change will cause the process output amplitude change of less than 5.1 × 10-6, that is, 2.6 × 10-11 power change at a frequency of 31.25 Hz. It was determined experimentally that the noise power is mostly around an order of 1 × 10-6 (see Figure 6). Thus, the setpoint change will have virtually no effect on the noise component of the measurement.
The following design parameters are selected: (1) Sampling Frequency: 250 Hz. This is high enough to capture most of the noise and this sampling frequency is high above the closed-loop process stopband cutoff frequency, which is around 10 Hz. (2) Data Window Size: 5000 data points, corresponding to 20 s. This amount of data is enough to provide a good estimation of noise power. Also, a 20-s delay in fault detection was considered acceptable. (3) Frequency Band Selection: The whole frequency range is divided into 16 bands out of which 10 frequency bands centering around [125/16 × 4, 125/16 × 5, ..., 125/ 16 × 13] Hz are used. The lowest frequency band centered at 31.25 Hz is higher than the process stopband cutoff frequency and is low enough to capture a wide range of noise. (4) Outlier Detection and Removal: The output from the sensor was seen to have occasional random spikes (probably introduced by the signal conditioning unit or A/D unit) which might be outliers and not part of the inherent noise coming from the process and instrument (See Figure 11). These spikes were detected and removed using a simple 3σ threshold check. Every 2 s (500 data points), variance of the data is estimated and measurement above 3σ or below -3σ is discarded. (5) Percent Level of Confidence: 95%. Noise is assumed to be Gaussian. First, in the learning stage, the PCA model and T2 statistics are built from 20 samples (corresponding to 400 s). Figure 6 shows the PSD data used for training from the 10 frequency bands we selected. Table 1 shows the percent variance captured by the PCA model. It is clear that 2 PCs are enough to express the data well. The PCA model and T2 statistics built from the learning stage are subsquently tested in the fault detection stage. To verify that the noise is stationary at different times and different temperature levels with normal sensor health, the experiment was repeated 5 days later, one at the same operating temperature and another at a different operating temperature. T2 values are ploted using the PCA model from the learning stage.
402
Ind. Eng. Chem. Res., Vol. 39, No. 2, 2000
Table 1. Percent Variance Captured by the PCA Model principal component number
eigenvalue of Cov(X) × (1.0 × 1012)
% variance captured this PC
% variance captured total
1 2 3 4 5 6 7 8 9 10
6.13 × 10-1 1.89 × 10-1 1.88 × 10-2 1.25 × 10-2 4.72 × 10-3 1.63 × 10-3 1.48 × 10-3 7.04 × 10-4 3.70 × 10-4 1.35 × 10-4
72.78 22.43 2.23 1.49 0.56 0.19 0.18 0.08 0.04 0.02
72.78 95.21 97.44 98.93 99.49 99.68 99.86 99.94 99.98 100.00
Figure 9. T2 chart when the process undergoes a step setpoint change (from 50 to 70 °C). Step change occurred at time 0.
Figure 7. Noise stationarity verification: the testing data are sampled 5 days later. Figure 10. T2 chart under coating fault (fault occurs at sample no. ) 20).
Figure 8. Noise stationarity verification: the testing data are sampled at a higher temperature (70 °C) than that at the learning stage (50 °C).
Figures 7 and 8 show that the noise is indeed time- and temperature-independent over this range of operation. For setpoint change under normal conditions, PSD at middle- to high-frequency bands is not affected; thus, T2 should still be below the confidence limit. Figure 9 shows the result when the controller setpoint changes suddenly from 50 to 70 °C. T2 is still below the confidence limit (with only one exception that can be attributed to the random nature of the noise). The setpoint change did raise some values of T2 immediately after the change occurred for a short period of time. And some points are close to the confidence limit. A possible cause may be the violation of noise stationary assumption during the transition time. Thus, the setting of the
confidence percentage level should be carefully done. On the whole, however, this change of T2 is still small compared with that for a sensor fault. Next, we deliberately introduced some faults to the sensor. Again, the PCA model from the learning stage is used to calculate the T2 values. Three types of faults are considered here: coating fault, dislocation fault, and span error. (1) Coating Fault: Sensor performance degradation caused by scaling deposits on the sensor is a frequent occurrence. With coating, the time constant of the thermocouple will increase, thus filtering out some of the high-frequency components. Figure 10 plots the T2 chart when the thermocouple is coated with an inert material instantly. The T2 value goes over the confidence limit and stays there persistently (even after sample number > 70), indicating the presence of a fault. The coating acts as a filter for the noise and hence manifests itself in the PSD. Scale buildup usually occurs gradually over a period of time. This will be manifested by an upward trend in the T2 plot, and when reaches steady state, T2 will be above the confidence limit. Here, we show the T2 plot before and after the fault in Figure 10. This figure also shows that the fault has led to a persistent excursion of the T2 plot over its normal range. In this case, the deviation will depend on the extent of coating present. We suggest that the T2 plot be used very much like a control chart, showing possible deviation in the sensor behavior. Although we have not shown the result here, the noise pattern remained stationary after the coating fault was applied. If the coating
Ind. Eng. Chem. Res., Vol. 39, No. 2, 2000 403
Figure 13. T2 chart under -20% span change (fault occurs at sample no. ) 20). Figure 11. Time domain signal when there is a coating fault in the temperature sensor. (10 000 data points are plotted. The spikes show the possible existence of outliers.)
Figure 14. T2 chart under +20% span change (fault occurs at sample no. ) 20). Figure 12. T2 chart under dislocation fault (fault occurs at sample no. ) 20).
occurred gradually over a period of time, we would have observed a gradual shift in the T2 chart. Hence, the method is independent of the dynamic of the fault itself. Figure 11 shows the sensor reading before and after the coating fault is introduced. The fault does not manifest itself in the time domain clearly. (2) Sensor Dislocation: Sometimes sensor may be dislocated due to mechanical damage or due to the impact from the media in which it is located. In such a case, the measurement will not reflect the true temperature at the original location. Figure 12 shows the T2 chart when there is a dislocation of 129 mm of the sensor. The noise characterization has changed and the T2 chart shows a fault. (3) Span (Gain) Error: The span of the signalconditioning unit usually changes with time, requiring recalibration of the instrument periodically. A change in span will magnify or reduce the signal power, causing error in measurement. The noise PSD will also be magnified or reduced also as a result of any change in the gain of the transmitter. T2 charts are plotted in Figures 13 and 14 for different span changes. It can be seen that the method detects both +20% and -20% span changes in the sensor. The above tests show the feasibility of this fault detection method. In case one needs to analyze the fault type, PSD plots versus time may be helpful. Figure 15 is an example of the PSD plots when the sensor is
Figure 15. PSD plots versus time (10 frequency bands are plotted). The vertical axis is the PSD estimate at frequency band centering f, and the horizontal axis is the sample number. The dashed lines are the estimated upper and lower 99.7% confidence limits on the PSD estimation based on eq 7 at the spedcified frequency band.
covered by cotton. After the coating, the PSD drops below the lower confidence limit. Given such a graph, the possible faults could be a coating fault, stuck fault, or span error. The method cannot distinguish the kind
404
Ind. Eng. Chem. Res., Vol. 39, No. 2, 2000
Figure 16. Pressure sensor experiment setup.
Figure 17. T2 charts after 7 days (at pressure 5.85 psi).
of fault, which causes the observed variations in the PSD plot. 7. Experimental Studies Using a Pressure Sensor We have also tested the proposed FDI method on a pressure sensor (Omega PX26-015GV). The experimental setup is shown in Figure 16. The sensor is used to measure the pressure at the bottom of the tank. A well-tuned controller is used to maintain the pressure inside the tank. The following design parameters are selected: (1) Sampling Frequency: 1000 Hz. This is high enough to capture most of the noise and this sampling frequency is high above the closed-loop process stopband cutoff frequency which is around 50 Hz. (2) Data Window Size: 5000 data points, corresponding to 5 s. This amount of data is enough to provide a good estimation of noise power. Also, a 5-s delay in fault detection is not a major concern. (3) Frequency Band Selection: The whole frequency range is divided into 16 bands out of which 10 frequency bands centering around [500/16 × 4, 500/16 × 5, ..., 500/ 16 × 13] Hz are used. The lowest frequency band centered at 125 Hz is higher than the process stopband cutoff frequency and is low enough to capture a wide range of noise. (4) Outlier Detection and Removal: The same outlier detection algorithm as that in the temperature sensor is used. Every 0.5 s (500 data points), variance of the data is estimated and measurements above 3σ or below -3σ are discarded. (5) Percent Level of Confidence: 95%. The noise is assumed to be Gaussian. (6) Number of Principal Components: 2 PCs are kept which capture 80% of the total variance. Noise Stationarity Verification. The sensed pressure is maintained at 5.85 psi (or 6.67 psi) for a long time. The noise stationary property is checked. Figures 17 and 18 show that the noise characteristic does not change with time or pressure. Also, the setpoint change does not affect fault detection too much, as is shown in Figure 19. As discussed earlier in the former example, the noise may not be stationary under transition. But the T2 change is small compared with that of a fault. Although one point in Figure 17 lies above the limit, this is to be expected occasionally because of the random nature of the noise. Several kinds of faults were introduced deliberately. The PCA model from the learning stage is used to
Figure 18. T2 chart at a higher pressure (6.67 psi).
Figure 19. T2 chart under setpoint change (from 5.85 to 6.67 psi).
calculate the T2 values. Figures 20 and 21 show T2 charts when span error is introduced. The proposed method detects this change quite well. Figures 22 and 23 show the effect of an air bubble introduced deliberately into a 30-cm long tube connecting the tank and the sensor. Apparently, a small air bubble (5-cm length) does not activate the alarm while a large air bubble (15-cm length) activates the alarm. When the separate Fluke calibration unit (see Figure 16) was used, it was seen that the small bubble did not appreciably affect the pressure reading whereas the larger air bubble did cause a measurement error of around 4% (at steady state). Figure 24 plots the T2 chart when the sensor has been slightly damaged as a result of overpressurization. The full scale of the sensor is 15 psi and the burst pressure
Ind. Eng. Chem. Res., Vol. 39, No. 2, 2000 405
Figure 20. T2 chart under -20% span change (fault occurs at sample no. ) 20).
Figure 21. T2 chart under +20% span change (fault occurs at sample no. ) 20).
Figure 23. T2 chart when there is an air bubble (with length ) 15 cm) in the tube connecting the sensor to the tank (fault occurs after sample no. ) 20).
Figure 24. T2 chart after the sensor has been slightly damaged due to overpressurization.
Figure 22. T2 chart when there is an air bubble (with length ) 5 cm) in the tube connecting the sensor to the tank (“fault” occurs after sample no. ) 20).
is 50 psi. The actual pressure was kept at 25-30 psi for 30 min, causing slight damage to the sensor. The sensor is now not functioning normally and has an actual error of about 0.3-0.7 psi as measured independently using the Fluke calibration unit. See Figure 25 for the calibration before and after the damage. The proposed method detects such a fault well. At the conditions of operation, this caused an error of 0.35 psi in the measurement (verified using the Fluke pressure calibration unit).
Figure 25. Pressure sensor calibration before and after slight damage caused by overpressurization of the sensor.
8. Conclusions In this paper, we proposed an approach to detect sensor faults by analyzing high-frequency noise present in the measurement. Experimental studies with a temperature sensor and a pressure sensor show the feasibility of the method. The method takes advantage of the emergence of significant local computing power expected to be embed-
406
Ind. Eng. Chem. Res., Vol. 39, No. 2, 2000
ded in future generations of intelligent sensors. It requires sampling the signal at a high frequency, thereby isolating the noise content in the high-frequency bands through standard signal-processing techniques and then monitoring these signals using the multivariable statistical control charts. Principal component analysis is used to remove correlation of the noise among the frequency bands. One key assumption used is that the noise is stationary and any change in the noise characteristics can be attributed to sensor fault. If the noise is nonstationary, then this method will not be applicable. For the two examples studied, the assumption was verified experimentally. Significant changes in the process operation will require relearning of the new noise characteristics at the current operating conditions. Only certain types of faults can be detected using this method. It may not be possible to uniquely identify the detectable faults, although some classification is possible. While we have established the feasibility of the approach, some questions remain unanswered. Becasue our experiments were carried out under laboratory conditions, there is a need to verify the validity of the assumptions used in industrial environments. The most important of these is the assumption of stationarity of the noise used. Although numerous authors have used this assumption in the past, it is necessary to verify it before the proposed technique can be used. We have also addressed some issues concerning the choice of parameters in the algorithm. These include the sampling frequency used, the data window size, the frequency band selection, and the choice of the percent level of confidence. While many of the parameters can be selected on the basis of an analysis of the application at hand, definite guidelines and a thorough analysis of the consequence of the selection of each parameter need to be developed. On the whole, the method looks promising, especially because of the inherent advantages it possesses such as localizing the self-diagnosing process, the ability to embed the scheme into the signal-conditioning module hardware, the ease of training the fault detection algorithm, and the ability to complement more powerful fault detection algorithms which are based on multiple sensors and a process model. Acknowledgment Support provided by a NSF grant CTS-95-29578 is gratefully acknowledged. Notation Bxx ) averaged PSD estimate from K segments E ) residual matrix J ) PSD estimate with one segment K ) number of segments l ) number of principal components m ) number of samples N ) data number in one window (window size) p ) number of frequency bands pi ) load pxx ) true PSD ti ) score vector T2 ) defined in eq 9 Q ) number of data points in each segment U ) defined in eq 3 w(n) ) window function
X ) data matrix y(i) ) the ith data point µ ) estimated mean Σ ) estimated covariance matrix λi ) the ith eigenvalue
Literature Cited (1) Yang, Janice C.-Y.; Clarke, D. W. A Self-Validating Thermocouple. IEEE Trans. Control Syst. Technol. 1997, 5 (2), 239. (2) Henry, M. P.; Clarke, D. W. The Self-Validating Sensor: Rationale, Definitions and Examples. Control Eng. Practice 1993, 1 (4), 585. (3) Isermann, R. Process Fault Detection Based on Modeling and Estimation MethodsA Survey. Automatica 1984, 20 (4), 387404. (4) Patton, R.; Frank, P.; Clark, R. Fault Diagnosis in Dynamic Systems; Theory and Application; Prentice Hall International (UK) Ltd.: Hertfordshire, UK, 1989. (5) Frank, P. M. Fault Diagnosis in Dynamic Systems Using Analytical and Knowledge-Based RedundancysA Survey and Some New Results. Automatica 1990, 26 (3), 459-474. (6) Gerler, J.; Singer, D. A New Structural Framework for Parity Equation-Based Failure Detection and Isolation. Automatica 1990, 26, 381-388. (7) Tsai, T. M.; Chou, H. P. Sensor Fault Detection with the Single Sensor Parity Relation. Nucl. Sci. Eng. 1993, 114, 141148. (8) Patton, R. J.; Chen, J. Review of Parity Space Approaches to Fault Diagnosis for Aerospace Systems. J. Guidance Control Dynam. 1994, 17 (2), 278-284. (9) Keller, J. Y.; Summerer, L.; Boutayeb, M.; Darouach, M. Generalized Likelihood Ratio Approach for Fault Detection in Linear Dynamic Stochastic Systems with Unknown Inputs, Int. J. Syst. Sci. 1996, 27 (12), 1237-1241. (10) Chang, C. T.; Hwang, J. I. Simplification Techniques for EKF Computations in Fault Diagnosis: Model Decomposition. AIChE J. 1998, 44 (6), 1392-1403. (11) Wilbers, D. N.; Speyer, J. L. Detection Filters for Aircraft Sensor and Actuator Faults. In ICCON ’89: Proceedings of the IEEE International Conference on Control and Applications, Jerusalem, April 1989; IEEE: New York, 1989. (12) Kage, K.; Joseph, B. Measurement Selection and Detection of Measurement Bias in the Context of Model-Based Control and Optimization. Ind. Eng. Chem. Res. 1990, 29 (10), 2037-44. (13) Watanabe, K.; Hirota, S.; Hou, L.; Himmelblau, D. M. Diagnosis of Multiple Simultaneous Fault via Hierarchical Artificial Neural Networks. AIChE J. 1994, 40 (5), 839-848. (14) Tsai, C. S.; Chang, C. T. Dynamic Process Diagnosis via Integrated Neural Networks. Comput. Chem. Eng. 1995, 19 (Suppl.), S747-S752. (15) Tzafestas, S. G.; Dalianis, P. J. Artificial Neural Networks in the Fault Diagnosis of Technological Systems: A Case Study in Chemical Engineering Process. Eng. Simul. 1996, 13, 939-954. (16) Brydon, D. A.; Cilliers, J. J.; Willis, M. J. Classifying PilotPlant Distillation Column Faults Using Neural Networks. Control Eng. Practice 1997, 5 (10), 1373-1384. (17) Leonard, J. A.; Kramer, M. A. Radial Basis Function Networks for Classifying Process Faults. IEEE Control Syst. Magazine 1991, 11 (3), 31-38. (18) Kramer, M. A. Nonlinear Principal Component Analysis Using Auto Associative Neural Networks. AIChE J. 1991, 37 (2), 233-243. (19) MacGregor J. F.; Jaeckle, C.; Kiparissides, C.; Koutodi, M. Process Monitoring and Diagnosis by Multiblock PLS Methods. AIChE J. 1994, 40, 826. (20) MacGregor J. F.; Kourti, T. Statistical Process Control of Multivariate Process. Control Eng. Practice 1995, 3 (3), 403-414. (21) Dunia, R.; Qin, S. J.; Edgar, T. F.; McAvoy, T. J. Identification of Fault Sensors Using Principal Component Analysis. AIChE J. 1996, 42, 2797-2812. (22) Wise, B. M.; Gallagher, N. B. The Process Chemometrics Approach to Process Monitoring and Fault Detection. J. Process Control 1996, 6(6), pp 329-348. (23) Raich, A.; Cinar, A. Statistical Process Monitoring and Disturbance Diagnosis in Multivariable Continuous Processes. AIChE J. 1996, 42 (4), 995-1009.
Ind. Eng. Chem. Res., Vol. 39, No. 2, 2000 407 (24) Maryak, J. L.; Hunter, L. W.; Favin, S. Automated System Monitoring and Diagnosis via Singular Value Decomposition. Automatica 1997, 33 (11), 2059-2063. (25) Dunia R.; Qin, S. J. A Unified Geometric Approach to Process and Sensor Fault Identification and Reconstruction: The Unidimensional Fault Case. Comput. Chem. Eng. 1998, 22 (7-8), 927-943. (26) Qin, S. J.; Yue, H.; Dunia, R. Self-Validating Inferential Sensors with Application to Air Emission Monitoring, Ind. Eng. Chem. Res. 1997, 36, 1675-1685. (27) Bakshi, B. R. Multiscale PCA with Application to Multivariate Statistical Process Monitoring. AIChE J. 1998, 44 (7), 1596-1610. (28) Luo, R.; Misra, M.; Himmelblau, D. Sensor Fault Detection via Multiscale Analysis and Dynamic PCA. Ind. Eng. Chem. Res. 1999, 38 (4), 1489-1495. (29) Nam, D. S.; Jeong, C. W.; Choe, Y. J.; Yoon, E. S. Operation-Aided System for Fault Diagnosis of Continuous and Semi-Continuous Processes. Comput. Chem. Eng. 1996, 20 (6-7), 793-803. (30) Guan, J.; Graham, J. H. An Integrated Approach for Fault Diagnosis with Learning. Comput. Ind. 1996, 32, 33-51. (31) Yung, S. K.; Clarke, D. W. Sensor Validation. Meas. Control 1989, 22, 132-150. (32) Luo, R.; Misra, M.; Qin, S. J.; Barton, R.; Himmelblau, D. Sensor Fault Detection via Multiscale Analysis and Nonparametric Statistical Inference. Ind. Eng. Chem. Res. 1998, 37 (3), 10241032. (33) Rabiner, L. R.; Gold, B. Theory and Application of Digital Signal Processing; Prentice-Hall: Englewood Cliffs, NJ, 1975. (34) Oppenheim, A. V.; Schafer, R. W. Digital Signal Processing; Prentice-Hall: Englewood Cliffs, NJ, 1975. (35) Welch, P. D. The Use of Fast Fourier Transform for the Estimation of Power Spectra: A Method Based on Time Averaging Over Short, Modified Periodograms. IEEE Trans. Audio Electroacoust. 1967, AU-15 (2), 70-73. (36) Montgomery, D. C. Introduction to Statistical Quality Control, 2nd ed.; John Wiley & Sons: New York, 1991. (37) Jackson, J. E. A User’s Guide to Principal Components; John Wiley & Sons: New York, 1991. (38) Wise, B. M.; Gallagher, N. B. PLS•Toolbox 2.0 for Use with MATLAB; Eigenvector Research, Inc.: Manson, WA, 1998. (39) Mandel, J. Principal Components, Analysis of Variance, and Data Structure. Stat. Neerland. 1972, 26 (3), 119-129. (40) Jolliffe, I. T. A Note on the Use of Principal Components in Regression, Appl. Stat. 1982, 31 (3), 300-303. (41) Thomas, M. M. Quality Control of Batch Chemical Processes with Application to Autoclave Curing of Composite Laminate Materials. D. Sc. Dissertation, Washington University, St. Louis, MO, Dec 1995.
(42) Bakshi, B. R.; Locher, G.; Stephanopoulos G.; Stephanopoulos, G. Analysis of Operating Data for Evaluation, Diagnosis and Control of Batch Operations. J. Process Control 1994, 4 (4), 179-194. (43) Garcia, E. A.; Frank, P. M. Analysis of a Class of Dedicated Observer Schemes to Sensor Fault Isolation. UKACC International Conference on Control, Sept 2-5, 1996. (44) Gross, K. C.; Hoyer, K. K.; Humenik, K. E. System for Monitoring an Industrial Process and Determining Sensor Status. U.S. Patent 5,459,675, 1995. (45) Henry, M. Automatic Sensor Validation. Control Instrum. 1995, 27, 60. (46) Himmelblau, D. M.; Mohan, B. On-line Sensor Validation of Single Sensors Using Artificial Neural Networks. American Control Conference, Seattle, WA, 1995; IEEE: New York, 1995; pp 766-770, (47) Hsiung, J. T.; Himmelblau, D. M. Detection of Leaks in a Liquid-Liquid Heat Exchanger Using Passive Acoustic Noise. Comput. Chem. Eng. 1996, 20 (9), 1101-1111. (48) Jenkins, G. M.; Watt, D. G. Spectral Analysis and Its Applications; Holden-Day Inc.: San Francisco, CA, 1968. (49) Karjala, T. W.; Himmelblau, D. M. Dynamic Data Rectification by Recurrent Neural Networks vs. Traditional Methods. AIChE J. 1994, 40 (11), 1865-1875. (50) Rengaswamy R.; Venkatasubramanian, V. A Syntactic Pattern-Recognition Approach for Process Monitoring and Fault Diagnosis. Eng. Appl. Artif. Intell. 1995, 8 (1), 35-51. (51) Stockdale, R. B. New Standards Squeeze Temperature Measurements and Control Specifications. Control Eng. 1992, 39, 89-92. (52) Watanabe, K.; Koyama, H.; Tanoguchi, H.; Ohma, T.; Himmelblau, D. M. Location of Pinholes in a Pipeline. Comput. Chem. Eng. 1993, 17 (1), 61-70. (53) Ying, C.-M. Issue in Process Monitoring and Control: Identification, Model Predictive Control, Optimization and Fault Detection. D. Sc. Thesis, Washington University, St. Louis, MO, 1999. (54) Ying, C.-M.; Joseph, B. Sensor Fault Detection Using Power Spectrum Density Analysis. AIChE Annual Meeting, Miami, FL, 1998.
Received for review May 13, 1999 Revised manuscript received November 5, 1999 Accepted November 9, 1999 IE9903341