Self-Validating Inferential Sensors with Application to Air Emission

Typically, a mobile analytical sensor is used to collect emission data for a ... In section 5, a method for determining the number of principal compon...
4 downloads 0 Views 289KB Size
Ind. Eng. Chem. Res. 1997, 36, 1675-1685

1675

Self-Validating Inferential Sensors with Application to Air Emission Monitoring S. Joe Qin,* Hongyu Yue, and Ricardo Dunia Department of Chemical Engineering, The University of Texas at Austin, Austin, Texas 78712

Inferential sensors, or soft sensors, refer to a modeling approach to estimating hard-to-measure process variables from other easy-to-measure, on-line sensors. Since many sensors are used as input variables to estimate the output, the probability that one of the sensors fails increases significantly. In this paper, we propose a self-validating inferential sensor approach based on principal component analysis (PCA). The input sensors are validated using a fault identification and reconstruction approach proposed in Dunia et al. AIChE J. 1996, 42, 2797-2812. A principal component model is built for the input sensors for sensor validation, and the validated principal components are used to predict output variables using linear regression or neural networks. If a sensor fails, the sensor is identified and reconstructed with the best estimate from its correlation to other sensors. The principal components are also reconstructed accordingly for prediction. The number of principal components used in sensor validation and prediction are chosen differently based on different criteria. The typical input correlation or collinearity is utilized for sensor validation and removed in predicting the output to avoid ill-conditioning. The selfvalidating soft sensor approach is applied to air emission monitoring, where continuous monitoring of the air pollutants is required for environmental regulations. 1. Introduction Inferential sensors, also known as soft sensors, refer to a software approach to inferring hard-to-measure variables from other on-line measurable process variables. These hard-to-measure variables are usually quality variables or directly related to economic interest. Traditionally, these quality variables are observed by conducting laboratory tests. A significant delay is normally incurred in laboratory testing, which is in the range of half an hour to a few hours. The problem is that, although the product quality can be determined after this delay, it may be too late to make timely adjustments. An inferential sensor can predict the primary output variables from process inputs and other measured variables without incurring the measurement delay. The prediction is provided for further implementation of inferential control. The fundamental principle behind inferential sensors is that there is a functional relationship between the variables to be inferred and process operating conditions. To build inferential sensors, early work (Joseph and Brosilow, 1978) suggests using the Kalman filter approach to estimate the product quality if a state space model is available. This approach usually uses secondary measurements as feedback to reduce the variance of the estimate for the primary variable. Jutan et al. (1977) used a combination of first principles models with multivariate statistical analysis to estimate the transport properties and catalyst activity for a packed-bed reactor. For processes where analytical sensors are available at a slower rate than the secondary variables, a multirate state estimation has been proposed by Lee and Morari (1992) and Lee et al. (1992). In the cases where the process mechanism is not well understood, the needed input variables are not measurable, or the physical model parameters are not known, it is desirable to build empirical models to infer the primary variable (Mejdell and Skogestad, 1991; Qin and McAvoy, 1992). * Author to whom all correspondence should be addressed. Tel: (512)471-4417. Fax: (512)471-7060. email: qin@ che.utexas.edu. S0888-5885(96)00615-X CCC: $14.00

A recent application of inferential sensors is to estimate air emissions from industrial processes (Keeler and Ferguson, 1996). The Environmental Protection Agency (EPA, 1991) requires that all industrial boilers and furnaces be equipped with continuous emission monitoring systems (CEMS) for NOx, SO2, and CO emissions. The conventional approach to continuous emission monitoring is to use analytical sensors, which are expensive and difficult to maintain (Mandel, 1996). Inferential approaches based on neural networks have been used as an alternative to CEMS, which is know as predictive emission systems (PEMS) (Keeler and Ferguson, 1996; Dong et al., 1995). Typically, a mobile analytical sensor is used to collect emission data for a period of time. Then an inferential model is built to predict the air emission from other on-line process variables. In this approach a process plant can use one analytical sensor to build inferential sensors for multiple processes. There are two important issues with inferential sensors which need to be studied for a successful application. The first issue is that EPA requires that the monitoring systems be on-line for at least 95% of the time. This requires highly reliable inferential sensors. However, the validity of inferential sensors relies on the validity of the input sensors. If one of the input sensors fails, the inferential sensor estimate will deteriorate. Since an inferential sensor uses multiple input sensors, the probability that one of the sensors fails increases dramatically. The second issue is that empirical models, such as neural networks, do not extrapolate well. It is imperative to detect whether a neural network is extrapolating or not. In this paper, we propose an integrated framework known as selfvalidating inferential sensors that will do the following: (1) Validate the input sensors before making a prediction. If an input sensor is faulty, it will be detected, identified, and reconstructed with a best estimate. A principal component analysis (PCA) model is built to achieve this goal using the sensor validation method proposed by Dunia et al. (1996). © 1997 American Chemical Society

1676 Ind. Eng. Chem. Res., Vol. 36, No. 5, 1997

(2) Detect whether the inferential model is extrapolating. A χ2 test for the PCA scores is used as an extrapolation index. (3) Build a prediction model for the output variables based on the validated principal components, instead of the raw input variables. If one sensor is faulty, the principal components are reconstructed based on the PCA model. The reconstructed principal components will then be used for prediction. If there is no fault at all, the PCA model serves as a dimension reduction to eliminate collinearity, thus making the prediction model well conditioned. Only one PCA model and a prediction model are needed in this framework. The PCA model extracts a principal component subspace (PCS) in which most normal variability occurs and a residual subspace which contains the residuals. The residual subspace is used to detect a faulty sensor by using the correlation. Further fault identification and reconstruction algorithms are used to replace fault sensors so that the inputs to the soft sensors are valid even in the presence of faults. The principal components, which remove correlation and replace possible faulty sensors, are used for predicting the output. The remaining of the paper is organized as follows. Section 2 presents the integrated framework of the selfvalidating inferential sensors. The tests for extrapolation and detection of faulty sensor are given in this section. A modified Q-statistic for the filtered squared prediction error is proposed. Section 3 discusses three different methods for reconstructing a faulty sensor, which are shown to be equivalent. Section 4 addresses the identification for a faulty sensor and a sensor validity index for identification. In section 5, a method for determining the number of principal components for best reconstruction is proposed. Section 6 discusses the prediction models for inferential sensors from reconstructed principal component scores. An application of the proposed method for continuous emission monitoring is presented in section 7. The last section gives conclusions. 2. Self-Validating Inferential Sensors 2.1. An Integrated Framework. Let x ∈ Rm represents a vector of input variables that will be used to predict the output variables in vector y ∈ Rp. Assuming n samples are collected for the output and input variables, two matrices, X ∈ Rn×m and Y ∈ Rn×p, are formulated in which each row represents a sample. If the input variables have time delays to the output, the input matrix X should be properly aligned according to the time delays. Typically, an inferential sensor is built to predict y directly from the inputs x. However, if one of the input sensors is faulty, the model prediction is not valid. In this section, we propose a two-step method in which the input variables are first modeled with principal component analysis, then the principal components are used to predict the outputs. The PCA model has three roles to play: (i) to validate the input sensors using the sensor validation method by Dunia et al. (1996) (if one of the input sensors is faulty, it is identified and reconstructed from other correlated sensors; the validated principal components are then used for prediction); (ii) to eliminate input collinearity that may enlarge the variance of the model prediction; and (iii) to indicate whether the model is extrapolating or not. Since all empirical models do not extrapolate reliably, this is a necessary measure to maintain the

Figure 1. Integrated soft sensor with a sensor validation module. A PCA model is built to extract sensor correlation for sensor validation. The same PCA model is used to extract independent principal components for soft sensor prediction.

validity of the prediction. This framework is referred to as self-validating inferential sensors (SVIS) which will validate the input sensors, test for extrapolation, and make prediction. A schematic diagram of the framework is depicted in Figure 1. The PCA model decomposes the input data as a bilinear product of scores and loadings,

X ) TPT + E

(1)

where E is the residual matrix which includes mainly noise under normal conditions. T ∈ Rn×l and P ∈ Rm×l are the score and loading matrices, respectively. Given a new sample vector x, the PCA score, prediction, and residual vectors are given as follows, respectively:

score:

t ) PTx ∈ Rl

prediction: xˆ ) Cx ∈ Sp ⊂ Rm;

(2) dim(Sp) ) l

(3)

residual: x˜ ) (I - C)x ∈ Sr ⊂ Rm; dim(Sr) ) m - l (4) where C ) PPT is idempotent and so is I - C. The sample vector x is projected onto the principal component subspace (Sp) and the residual subspace (Sr), respectively, as follows:

x ) xˆ + x˜

(5)

The squared prediction error (SPE) for x is

SPE(x) ) ||x˜ ||2 ) xT(I - C)x

(6)

Under normal conditions, the residual portion of the sample is small. Jackson and Mudholkar (1979) proposed the calculation of a confident limit for the SPE. If one of the sensors is faulty, the SPE will increase and thus can be used to detect sensor faults. When a sensor fault is detected, it is identified and reconstructed using

Ind. Eng. Chem. Res., Vol. 36, No. 5, 1997 1677

Figure 2. Use of principal components and SPE for input sensor validation and detection of extrapolation. The points marked with “O” have large SPE’s which indicate an abnormal condition. The points marked with “×” are normal but are beyond the training region defined by the principal components.

the sensor validation algorithm, which will be discussed in detail later in this paper. A reconstructed t, which eliminates possible faults, is used for prediction. A prediction model is built to predict y from t by minimizing: n

||yk - f(tk,w)||2 ∑ k)1

(7)

where w is the model parameter vector and f:Rl f Rp is the prediction model which can be a linear model or a neural network. Note that the prediction model is based on the principal components instead of the original input variables. Since the principal components are orthogonal, there is no collinearity problem or illconditioned problem to build such a model. Furthermore, it is convenient to determine the region of the principal component scores for the training data, so as to determine whether a new sample is extrapolating or not. When the data are linear, a linear model is sufficient. The resulting inferential sensor is actually a principal component regression (PCR) model with selfvalidation of the input. If a neural network is used, it can be trained with a conjugate gradient algorithm and the number of hidden neurons can be determined by train/test validation. A detailed treatment of neural network training can be found in Piovoso and Owens (1991) and Qin (1996). Figure 2 illustrates how a PCA model is used for validating input sensors, detecting extrapolation, and removing collinearity for prediction. If the SPE increases significantly as represented by the circles in the figure, these points break the correlation in the PCA model. Therefore, there is possibly a sensor fault. In this case further testing is conducted to tell whether it is a sensor fault or a process fault. If it is due to a sensor fault, an identification and reconstruction algorithm will be executed to identify and reconstruct the failed sensor. The reconstructed sample vector shall have a much lower SPE. If the SPE is normal, but the PC scores are

outside the ellipsis defined by the training data (as represented by the crosses in the figure), these data points are consistent with the PCA model but are beyond the training data region and will result in extrapolation in the prediction model. If the data points are within the cylinder, a valid prediction can be generated from the (validated) PCA scores. It is worth noting that similar tests are used in process monitoring (MacGregor et al., 1991b). In this paper they are used for detecting extrapolation and faulty sensors, which are important considerations for empirical modeling. 2.2. Test for Extrapolation. A preliminary test for extrapolation is to examine each variable alone against the lower and upper bounds of the training data used in modeling. However, it is well-known that this lowerupper bound test cannot detect extrapolation points that are within the bounds but lie outside the ellipsis in Figure 2. Since the prediction model of the integrated framework is based on the principal components, testing extrapolation is equivalent to determining the region which is occupied by training data in the principal component subspace. Therefore, methods for testing the confidence region for the principal components are directly applicable (Tong and Crowe, 1995). Assuming the samples follow normal distribution, the Hotelling T2 test (Hotelling, 1947)

T2 ) tTΛ-1t

(8)

follows a χ2 distribution of l degrees of freedom, where Λ ) diag{λ1, λ2, ..., λl} is a diagonal matrix containing the l largest eigenvalues of the covariance matrix XTX. A multivariate χ2 control chart can be obtained by plotting T2 versus time with a control limit χβ2(l), where β is the confidence level of the test. Tong and Crowe (1995) discussed several similar tests that may be used alternatively. 2.3. Sensor Fault Detection with Filtering. When all sensors are normal, the residual x˜ will include mainly noise. If one sensor fails, which breaks the normal correlation, the residual will increase significantly. Jackson and Mudholkar (1979) developed a test for residuals known as the Q-statistic. This test suggests the existence of an abnormal condition when

SPE > δR2 where

[

δR2 ) θ1

(9)

]

cRx2θ2h02 θ2h0(h0 - 1) +1+ h0/1 θ1 θ2 1

(10)

and m

θi )

λij; ∑ j)l+1

for i ) 1, 2, 3

h0 ) 1 -

2θ1θ3 3θ22

(11)

(12)

cR is the confidence limit for the 1 - R percentile in a normal distribution. Note that this result is derived under the following conditions: (1) the sample vector x follows a multivariate normal distribution; (2) the result holds regardless of how many principal components are retained in the model; and (3) an approximation for the distribution is made in deriving the confident limit.

1678 Ind. Eng. Chem. Res., Vol. 36, No. 5, 1997

Therefore, one should be cautious in situations where the data are seemingly different from a normal distribution. In the case where the data are not normally distributed, false alarms can occur, which is undesirable. To reduce false alarms, exponentially weighted moving average (EWMA) filters can be applied to the residuals. Since an EWMA filter is roughly equivalent to a windowed approach to a group of data samples, the filtered residuals are closer to normal distribution than the unfiltered residuals. The general EWMA expression for residuals is

e j k ) (I - Γ)e j k-1 + Γx˜ k

(13)

2

SPEk ) ||e j k||

Since

h h0 ) 1 -

h3 2θ h 1θ 2

3θ h2

)1-

2θ1θ3 3θ22

) h0

the distribution in eq 19 can be further simplified as

[ ] [ ||e j k||2 θ1

h0

∼N 1+

θ12

θ12

(15)

jk e j k ) (I - C)x

(16)

Assuming that xk is an independent random process, the following covariance relation can be derived from eq 15

γ R 2-γ

R h )

(17)

where R h is the covariance matrix for x j k and R the covariance matrix for xk. As a consequence, the eigenvalues for R h and R are related as follows:

λhi )

γ λ 2-γ i

for i ) 1, 2, ..., m

(18)

Therefore, similar to the reasoning in Jackson and Mudholkar (1979),

[ ] [ ||e j k||2 θ h1

h h0

∼N 1+

]

h 0(h0 - 1) 2θ h 02 θ h 2h h 2h , θ h 12 θ h2

(19)

where m

θh i )

λhij ) ∑ j)l+1

( )∑ ( ) γ

2-γ

i m

j)l+1

λij )

γ

i

θ h i; 2-γ for i ) 1, 2, 3 (20)

(22)

As a consequence, the Q-statistic for the filtered SPE

SPE > δh R2

(23)

for an abnormal situation and

x j k ) (1 - γ)x j k-1 + γxk

]

θ2h0(h0 - 1) 2θ2h02

is

(14)

where e j k and SPEk are the filtered residuals and SPE, respectively. Γ denotes a diagonal matrix whose diagonal elements are forgetting factors for the residuals. Γ can be adjusted to detect a particular type of fault. In general, Γ close to identity tends to favor the detection of variance changes in the data, while Γ close to 0 is more sensitive to mean changes. Different diagonal entries can be defined in Γ depending on the type of fault to detect in each sensor. A few examples for choosing the forgetting factors can be found in Dunia et al. (1996). 2.4. The Q-Statistic for SPE. Although the filtered SPE can reduce false alarms and provide more flexibility in detecting different types of faults, the original Q-statistic developed by Jackson and Mudholkar (1979) is not applicable to SPE any more. In this section we develop an adequate statistical test for the SPE. Assuming for simplicity that Γ ) γI, eq 13 is equivalent to filtering the sample vector x first and then calculating the residuals

(21)

2

δ hR

[

]

cRx2θ2h02 θ2h0(h0 - 1) ) θh 1 +1+ θ1 θ2 1

1/h0

(24)

Using the relation in eq 20 and comparing eqs 10 and 24, we obtain

δ h R2 )

γ δ 2 2-γ R

(25)

which related the Q-statistic for the filtered SPE to the unfiltered SPE with the filter constant. In other words, SPE defines a tighter confidence region than SPE because of filtering. Example 1. The process variables are related by the following steady state relation:

x ) Pt + e

(26)

where

[

PT ) -0.143 0.143 0 -0.143 0.143 0 -0.143 0.143 0 -0.143 0.408 0.408 0 -0.408 -0.408 0 0.408 0.408 0 0 0.5 0.5 0 0.5 0.5 0 0 0 0 0

]

(27) t ) [t1 t2 t3]T contains three independent random variables and e contains random noise. Five hundred data points are generated from this process. Three principal components are derived in the PCA model to calculate the SPE. The resulting SPE plot and the 95% control limit (δR2 ) 16) are shown in Figure 3 (top). Even though there are no faults in the data, the SPE exceeds the control limit 15 times. To reduce the number of false alarms, Figure 3 (bottom) depicts the filtered SPE (with γ ) 0.1) and the filtered control limit (δh R2 ) 0.84). After filtering there are only two occasions where the filtered SPE exceeds its control limit. Furthermore, it is noted that the control limit after filtering is significantly smaller that the control limit without filtering. Therefore, when the SPE is filtered, the control limit must be calculated using eq 24. 3. Reconstruction of a Faulty Sensor In this section we discuss three methods for reconstruction of a faulty sensor. The faults considered include drifting, bias, precision degradation, and complete failure. We assume that the ith sensor is faulty and try to reconstruct it from other normal sensors.

Ind. Eng. Chem. Res., Vol. 36, No. 5, 1997 1679

Martin and Naes (1989) described a missing value replacement method using PCA. If the ith sensor is faulty (or missing), it is omitted from the sample vector. The modified sample vector becomes

xi ) [xT-i xT+i]T ∈ Rm-1

(33)

The reconstructed score vector is

tˆ ) (PTi Pi)-1PTi xi

(34)

where

Pi )

[ ]

P-i ∈ Rm-1×l P+i

(35)

In other words, Pi is P after crossing out the ith row. The reconstructed value for xi is Figure 3. Filtered SPE and the filtered control limit reduces the occasions of false alarm.

zi ) Ti Ptˆ ) Ti P(PTi Pi)-1PTi xi

Identification of a faulty sensor will be discussed in the next section. 3.1. Reconstruction via Iteration. One may estimate the ith variable from x using eq 3, where the prediction xˆ i is used as a reconstruction of xi. The drawback of this approach is that the faulty sensor contained in x is used in the estimate. Therefore, the estimate is somewhat contaminated by the fault. To eliminate the effect of the faulty sensor, we feed back the prediction of the ith variable (xˆ i) to the input to replace xi and iterate until it converges to a value zi, as shown in Figure 4a. Every iteration through the PCA model is an orthogonal projection to the principal component subspace, as shown in Figure 4b. The iteration can be represented by the following expression:

Note that the reconstructed score vector in eq 34 can be interpreted as

zinew ) [xT-i ziold xT+i]ci ) ciiziold + [cT-i 0 cT+i]x (28) where

C ) PPT ) [c1 c2 ‚‚‚ cm]

(29)

cTi ) [c1i c2i ‚‚‚ cmi] ) [cT-i cii cT+i]

(30)

t

(31)

and zi ) xi for cii ) 1. In the later case,

(37)

which means to find the best score vector that minimizes the SPE among available sensors. Clearly, we have to have m - 1 g l to uniquely determine the modified score vector. This is related to the issue of reconstructability. Dunia et al. (1996) showed that the reconstruction in eq 31 via the iterative approach is equivalent to the one in eq 36 via the missing value approach. The only difference is that eq 31 via iteration does not require matrix inversion and the converging value is calculated in one step. 3.3. Reconstruction via Optimization. This approach was implemented by Wise and Ricker (1991) for a set of reconstructed variables. To reconstruct the ith variable, the optimization procedure tries to find a reconstructed value that minimizes the SPE,

zi ) arg min ||(I - C)[xT-i zi xT+i]T||2 zi

The subscripts -i and +i denote a vector formed by the first i - 1 and the last m - i elements of the original vector, respectively. Dunia et al. (1996) showed that the iteration always converges. The converged value zi regardless of the initial condition for cii < 1 is

[cT-i 0 cT+i]x zi ) 1 - cii

tˆ ) arg min ||xi - Pit||2

(36)

(38)

Since I - C is positive semidefinite, the minimum is achieved by setting the derivative with respect to zi to zero,

[xT-i zi xT+i](i - ci) ) 0

(39)

Rearranging the above equation, the solution is exactly the same as eq 31 via the iteration method. As a result, the three reconstruction methods give identical reconstruction results. 4. Identifying the Faulty Sensor

T

ci ) i ) [0 0 ‚‚‚ 1 ‚‚‚ 0]

(32)

is the ith column of the identity matrix. This is the case where xi is not correlated to other sensors but itself. In this case the sensor cannot be reconstructed from others and a univariate SPC approach may be applied. 3.2. A Missing Value Approach. If we treat an identified faulty sensor as having a missing value, the missing data replacement approaches available in the literature can be used to reconstruct the faulty sensor.

This section concentrates on the identification of a single faulty sensor, but the algorithm can be serially implemented to identify multiple faults that do not occur simultaneously. Nevertheless, it is unusual to have simultaneous sensor faults unless, for example, a power failure, which is usually taken care of by redundant power supplies. This fact helps isolate a sensor problem from an abnormal process operating condition where a group of variables break the normal correlation established by the principal component model.

1680 Ind. Eng. Chem. Res., Vol. 36, No. 5, 1997

Figure 4. Reconstruction of a faulty sensor via an iteration approach.

When a sensor fault has occurred, the sample vector can be represented as follows:

x ) x* + fi

(40)

where x* denotes the portion of normal data, i is the direction vector of unit length for the faulty sensor, and f is the magnitude of the fault which can be positive or negative. All possible sensor faults are denoted by the set {j; j ) 1, 2, ..., m}. To identify the true fault among all possible faults, we propose an identification approach by reconstructing x* from x among all possible fault directions, j. For each assumed fault, a fault magnitude is estimated by reconstruction from other sensors using the reconstruction methods discussed in the previous section. If the true sensor is assumed, the largest reduction in SPE is expected, as all the reconstruction methods try to minimize SPE. To illustrate this method, we denote the reconstructed sample as

xTj ) [xT-j zj xT+j] ) xT + (zj - xj)Tj

(41)

The reconstructed vector can be projected onto the model and residual subspaces and a similar SPE can be calculated as

SPE(xj) ) ||x˜ j||2 ) xTj (I - C)xj

(42)

Minimizing the above equation with respect to zj gives

xTj (I - C)j ) 0

(43)

The reconstructed value is solved by substituting eq 41 into the above equation:

zj ) xj -

xT(I - C)j Tj (I - C)j

(44)

which is an alternative expression for eq 31. If the ith sensor is faulty and j * i, there will be not much decrease in SPE(xj). However, if the true sensor is

chosen, a large reduction in SPE(xj) is expected. Furthermore, SPE will increase significantly due to the fault. Therefore, the ratio of SPE(xj) and SPE is more sensitive to the fault and is defined as the sensor validity index (SVI),

ηj )

SPE(xj) SPE(x)

(45)

Note that 0 e ηj e 1 since SPE(xj) is the minimized SPE. A validity index close to 1 indicates that the sensor variations follow the variations experienced by the remaining sensors. When the sensor is faulty, ηi is close to zero. Dunia et al. (1996) give a detailed discussion of the validity index. It should be noted that oscillations typically exhibit in the validity index when no sensor fault is present in the system. These oscillations are due to the tendency that the ratio always penalizes the variables with the largest error. The use of a filter for the validity index is necessary to eliminate oscillations and possible erroneous identification (Dunia et al., 1996). A contribution plot approach based on error has been recently used in the literature (MacGregor et al., 1994; Tong and Crowe, 1995). This approach compares the contribution of each variable in the SPE when a fault has been detected. After a faulty sensor has been identified, the reconstructed value for that sensor may be used to replace the faulty sensor. However, this is not always a good replacement as we will show in the next section that the reconstructed value could be worse than the mean of that sensor measurement. This is the situation where the sensor is not correlated with others. The mean of the variable should be used as the reconstruction which actually reduces to a univariate SPC approach. 5. Determining the Number of Principal Components In general practice of PCA, the number of PC’s retained in the model is a rather subjective matter. However, in applying PCA for sensor validation as

Ind. Eng. Chem. Res., Vol. 36, No. 5, 1997 1681

described in the previous sections, the number of PC’s has significant impact on each step of the sensor validation procedure: (1) The number of PC’s affects the confidence limit of the Q-statistic and hence affects the ability to detect small faults. If a fewer number of PC’s are used, the confidence limit for SPE is larger. This will compromise the ability to detect small faults. If too many PC’s are used, some faults can stay in the PC subspace and do not show any effects in the residual subspace. This will make it impossible to detect these faults. (2) The number of PC’s affects how many degrees of freedom there are for fault identification. If more PC’s are used, it will reduce the number of degrees of freedom. If a fewer number of PC’s are used, it is unable to identify small to moderate faults. (3) The number of PC’s affects the accuracy of reconstruction. If too many PC’s are used, each variable tends to rely too much on its own, i.e., cii f 1, which means its relation to other variables is weakened. If too few PC’s are used, the model is inadequate to represent all the normal variations of the data and thus results in poor reconstruction. Therefore, it is important to determine the number of PC’s that will provide an optimal tradeoff. Since the identification procedure relies on the reconstruction algorithm, we propose to determine the number of PC’s that will generate the best reconstruction. In other words, under normal conditions, the reconstructed value zj should be as close as possible to the data xj, i.e.,

uj ≡ var{xj - zj} ) E{(xj - zj)2}

(46)

should be minimized. uj is defined as the unreconstructed variance. To choose l that will best reconstruct each sensor from others, we need to m

min l

qjE{(xj - zj)2} ∑ j)1

(47)

where qj is an assignable weight for the jth sensor. If the weights are chosen equal, the reconstruction errors tend to be equalized. From eq 44 we obtain

uj )

Tj (I - C)R(I - C)j [Tj (I - C)j]2

(48)

where R ) E{xxT} is the covariance matrix of the normal data. This covariance matrix can be approximately calculated from the data,

R≈

1 XTX n-1

(49)

Therefore, one can calculate the reconstructed errors versus the number of PC’s retained in the model and choose the one that produces the minimum unreconstructed variance. It should be noted that it is possible for the unreconstructed variance to be larger than the variance of the data, i.e.,

var(xj - zj) g var(xj - xjj) ) var(xj)

(50)

In this case, the best reconstruction is the mean, xjj, instead of zj from the model. This is the case where variable xj is little correlated with others, and hence the

Table 1. Unreconstructed Variance versus the Number of Principal Components no. of PC’s

u1

u2

u3

u4

∑ui

1 2 3 4

1.067 1.088 1.094 ∞

1.067 1.657 7.900 ∞

1.067 11.88 31.48 ∞

1.067 1.119 5.619 ∞

4.268 15.74 46.09 ∞

best reconstruction is its own mean. The following example illustrates when a variable is best replaced with its mean. Example 2. Consider four process variables with the following correlation matrix:

[

1.0 0.2 R) 0.2 0.2

0.2 1.0 0.2 0.2

0.2 0.2 1.0 0.2

0.2 0.2 0.2 1.0

]

(51)

Each variable has a variance of 1. Therefore, if a sensor is faulty and is replaced with its mean, the variance of the (reconstruction) error is 1. If the unreconstructed variance is larger than 1 from the best PCA model, then the best reconstruction is the mean. Since the eigenvectors of the the correlation matrix formulates the P matrix, we obtain

[

0.5 0.5 P) 0.5 0.5

0.1213 0.5170 -0.8262 0.1879

0.0658 -0.6176 -0.2047 0.7565

0.8550 -0.3182 -0.1594 -0.3773

]

(52)

The corresponding eigenvalues are 1.6, 0.8, 0.8, and 0.8. Table 1 lists the unreconstructed variance versus the number of principal components according to eq 48. From this table we observe that the best reconstruction is obtained by using one principal component, which gives the unreconstructed variance of 1.067. However, even this smallest unreconstructed variance is larger than the variance of the variables. Therefore, in this example, the mean of each sensor is the best reconstruction if one of them fails. It should be noted that the number of PC’s determined based on the unreconstructed variance gives the best reconstruction for a PCA model. However, the number of PC’s used in the prediction model for the inferential sensors can be different. The number of PC’s for best prediction should be determined by cross validation and will be discussed in the next section. 6. Prediction Models In the previous sections we presented the sensor validation to guarantee the input sensors are valid. The next issue is to build a prediction model to infer the output variables. There are two fundamental issues at this stage: (i) whether to use a linear regression or a nonlinear regression model and (ii) whether to use the validated sensor data (x) or to use the validated principal components (t). The answer to the former question depends on the nonlinearity of the process data. If the data exhibit severe nonlinearity, a nonlinear model, such as neural networks, should be used; otherwise, a linear regression is sufficient. To answer the second question, we need to examine the collinearity of the data. If the input data are highly collinear, which is typical in chemical processes, using the original data for linear regression will result in an ill-conditioned problem. In this case, the principal components are preferred since they are orthogonal. The resulting

1682 Ind. Eng. Chem. Res., Vol. 36, No. 5, 1997

architecture is actually principal component regression (PCR) with sensor validation. If a neural network is used as the prediction model, the effect of input collinearity is less obvious. Since neural networks usually use gradient-based algorithms, such as backpropagation to build the model, there is no ill-conditioning involved. However, such gradient-based algorithms will result in large prediction variance if the inputs are collinear. We use the following example to illustrate this point and conclude that it is better to use the principal components than the original variables for prediction regardless of the algorithms. Example 3. To illustrate how collinearity affects prediction variance, the example in MacGregor et al. (1991a) is used here. This example considers an idealized process with one output variable and five input variables that are exactly collinear. The real process relation is assumed to be

y ) 1.0x2 + e

(53)

The objective is to build a linear model of the five input variables and the output variable. Since the input variables are exactly collinear, it is obvious that an ordinary least-squares approach yields an ill-conditioned problem. When a linear network model without hidden layers is built, three different models result from different initial conditions

yˆ NN1 ) 0.63x1 + 0.36x2 + 0.09x3 + 0.22x4 - 0.30x5 yˆ NN2 ) -0.43x1 - 0.05x2 + 0.92x3 - 0.02x4 + 0.58x5 yˆ NN3 ) 0.23x1 + 0.35x2 - 0.26x3 - 0.22x4 + 0.91x5 These three models are adequate as long as their coefficients sum up as 1.0. When the PCR method is used, one principal component gives the following model:

yˆ PCR ) 0.2x1 + 0.2x2 + 0.2x3 + 0.2x4 + 0.2x5

(54)

Consider that new data for the five inputs have independent, identically distributed measurement noise with zero mean and variance σ2, the prediction variance of the three neural network models and the PCR model can be calculated as follows:

var(yˆ NN1) ) 0.67σ2 var(yˆ NN2) ) 1.37σ2 var(yˆ NN3) ) 1.12σ2 var(yˆ PCR) ) 0.20σ2 One can see that all the neural network models result in much larger prediction variances than the PCR model. Although the first neural net model reduces the variance, the other two models actually enlarge the variance. This demonstrates that backpropagation is sensitive to collinearity and results in a large prediction variance. The preferred solution is to build the prediction using the principal components, which are orthogonal. If the neural network is used as the prediction model, the resulting model is referred to as neural net PCR (NNPCR) with sensor validation (Qin, 1996). No matter if a PCR model or an NNPCR model is used, a remaining issue is to determine the number of

Table 2. Process Variables Used To Infer the NOx Emission Level variable name air flow fuel flow steam flow economizer temp stack pressure windbox pressure feedwater flow

sensor no. minimum maximum 1 2 3 4 5 6 7

215.88 10.480 150.98 622.66 2.020 2.690 172.97

415.74 20.130 300.62 737.81 10.790 11.300 308.18

mean

unit

341.98 16.430 244.12 699.53 7.105 7.520 253.58

kpb/h % kpb/h °F H2O H2O kpb/h

principal components for best prediction. We have proposed a method to determine the number of PC’s for best reconstruction, but the number of PC’s determined for best reconstruction may not give the best prediction. In this case, we use the cross-validation approach to determining the number of PC’s for best prediction. Note that the number of PC’s for best prediction and that for best reconstruction may be different. However, this does not bring much complexity since the entire PCA model (P) can be calculated with a singular value decomposition algorithm. Using different numbers of PC’s just means choosing different numbers of columns in P. 7. Application to Boiler Emission Monitoring Industrial emission sources such as boiler furnaces are required to be equipped with continuous monitoring systems (EPA, 1991). Traditionally, this requirement is complied with hardware analyzers, known as CEMS (Mandel, 1996). A hardware CEMS is typically used in the following manner: (i) the probe is installed near the top of the stack to take samples; (ii) the samples are transported to the analyzing equipment sited on the ground with a heated cable to maintain the same temperature as they were sampled; and (iii) the CEMS need to be calibrated a few times a day with inert gases. Due to these requirements, the hardware CEMS are usually high-cost and prone to malfunctions. With the help of inferential sensors, industry is able to build software CEMS with inferential models such as neural networks (Keeler and Ferguson, 1996). Among the EPA requirements for CEMS, three of them are important: (i) EPA allows one to use software means for air emission monitoring; (ii) the inferential model has to have a correlation coefficient better than a specified value, say 80%; and (iii) the CEMS have to be on-line for at least 95% of the time. The last requirement mandates a sensor validation mechanism in the inferential sensors. In this section, we apply the selfvalidating inferential sensor approach to (i) detect sensor faults, (ii) identify and reconstruct faulty sensors, (iii) build inferential models, and (iv) detect extrapolations. Four hundred data points were collected from an industrial boiler process at a 5-min sampling interval. The data were collected during a period of significant change in the boiler throughput so as to cover a wide range of the process behavior. The variable to be predicted is NOx content sampled from the boiler stack. Eight input variables are considered to have influence on the NOx emission level, which are listed in Table 2. In the same table the variables’ minimum, maximum, and mean values are also listed. To test the SVIS approach described in this paper, 250 data points are used for building the PCA and prediction model and 150 points are used for testing various cases of sensor faults,

Ind. Eng. Chem. Res., Vol. 36, No. 5, 1997 1683

Figure 6. Total unreconstructed variance versus the number of PC’s kept in the model. The optimal number of PC’s is 2 for best reconstruction.

Figure 5. Cross-validation result for the best number of principal components to use in the prediction model.

fault identification and reconstruction, prediction accuracy, and extrapolation. 7.1. Model Prediction: PCR vs NNPCR. In this experiment, we compare the use of different prediction models to estimate NOx emission for the boiler process. A PCR model is first derived, which builds a linear regression model based on the principal components. As discussed earlier in the paper, a cross-validation is conducted to determine the number of principal components for best prediction. Figure 5 shows the predicted error sum of squares (PRESS) vs the number of principal components used in the PCR model. It is observed that two PC’s give the best PRESS error. Therefore, we will retain two principal components in the prediction model. An NNPCR model is then built which uses the principal components as network input. The neural network is a typical feedforward network with one sigmoidal hidden layer. The output neurons do not have sigmoidal functions. A conjugate gradient training algorithm is used which is much faster than the regular backpropagation. To determine the best number of hidden neurons, a train/test scheme is used which chooses the number of hidden neurons corresponding to the best test error. As a result, it is determined that two hidden neurons give the best error. The resulting mean-squared test errors for the PCR and NNPCR models are 0.271 and 0.254, respectively. It is seen that marginal improvement in model accuracy is obtained using a neural network model. Therefore, we may conclude that the process nonlinearity in the operating region is not severe. 7.2. Validation of a Single Sensor Fault. The first step to sensor validation is to determine the number of principal components for best reconstruction. We calculate the unreconstructed variance by reconstructing each sensor from others. The total unreconstructed variance is shown in Figure 6. From the figure we determine that the optimal number of principal components is two. Therefore, we will use two PC’s for sensor reconstruction. Based on the obtained PCA model, the indices for detecting extrapolation, sensor faults, and faulty sensor identification can be calculated on-line. Figure 7 shows the three indices for the test data set which has 150 samples. In this case no fault was added to the data.

Figure 7. Indices for extrapolation, fault detection, and fault sensor identification for the test data set: (a) the T2 test for extrapolation; (b) the SPE test for fault detection; and (c) the SVI for the first sensor.

Therefore, the SPE is within the control limit. Consequently, the SVI in the figure is close to 1. The T2 index, which is an indication of extrapolation, is also within the control limit. This result indicates that the test data are within the region of the training data and involve no extrapolation. Had this index violated the control limit, the model prediction would be extrapolating and not reliable. To test the sensor validation method, a bias fault is introduced for the air flow sensor, as depicted in Figure 8a with a dotted line. The SPE in Figure 8b almost immediately detects the fault. To identify which sensor is faulty, we examine the SVI for all sensors. Figure 8c shows that the SVI for sensor 1 (air flow) drops below the control limit after a few samples, which indicates the air flow sensor is faulty. After the faulty sensor is identified, its values are replaced with the reconstructed values, which are shown as a dashed line in Figure 8a. In this plot, the normal values for this sensor are drawn as a solid line, which shows the sensor is accurately reconstructed. The prediction model will use the reconstructed values as inputs. After replacing the faulty values with the reconstructed values, the SPE and SVI with reconstructed values are shown as solid lines in parts b and c of Figure 8, both of which return to the normal ranges. The sensor validation will continue to check the validity of the sensor after it is detected faulty. If the sensor returns to normal after a period of time, the

1684 Ind. Eng. Chem. Res., Vol. 36, No. 5, 1997

Figure 10. Detection and identification of two sequentially occurring sensor faults. Figure 8. Detection, identification, and reconstruction of a sensor fault in the air flow.

Figure 11. When the two faulty sensors are not used in reconstruction, the SVI’s for both sensors indicate them faulty.

Figure 9. SPE and SVI for a sensor fault that occurs and disappears after a period of time. The SPE and SVI are able to detect the disappearance of a fault.

sensor data should be used instead of the reconstructed data. Figure 9 depicts the situation where a sensor fault exists for a period of time and then disappears. The SPE and SVI detect and identify the sensor fault, and their values based on reconstruction return to normal. In this case, the SPE and SVI for the actual sensor data are still monitored and plotted as dotted lines in the figure. At about the middle of the figure the sensor returns to normal, which is detected by the SPE and SVI shown as dotted lines. As a consequence, the actual sensor data will be used for model prediction instead of the reconstructed values. 7.3. Prediction Accuracy after a Sensor Fault. Under the situation that sensor 1 is faulty as indicated in Figure 8a, we compare the degradation of the prediction accuracy of NOx using a PCR model. The PCR model performs PCA on the inputs and then uses the principal components to predict the NOx of the boiler process. When sensor 1 is normal, the meansquared error for prediction is 0.271. After sensor 1 is faulty (with a bias), the mean-squared error increases to 0.285. After the sensor 1 is reconstructed, the meansquared error for prediction drops to 0.274, which is very

close to the case of no sensor faults. This experiment demonstrates the effectiveness of the SVI’s in the presence of sensor faults. 7.4. Sequentially Occurring Multiple Sensor Faults. Although it is rare to have several sensors fail exactly at the same time, it is likely to have multiple sensor faults sequentially. In this section, we demonstrate how the sensor validation is used to identify and reconstruct multiple sensor faults. We introduce two independent faults in sensor 1 and sensor 2 subsequently. Figure 10 shows the SPE and SVI for two sensor faults occurring sequentially. At the beginning of the plots sensor 1 is faulty and is detected and identified with the SPE and SVI. At this point, sensor 1 is reconstructed from the remaining sensors including sensor 2. When sensor 2 is faulty after a period of time, the SPE is still indicating abnormal, but the SVIs for sensors 1 and 2 are both in the normal range. The reason is that the reconstruction of sensor 1 uses sensor 2, which is faulty, and vice versa. To eliminate this problem, we reconstruct sensor 1 with all other sensors after the sensor is identified as faulty. Then when other sensors are reconstructed for identification, sensor 1 is not used. If additional sensors are identified as faulty, they will not be used in reconstructing other sensors. The result of this modified procedure is depicted in Figure 11. In this figure, the SVI for sensor 1 identifies that sensor 1 is faulty at the 14th sample. After this point sensor 1 will be reconstructed for model prediction. However, all other sensors will be reconstructed to identify further faults without using sensor 1. It turns out that at the 74th sample sensor 2 is identified as faulty. From this point, both sensor 1 and sensor 2 are reconstructed without using sensors 1 and 2, which makes the SVIs for both sensors, indicating that they are faulty.

Ind. Eng. Chem. Res., Vol. 36, No. 5, 1997 1685

8. Conclusions The self-validating inferential sensor approach is proposed which integrates sensor fault identification, reconstruction, and detection of potential extrapolation in the inferential models. The integrated framework is successfully applied to a boiler process to predict NOx emissions. A formula for calculating the confidence limit for the filtered SPE is derived and shown to be effective in reducing false alarms. Identification and reconstruction of single sensor faults and multiple sequential faults are tested effectively on the boiler process. The extrapolation test on the PCA score space defines a tighter region than a univariate test. With the sensor validation and extrapolation test integrated in the inferential sensors, this resulting prediction framework may be used to produce more frequent estimation than laboratory tests, generate redundancy for analytical sensors that are prone to malfunctions, and analyze process sensitivity for process control. Due to the nature that the prediction is not reliable when extrapolation occurs, it is recommended that the inferential sensors be used with backup analyzers. The economic benefit would be that one analyzer can back up several on-line inferential sensors and the need for analyzer maintenance is largely reduced. Acknowledgment R.D. gratefully acknowledges the financial support from Fisher-Rosemount Systems, Inc. Literature Cited Dong, D.; McAvoy, T. J.; Chang, L. J. Emission monitoring using multivariable soft sensors. Proc. Am. Control Conf. 1995, 761765. Dunia, R.; Qin, J.; Edgar, T. F.; McAvoy, T. J. Identification of faulty sensors using principal component analysis. AIChE J. 1996, 42, 2797-2812. EPA, 40CFR Part-75scontinuous emission monitoring. Federal Register, 1991. Hotelling, H. Multivariate quality control. In Techniques of Statistical Analysis; Eisenhart, Co., Hastay, M., Wallis, W., Eds.; McGraw-Hill: New York, 1991; pp 111-184. Jackson, J. E.; Mudholkar, G. Control procedures for residuals associated with principal component analysis. Technometrics 1979, 21, 341-349. Joseph, B.; Brosilow, C. Inferential control of process. AIChE J. 1978, 24, 485-509. Jutan, A.; MacGregor, J.; Wright, J. Multivariable computer control of a butane hydrogenlysis reactor, part IIsdata collec-

tion, parameter estimation, and stochastic disturbance identification. AIChE J. 1977, 23, 742-750. Keeler, J.; Ferguson, B. Commercial applications of soft sensors: the virtual on-line analyzer and the software cem. Proc. IFPAC Conf. 1996. Lee, J. H.; Morari, M. Robust inferential control of multi-rate sampled data systems. Chem. Eng. Sci. 1992, 47, 865-885. Lee, J. H.; Gelormino, M.; Morari, M. Model predictive control of multi-rate sampled data systems. Int. J. Control 1992, 55, 153191. MacGregor, J.; Marlin, T.; Kresta, J. Some comments on neural networks and other empirical modeling methods. In Chemical Process ControlsCPC IV; Arkun, Y., Ray, W. H., Eds.; Elsevier: Amsterdam, The Netherlands, 1991a; pp 665-672. MacGregor, J.; Marlin, T.; Kresta, J.; Skagerberg, B. Multivariate statistical methods in process analysis and control. In Chemical Process ControlsCPC IV; Arkun, Y., Ray, W. H., Eds.; Elsevier: Amsterdam, The Netherlands, 1991b; pp 79-100. MacGregor, J. F.; Jaeckle, C.; Kiparissides, C.; Koutodi, M. Process monitoring and diagnosis by multiblock PLS methods. AIChE J. 1994, 40, 826-828. Mandel, S. Continuous emission monitoring systems: an overview. Control Eng. 1996, April, 47-48. Martin, H.; Naes, T. Multivariate Calibration; John Wiley and Sons: New York, 1989. Mejdell, T.; Skogestad, S. Estimation of distillation compositions from multiple temperature measurements using partial least squares regression. Ind. Eng. Chem. Res. 1991, 30, 2543-2555. Piovoso, M.; Owens, A. J. Sensor data analysis using artificial neural networks. In Chemical Process ControlsCPC IV; Arkun, Y., Ray, W. H., Eds.; Elsevier: Amsterdam, The Netherlands, 1991; pp 101-118. Qin, S. J. Neural networks for intelligent sensors and controlspractical issues and some solutions. In Neural Systems for Control; Elliott, D., Ed.; Academic Press: New York, 1996; Chapter 8. Qin, S. J.; McAvoy, T. Nonlinear PLS modeling using neural networks. Comput. Chem. Eng. 1992, 16, 379-391. Tong, H.; Crowe, C. M. Detection of gross errors in data reconciliation by principal component analysis. AIChE J. 1995, 41, 1712-1722. Wise, B. M.; Ricker, N. L. Recent advances in multivariate statistical process control: Improving robustness and sensitivity. In Proc. IFAC. ADCHEM Symp. 1991, 125-130.

Received for review October 3, 1996 Revised manuscript received January 3, 1997 Accepted January 5, 1997X IE960615Y

X Abstract published in Advance ACS Abstracts, February 15, 1997.