458
Ind. Eng. Chem. Res. 2004, 43, 458-474
Fault Detection Using Canonical Variate Analysis Ben C. Juricek,† Dale E. Seborg,*,‡ and Wallace E. Larimore§ Toyon Research Corporation, 75 Aero Camino, Suite A, Goleta, California 93106, Department of Chemical Engineering, University of California, Santa Barbara, California 93106, and Adaptics, Inc., 1717 Briar Ridge Road, McLean, Virginia 22101
The system identification method canonical variate analysis (CVA) has attracted much attention from researchers for its ability to identify multivariable state-space models using experimental data. A model identified using CVA can use several methods for fault detection. Two standard methods are investigated in this paper: the first is based on Kalman filter residuals for the CVA model, the second on canonical variable residuals. In addition, a third method is proposed that uses the local approach for detecting changes in the canonical variable coefficients. The detection methods are evaluated using three simulation examples; the examples consider the effects of feedback control; process nonlinearities; and multivariable, serially correlated data. The simulations consider several types of common process faults, including sensor faults, load disturbances, and process changes. The simulation results indicate that the local approach provides a very sensitive method for detecting process changes that are difficult to detect using either the Kalman filter or canonical variable residuals. 1. Fault Detection Using Canonical Variates Fault detection is a very active field for research and applications in both academia and industry. From an industrial perspective, undetected faults can result in lost revenue due to poor product quality, equipment failures, and so on. Consequently, process control researchers have developed several fault detection methods that are suitable for real-time data from industrial processes.1 Such data are typically multivariable, serially correlated, and corrupted by noisy measurements. The subspace method canonical variate analysis (CVA) can identify accurate state-space models for processes with large numbers of inputs and outputs and large state dimensions. One of the key steps for subspace models is a weighted singular value decomposition, and the weighting used for CVA is particularly noteworthy. The CVA weighting comes from the multivariable statistical method, canonical correlation analysis (CCA). Canonical correlation analysis is a maximum likelihood (ML) procedure for describing the correlation between two data sets as pairs of linear combinations, called canonical variables. For this research, process faults such as sensor faults, process disturbances, and gradual changes in process equipment are detected using a state-space model identified by CVA. A novel method is proposed that is based on the local statistic for canonical variables. The proposed method is compared with two existing fault detection methods that also use a CVA state-space model. The major goals for this research are (1) to investigate the utility of the CVA-based methods for detecting faults that are common in industrial processes and (2) to propose a novel method using the local statistic. This method is specifically derived to detect a particular type of fault, the so-called parametric changes. * To whom correspondence should be addressed. Tel.: 805 893 4731. Fax: 805 893 2252. E-mail: seborg@ engineering.ucsb.edu. † Toyon Research Corporation. ‡ University of California, Santa Barbara. § Adaptics, Inc.
2. Fault Detection Using CVA State-Space Models In this section, three fault detection methods are considered. The first detection method uses the Kalman filter residuals of the identified state-space model. The second detection method uses canonical variable residuals, which are the product of the canonical correlation analysis (CCA) from the CVA identification procedure. Finally, a new monitoring statistic is proposed that is based on the local statistic, a statistic specifically designed to detect parametric changes. Before these methods are discussed, section 2.1 describes the T 2 statistic control charts that are used to implement each monitoring method. 2.1. Control Charts for T 2 Statistics. Quality control charts are widely used in the process industries to distinguish between normal and abnormal process operation. Two commonly used control charts are the Shewhart and the cumulative sum (CUSUM) control charts,2 which are traditionally used to monitor univariate measurements. When more than one measurement is being monitored, there is considerable motivation to use multivariable control charts rather than several univariate control charts. Several researchers have shown that multivariable control charts can detect smaller faults with greater precision and summarize the data more clearly than multiple univariate control charts.3-6 Most multivariable control charts are based on a T 2 statistic7
T 2(k) ) [x(k) - xj]TS-1[x(k) - xj]
(1)
where x(k) is the vector of variables at sample k that is normally distributed with mean µ and covariance matrix ∑ for k ) 1, ..., N. Usually, x is the vector of measurements to be monitored, but x might also be a linear combination of the measurements; most multivariable statistical methods (e.g., principal component analysis and CCA7) make use of the T 2 statistic. Given N
10.1021/ie0301684 CCC: $27.50 © 2004 American Chemical Society Published on Web 12/19/2003
Ind. Eng. Chem. Res., Vol. 43, No. 2, 2004 459
samples, an estimate of the mean µ is the sample mean, xj, given by
xj )
s(k) )
N
1
∑
Nk)1
x(k)
(2)
An estimate for Σ is the sample covariance matrix, S, which is given by
S)
1
∑
[x(k) - xj][x(k) - xj]T
(3)
Multivariable versions of the Shewhart and CUSUM charts3 based on the T 2 statistic can be used to detect faults. For the multivariable Shewhart chart, the T 2 statistic is plotted for each sample and compared with an upper limit. The upper limit TR2 can be calculated by specifying the type I error probability, R, of the current T 2 value exceeding the upper limit TR2. Assuming that x is distributed as a multivariable, Gaussian random variable, the upper limit is given as7
TR2 )
{
if κ(k) e γ 0 [s(k-1) + (k)][1 - γ/κ(k)] if κ(k) > γ
p(N - 1) F (R) N - p p,N-p
(4)
where N is the number of samples used to calculate S, p is the dimension of x, and Fp,N-p(R) is the F-distribution critical value for p and N - p degrees of freedom with significance level R. An alarm is signaled when the limit is exceeded more frequently than would be expected for a given value of R. The Western Electric rules,8 a statistically based set of heuristics, can also be used for interpreting the Shewhart chart. The T 2 chart is useful for quickly detecting abrupt, large changes, but the CUSUM control chart is better suited for detecting small or slowly developing changes.3,5 The univariate CUSUM chart plots the cumulative sum of past observations
(7)
Initially, at k ) 0, s(0) ) 0. The threshold for the minimum size change to be detected is denoted by γ. The MCUSUM chart signals an alarm when
∑-1s(k) > h
sT(k)
N
N - 1k)1
where
(8)
where h is the alarm limit. The MCUSUM chart requires the specification of two parameters: the threshold, γ, and the alarm limit, h. There is a fundamental tradeoff between rapid detection and the number of false alarms: smaller thresholds and alarm limits detect smaller changes faster but increase the false alarm rate. For CUSUM charts, this sensitivity is normally characterized by the average run length (ARL), the expected number of samples between alarms during normal operation. Harris and Ross11 reviewed and evaluated several empirical and theoretical methods that relate CUSUM chart parameters to the ARL. 2.2. Process Monitoring with Kalman Filter Residuals. A traditional method for monitoring processes using state-space models is based on the statistical properties of the Kalman filter residuals.12 The theory for detecting changes using Kalman filter residuals is well understood and has been thoroughly reviewed by Basseville and Nikiforov.13 Consider a stochastic state-space model given by
x(k+1) ) Ax(k) + Bu(k) + w(k) y(k) ) Cx(k) + v(k)
(9)
where w and v are uncorrelated, normally distributed white noise processes with covariance matrices Q and R, respectively, and S is the covariance between w and v. The Kalman filter can be written as
xˆ (k+1|k) ) Axˆ (k|k) + Bu(k)
k
s(k) )
[x(i) - µ] ∑ i)0
(5)
where µ is the population mean of x. Different methods exist for plotting and interpreting CUSUM charts,8 and the theory for CUSUM charts can be complex.9 However, the key benefit of CUSUM charts is that small, persistent changes can be detected quickly (relative to a Shewhart chart) because the cumulative effect of a small, persistent change can be large. Using the central limit theorem and the assumed independence of x(i), the sum s(k) is approximately normally distributed even if x is from a nonnormal distribution. Hence, CUSUM charts can be very useful for handling nonnormal distributions or unknown distributions when independence can be assumed. Several extensions to the univariate CUSUM chart in eq 5 have been proposed for monitoring a T 2 statistic. One such method is the multivariable CUSUM chart proposed by Crosier;10 this chart is denoted as MCUSUM. Suppose the statistic to be monitored, (k), is a normally distributed random vector with covariance ∑. The MCUSUM algorithm is given by
∑-1[s(k-1) + (k)]}1/2
κ(k) ) {[s(k-1) + (k)]T
(6)
xˆ (k|k) ) xˆ (k|k-1) + KKF(k) KF(k) ) y(k) - Cxˆ (k|k-1)
(10)
where K denotes the Kalman gain.13 The one-step-ahead prediction of the state vector is denoted by xˆ (k+1|k): it is the prediction of x at sample k + 1 using all of the available measurements at sample k. The “filtered” model prediction is denoted by xˆ (k|k). It is the previous one-step-ahead prediction modified by the current measurement. Theoretically, the Kalman filter residual, KF(k), is independently and normally distributed with covariance matrix, ∑KF
∑KF ) CPCT + R
(11)
where P is the covariance matrix of the state estimation error
P ) E{[x(k) - xˆ (k)][x(k) - xˆ (k)]T}
(12)
P can be calculated analytically from a Riccati equation
P ) A(P + P
∑-1P)AT + Q
(13)
460
Ind. Eng. Chem. Res., Vol. 43, No. 2, 2004
A T 2 statistic based on the Kalman filter residuals can be used to detect faults
∑KF-1KF(k)
TKF2(k) ) KFT(k)
(14)
This statistic is often reported as a χ2 statistic,12 which supposes that ∑KF is known exactly rather than estimated as is the case in most process monitoring applications. The T 2 and the MCUSUM control charts can be used to monitor the Kalman filter residuals. The critical limit for the TKF2 chart can be calculated using eq 4
TKF,a2 )
pN F (R) N - p + 1 p,N-p+1
(15)
where N is the number of samples used to identify the state-space model and p is the dimension of KF. 2.3. Process Monitoring Using TCV2. Subspace methods identify a state-space model by first estimating the vector of state variables, x(k), and then estimating the state-space matrices. For CVA, x(k) is calculated by performing a canonical correlation analysis (CCA) between the past, p(k), and the “conditional future”, ˜f(k), which are given by
p(k) ) [uT(k) ‚‚‚ uT(k-Nl) yT(k) ‚‚‚ yT(k-Nl)] T
T
T
T
f(k) ) [y (k+1) ‚‚‚ y (k+Nl)]
(16) (17)
q(k) ) [u (k+1) ‚‚‚ u (k+Nl)]
(18)
˜f(k) ) f(k) - Mq(k)
(19)
The number of lags used is denoted by Nl, and M is a block Toeplitz matrix of the impulse response coefficient matrices given by
[
D CB M) l CAN-1B
0 D l CAN-2B
... 0 l ...
... ... l D
]
(20)
Note that the conditional future removes the effects of future inputs. The CCA between p(k) and ˜f(k) is given by the generalized singular value decomposition (GSVD) satisfying the equations26
J
∑p,pJT ) I, L∑f,fLT ) I, J∑p,fLT ) Ω
(21)
where ∑p,p, ∑f,f, and ∑p,f denote the covariance matrices for p and f and the cross-covariance matrix between p and f, respectively. The transformations J and L of past, p(k), and future, ˜f(k), respectively, produce two corresponding sets of canonical variables
c(k) ) Jp(k)
(22)
d(k) ) Lf˜(k)
(23)
The covariance matrix between c and d is ∑c,d ) Ω, a diagonal matrix of the canonical correlation coefficients in descending order. Because of noise in the measurements, the sample covariance matrices used in the GSVD eq 21 are typically full rank. Thus, the dimension of c(k) and d(k) is typically the minimum of the dimensions of p(k) and ˜f(k).
By construction, one canonical variable, d(k) ) Lf˜(k), is most correlated with the corresponding canonical variable, c(k) ) Jp(k). Because ∑d,c ) Ω and ∑c,c ) I, the best predictor of d(k) is Ωc(k). The canonical variate residual, CV(k), is the difference between d(k) and Ωc(k)
CV(k) ) d(k) - Ωc(k) ) Lf˜(k) - ΩJp(k)
(24)
Assuming that p and ˜f are normally distributed, the canonical variate residual should also have a multivariable normal distribution with covariance matrix given by ∑CV,CV ) I - Ω. Thus, a T 2 statistic can be used to monitor for large changes in CV
TCV2(k) ) CVT(k)(I - Ω)-1CV(k)
(25)
Because CVA is a near-ML procedure, ML-based statistical selection methods, e.g., likelihood ratio tests or the Akaike information criterion (AIC),14 can be used to determine the state order, nCV; eqivalently, this selection can be regarded as the number of statistically significant canonical variables. The selected system state for the model is given by
x(k) ) J1:nCVp(k)
(26)
where the subscripts on J denote the range of rows of J included, so that x(k) consists of the first nCV past canonical variable, i.e., elements of c(k). An orthogonal subspace is then defined by the remaining Nl - nCV rows in J
xo(k) ) JnCV+1:Nlp(k)
(27)
Recall that Nl denotes the number of lagged variables in eqs 17-19. Larimore15 proposed several approaches to the use of the TCV2 statistic that involved partitioning the residuals into the first nCV and the last Nl - nCV terms and denoting the corresponding T 2 statistics based on them as TCV12 and TCV22, respectively. The TCV12 statistic monitors the process behavior using the subspace of the state-space model defined in eq 26, and TCV22 monitors the behavior orthogonal to this subspace defined in eq 27. In principle, TCV12 can be used to monitor for abnormal behavior based on the usual correlation between c and d, and TCV22 can be used to detect a departure from the expected state-space subspace. In practice, nCV is typically much smaller than Nl, and considerable noise is eliminated by using TCV12. The critical value for either TCV2 statistic can be calculated using eq 4, where N is the number of samples used to identify the CVA model and p is the number of canonical variables used, which for TCV12 is nCV and for TCV22 is Nl - nCV. Negiz and Cinar16 used T 2 to detect sensor faults for a pasteurization process and compared this approach with dynamic principal component analysis. Russell et al.17 used TCV22 to monitor the Tennessee Eastman Challenge Process. In this paper, TCV12 is used and is denoted as TCV2. 2.4. Local Statistic. In fault detection, faults are often classified into two categories: additive faults and multiplicative faults.13,18 The categories describe how
Ind. Eng. Chem. Res., Vol. 43, No. 2, 2004 461
these faults are modeled. An additive fault is modeled by adding terms, such as Γfl(k) or Θfs(k), to the statespace model
x(k+1) ) Ax(k) + Bu(k) + Γfl(k)+ Ke(k) y(k)) Cx(k) + Θfs(k) + e(k)
(28)
Load or process disturbances are modeled using fl(k); sensor faults or other disturbances occurring at the process output are modeled using fs(k). Multiplicative faults correspond to changing the state-space model matrices, in particular the transition matrix
x(k+1) ) (A + ∆A)x(k) + Bu(k) + Ke(k) y(k) ) Cx(k) + e(k)
(29)
Sensor biases and process disturbances are examples of additive faults. A multiplicative fault could be due to changes in the physical process parameters, e.g., fouling in heat transfer (i.e., changes in the heat-transfer coefficient) or fatigue in mechanical structures (i.e., changes in the spring constants of a vibrating system). Multiplicative faults are also called parametric or spectral faults. Statistics based on the Kalman fillter residuals such as TKF2 often fail to detect parametric faults. For Kalman filter residuals, a parametric change corresponds to a change in the coefficients of the state-space matrices, in particular, changes in A, because these change the modes or eigenvalues of the process. Canonical variable residuals can also fail to detect parametric changes. For canonical variables, a parametric change corresponds to changes in the canonical loadings and canonical correlations. Such changes correspond to changes in the underlying ARMAX process, i.e., the “true” ARMAX coefficients. There are some pathological cases (e.g., when there are repeated canonical correlation coefficients) where the canonical loadings can change but the ARMAX coefficients remain constant. For practical applications, these situations are unlikely to occur. Basseville and Nikiforov13 proposed using the local approach to develop statistics for detecting process changes that can be modeled as changes in the model parameters. They discussed statistics for a number of possible parameter changes for ARMA and state-space models. A summary of the local approach is described in Appendix A. Malhotra et al.19 used the local approach to monitor for changes in an ARX model of a distillation column at a Syncrude Canada production facility. Kumar et al.20 used the local approach to monitor for changes in pricipal component loadings. In this section, the local approach is used to develop statistics that detect changes in the canonical loadings. Compared with the statistics designed for detecting parameter changes of a state-space model, these results are much simpler to derive and calculate. Given a nominal parameter, θ0, and data Y(N) ) {y(1), ..., y(N)}, a hypothesis test for detecting a θ0 can be written as
H0: θ ) θ0 Ha: θ ∈ θ1 ) θ0 +
(30)
Let Sθ0,θ1 denote the log likelihood ratio test
Sθ0,θ1(N) ) log
pθ1[Y(N)] pθ0[Y(N)]
(31)
where log pθ[Y(N)] is the log likelihood function for the model with parameter θ, calculated from the data Y(N). Unless the change is specified a priori, i.e., the test is for a specific change of interest, eq 31 cannot be used directly to calculate Sθ0,θ1. Assuming that the parametric change is bounded, |||| e ν, an approximation to eq 31 using measurable quantities can be derived and is shown in Appendix A. The resulting “local statistic” approximates eq 31 when the parameter change is bounded within a local region. As shown in Appendix A, the local statistic is based on the efficient score13 at sample k, Z(k), that is given by
Z(k) )
∂ log pθ[y(k)] |θ)θ0 ∂θ
(32)
Let Z ˜ (N) denote the mean efficient score
1
Z(N) )
N
∑ Z(k)
(33)
Nk)1
The covariance matrix for Z equals the Fisher information matrix, which is given by
[
I(θ0) ) E
[
|]
∂ log pθ ∂θ
)E -
2
θ0
|]
∂2 log pθ ∂θ2
θ0
(34)
The log likelihood ratio in eq 31 can be approximated as13
S(N) ≈
N T Z h (N) I-1(θ0) Z h (N) 2
(35)
where the (θ0, θ1) subscript has been dropped to simplify the notation. Under the null hypothesis of no change, S(N) in eq 35 is described by a random walk process as N f ∞.13 Thus, parametric changes can be monitored by a T∆S2 statistic or a MCUSUM∆S statistic based on the differenced values
∆S(k) ) S(k) - S(k-1)
(36)
A violation of either T∆S2 or MCUSUM∆S rejects the null hypothesis and indicates that the canonical loadings have changed. 2.5. Local Statistic for Canonical Variables. In this section, the local statistic methodology is used for detecting changes in the canonical loading matrices discussed in section 2.3. As described above, the local statistic provides a general method for detecting parametric changes. This methodology can be applied to any parametric model; Basseville and Nikiforov13 discussed and derived the equations for time series (AR and ARMA) and dynamic input-output models (ARX, statespace models, etc.). The local statistic methodology implicitly assumes that the parametric changes are caused by process changes and not other faults such as
462
Ind. Eng. Chem. Res., Vol. 43, No. 2, 2004
a sensor fault or load disturbance. That is, a process change occurs, and the current parametric model is no longer accurate. The “process model” for the proposed statistic is the result of the canonical correlation analysis (CCA) between the future and past
CV(k) ) Lf˜(k) - ΩJp(k)
(37)
Recall that ˜f is the conditional future in eq 19, p is composed of past inputs and outputs (eq 17), L and J are the canonical loading matrices, and Ω is the matrix of canonical correlation coefficients. The CCA between ˜f and p is used to identify the state variables for the state-space model in the CVA system identification procedure. That is, the CCA can be used to estimate the state variables. Therefore, process changes that affect the state variables of the state-space model also affect the CCA results. Changes in the CCA are reflected as changes in L, J, and Ω. The relationship among J, L, and Ω can be quite complex, particularly when y and u result in autocorrelated, multivariable data. Therefore, it is difficult to specify how a particular process change affects J, L, or Ω. However, it is unlikely that a process change affects only a single coefficient in J, L, or Ω. Instead, the process change affects all matrices simultaneously. Therefore, the proposed method considers only changes in L (i.e., θ ) L in eq 30, and θ0 is the nominal value for L) and assumes that this is sufficient for detecting process changes. Because of the selection to monitor changes in L only, the resulting local statistic expressions are greatly simplified. However, using L to parametrize the state-space variables does not capture all of the detailed relationships. This is an area for future research. To derive the local statistics, it is necessary to have an expression for the likelihood function in terms of the canonical variable residuals CV. Larimore15 gives a fundamental expression for the log likelihood function in terms of the CV as
-2 log[p(Y1N:θ)] ) -N log|I - Ω2| + N
∑1
CVT(k)(I
2 -1
- Ω ) CV(k) (38)
where Li and Ji are the canonical loadings for the ith canonical variable and ωi is the corresponding canonical correlation. In this paper, only the functional dependence in eq 39 on Li is considered for simplicity. This introduces some uncertainty in interpretation of the results. If all of the variables Li, Ji, and ωi were considered along with the constraints introduced in the CVA of eq 21, then this would be a complete parametrization of the ith state. Ignoring the constraints increases the number of degrees of freedom, i.e. estimated parameters, but leaving out Ji and ωi decreases this number. We will roughly count the dimension of Li, which is the number of lags Nl, as the number of estimated parameters. Assuming that the input-output data used in the CCA, y and u, are normal random variables, CV,i is also a normal random variable with variance 1 - ωi2
CV,i ∼ N(0, 1 - ωi2)
(40)
Therefore, the log likelihood function for CV,i is given by
log p[CV,i(k)] ) -
N
1
∑ N k)1 l
CV,i2(k)
-
2
2(1 - ωi )
N 2Nl
log 2π -
N 2Nl
log(1 - ωi2) (41)
Using θ ) Li, the efficient score for the local statistic is given by
zi(k) )
∂ log p ˜f (k) CV,i(k) ) ∂Li 1-ω2
(42)
[
(43)
i
Ii ) E
]
∑
∂2 log p ˜f,f˜ ) T ∂LiLi 1 - ωi2
The covariance matrix for the canonical future, ∑˜f,f˜, can be calculated analytically from the Kalman filter covariance matrices
[
∑
)
]
This expression is exact except for finite-length end effects that become small with increased sample size. As a result, the canonical variable estimates of the system parameters are approximate maximum likelihood estimates, and monitoring procedures based on eq 38 can be shown to have some optimal statistical properties. A remarkable aspect of this expression for the likelihood function is that the residuals are often highly serially correlated but the likelihood is expressed as a sum of squared residuals as if they were independent. The expression in eq 38 justifies the use of this sum of squared residuals that otherwise would lack rigorous justification. The local statistic for canonical variables can be derived as follows. By construction, the components of the canonical variable residuals for sample k are independently distributed, so expressions can be developed for each canonical variable separately. The residual of the ith canonical variable at sample k is given by
where P, Q, R, and S are the Kalman filter matrices used in section 2.2. The mean of the efficient score is given by the usual formula
CV,i(k) ) Li˜f(k) - ωiJip(k)
si(k) ) zjiT(k)Ii-1zji(k)
(39)
˜f,f˜
CA(APCT + S) ... CANl-1(APCT + S) l l CA(APCT + S) CACT + R l l l l ... CACT + R CANl-1(APCT + S) ...
CACT + R
(44)
zi(k) )
1
k
∑zi(j) kj)1
(45)
and the local statistic for the ith canonical variables can be calculated by
(46)
Ind. Eng. Chem. Res., Vol. 43, No. 2, 2004 463
Performing this procedure for each of the nCV canonical variables in the state-space model yields
[ ]
s1(k) S(k) ) l snCV(k)
compared with the detection of other parameteric changes. In fact, ∆S is proportional to the product of the conditional future and CV
∆S(k) ∝ ˜f(k)CV
(47)
Control charts such as the T 2 and MCUSUM charts can be used to detect changes. As noted previously, Basseville and Nikiforov13 derived several statistics that detect parametric changes in ARMAX or state-space models based on KF. Deriving and evaluating the efficient score in eq 32 and the covariance matrix in eq 34 is much simpler for CV than the statistics based on KF. The simplicity occurs because CV and f(k) are linearly related by L, and the relationship between KF and the state-space model parameters is much more complicated. 2.6. Discussion of Detection Methods. The differences between the canonical variable and Kalman filter detection methods are generally due to the type of model that is used. The Kalman filter residuals use a statespace model, a dynamic, input-output model. For the canonical variable residuals, the “model” is the result of the CCA between past and future inputs and outputs. That is, the CCA result is not explicitly a dynamic input-output model; rather, CCA produces a statistical relationship of the past and future, in this sense, similar to the common practice of lagging serially correlated variables for principal component analysis.21 The constructed canonical variables are then used simply to estimate the state equations by multivariate linear regression with the same maximum likelihood accuracy. However, because the past p(k) and future ˜f(k) are vectors of lagged variables, there is typically strong autocorrelation. Nevertheless, as discussed above, the likelihood function is still expressible as a sum of T 2 statistics notwithstanding the autocorrelation. This difference has very important effects on the resulting detection methods. For example, the Kalman filter residuals will be independent as long as the model is accurate. The significance of these effects is more closely examined in simulation studies in the next section. The statistics based on the Kalman filter residuals are appealing because they are intuitive and simple to implement. However, with this simplicity comes the “oblivious filter” problem:12 the filter was designed to estimate the true state from noisy measurements, but it does not distinguish noise from faults. If a fault occurs, the filter will attempt to correct the model estimates as if the fault were a random shock. This correction tends to dampen the effect of the fault, and thus “mask” the fault, making the fault more difficult to detect. Statistics based on the canonical variable residuals are not complicated by the oblivious filter problem and use standard control charts for monitoring. The local statistic for canonical variables is proposed to detect parametric changes, which are not necessarily detectable by the other methods. The key assumption is that such process changes can be modeled as changes in the canonical loadings in the CVA modeling procedure. The local approach was used to develop the ∆S statistics for detecting parametric changes in the canonical loadings. Because the canonical variable residual is a linear combination of measured inputs and outputs, the calculation of ∆S is relatively simple
(48)
An interesting ramification of eq 48 is the effect of feedback control on the local statistic control charts. For the Kalman filter residuals, TKF2 is given by
∑
TKF2(k) ) KFT(k)
-1 KF
KF(k)
(49)
Because KF is calculated from an input-ouput model, KF (and therefore TKF2) will not be affected by feedback control as long as the dynamic model is accurate. Similarly, for TCV2, the canonical variable residual will also be insensitive to feedback if the CCA remains accurate. The local statistic, however, will be affected by feedback. To see this, note that ∆S depends on the crosscorrelation between measured f and u and CV in eq 48. Thus, the monitoring statistics T∆S2 and MCUSUM∆S use
∑-1∆S(k) ∝ f(∑˜f,f˜, ∑CV,CV)
T∆S2(k) ) ∆ST(k)
(50) (51)
The variance and correlation properties of f depend on whether a feedback controller is present. The significance of this issue is that the results and behavior of T∆S2 or MCUSUM∆S change if a control system is switched from manual to automatic, or vice versa. The impact of this effect will be evaluated in greater detail in the following simulation studies. 3. Simulation Examples Three Monte Carlo simulations are used to evaluate the detection methods for different conditions. The first example considers detecting changes in the autoregressive parameters of an ARMA model. The next example uses a SISO process to evaluate the detection methods for sensor faults, process disturbances, and parametric changes. For the SISO process, the effect of feedback is examined. In the final example, a continuous stirredtank reactor (CSTR), the monitoring methods are evaluated for a multivariable, nonlinear system in closed-loop operation. Example 1. An ARMA Example. Consider the time series models given by
AR(1): y(k) )
1 e(k) 1 - 0.951q-1
(52)
AR(2): y(k) )
1 e(k) (1 - 0.95q )(1 - 0.5q-1)
(53)
-1
where e(k) is normally distributed with µ ) 0 and σ2 ) 1. For each model, 50 Monte Carlo trials with 6000 data points each were generated. For each trial, the first 3000 samples were generated from the first-order autoregressive process in eq 52, and the next 3000 samples used the second-order autoregressive process in eq 53. For each trial, a CVA state-space model was identified using the first 1000 data points. These data were also used to calculate the CCA matrices J, L, and Ω as part
464
Ind. Eng. Chem. Res., Vol. 43, No. 2, 2004
Table 1. Control Chart Parameters for Example 1 T2 statistic
limit
CUSUM statistic
CUSUM parameter γ
limit h
TCV2 TKF2 T∆S(CV)2 T∆S(AR)2
13.1 6.8 15.1 14.5
CUSUMCV CUSUMKF CUSUM∆S(CV) CUSUM∆S(AR)
2.6 2.5 1.9 1.1
1.1 0.75 4.4 6.2
of the CVA identification procedure. Using the 6000sample data set, the T 2 and CUSUM control charts were examined for the following methods: (1) Kalman filter residuals using the CVA state-space model (denoted as TKF2 and CUSUMKF), (2) canonical variable residuals using the CCA (TCV2 and CUSUMCV), (3) local statistic for canonical variables (T∆S(CV)2 and CUSUM∆S(CV)), and (4) local statistic for the autoregressive parameter (T∆S(AR)2 and CUSUM∆S(AR)). Note that a CUSUM rather than an MCUSUM control chart is being used. Although the number of canonical variables (which equals the number of state variables) is estimated as part of the CVA identification procedure, the state dimension was fixed at nCV ) 1, so that the detection methods monitor a single variable even though there are Nl ) 3 estimated parameters as discussed in section 2.5; this is similar to a single-state system being specified by pole, zero, and gain parameters. For the ARX model, only the AR coefficient changes, and there is only one estimated parameter. Thus, the MCUSUM control chart monitors one variable and will be denoted as the CUSUM control chart for this example. Initially, the alarm limit for the T 2 charts was calculated using the theoretical limits from eq 4. However, it was observed that the T∆S2 limit resulted in several more chart violations than would be expected theoretically. It appeared that the AS statistics were not normally distributed, which might motivate use of the CUSUM charts for future applications. To compare the sensitivity of all charts with a consistent baseline (i.e., charts that resulted in approximately the same number of violations during normal operation), alarm limits for the T 2 charts were selected to be the empirical 99% confidence limits based on the period of normal operation in the first Monte Carlo trial. The parameters for the CUSUM chart, i.e., γ in eq 7 and h in eq 8, were selected to yield an average run length (ARL) approximately equal to 100 samples (an ARL that is comparable to a Shewhart chart for R ) 0.01) for the data from normal operation, the first 3000 samples. The critical values for the T 2 statistics and the tuning parameters for the CUSUM charts are listed in Table 1. Control charts for the T 2 and CUSUM statistics from a single Monte Carlo trial are shown in Figure 1. Table 2 shows the number of chart violations averaged over the 50 Monte Carlo trials for the first 3000 samples (i.e., normal operation) and for the next 3000 samples, which were generated from the second model. For R ) 0.01, approximately 30 violations are expected to occur for 3000 samples of normal operation. The number of chart violations for the next 3000 samples, corresponding to the process change, indicates the sensitivity of the different monitoring methods, i.e., how well a method detects the process change. The Kalman filter residual charts, TKF2 and CUSUMKF, are clearly the least sensitive. The number of violations for change data increases by less than a factor of 3, the smallest difference for all of the charts. The relatively
Figure 1. T 2 and CUSUM control charts for the first Monte Carlo trial, example 1. The alarm limits are indicated by the horizontal lines. Table 2. Control Chart Violations for Example 1a statistic TCV2 TKF2 T∆S(CV)2 T∆S(AR)2 CUSUMCV CUSUMKF CUSUM∆S(CV) CUSUM∆S(AR) a
violations normal data change data 30.6 29.0 29.2 29.8 32.7 38.5 29.8 30.1
188.2 74.4 637.5 177.3 245.1 109.3 1525.7 220.6
ratio (change/normal) 6.1 2.6 22 5.9 7.5 2.8 51 7.3
Averaged over 50 Monte Carlo trials.
poor sensitivity for the Kalman filter charts is likely caused by the oblivious filter phenomenon described previously. Charts for the canonical variable residuals, TCV2 and CUSUMCV, and the local statistic for the autoregressive parameter, T∆S(AR)2 and CUSUM∆S(AR), are more sensitive: for the T 2 charts, the number of violations for the change data approximately increases by a factor of 6; for the CUSUM charts, by a factor of 7. However, the local statistic for the canonical variable charts, T∆S(CV)2 and CUSUM∆S(CV), are the most sensitive. For all of the statistics, the CUSUM charts resulted in more violations than the T 2 charts. As discussed in section 2.1, CUSUM charts should be more sensitive than T 2 charts for detecting small, persistent changes because of the integrative nature of the CUSUM chart. The difference between the local statistic results, ∆S(CV) and ∆S(AR), is intriguing. The ∆S(CV) statistic
Ind. Eng. Chem. Res., Vol. 43, No. 2, 2004 465
assumes that changes in the process can be detected as changes in the canonical loadings. This assumption appears not only to be valid but also to result in more sensitivity than the ∆S(AR) statistic, which monitors the true process parameter. One possible reason for this improved sensitivity is that more parameters are being monitored with the ∆S(CV) statistic. That is, the dimension of L1 equals Nl, the number of lags in eq 18, which was always greater than 1 (typically, N ) 3). Example 2. SISO Example. For this simulation the process model is given by the second-order process
Gp(s) )
0.5(s - 1) 0.5s2 + 1s + 1
(54)
The time constant is τ ) x2/2, and the damping coefficient is ζ ) x2/2. The continuous-time model was discretized with a sampling period of Ts ) 0.1 s to yield a discrete model denoted by gp(q-1)
gp(q-1) )
0.0857q-1 - 0.0947q-2 1 - 1.8006q-1 + 08187q-2
(55)
Stochastic disturbances were added to produce the following model
y(k) ) gp(q-1) u(k) + gw(q-1) w(k) + v(k)
(56)
-1
gw(q-1) )
gp(q ) 2
(57)
where the process noise, w(k), and the measurement noise, v(k), are uncorrelated and normally distributed with variances σ2(w) ) 0.25 and σ2(v) ) 1, respectively. A Monte Carlo simulation was used to study the proposed detection methods. Twenty-five Monte Carlo trials were used. For each trial, a state-space model was identified using CVA. The model was identified by exciting the open-loop process with u specified as a pseudo-random binary sequence (PRBS) with an amplitude of 10 and switching time of 4 s; a different PRBS was used for each trial. A second-order state-space model was identified using ADAPTX, version 3.5.22 Several faults were considered for the process in openloop and closed-loop operation. The closed-loop process used a PI controller tuned by the Ziegler-Nichols rules23 (Kp ) -0.9, τI ) 1.31 s). To evaluate the sensitivity of the monitoring methods to the size of the fault, a fault parameter for each disturbance was studied. For example, a sensor bias is described by the size of the bias, fs, and several values for fs were considered in the study. For each value of fs, 25 Monte Carlo trials were simulated. The following operating conditions and parameter ranges were used for each Monte Carlo trial: Case 1. Normal, open-loop operation Case 2. Normal, closed-loop operation Case 3. Step (or bias) sensor fault (0 < fs < 2) during open-loop operation
y(k) ) gw(q-1) w(k) + v(k) + fs(k)
(58)
Case 4. Step disturbance (0 < fd < 5) during openloop operation
y(k) ) gw(q-1) w(k) + v(k) + gp(q-1) fd(k)
(59)
Case 5. Step sensor fault (0 < fs < 2) during closedloop operation
y(k) ) gp(q-1) u(k) + gw(q-1) w(k) + v(k) + fs(k) (60) Case 6. Step disturbance (0 < fd < 5) during closedloop operation
y(k) ) gp(q-1) u(k) + gw(q-1) w(k) + v(k) + gp(q-1) fd(k) (61) Case 7. Low-frequency stochastic disturbance during closed-loop operation (0 < β < 1)
y(k) ) gp(q-1) u(k) + g(q-1) w(k) + v(k) + gp(q-1) fd(k) (62) fd(k) ) βfd(k-1) + e(k)
(63)
Note that, for β ) 1, this disturbance becomes a random walk. Because fd(k) is modeled as an autoregressive time series process, this fault will referred as an “AR disturbance”. Case 8. Process changes from Gp(s) to G/p(s) during closed-loop operation (0.75 < η < 1)
1 (s - 1) 2η2 Gp*(s) ) 1 2 1 s + s+1 η 2η2
(64)
The process change moves the poles from (-1 ( i) to -η(1 ( i). The damping coefficient remains at ζ ) x2/2, but the time constant increases from τ ) x2/2 to τ ) x2/2η, and the gain changes from k ) -1/2 to k ) -1/(2η2). If η ≈ 0.7, the closed-loop process is unstable. The open-loop step responses and the closed-loop setpoint responses for Gp and G/p are shown in Figure 2 for η ) 0.8. For each simulation, 2500 samples were generated, and the fault occurred at sample 1250. The open-loop normal operation results for all 25 Monte Carlo trials were used to calculate the control chart parameters empirically. That is, these results were used to select the alarm limits for the Shewhart charts and the MCUSUM parameters, γ in eq 7 and h in eq 8. The γ and h values were chosen so that R ≈ 0.01, that is, so that each chart would indicate approximately 25 violations for 2500 samples of normal operation. The threshold and alarm limits are reported in Table 3. The control charts for normal operation are shown in Figures 3 and 4 for open-loop and closed-loop operation, respectively. The effect of feedback will be discussed more thoroughly below, but note that the control chart behaviors are qualitatively similar but not identical for open-loop and closed-loop conditions. The autocorrelation functions for the T 2 statistics are shown in Figure 5 for normal open-loop operation (the results are similar for closed-loop operation). There is clearly a degree of autocorrelation for the TCV2 and T∆S2 statistics; by contrast, the TKF2 results are uncorrelated. The degree of autocorrelation for T∆S2 and TCV2 indicates that the CCA model residuals are not independent; however, the Kalman filter residuals are independent, as discussed in section 2.6. The number of violations
466
Ind. Eng. Chem. Res., Vol. 43, No. 2, 2004
Figure 3. Control charts for example 2, case 1 (normal, openloop operation).
Figure 2. (a) Open-loop responses to a +1 step in u Gp(s), indicated by the solid line, and G/p(s), indicated by-the dashed line. (b) Closed-loop set-point responses. Table 3. Control Chart Alarm Limits and Parameters for Example 2 T 2 statistic
alarm limit
MCUSUM statistic
γ
h
TCV2 TKF2 T∆S2
10.4 6.7 22.5
MCUSUMCV MCUSUMKF MCUSUM∆S
2 0.25 0.3
2.875 2.25 4.8
for TCV2 and T∆S2 approximately equals the number of violations for TKF2, as shown in Table 4. However, because of the autocorrelation for TCV2 and T∆S2, the chart violations tend to be sustained, occurring for periods of a few samples; for TKF2, the violations occur more frequently and for shorter periods (often, only one sample violates the alarm limit). The autocorrelation in the TCV2 and T∆S2 statistics also affects the corresponding MCUSUM statistics, and for normal operation, the violations for MCUSUMCV and MCUSUM∆S also occur for small, sustained periods. For cases 5-8, the control charts for the first Monte Carlo trial of the largest (i.e., the most easily detectable) faults are shown in Figures 6-9. (For Figures 6-9, the y-axis limits are constant, so that values greater than the upper limit appear to be equal to the upper limit.) For the sensor fault and process disturbances, all statistics successfully detect the fault for both open and closed-loop operation, and the control charts for cases 3 and 4 (not shown) are qualitatively similar to those for cases 5 and 6. For case 8, the local statistic charts (MCUSUM∆S and T∆S2) and perhaps the canonical variable charts (TCV2 and MCUSUMCV) detect the process change. The Kalman filter charts do not clearly indicate that a change has occurred, although there are more violations after the process change than before the change occurred. In Table 4, the average numbers of violations are reported for all 25 Monte Carlo trials. The same pattern is apparent: the canonical variable and
Figure 4. Control charts for example 2, case 2 (normal, closedloop operation).
local statistic methods detect all of the faults, and the Kalman filter residuals do not detect the process change. Additionally, there are fewer violations for the TKF2 and MCUSUMKF than for all of the other methods. Thus, the Kalman filter methods are less sensitive than the canonical variable or local statistic methods. The sensitivity of each method to the fault size was investigated more thoroughly by varying the parameter for each disturbance. For each parameter value, 25 Monte Carlo trials were simulated. The mean number of violations for all 25 Monte Carlo trials is plotted as a function of the fault size in Figure 10 for the T 2 charts and in Figure 11 for the MCUSUM charts. The following observations are apparent:
Ind. Eng. Chem. Res., Vol. 43, No. 2, 2004 467 Table 4. Control Chart Violations for Example 2a case
conditions
TCV2
TKF2
T∆S2
MCUSUMCV
MCUSUMKF
MCUSUM∆S
1 2 3 4 5 6 7 8
open-loop, normal operation open-loop, sensor fault (fs ) 0.57) open-loop, load disturbance (fd ) 1.4) closed-loop, normal operation closed-loop, sensor fault (fs ) 0.57) closed-loop, load disturance (fd ) 1.4) closed-loop, process change (β ) 0.29) closed-loop, AR disturbance (η ) 0.086)
18.7 327.8 637.6 24.8 386.7 625.7 31.6 58.9
24.7 30.7 40.2 21.2 30.2 39.3 23.9 23.1
24.0 587.7 917.9 39.3 536.9 800.1 67.5 105.3
17.7 652.1 1080.5 21.9 741.0 1069.7 31.7 65.0
31.1 39.7 51.8 27.3 39.7 50.5 30.7 29.3
26.3 927.4 1192.3 47.7 848.8 1126.1 87.5 140.4
a
Each simulation consists of 2500 samples; the disturbance occurs at sample 1251.
Figure 5. Autocorrelation function for example 2, case 1 (normal, open-loop operation).
Figure 7. Control charts for example 2, case 6 (load disturbance, closed-loop operation, fd ) 2.4).
Figure 6. Control charts for example 2, case 5 (sensor fault, closed-loop operation, fs ) 0.97).
(1) The curves increase monotonically, confirming the intuitive notion that larger faults are easier to detect than smaller faults. (2) The MCUSUM charts are more sensitive than the T 2 charts. As shown in Figure 12, MCUSUMCV can detect smaller faults than TCV2; the MCUSUMCV curve is slightly to the left of the TCV2 curve, except for the process change when the two charts are essentially
identical. Note that the degree to which MCUSUM is more sensitive than T 2 charts also depends on the MCUSUM tuning parameters. (3) For each disturbance, the Kalman filter clearly is the least sensitive of the metrics. (4) The results for MCUSUMCV and MCUSUM∆S (or TCV2 and T∆S2) are approximately the same for the sensor and load disturbances. 95) The ∆S statistics are noticeably more sensitive for the process change and AR disturbance. Note that, for the local statistic and the sensor fault or process disturbance, the results change slightly for the open-loop and closed-loop simulations. In Figure 11, the MCUSUM∆S results appear to be slightly more sensitive than the MCUSUMCV results for open-loop simulations, but the MCUSUM∆S and MCUSUMCV results are virtually identical in the closed-loop simulations. The effect of feedback can be examined by considering the open-loop and closed-loop results for the sensor fault and process disturbance simulations. In Figure 13, the MCUSUM results are shown for each method (the T 2 results are similar). The results for the canonical variable residual charts and Kalman filter residual charts appear to be nearly identical. For the Kalman filter residuals, this result is not surprising because an input-output model is being used. As discussed in section 2.6, the canonical variable residuals might be
468
Ind. Eng. Chem. Res., Vol. 43, No. 2, 2004
Figure 8. Control charts for example 2, case 7 (AR disturbance, closed-loop operation, β ) 0.49).
Figure 10. Number of violations for different fault sizes (T 2 charts, example 2).
Figure 9. Control charts for example 2, case 8 (process change, closed-loop operation, η ) 0.15).
sensitive to feedback depending on the accuracy of the CCA results. Because the canonical variable results in Figure 13 appear to be nearly identical, the CCA model accounts for the input-output relationship reasonably well. For the local statistic, the results are clearly different. This result is also not surprising based on the discussion in section 2.6. However, the results for the local statistic are not dramatically different for openloop or closed-loop simulations. Thus, none of the methods appears to be highly sensitive to feedback. Observations. Before the next simulation example is discussed, the following observations can be made: (1) Of course, larger faults were easier to detect. (2) None of the detection methods was greatly affected
Figure 11. Number of violations for different fault sizes (MCUSUM charts, example 2).
by feedback, although the local statistic was clearly sensititive to feedback. (3) MCUSUM charts were more sensitive than T 2 charts. (4) Detecting the process change was more difficult than detecting sensor faults or process disturbances. (5) Of the methods considered, the local statistic was the most sensitive for detecting the process change and AR disturbance. Example 3. Nonlinear Continuous Stirred-Tank Reactor. A nonlinear continuous stirred-tank reactor
Ind. Eng. Chem. Res., Vol. 43, No. 2, 2004 469 Table 5. CSTR Parameters variable
value
variable
value
F CF TF V σ Cp T
100 L/min 1 mol/L 350 K 100 L 1 kg/L 239 J/kg‚K 383.7 K
E/R k0 UA TJ -∆H CA
8750 K 7.2 × 1010 min-1 5 × 104 J/min‚K 309.9 K 5 × 104 J/mol 0.1 mol/L
Table 6. PI Controller Parameters CA-F K2 τ1,2
0.75 K/(mol/L) 1.2 min T-TJ
K1 τI,1
Figure 12. Number of violations for different fault sizes (TCV2 and MCUSUMCV charts, example 2).
Figure 13. Effect of feedback, example 2.
(CSTR) with the first-order reaction A f B is described by23
( ) ( )
EA dCA F C ) (C0 - CA) - k0 exp dt V RT A
(65)
EA dT F -∆H ) (T0 - T) + k exp + dt V FCp 0 RT UA (T - T) (66) FCpV J Nominal values for the simulation are given in Table 5. Temperature, T, and concentration, CA, are controlled
68.6 1/K 4.1 min
variables; jacket temperature, TJ, and flow rate, F, are manipulated variables; and feed concentration, C0, and feed temperature, T0, are unmeasured disturbances. Guassian noise is added to each output measurement with variances given by σ2(CA) ) 1 × 10-6 and σ2(T) ) 0.0411. A decentralized control system was implemented that controls CA with F and T with TJ; the controller parameters are Ulisted in Table 6. As in example 2, a Monte Carlo simulation of 15 trials was used to study the detection methods. For each trial, a state-space model was identified using CVA and ADAPTX, version 3.5.22 The excitation data for identifying the state-space were generated by specifying F and TJ to be PRBS sequences (switching time ) 10 s for each input; amplitude ) 3 L/s for F, 3 K for TJ). The excitation data set comprised 1200 samples. The Monte Carlo simulation consisted of normal operation and several disturbances for the closed-loop system. Each disturbance can be described by a single parameter, e.g., the sensor fault bias is described by a step size denoted as fs,CA. The Monte Carlo study employed several values for the disturbance parameters to evaluate the sensitivity of the detection methods. The following closed-loop simulations were considered: Case 1. Normal, closed-loop operation Case 2. Sensor Fault. A bias (step) was added to the CA measurement, denoted by fs,CA. The range of values used for the parametric study was 0 < fs,CA < 0.01. Case 3. Process Disturbance. A step change was added to C0, denoted by fd,C0. The range of values used for the parametric study was -0.05 < fd,C0 < 0. Case 4. Catalyst Deactivation. Deactivation by sintering was assumed.24 The preexponential factor, k0, was multiplied by a time-dependent catalytic activity coefficient, a(t)
k0(t) ) k0(0) a(t)
(67)
Ed 2 da ) -kd exp a dt RT
(68)
( )
where the activation energy was constant at Ed ) 2.9 × 103 kJ/mol and the range of the sintering rate constant, denoted by kd, was 0 < kd < 0.001. Case 5. Fouling. The heat transfer coefficient, U, decayed as a ramp
U(t) ) U(0)(1 - kft)
(69)
The range for the decay rate, denoted by kf, was 0 < kf < 0.001.
470
Ind. Eng. Chem. Res., Vol. 43, No. 2, 2004
Table 7. Control Chart Parameters for Example 3 T 2 statistic 2
TCV TKF2 T∆S2
alarm limit
MCUSUM statistic
h
γ
9.4 8.9 21.6
MCUSUMCV MCUSUMKF MCUSUM∆S
12 6 19.5
1.5 0.5 2
Table 8. Average Control Chart Violations for Example 3 TCV2 TKF2 T∆S2 MCUSUMCV MCUSUMKF MCUMSUM∆S a
case 1
case 4a
12 13 13 13 9 10
60 29 424 245 600 749
kd ) 1 × 10-3 min-1.
Each fault or change occurred at sample koc ) 101. Cases 2 and 3 are “abrupt faults” because a parameter changes suddenly; cases 4 and 5 are “incipient faults” because the parameters change gradually. Because the process is under feedback control, the effect of the disturbances is more noticeable in the manipulated variables than in the controlled variables. That is, from the perspective of detecting faults, feedback tends to “hide” the appearance of the disturbances in the controlled variables, (Arguably, this is the raison d’etre for feedback control.25) Furthermore, for the disturbance magnitudes considered, the effect of the disturbance is not catastrophic. For example, the incipient faults result in noticeable, but rather subtle changes from normal operation. Using the data from case 1 (normal, closed-loop operation), empirical chart parameters were calculated for the T 2 and MCUSUM charts. The T 2 alarm limits and the MCUSUM parameters were chosen such that the type I error level was R ) 0.01, which would result in an expected value of 12 violations using the 1200 samples from normal operation. The alarm limits and MCUSUM parameters are reported in Table 7, and the average numbers of violations for normal operation (case 1) and catalyst deactivation (case 4) are listed in Table 8. Figure 14 shows the results for the proposed monitoring metrics using 1200 samples from case 1 (normal, closed-loop operation). These charts are worth studying in some detail. As in the T 2 charts for example 2, there is a degree of autocorrelation for the TCV2 and T∆S2 statistics; by contrast, the TKF2 results are independent. The autocorrelation for T∆S2 and TCV2 likely is due to the CCA model, but the Kalman filter residual is independent, as discussed in section 2.6. As in the SISO example, the autocorrelation in the TCV2 and T∆S2 statistics causes the violations for normal operation to occur for small, sustained periods. As shown in Table 8, the average number of violations for each chart approximately equals 10. The control charts for each case are shown in Figures 15-17. For Figures 15-17, the y-axis limits are constant, so that values greater than the upper limit appear to equal the upper limit. For this set of disturbance parameters, each fault is detected by all of the methods. For the sensor fault with fs,CA ) 0.002 in Figure 15, all of the charts detect the small (2%) bias. It is not clear whether any chart is more suitable than any other chart for signaling this change, but note that the TKF2 chart does not indicate the sustained disturbance as clearly as the other charts. Although the frequency of violations
Figure 14. Control charts for case 1, example 3 (normal operation).
Figure 15. Control charts for case 2, example 3: small sensor bias, fs,CA ) 0.002 mol/L occurs at k ) 101.
is greater for these data than for normal operation, this change might go unnoticed. The results for the step disturbance in C0 with fd ) -0.005 in Figure 16 are similar, and the TKF2 chart arguably does not detect this change at all. Recall that cases 4 and 5 are incipient faults, which develop slowly over time. For the incipient faults, detecting the precise time of occurrence is more difficult than for the abrupt faults because of the slow rate of change. However, detection of the change soon after the fault occurs is still desirable in order that the effect of the disturbance be mitigated and more time for taking corrective action be available. For case 4 (catalyst deactivation), the control charts from one Monte Carlo
Ind. Eng. Chem. Res., Vol. 43, No. 2, 2004 471
Figure 16. Control charts for case 3, example 3: C0 step disturbance, fd,C0 ) -0.005 mol/L, occurs at k ) 101.
Figure 18. Control charts for case 5, example 3: heat exchanger fouling, kf ) 3 × 10-4 min-1, begins at k ) 101.
Figure 19. Sensitivity of methods to fault size for the T 2 charts. Figure 17. Control charts for case 4, example 3: catalyst deactivation, kd ) 1 × 10-3 min-1, begins at k ) 101.
trial are shown in Figure 17 for kd ) 1 × 10-3 min-1, and the average violations for all 25 Monte Carlo trials are listed in Table 8. The results indicate that the T∆S2 chart and all of the MCUSUM charts clearly detect the change. The TCV2 and TKF2 charts do not clearly indicate a change. Furthermore, typically, the MCUSUMCV chart does not indicate a change until approximately sample 1000smuch later than when the MCUSUMKF and MCUSUM∆S charts signal a change. Thus, the canonical variable residual statistics do not perform as well as the other methods for this disturbance. For the case 5 (fouling) data in Figure 18 for kf ) 3-4 min-1, all of the charts except TKF2 detect the change. In this case, the MCUSUMKF chart detects the change later than both
the MCUSUM∆S and MCUSUMCV charts, and it appears that the Kalman filter residual does not perform as well as the other methods for this disturbance. The sensitivity of the charts to the fault or disturbance magnitude was considered in greater detail. In Figure 19, the number of violations for the T 2 statistic is shown as a function of the fault size. The more sensitive charts result in more violations for a given fault size. Ranked according to sensitivity, T∆S2 is the most sensitive of the T 2 methods, followed by TCV2 and TKF2. For the MCUSUM charts in Figure 20, the results are practically indistinguishable for the abrupt sensor and process disturbances. For the incipient changes, MCUSUM∆S is the most sensitive for both fouling and catalyst deactivation. Curiously, MCUSUMKF is less sensitive than MCUSUMCV for fouling, but the opposite
472
Ind. Eng. Chem. Res., Vol. 43, No. 2, 2004
(4) As discussed in section 2.6, the T∆S2 and MCUSUM∆S charts might be sensitive to feedback. For the SISO example, the effects of feedback were apparent but not severe. The CVA-based detection methods were examined using an experimental pH neutralization process.1 The conclusions for the experimental results are quite similar to these simulation results, although the sensitivity of the methods was not as critical a factor for the experimental results (i.e., all of the detection methods performed similarly for the range of faults considered in the experiments). However, the experiments validated that all of these methods worked using real, measured data. 4. Conclusions
Figure 20. Sensitivity of methods to fault size for the MCUSUM charts.
is true for catalyst deactivation. It is difficult to explain this rather odd result, but it should be noted that tuning the MCUSUM charts is nontrivial and can lead to different results. That is, several combinations of γ and h for the MCUSUMCV or MCUSUMKF charts resulted in the same number of violations for normal operation but changed the relative performance of MCUSUMKF and MCUSUMCV for the incipient disturbances depending on the selection of γ and h. Although it is beyond the scope of this research to address the nuances of the MCUSUM chart in great detail, in all cases, MCUSUM∆S was the most sensitive method for detecting the incipient disturbances. 3.1. Summary of Results. Many of the relevant issues for the detection methods discussed in section 2.6 were apparent in the simulations: (1) The difference between the T 2 and MCUSUM control charts was clearly apparent: the MCUSUM chart detected and indicated smaller changes more clearly than the T 2 charts. However, note that, for these simulations, only persistent changes were considered. If sporadic or intermittent changes were considered, the T 2 charts likely would perform better than the MCUSUM charts. (2) In general, all of the metrics were capable of detecting rather small faults. Ranked by sensitivity, the Kalman filter residual metrics, TKF2 and MCUSUMKF, were the least sensitive of the methods considered; the TCV2 (MCUSUMCV) charts were much more sensitive than the TKF2 (MCUSUMKF) charts; and the T∆S2 (MCUSUM∆S) charts were the most sensitive of the three detection methods considered. The poor performance of the Kalman filter charts is not suprising according to the discussion in section 2.6 for the oblivious filter phenomenon. (3) The benefit of the T∆S2 and MCUSUM∆S charts was most significant for the process change in the SISO example and the incipient, parametric changes in the CSTR example. For the abrupt additive faults (i.e., the sensor fault and process disturbance), the differences between the charts were relatively minor.
Using the statistical properties of canonical correlation analysis and a state-space model, a novel method for monitoring continuous processes has been proposed that is based on the local approach for detecting parametric changes. The proposed methods use multivariable versions of the classical Shewhart and CUSUM control charts. Thus, although the statistics are nontraditional, they still use standard SPC charts and “SPC thinking”; i.e., alarm limits and limit violations have their usual meanings. The major motivation for developing the proposed method is sensitivity. That is, slow-developing process changes can go undetected using standard, residualbased detection methods. The “local approach” allows for the detection of changes in parameters, and the proposed method makes use of the parametric changes for detecting process changes. In several simulation studies, the proposed method was compared with two other methods: one using Kalman filter residuals and the other using canonical variable residuals. The results of the simulation study indicate that the proposed method is very sensitive and can detect very small faults compared to the other methods studied. Appendix A. Detecting Process Changes with a Local Statistic The material in this section summarizes the key points for detecting process changes with the local approach. A more complete description is given by Basseville and Nikiforov13 and the references contained therein. The objective is to detect changes from an initial parameter vector θ0 given data YN 1 ) y(1), ..., y(N)
H0: θ ) θ0
(A-1)
H1: θ ∈ Θ1
(A-2)
For example, for the sake of discussion, consider the onedimensional subspace Θ1 containing elements θ1 of the form
θ1 ) θ0 +
ν Υ xN
(A-3)
where ν is the magnitude of the change and Υ is the change direction, scaled to unit length (i.e., ||Υ|| ) 1).
Ind. Eng. Chem. Res., Vol. 43, No. 2, 2004 473
The key statistic for detecting changes is the log likelihood ratoi
SN 1 (θ0,θ1)
)
lθ1(YN 1)
(A-4)
lθ0(YN 1)
N where lθ(YN 1 ) ) log pθ(Y1 ) is the logarithm of the probability density function of the measurements for parameter vector θ. In general, the log likelihood function can be difficult to compute exactly. A Taylor series expansion around θ0 can be used to approximate up to the second-order effects
∂lθ S(θ0,θ1) ≈ |θ)θ0(θ1 - θ0) + ∂θ ∂2l 1 (θ1 - θ0)T 2|θ)θ0(θ1 - θ0) (A-5) 2 ∂θ Following the hypothesis of local parameter change, the term θ1 - θ0 can be simplified as
ν Υ θ1 - θ0 ) xN
(A-6)
The log likelihood function can be further simplified using following definitions of the efficient score and Fisher information matrix
∆N(θ0) )
1 ∂l |θ)θ0 xN ∂θ
IN(θ0) )
1 I(θ ) N 0
ZN(θ0) }
I(θ0) }
∂l | (A-7) ∂θ θ)θ0
∂2 l |θ)θ0 ∂θ2
(A-8)
where ZN(θ0) is the efficient score and I(θ0) is the Fisher information matrix about parameter θ0. Equation A-5 can be written as T SN 1 (θ0,θ1) ) νΥ ∆N(θ0) -
ν2 T Υ IN(θ0)Υ + Ο(ν3) (A-9) 2
Asymptotically, as N f ∞
∆N(θ) f N[0, I(θ)]
(A-10)
and the higher-order terms approach zero. Basseville and Nikiforov generated several statistics for cases when some knowledge of v and Υ exists. In the general case when Υ can be anything and ||θ0 - θ1|| < ν2, the log likelihood ratio is approximated by inserting the maximum likelihood estimate for θ1. As shown in ref 13, for sample Ykj , eq A-5 can be written as k
sup Sj (θ0,θ) ) θ
k-j+1 k2 (χj ) 2
(A-11)
where
h kj )TI-1(θ0)(Z h kj ) (χkj )2 ) (Z Z h kj )
1
(A-12)
k
∑Zi(θ0) k - j + 1 i)j
(A-13)
where Zi is the efficient score for the ith observation, y(i).
For canonical variables, the efficient score is particularly simple. In this case, the parameter vectors of interest are the canonical loading coefficients for the future and the canonical correlations, θ ) L (in principle, Ω and J could also be included, but the resulting expressions are much more complicated). The probability density function for the canonical variables is given by
lθ() ) -
tT(1 - Ω)-1t 1 - log I - Ω + constant 2 2 (A-14) t ) Lft - Ωxt
(A-15)
The efficient score is thus
[]
∂l ∂L Zi(θ0) ) ∂l ) [ft(I - Ω)-1t] ∂Ω
[
]
(A-16)
and the Fisher information matrix is given by
∂2l ∂2l T ∂L∂ΩT ) f (I - Ω)-1f T I(θ0) ) ∂LL2 t t ∂l ∂2l T T ∂Ω∂L ∂ΩΩ
(A-17)
Literature Cited (1) Juricek, B. Multivariable Statistical Methods for System Identification and Process Monitoring. Ph.D. Dissertation, Department of Chemical Engineering, University of California, Santa Barbara, CA, 2002. (2) Montgomery, D. Introduction to Statistical Quality Control, 4th ed.; John Wiley: New York, 2000. (3) Lowry, C.; Montgomery, D. A Review of Multivariate Control Charts. IIE Trans. 1995, 27, 800-810. (4) MacGregor, J. Using On-Line Process Data to Improve Quality: Challenges for Statisticians. Int. Stat. Rev. 1997, 65, 309-323. (5) MacGregor, J.; Kourti, T. Statistical Process Control of Multivariate Processes. Control Eng. Pract. 1995, 3, 403-414. (6) Wise, B.; Gallagher, N.; MacGregor, J. The Process Chemometrics Approach to Process Monitoring and Fault Detection. In IFAC Workshop on On-line Fault Detection and Supervision in the Chemical Process Industries; Morris, A. J., Martin, E. B., Eds.; Pergamon Press: New York, 1995. (7) Anderson, T. An Introduction to Multivariate Statistical Analysis; John Wiley: New York, 1984. (8) Box, G.; Lucen˜o, A. Statistical Control by Monitoring and Feedback Adjustment; John Wiley: New York, 1997. (9) van Dobben De Bruyn, D. Cumulative Sum Tests: Theory and Practice; Hafner Publishing Co.: New York, 1968. (10) Crosier, R. Multivariate Generalizations of Cumulative Sum Quality-Control Schemes. Technometrics 1988, 30, 291-303. (11) Harris, T.; Ross, W. Statistical Process Control Procedures for Correlated Observations. Can. J. Chem. Eng. 1991, 69, 4857. (12) Willsky, A. A Survey of Design Methods for Failure Detection in Dynamic Systems. Automatica 1976, 12, 601-611. (13) Basseville, M.;. Nikiforov, I. Detection of Abrupt Changes; Prentice-Hall: Englewood Cliffs, NJ, 1993. (14) Burnham, K.; Anderson, D. Model Selection and Inference; Springer: New York, 1998. (15) Larimore, W. Optimal Reduced Rank Modeling, Prediction, Monitoring, and Control Using Canonical Variate Analysis. In IFAC 1997 International Symposium on Advanced Control of Chemical Processes; Elsevier Science: New York, 1997; pp 6166. (16) Negiz, A.; Cinar, A. Statistical Monitoring of Multivariable Dynamic Processes with State-Space Models. AIChE J. 1997, 8, 2002-2020.
474
Ind. Eng. Chem. Res., Vol. 43, No. 2, 2004
(17) Russell, E.; Chiang, L.; Braatz, R. Fault Detection in Industrial Processes Using Canonical Variate Analysis and Dynamic Principal Component Analysis. Chemom. Intell. Lab. Syst. 2000, 51, 81-93. (18) Gertler, J. Fault Detection and Diagnosis in Engineering Systems; Marcel Dekker: New York, 1998. (19) Malhotra, A.; Hanafi, A.; Tamayo, E.; Huang, B. Industrial Applications of Process Fault Detection Approaches. ADCHEM 2000; Elsevier Science: New York, 2000; pp 647-652. (20) Kumar, S.; Martin, E.; Morris, J. Detection of Process Model Changes in PCA Based Performance Montoring. In Proceedings of the American Control Conference; IEEE Press: Piscataway, NJ, 2002; pp 2719-2724. (21) Ku, W.; Storer, R.; Georgakis, C. Disturbance Detection and Isolation by Dynamic Principal Component Analysis. Chemom. Intell. Lab. Syst. 1995, 30, 179-196. (22) Larimore, W. ADAPTX Automated System Identification Software Users Manual; Adaptics, Inc.: McLean, VA, 1996.
(23) Seborg, D.; Edgar, T.; Mellichamp, D. Process Dynamics and Control, 2nd ed.; John Wiley: New York, 2004. (24) Fogler H. Elements of Chemical Reaction Engineering, 2nd ed.; Prentice-Hall: Englewood Cliffs, NJ, 1992. (25) Box, G.; Kramer, T. Statistical Process Monitoring and Feedback AdjustmentsA Discussion. Technometrics 1992, 34, 251-285. (26) Larimore, W. E. System Identification, Reduced-Order Filtering and Modeling via Canonical Variate Analysis. In Proceedings of the American Control Conference; IEEE Press: Piscataway, NJ, 1983; pp 445-451.
Received for review February 24, 2003 Revised manuscript received October 30, 2003 Accepted October 30, 2003 IE0301684