332
Ind. Eng. Chem. Res. 2008, 47, 332-343
Detection of Unmeasured Disturbances in Model Predictive Control (MPC) Plant Test Data Hongbin Dong† and James B. Riggs* Department of Chemical Engineering, Texas Tech UniVersity, Lubbock, Texas 79409-3121
The fidelity of MPC models can be significantly compromised if the effect of unmeasured disturbances is contained in the data set used to identify the process models. This paper addresses an asymptotical detection method (ADM) based on the normalized primary residual and the χ2 statistic. An iteratively procedure, based on ADM, is proposed for removing data containing the effect of unmeasured disturbances until all the data passes the ADM χ2 test. A detailed simulation of a refining FCC unit is used to evaluate the performance of the proposed method for identifying unmeasured disturbances. A series of random walk unmeasured disturbances with varying amplitudes was applied during a simulated testing period. The proposed algorithm was applied to iteratively identify and remove the corrupted data. After the corrupted data were removed, the identified MPC models showed significant improvement. The sensitivity of the ADM to measurement noise is also evaluated. 1. Introduction Experience with model predictive control (MPC) has shown that using plant data from abnormal operating periods to train an MPC model can significantly undermine the control performance when these models are used for closed-loop control. Corrupted process data can result from process faults and sensor faults.1 Corrupted process data caused by process faults, which correspond to changes in process behavior, e.g., functioning mode changes of a gas turbine or an abnormal operating region of a catalytic converter, have been addressed.2-4 Sensor faults include sensor failures and sensors which are out of calibration. The detection of process data corrupted by sensor malfunction corresponds to detecting gross error in a process. The methods of gross error detection have been investigated previously.5-7 Process variations and unmeasured disturbances during testing periods and process nonlinearity can each affect the fidelity of MPC models developed from process training data. Process variations are the variations in the controlled variables due to colored noise or the coupling of regulatory PID-type controllers. Unmeasured disturbances include input changes (e.g., feed temperature and composition changes) and process changes (e.g., tuning parameter changes of the regulatory controllers and changing the lineup of a process). Industrial processes have varying degrees of nonlinearity, while MPC models are usually linear models. Therefore, process nonlinearity can undermine the performance of MPC models, and various transformations are used to reduce this effect. When a set of plant test data contains abnormal operating periods due to unmeasured disturbances, the fidelity of the identified model based on that test data can be degraded, resulting in a significant mismatch between the identified model and the process. A number of studies8-11 on system identification in the presence of process disturbances have focused on the modeling process disturbances rather than detecting them and removing the affected training data. For instance, MacGregor8 has investigated four classes of stochastic disturbance models: a Gaussian white noise model, a low order stationary autoregres* To whom correspondence should be addressed. Tel.: (806) 742 1765. Fax: (806) 742 1765. E-mail:
[email protected]. † Current address: Pavilion Technologies, Austin, TX.
sive moving average model, a random walk model, and an integrated moving average model. The approach is based on assuming that the disturbances result from filtered white noise. The filter model can be regarded as the disturbance model. It was further pointed out that when a disturbance is nonwhite and test data are finite, to regress accurate FIR models, the disturbance model should be identified simultaneously with the process model.11,12 However, in most cases when process disturbances dominates model error, modeling process disturbances will not significantly improve the accuracy of the identified process model because the asymptotical model variance is proportional to the disturbance spectrum at any frequency.10 Therefore, if the process disturbances in a training data set can be detected and deleted in advance, then it should result in more accurate process models. For linear model identification, the output error (residual) method is used to remove corrupted data resulting from unmeasured disturbances from plant test data.13 This method is able to detect corrupted process data when the corrupted data is a small percentage of the whole training data. Moreover, the model residual has been shown to be insufficient for detecting corrupted data if the model has been corrupted.14 The contribution of this work is to develop an approach that is able to identify abnormal process data from plant test data set even if the MPC models are corrupted by the abnormal operating conditions. After removing the identified corrupted plant test data, the fidelity of identified model should improve. The scope of this paper is to address the detection of unmeasured disturbances in MPC test data. 2. The Asymptotical Detection Method 2.1. Preliminary Concepts. 2.1.1. Primary Residual. Given a finite dimensional residual vector, K(G0,Yk), where the process is accurately characterized by the model, G0, if EGˆ [K(G0,Yk)] ) 0 if G ˆ ) G0
(1)
ˆ * G0 EGˆ [K(G0,Yk)] * 0 if G
(2)
where EGˆ is the expectation value when the process is represented by the model G ˆ , and Yk is the process measurement
10.1021/ie061623+ CCC: $40.75 © 2008 American Chemical Society Published on Web 12/12/2007
Ind. Eng. Chem. Res., Vol. 47, No. 2, 2008 333
at time unit k. K(G0,Yk) is then referred to as the primary residual.2 For a finite sample of data, N, eq 1 can be empirically rewritten as 1
ˆ ) G0 no corrupted process data H0: G
N
∑ K(G ,Y ) ) 0 0
N k)1
k
ˆ * G0 corrupted process data exist H 1: G
This expression can be regarded as the estimation function of G0, which is also the loss function of process identification. The dimension of the primary residual should be greater than or equal to the order of the system parameter G0. A typical primary residual of a primary vector-valued function K(G0,Yk) is the gradient with respect to the model coefficient, g, of the summation of prediction error square. For the loss function V(g) )
1
N
∑e N
2
k
2.2. The Asymptotical Detection Method (ADM). To sufficiently detect abnormal process training data, which can corrupt the model, we can propose hypothesis testing, i.e.
where G0 and G ˆ are the true process model and the estimated model, respectively. Based upon eqs 1 and 2, we will have EGˆ (K(G0,Yk)) ) 0 under H0
(5)
EGˆ (K(G0,Yk)) * 0 under H1
(6)
Theorem 1. Given expressions 5 and 6, if we define the normalized residual
(g) ζN(G0) )
k)1
We can obtain the following expression ∂V(g) ∂g
)
2
∂ek(g)
N
∑e
N k)1
k
N
1
∑ K(G ,Y ) 0
xN k)1
k
The following will hold )0
by minimizing the loss function V(g) to solve for the unknown parameter g of the model. Then, if we let the K(G0,Yk) equal to ek(∂ek(g))/(∂g), i.e., ∂ek(g) K(G0,Yk) ) ek ∂g
∑(G )] under H ˆ ) ∼ N[* 0, ∑(G )] under H ζ (G ζN(G ˆ ) ∼ N[0,
∂g
(3)
0
(7)
0
N
0
1
(8)
where ∑(G0) is the covariance matrix, and N[,] represents a normal distribution. The proof is shown in Appendix A. The covariance matrix of the primary residual, ∑(G0), theoretically is defined as
∑(G ) ) lim ζ (G ) ζ (G ) T
the K(G0,Yk) will obey the condition given by eq 1 if the new sample data set is collected from a process which is not corrupted by unknown process factors. Meanwhile, eq 2 will hold when the model is corrupted by unknown process factors. Therefore, ek(∂ek(g))/(∂g) can be considered as the primary residual in this case. 2.1.2. Normalized Residual. Although we can derive the primary residual for a large sample of training data, the statistical distribution of the primary residual is rarely known. To circumvent this weakness, we may generate a known distribution of a new kind of residual from the primary residual by using the central limit theorem (CLT). Given a primary residual K(G0,Yk) is characterized by the model G0 and the sample number of N. The normalized residual is defined as ζN(G0) )
1
N
∑ K(G ,Y )
xN k)1
0
k
(4)
Compared with the primary residual, one can find that this normalized residual has the same mean value and covariance/ variance as the primary residual. However, the most important characteristic of the normalized residual is that it will follow a Gaussian distribution, which simplifies the statistical analysis. Thus far, it is necessary to distinguish between the primary residual and the normalized residual. Primary residuals are the general residual which are vector-valued functions linking the process measurements and model parameters, while normalized residuals are derived from the primary residual using the CLT and will follow a Gaussian distribution.
0
N
Nf∞
0
N
0
For the sake of calculation convenience, the whole data sequence is divided into the M segments of data length L, and in each segment the normalized residual is calculated as3 ζM,L(G0) )
1
L
∑K
xN l)1
l+(M-1)L (G0,Yl+(M-1)L)
(9)
Then, with finite a sample size, the covariance matrix is 1
M
∑(G ) ) M ∑ ζ 0
T m,L(G0)ζm,L(G0)
(10)
m)1
Now, the problem of detecting corrupted process data is converted into determining if the mean value of ζN(G ˆ ) has changed. It is well known that for a Gaussian distributed vector the generalized likelihood ratio test (GLR) can detect unknown changes in the mean of a Gaussian distributed vector, H0 against H1 is a χ2 test:
∑
χ2global ) ζN(G ˆ )T
-1
(G0)ζN(G ˆ)
(11)
The degree of χ2 is equal to the order of the model. A threshold value of χ2 can be selected from a χ2 table4 or calculated with a specified false alarm rate by the user. Several comments with regard to the asymptotical detection method should be made at this point: (1) For some identification methods, such as the instrumental variable (IV) method and the subspace identification method, parameter estimations are not applied using the form of eq 3, but the form of the model is not
334
Ind. Eng. Chem. Res., Vol. 47, No. 2, 2008
restricted in this development. (2) The requirement of the detection model is that it can allow us to discriminate between normal and abnormal status of process data. Furthermore, we are interested in abnormal process data that will eventually result in the corruption of the models of interest. For an example, if an available model has an FIR form, this method should determine if the FIR model has been corrupted by abnormal process data. (3) It has been shown by Zhang et. al3 that for a detection model it is not necessary for G0 to be the ‘true’ model when the primary residual is utilized for process monitoring purposes. Their results further indicate that using a biased process model can isolate such abnormal process data that will potentially result from the unknown faults of a process. On the other hand, it has been demonstrated16,17 that in order to improve monitoring efficiency, the given model should be close to the ‘true’ one as possible. Therefore, we can conclude that as long as a given process model is better than the one that was identified from the new process data, the asymptotical detection method should apply.
∂V(G)
) -2
1 N
∂gi
N
∑ X(t - 1)(y(t) - GX (t - 1)) ) 0 T
t)1
Therefore, the model parameter, G, should satisfy 1 N
N
∑ X(t - 1)(y(t) - GX (t - 1)) ) 0 T
t)1
Recall that y(t) - GXT(t - 1) ) (t,G) where (t,G) is the regularly scalar residual of the MPC model. Now, we can derive the primary residual for an MPC model K[G,y(t)] ) X(t - 1)[y(t) - GXT(t - 1)] ) X(t - 1)(t,G)
3. Identifying Unmeasured Disturbances in MPC Plant Test 3.1. Primary Residuals and Normalized Residual of an MPC Model. For the sake of simplicity, we have only considered a SISO process because its extension to the MIMO process is straightforward. That is, a MIMO system can be represented as an array of SISO models. Considering the variables of observation for the output y(k) and observations for the inputs [(x(1), x(2), ‚‚‚ x(k - 1))], the model can be expressed in an MPC form y(k) ) g1x(k - 1) + g2x(k - 2) + ‚‚‚ + g(k-1)x(1) +
If we denote the estimated parameters as G, then we may calculate G by solving the following equation
due to 1 N
∑ [X(t - 1)(t,G)] ) 0
1 N
N
∑ [X(t - 1)(t,G)] * 0
If we denote ζN(G) )
1
N
∑ X(t - 1)[y(t) - GX (t - 1)] ) T
xN i)1
1
y(k) ) GXT(k - 1) where X(k - 1) is X(k - 1) ) [x(1), x(2), ‚‚‚ x(k - 1)] A popular method to determine the coefficients of G is a least-squares regression analysis. Given the measured inputoutput sequence [y(1) X(0), y(2) X(1), ‚‚‚, y(N) X(N - 1)] recall that the objective function of linear equation is minimized by the following: N
∑ (y(t) - g x(t - 1) - g x(t - 2) - ‚‚‚ N 1
2
t)1
gk-1x(t - k + 1))2 )
1 N
N
∑ (y(t) - GX (t - 1)) T
t)1
N
∑ X(t - 1)(t,G)
xN t)1
then the linear regression model is written as
1
if G * G0
t)1
G ) [g1, g2, ‚‚‚ gk-1]
V(G) )
if G ) G0
t)1
and
(12)
where gi(i ) 1, 2, ‚‚‚ k - 1) represents the coefficients of an MPC model between variables of y and x, and is a scalar of the model prediction error. Now, define a model parameter vector for eq 12
N
2
from the definition of ζN(G), it is thus the normalized residual for the MPC model. Here, ζN(G) and K[G,y(t)] are both vectors, while as a scalar residual, (t,G) contains less information relevant to the data than the vectors. Actually, if instead of the normalized residual ζN(G), the model error (t,G) is used as the detection residual, an asymptotically sufficient statistic will not result. Define
[ ][
y(k) x(k - 1) y(k + 1) x(k) Y) ,X) l l y(n + k - 1) x(n + k - 2)
x(k - 2) ‚‚‚ x(0) x(k - 1) ‚‚‚ x(1) l l l x(n + k - 3) ‚‚‚ x(n - 1)
]
and let Yˆ represent the prediction vector of the MPC model. Then, the primary residual matrix K
[ ]
K1(G0,Y1) K (G ,Y ) K) 2 0 2 l Kn(G0,Yn)
[
Ind. Eng. Chem. Res., Vol. 47, No. 2, 2008 335
(yˆ (k) - y(k))x(k - 1) (yˆ (k + 1) - y(k + 1))x(k) ) l (yˆ (n + k + 1) - y(n + k + 1))x(n + k - 2)
(yˆ (k) - y(k))x(k - 2) (yˆ (k + 1) - y(k + 1))x(k - 1) l (yˆ (n + k + 1) - y(n + k + 1))x(n + k - 3)
‚‚‚ (yˆ (k) - y(k))x(0) ‚‚‚ (yˆ (k + 1) - y(k + 1))x(1) l l ‚‚‚ (yˆ (n + k + 1) - y(n + k + 1))x(n - 1)
]
and the normalized residual can be written as ζN(G) )
1 (Yˆ - Y)TX xN
(13)
Clearly, if one wants to detect abnormal data in a training data set using the asymptotical detection method, a ‘true’ process model should be known in advance. On the other hand, because the ‘true’ process model is the objective of model identification by the given training data, the ‘true’ process model is not generally available. However, as stated in the previous section, a ‘true’ process model is not necessarily required if there exists a model that is better than the one regressed from the corrupted training data set. Piecewise linearization results in improved models by reducing the affect of process nonlinearity and, therefore, can be used as the ‘true’ process model for identifying abnormal plant test data. The description of piecewise linearization is shown in Appendix B. 3.2. Modification of the Threshold. In the framework of MPC model identification, the primary residual and the normalized residual were derived in the previous section. One can apply eq 11
∑
χ2global ) ζN(G ˆ )T
-1
(G0)ζN(G ˆ)
to calculate χ2global. Note, G0 is the MPC model identified from the application of piecewise linearization to the training data. Because the number of segments of piecewise linearization will affect the degree of linearization of the linearized data (DL) (see Appendix B, eq B2) and can be chosen arbitrarily in principle, the degree of linearization can affect the detection method. As a result, the use of eq 11 to identify corrupted data requires a threshold that is dependent on the degree of linearization used. 3.2.1. The Impact of Piecewise Linearization on the Asymptotical Detection Method. It has been shown in Appendix B that the nonlinear content of a training data set can be reduced by the use of piecewise linearization. The number of segments will affect the linearity content of the transformed data, which can be quantified by the quantity, DL. When more segments are used for the piecewise linearization, the transformed data will asymptotically become more linear. That is, DL will asymptotically approach unity as the number of segments is increased. Consequently, based on the linearization, the MPC model quality should improve. Therefore, if the MPC model error is dominated by process nonlinearity rather than by unmeasured disturbances, the detection algorithm will be more sensitive to process nonlinearity. As a result, the detection algorithm can issue a large number of false alarms. Hence, we need to revise the original χ2global threshold value to circumvent false alarms due to the nonlinearity of a training data set. We still want the new threshold value of χ2global to be as small as possible so that the detection performance is lossless. In addition, the smaller threshold of the new χ2global will make the detection test more sensitive not only to unmeasured process disturbances but also to other factors, such as process nonlinearity. The issue of an improved threshold can be outlined as follows: (1) Because the proposed detection algorithm is based on the piecewise linearization, it is naturally assumed that the improved threshold χ2global should be a function of the degree of linearization (DL), i.e. χ2improved ) f(DL) (2) The improved threshold χ2improved value should keep at least at the same false alarm rate as the standard χ2standard value with respect to the unmeasured process disturbances. In other words, if there are no unmeasured process disturbances, we can have χ2standard < χ2global < χ2improved if there exist data corrupted by unmeasured disturbances, we of course should have χ2global > χ2improved > χ2standard (3) To maintain the sensitivity of the detection method to unmeasured process disturbances, the improved threshold χ2global should be chosen as small as possible. 3.3. Improved Threshold. The objective of improving the threshold is to find a new threshold for χ2global for which the false rate P(χ2global > χ2improved) is less than the false alarm rate of P(χ2global > χ2standard). Then, we can have χ2standard e χ2improved And the following will hold 1 - P(χ2global > χ2improved) > 1 - P(χ2global > χ2standard) ) 1 - R where R is a predefined false alarm rate. Now, the issue of improving the threshold is converted into the problem of how to set up a new threshold based on DL and other available information while keeping it as small as possible.
336
Ind. Eng. Chem. Res., Vol. 47, No. 2, 2008
Figure 1. A schematic diagram of a typical FCC process: reactor (RX); regenerator (RG); main fractionator (MF); wet gas compressor (A); overhead drum (B); disengager (D); reactor riser (E); feed preheater (G); catalyst cooler (H); stack gas expander (I); air blower (J); motor (K); CO boiler (or waster heat boiler) (L).
Theorem 2. Given a training set (Xk,Yk), and its linearized training set (Xk,(YL)k), with a certain degree of linearization, DL φ2 )
[
1 DL xN
∑X [Y T k
k
- (YP)k]T ×
∑
-1
(G0)
(
1 DL xN
∑X (Y T k
k
)]
- (YP)k
models and calculate the new threshold, χ2improved using eq C10. Calculate χ2global using eq 11. (8) If χ2global is larger than the predefined value of χ2improved, then we mark it as abnormal data, delete it, and go back to step 1 until all the χ 2global values are lower than χ 2improved.
(14)
where YP is the prediction of Y based on the models identified from the training data set (Xk,Yk), then the new threshold of the asymptotical detection method is χ2improved ) (xχ2standard + xφ2)2 The proof of this theorem is shown in Appendix C. Algorithm. Consider a training data set that is used to identify MPC models. The corrupted data in the training data set can be detected using the following procedures: (1) Identify MPC models based on the original data set by the least-squares method. (2) Apply piecewise linearization to the original data set and obtain the piecewise linearized training data set using eq B1. (3) Calculate DL using eq B2. (4) Reidentify the MPC model based on the linearized training data set using the leastsquares method. (5) Calculate the covariance matrix based on the piecewise linearized training data and the piecewise linear MPC models using eq 10. (6) Calculate the normalized residuals based on the original data set and the piecewise MPC models using eq 13. (7) Select the threshold value from the χ2standard table with respect to a predefined false rate and the degrees freedom equal to the number of coefficients of step response
4. Case Study 4.1. Process Description. In this work, a detailed fluidized catalytic cracking (FCC) model18 is used to represent the real process. An FCC process is an important unit in a modern refinery because it converts low value gas oil to high value gasoline and light gases. A typical FCC process is shown in Figure 1. The hydrocarbon feedstock is preheated in the feed preheater (G) before entering the riser (E) where it is first mixed with and vaporized by the hot regenerated catalyst from the regenerator (RG). Endothermic catalytic cracking reactions take place as the hydrocarbon rises along the length of the riser. Lighter hydrocarbon products are produced along with the byproduct coke which is deposited on the surface of the catalyst and decreases the catalytic activity. The main product vapor goes into the mainfractionator unit (MF) where different products are separated. The deactivated catalyst is separated in the cyclone (F) and falls into the stripping section (D) where the residual products on the catalyst are stripped by stripping stream. The spent catalyst from the stripping section goes into the regenerator (RG) through a transport line. In the regenerator, the coke is burned off by the air from the air blower (J). After the removal of coke from the surface of the catalyst, the activity of the catalyst is renewed while releasing heat due to burning
Ind. Eng. Chem. Res., Vol. 47, No. 2, 2008 337
Figure 3. Threshold comparisons of χ2 value estimate with DL ≈ 0.3 for the case with no unmeasured disturbance.
Figure 2. Plant test signal for MVs. Table 1. Process Variables for MPC Models manipulated variables (MVs)
controlled variables (CVs)
symbol
description
symbol
description
u1 u2
valve position (RG to RX) valve position (blower to RG)
y1 y2
riser outlet temp O2 concn of stack gas
the coke, which heats the catalyst to a temperature sufficient to vaporize the feed and to maintain the temperature in the riser. The regenerated catalyst is brought into the riser by another transport line. Thus, the generated catalyst continuously enters the riser, and the spent catalyst comes into the regenerator providing a cyclic continuous reaction process. The CVs and MVs of MPC models for the FCC process are listed in Table 1. The manipulated variables (MVs) are the valve position for the slide valve on the line from the regenerator to the riser and the valve position on the valve for the air flow to the regenerator, where the controlled variables (CVs) are the riser temperature and the oxygen concentration in the stack gas. For FCC processes, feed composition variation is a typical source of unmeasured process disturbance because it will change irregularly with time and affect the quality of product and the performance of the process. For this work, the ratio of heavy oil to light oil in the fresh oil feed is used as the unmeasured disturbance. Significant changes in the ratio result in a significant change in process behavior. The initial base case-value of heavy oil to light oil ratio is equal to 1. When conducting industrial plant tests, plant test signals are designed for each manipulated variable, the holding time for the step change are 1τp, 2τp, 3τp, 4τp, 5τp, respectively,19 where τp is a process time constant. Figure 2 shows the step changes in the MVs used to develop the MPC models in this work. In this work, it has been further assumed that the process noise level is limited to a certain range that will not significantly corrupt the MPC models. That is, the noise levels used were based on assuming that properly functioning sensors were used to collect the plant test data. The sensor noise is modeled as Gaussian colored noise.20,21 The repeatabilities for the temperature sensor and the O2 analyzer are (1.1 K and (0.002, respectively. The time unit of this work is based on the sampling interval for each simulation case and it varies somewhat from case to case. 4.2. Identifying Unmeasured Disturbances in MPC Plant Test Data. The asymptotical detection algorithm was used to evaluate the efficiency of detecting unmeasured process disturbances in an MPC plant test data set generated using the FCC simulation. The plant test signal applied is shown in Figure 2. Considering a training data set for an MPC model between the riser outlet temperature (y1) and the slide valve position from RG to RX (u1), we first investigated a set of process data based on a step test without unmeasured disturbances to determine
the best available (‘true’) process model using no process noise and applying piecewise linearization to the data to reduce the effect of process nonlinearity. Then, unmeasured disturbances in the feed composition were investigated. Finally, an example is used to demonstrate that the quality of the MPC model is improved by removing the data corrupted by unmeasured disturbances. Even though ADM will be applied and tested using FIR-type MPC models, ADM is not limited to FIR models (see section 2). 4.2.1. Disturbance Free Case. The process training data for the effect of u1 on y1 was collected for the case without an unmeasured disturbance. The least-squares method was used to identify the MPC model for y1 - u1. Applying piecewise linearization to the raw training data to generate the linearized data set, the degree of linearization, DL, was calculated equal to approximately 0.3. The linearized data was used to reidentify a new MPC model between y1 - u1. The covariance matrix ∑(G0) was estimated using this linearized training data set and its MPC model. To reduce the computational load and guarantee a positive definite covariance matrix, the covariance estimate was sequentially based on a sliding window with the sample size N (in eq 13) equal to 100. The χ2global of the raw training data was calculated from the sliding window with the sample size N (in eq 13) equal to 350. The original threshold was 152.21 obtained from the standard χ2 table by selecting false alarm R ) 0.025.19 Following the definition of φ2 and substituting DL, Y, Yˆ , and ∑(G0) into eq 12, φ2 is equal to 72.41. The improved threshold is calculated as follows: χ2improved ) (x152.21 + x72.41)2 ) 434.59 A plot of χ2 values is shown in Figure 3. These results indicate that the improved threshold can significantly decrease the rate of false alarms. Similarly, for the same case, we increased the DL to 0.7 by adjusting the number of segments applied for piecewise linearization. Applying the same procedures as described in the above case, we calculated φ2 equal to 453.69. Then the improved threshold is χ2improved ) (x152.21 + x453.69)2 ) 1128.7 The results are shown in Figure 4. From these two figures, one can see that using the improved threshold can improve the reliability of the asymptotical detection method. As a result, the detection algorithm should be less sensitive to the process nonlinearity contained in the training data set even though the model error primarily results from the process nonlinearity. Moreover, results clearly indicate
338
Ind. Eng. Chem. Res., Vol. 47, No. 2, 2008
Figure 4. Threshold comparisons of χ2 value estimate with DL ≈ 0.7 for the case with no disturbance.
Figure 6. Detecting unmeasured disturbances with a step change from 1 to 1.42.
Figure 5. Model comparison between the true model and the corrupted model for a step change in the unmeasured disturbances (1.42): sold line, true model; dashed line, corrupted model.
that the detection method is independent of the value of DL when the standard threshold is replaced with the improved threshold. 4.2.2. Unmeasured Disturbance Cases. I. Generation of Unmeasured Disturbances. For the FCC process simulation, variations in the feed composition are assumed to be the source of unmeasured process disturbance because it will affect the MPC model quality. The initial value of the heavy to light feed ratio is equal to 1. Two kinds of unmeasured disturbances are considered in this work: one is a step change and another one is random walk variation. The details of the procedures used to generate random walk unmeasured disturbances are shown in Appendix D. II. Step Change in the Unmeasured Disturbances. For a step change in the unmeasured disturbance, the single step change in the unmeasured disturbance followed by a step change return to its initial value in each plant test is first considered; then, the case with multiple levels of step changes in the unmeasured disturbances in one plant test data set is considered. 1. Single Step Change in the Unmeasured Disturbance. During the plant test, the feed composition (ratio of heavy oil to light oil) is changed from 1 to 1.42 at time unit 500 and returned to the original value (1.0) at time unit 1400. Based on including the abnormal training data, the MPC model y1 - u1 is identified using DMCplus from AspenTech and is shown in Figure 5 along with the ‘true’ model. From this figure, compared to the true model, it is shown that the estimated model is changed by the unmeasured disturbance. Applying the proposed detection algorithm to this training data, the results are shown in Figure 6, where the DL is 0.7. From Figure 6, for a feed composition change at time unit 500, the detection method was able to detect the abnormal data. Therefore, the asymptotical detection method was able to identify the step change in the unmeasured disturbance. Even though the unmeasured disturbance was not terminated until time unit 1500, the ADM did not identify corrupted data after the time unit 1400.
Figure 7. Detecting multiple-amplitude step changes in the unmeasured disturbances, DL ≈ 0.7.
Figure 8. Detecting multiple-amplitude step changes in the unmeasured disturbances after first slicing out some of the corrupted data, DL ≈ 0.7.
It should be pointed out that considering other pairs of MPC models (i.e., y1 - u2, y2 - u1, y2 - u2), the ADM was able to effectively identity the unmeasured disturbance although in each case a different value of the improved threshold values will result. 2. Multiple Levels of Step Change in the Unmeasured Disturbances. In industrial processes, unmeasured process disturbances like feed composition will not occur at a simple upset level during the MPC plant test. In other words, the scale of unmeasured disturbances can vary with time. Therefore, a series of step changes in the unmeasured was considered. Step
Ind. Eng. Chem. Res., Vol. 47, No. 2, 2008 339
Figure 10. Model comparisons between the true model and the model corrupted by two levels (0.005 and 0.03) of the random walk changes in unmeasured disturbances: solid line, true model; dashed line, corrupted model.
Figure 9. Detecting relatively large magnitude random walk changes in an unmeasured disturbance (low level, scaled 0.005; high level, scaled 0.03), DL ≈ 0.7.
changes in the ratio of heavy to light feed, with step magnitudes 1.14, 1.42, 2.8, and 7.0, are introduced sequentially during one plant test, which is shown at the top of Figure 7. The asymptotical detection algorithm was applied to this training data set, where the DL was adjusted to approximately 0.7. The detection results are shown in the bottom graph of Figure 7. From Figure 7, one can see that, when the training data are corrupted by multilevels of step changes in the unmeasured disturbance, the detection method will first identify the data corrupted by the largest unmeasured disturbances, i.e., step magnitudes 2.8 and 7.0 simultaneously. After removing these corrupted data from the training data set, the corrupted data in the rest of training data can be identified and removed until all significant disturbances are removed. Figure 8 shows the second cycle of testing. This approach will be further demonstrated in the next section. III. Random Walk Change in the Unmeasured Disturbances. Similar to the case of a step change in the unmeasured disturbance, two groups of random walk unmeasured disturbances are considered: (1) a single magnitude random walk unmeasured disturbance and (2) multiple random walk changes with different magnitudes in the unmeasured disturbance. For the single magnitude random walk disturbance case, random walk changes for the unmeasured disturbance were scaled by 0.03. For a series of random walk changes in the unmeasured disturbance of different magnitudes, this case will be considered simultaneously in one plant test. 1. Single Random Walk Change in the Unmeasured Disturbances. A background random walk disturbance with a magnitude of 0.005 was applied throughout the plant test except where a large magnitude random walk disturbance was applied. The top plot of Figure 9 shows a 0.03 scaled magnitude random walk disturbance applied between the time unit 500 and 1300. The complete set of data was used to identify an MPC model, and this corrupted model is plotted in Figure 10 along with the “true” model. The detection method is applied to this plant test data, where the DL is 0.7. Therefore, from this case, ADM was able to identify most of the corrupted training data resulting from a single level random walk unmeasured disturbance. 2. Multilevels of the Unmeasured Disturbances. Similar to the case of the multiple step changes in the unmeasured disturbance, an MPC plant test was designed (as shown in Figure 2) that was corrupted by the multiple levels of the random walk unmeasured disturbances. These levels of unmeasured distur-
Figure 11. First application of ADM for a set of unmeasured random walk changes in disturbances scaled from 0.01 to 0.05, DL ≈ 0.7.
bances are scaled by 0.01, 0.02, 0.03, and 0.05. The profile of this multidegree unmeasured disturbance test is shown as the top graph in Figure 11. ADM was applied to this training data set, where the DL is approximately equal to 0.7. The detection results are shown in the bottom plot of Figure 11. From this figure, it was found that the detection method was able to identify all the data corrupted by the 0.05 scaled random walk unmeasured disturbance, which is the largest magnitude disturbance used in this case. After the identified corrupted data are removed from the training data set, the detection method is applied to the remaining plant test data set. The data corrupted by the 0.03 scaled and 0.02 scaled random walk unmeasured disturbances were identified at the same time from the remaining test data, as shown in Figure 12. Thus, this procedure can be implemented continuously to identify all the corrupted data, as shown in Figures 11-14. In Figure 14, it is shown that most of the χ2global values are lower than the threshold, which indicates that the implementation of the detection algorithm can be terminated. Therefore, for the training data set which is corrupted by the multiple degrees of unmeasured disturbances, the detection algorithm can be applied iteratively to the training data to identify the abnormal training data that will potentially undermine the MPC model until almost all the χ2global values are lower than the improved threshold. Figure 15 shows the true model and the various models identified from the iterative removal of abnormal data from the data set. Table 2 summarizes the improvement in the model during the iterative application of ADM for this case. Note that in this case, over 30% of the data was corrupted, and yet ADM was able to ferret out the corrupted data. If the majority of the data were corrupted or if the uncorrupted data did not contain adequate information about the response of the
340
Ind. Eng. Chem. Res., Vol. 47, No. 2, 2008
Figure 15. Model comparison of the true model and the multi-improved model after slicing bad data: solid line, true model; dotted line, first improved model; dashed-dotted line, third improved model; dashed line, corrupted model based on all the test data. Figure 12. Second application of ADM for a set of random changes in unmeasured disturbances scaled from 0.01 to 0.05, DL ≈ 0.7.
Table 2. Comparison of the Corrupted Model and the Multi-Improved Model
corrupted model first-improved model second-improved model third-improved model
MSE_STEPa
gain difference (%)
915.92 298.51 16.88 61.75
16.58 9.47 4.84 3.79
a MSE_ STEP is the mean square error of step response model from the true model.
Figure 13. Third application of ADM for a set of random walk changes in unmeasured disturbances from 0.01 to 0.05, DL ≈ 0.7.
Figure 16. Application of ADM for a set of random walk changes in unmeasured disturbances scaled from 0.01 to 0.05 for a high measurement noise case, DL ≈ 0.7.
Figure 14. Fourth application of ADM for a set of random walk changes in unmeasured disturbances from 0.01 to 0.05, DL ≈ 0.7.
process, it is not realistic to expect ADM to accurately detect the corrupted test data. 4.3.2. Sensitivity of ADM to Measurement Noise. To evaluate the sensitivity of AMD to measurement noise, the repeatability of the colored sensor noise for the temperature reading and the composition analyzer was increased to (1.5 K (40% increase compared to the base case level) and (0.0004 (100% increase compared to the base case level) [moderate increase in sensor noise levels]. It was found that the ADM
was able to iteratively identify the multilevels of the unmeasured disturbances in a fashion similar to the results shown in Figures 11-14. The MPC model was accordingly improved after removing the unmeasured disturbances. When the repeatability of the colored sensor noise was increased to (3.3 K (a 120% increase compared to the base case level) and (0.0008 (200% increase compared to the base case level) [large increase in the sensor noise levels], ADM was able to identify the multilevels of the unmeasured disturbances, as shown in Figure 16. In this case, the unmeasured disturbance levels scaled by 0.02, 0.03, and 0.05 were all identified in a single χ2 test. The MPC model was improved as well, as shown in Figure 17. Note that even after the corrupted data were removed from the test results, the regressed MPC model showed a significant mismatch when compared to the ‘true model’. This results because the training data are corrupted by a large
Ind. Eng. Chem. Res., Vol. 47, No. 2, 2008 341
the variables, a series of pairs of data (a measurement and its prediction) is selected to be the boundary points for the segments for the number of segments of interest. In each segment, apiecewise linear equation is regressed using its onset point and ending point, as shown by yl(k) )
Figure 17. Model comparison of the true model and the improved model after slicing bad data for the high measurement noise case: solid line, true model; dashed line, improved model; dashed-dotted line, corrupted model based on all the test data.
magnitude colored noise. The results for a moderate and a largeincrease in the sensor noise indicate that the ADM does not appear to be adversely affected by sensor noise levels. 5. Conclusion Based on the improved MPC model using piecewise linearization, an asymptotical detection method (ADM) was proposed to identify MPC training data corrupted by unmeasured disturbances. Different cases of unmeasured disturbances based on a simulation of a FCC unit were used to evaluate the efficiency of this detection method. It was found the asymptotical detection method can iteratively identify most of the corrupted training data that can undermine the fidelity of MPC models. After removing the corrupted data, the correspondence between the regressed MPC model and the “true” process model was shown to improve significantly. Finally, the ADM was shown not to be particularly sensitive to the level of colored sensor noise. Appendix A: The Proof of Theorem 1 Let G0 represent the true process model, which is not corrupted by unmeasured disturbances. G ˆ represents a model corrupted by unmeasured disturbances. Assume ∂ζN(G ˆ )/∂G ˆ ˆ) exists, then, we apply a first-order Taylor expansion to ζN(G about G0: ζN(G ˆ ) ≈ ζN(G0) +
∂ζN(G ˆ) (G ˆ - G0) | ∂G ˆ Gˆ )G0
Because ζN(G0) is a Gaussian distribution with a zero mean, after the models have been corrupted, ζN(G ˆ ) will converge to (∂ζN(G ˆ )/∂G ˆ |Gˆ )G0)(G ˆ - G0). The mean value of ζN(G ˆ ) is not equal to zero due to G ˆ * G0, and the covariance matrix is equal to ∑(G0). Therefore
∑(G )] ζ (G ˆ ) ∼ N[*0, ∑(G )] ζN(G ˆ ) ∼ N[0, N
0
0
under H0 under H1
Appendix B: Piecewise Linearization of MPC Plant Test Data The measurements of a controlled variable in a training data set at each sampling time, represented by [ym(1), ym(2), ‚‚‚ ym(N)], will provide points on the line segment in the process variable domain; the predictions, [yp(1), yp(2), ‚‚‚ yp(N)], corresponding to [ym(1), ym(2), ‚‚‚ ym(N)] will yield points on the line segment in the transform domain. Within the range of
ym(k) - ym(i) ym(j) - ym(i)
× (yp(j) - yp(i)) + yp(i)
(B1)
where ym(k) represents the measurement, and yl(k) is the linearized value of ym(k). [ym(i),yp(i)] and [ym(j),yp(j)] are the boundary points of a segment. By locating the interval [ym(i),ym(j)] in which ym(k) lies, the corresponding eq B1 is used to calculate yl(k), the linearized value. It should be noted that the number of segments must be less than, at most equal to, that of the number of sample data. By adjusting the number of segments, we may regress different piecewise linear equations using eq B1. The different level of nonlinearity of the linearized data sets can be obtained. In order to quantify the linear content of piecewise linearized data, we define the quantity DL )
∑|y (t) - y (t))| (yl(t) - ym(t)
1 N
p
(B2)
m
as the linear degree of the linearized data, where yl is the linearized data, ym is the measurement, and yp is the prediction of ym. From this definition, it was found that if the number of segments increases, the value of DL will be asymptotically approach to 1. That is to say, as a set of data is linearized the value of DL approaches unity 1, i.e., DL ) 1. The equality, DL ) 1, will hold, only if the number of segments is equal to the number of data points in the data sets. When the raw training data set is linearized by the piecewise linearization, a linearized training data set will be obtained. The nonlinearity of process is one of the sources of model mismatch. Therefore, when the nonlinearity of training data is reduced, one can identify a model which will be expected to be improved compared to the original one. Appendix C: The Proof of Theorem 2 Given ζN(G0) and ζN(G ˆ ), we have stated that ζN(G ˆ ) can be approximated at G0 by a first-order Taylor series ζN(G ˆ ) ≈ ζN(G0) +
ˆ) ∂ζN(G | (G ˆ - G0) ∂G ˆ Gˆ )G0
(C1)
where G ˆ and G0 are the coefficients of identified models which are regressed using the original training data set (Xk,Yk) and the linearized training data set [Xk,(YL)k], respectively. From the definition of normalized residual ζN(G ˆ ), it can be proven that ∂ζN(G ˆ )/∂G ˆ is equal to (from eq 13), i.e. ∂ζN(G ˆ) ∂G ˆ
)
1
N
∑X X
T k k
xN k)1
By the least-squares identification methodology, we may have N
G0 ) (
∑
N
[XkTXk])-1
k)1
∑ X (Y ) T k
N
T -1 k k
k)1
(C2)
N
∑ [X X ]) ∑ X Y
G ˆ )(
L k
k)1
T k k
k)1
(C3)
342
Ind. Eng. Chem. Res., Vol. 47, No. 2, 2008
Substituting C2 and C3 into C1, we can have
χ2global ) ζN(G0)T
N
1
ˆ ) ≈ ζN(G0) + ζN(G
Then applying C7 and C8 to C6 can be expressed as
∑
xN k)1
XTk [Yk
- (YL)k]
(C4)
∑
∑
>0
YP - Y(t) Then we can obtain
(YL)k - Yk ) DL((YP)k - Yk) Then C4 will be rewritten as ζN(G ˆ ) ≈ ζN(G0) +
∑ X DL[Y T k
xN k)1
k
- (Yp)k]
(C5)
Now we want to look for a new threshold for ζN(G ˆ ), follow the equation
∑
-1
χ2global ) ζN(G ˆ )T
(
N
1
) ζN(G0) +
∑
xN k)1
∑
(G0) ζN(G0) +
- (Yp)k
)]
1
N
×
∑ X DL[Y
xN k)1
T k
k
]
- (Yp)k)
which can be expanded as
∑
∑
[
∑
∑
]
φ2 )
(
1
xN
-1
DL
∑X [Y - (Y ) ] ∑ (G ) T k
T
k
P k
0
(
1
xN
DL
∑X
T k
[
Yk - (YP)k
)]
Then we will have χ2global e2χ2standard + 2φ2
xχ2standard + xφ2
So, we have Min{2χ2defined + 2φ2} ) xχ2standard + xφ2 Then, we can propose the new threshold for the asymptotical detection method based upon piecewise linearization:
∑ 2ζ (G ) ∑
-1
χ2global ) ζN(G0)T
(G0)ζN(G0) + 1 N T -1 (G0) Xk DL[Yk - (Yp)k] + k)1 xN
T
N
] ] ]
∑
2χ2standard + 2φ2 gχ2standard + φ2 + 2 xχ2standard xφ2 )
T
XTk DL[Yk
(
-1
[ [ [
As stated before, we desire to find the new threshold which is required as small as possible in order to reinforce the efficiency of the prediction algorithm. On the other hand, it is easy to prove that the minimum value of the right hand side is
(G0)ζN(G ˆ)
ˆ) replacing ζN(G χ2global
∑
Defining the second term of right hand side of inequality C9 N
1
-1
(G0)ζN(G0) + 2ζN(G0)T × 1 N T -1 (G0) Xk DL[Yk - (Yp)k] + xN k)1 T 1 N T Xk DL(Yk - (Yp)k) × xN k)1 1 N T -1 (G0) Xk DL(Yk - (Yp)k) < 2χ 2defined + xN k)1 T 1 N T Xk DL(Yk - (Yp)k) × 2 xN k)1 1 N T -1 (G0) Xk DL(Yk - (Yp)k) (C9) xN k)1
Note the definition of DL, for the sake of simplicity if it is assumed YL(t) - Y(t)
∑
0
[
1
∑
]
N
∑ X DL(Y T k
xN k)1
∑
-1
[
(G0)
χ2improved ) (xχ2standard + xφ2)2
Appendix D: Random Walk Unmeasured Disturbances
T
k
1
- (Yp)k) N
∑
xN k)1
In this study, an autoregressive model is used to model process unmeasured disturbances
×
]
XTk DL(Yk - (Yp)k) (C6)
On the other hand, for any n-dimension vector A and B, we can easily prove that 2ATB e A2 + B2
(C7)
Furthermore, we can select the original threshold, χ2standard, from the standard χ2 table by the freedom and specifying a false rate such that ζN(G0)T
∑
-1
(G0)ζN(G0) < χ2standard
(C10)
(C8)
X(k + 1) ) X(k) + e(k)
(D1)
where X(0) is equal to the initial value of the disturbance variable, 1, and e(k) is white noise. By taking different values of the standard deviation for the white noise, different levels of white noise can be generated. From these levels of white noise, a variety of random changes can be obtained using D1. In the study, unmeasured process disturbances are first generated by taking the standard deviation as 0.05, then different scale values, λ, equal to 0.01, 0.02, and 0.03 are used to scale the unmeasured disturbances generated from the standard deviation 0.05 to obtain different degrees of unmeasured disturbance generated. The scale equation is Xkλ ) 1 +
Xk - 1 λ 0.05
Ind. Eng. Chem. Res., Vol. 47, No. 2, 2008 343
where λ is the scale value, Xλk is referred to as the scaled unmeasured disturbance by λ at time k, and Xk is referred to as the disturbance value generated by standard deviation 0.05 at time k. For example, when the scale value, λ, is equal to 0.01, the corresponding disturbance is generated by scaling the original unmeasured disturbance with a value of 0.01. Literature Cited (1) Qin, S. J.; Li, W. Detection and Identification of Faulty Sensors in Dynamic Processes. AIChE J. 2001, 47, 1581. (2) Basseville, M. On-board Component Fault Detection and Isolation Using the Statistical Local Approach. Automatica 1998, 34, 1391. (3) Zhang, Q.; Basseville, M.; Benveniste, A. Early warning of slight changes in systems. Automatica 1994, 30, 95. (4) Zhang, Q.; Basseville, M.; Benveniste, A. Fault detection and isolation in nonlinear dynamic systems: A combined input-output and local approach. Automatica 1998, 34, 1359. (5) Narasimhan, S.; Mah, R. S. Generalized Likelihood Ratios for Gross Error Identification in Dynamic Processes. AIChE J. 1998, 34, 1321. (6) Tong, H.; Crowe, C. M. Detecting Persistent Gross Errors by Sequential Analysis of Principal Component. AIChE J. 1997, 43, 1242. (7) Singhal, A.; Seborg, D. E. Dynamic Data Rectification Using the Expectation Maximization Algorithm. AIChE J. 2000, 46, 1556. (8) MacGregor, J. F. On-line Statistical Process Control. Chem. Eng. Prog. 1988, 22. (9) Ljung, L. System Identification: Theory for the user; Prentice Hall: Englewood Cliffs, NJ, 1987. (10) Ljung, L. Asymptotic Variance Expressions for Identified BlackBox Transfer Function Model. IEEE Trans. Auto. Control 1985, 30, 834.
(11) Zhu, Y. Multivariable process identification for MPC: the asymptotic method and its applications. J. Process Control 1998, 8, 101. (12) Dayal, B. S.; MacGregor, J. F. Identification of Finite Impulse Response Models: Methods and Robustness Issues. Ind. Eng. Chem. Res. 1996, 35, 4078. (13) Zhu, Y. MultiVariable System Identification for Process Control; Pergamon: 2001. (14) Basseville, M.; Nikiforov, I. V. Detection of Abrupt Changes Theory and Application; Prentice Hall: 1993. (15) Private communication with Dr. Charlie. R. Cutler, 2003. (16) Basseville, M.; Benveniste, G. M.; Roug´ee, A. Optimal sensor location for detecting changes in dynamical behavior. IEEE Trans. Auto. Control 1987, 32, 1067. (17) Benveniste, A. M.; Metivier, M.; Priouret, P. Adaptive Algorithms and Stochastic Approximations. In Applications of Mathematics; SpringerVerlag: Berlin, 1990; Vol. 22. (18) Han, I. S.; Chung, C. B. Dynamic Modeling and Simulation of a Fluidized Catalytic Cracking Process. Chem. Eng. Sci. 2001, 56, 1951. (19) Riggs, J. B.; Karim, M. N. Chemical and Bio-Process Control; Ferret Publishing: Lubbock, TX, 2006. (20) Fox, R. F.; Gatland, I. R; Soy, R.; Vemuri, G. Fast, Accurate Algorithm for Numerical Simulation of Exponentially Correlated Colored Noise. Phys. ReV. 1988, 38, 5938. (21) Ott, R. L; Longnecker, M. An Introduction to Statistical Methods and Data Analysis, 5th ed.; Thomson Learning: 2001.
ReceiVed for reView December 17, 2006 ReVised manuscript receiVed September 21, 2007 Accepted September 24, 2007 IE061623+