Ind. Eng. Chem. Res. 2006, 45, 1677-1688
1677
Statistical Monitoring of Dynamic Multivariate Processes - Part 2. Identifying Fault Magnitude and Signature Dirk Lieftucht,† Uwe Kruger,*,† Lei Xie,*,‡,⊥ Tim Littler,§ Qian Chen,| and Shu-Qing Wang‡ Intelligent Systems and Control Research Group, Queen’s UniVersity Belfast, Belfast BT9 5AH, U.K., Institute of AdVanced Process Control, National Key Laboratory of Industrial Control Technology, Zhejiang UniVersity, Hangzhou 310027, P.R. China, Energy Systems Research Group, Queen’s UniVersity Belfast, Belfast BT9 5AH, U.K., College of Aerospace Engineering, Nanjing UniVersity of Aeronautics and Astronautics, Nanjing 210016, P.R. China, and Department of Process Dynamics and Operation, Technical UniVersity of Berlin, 10623 Berlin, Germany
This paper presents the second part of the two-part analysis of statistical monitoring of complex multivariate processes. Part I introduced an effective method to remove both autocorrelation and cross-correlation from the monitored variables. This part shows that this method, i.e., the removal of correlation, can considerably influence the magnitude and signature of fault conditions. To overcome this problem, this part introduces a compensation scheme that retains the correct magnitude and signature of step-type faults. The paper extends the compensation scheme to provide accurate estimates of the magnitude and signature of general deterministic fault conditions. The utility of this scheme is shown using simple univariate and multivariate examples, as well as an application to a benchmark simulator, and recorded data from an industrial distillation unit. 1. Introduction The work presented in this paper considers a consequent effect of removing autocorrelation and cross-correlation from monitored variables using a filtering method. In particular, filtering of the score variables can inherently alter both the magnitude and underlying signature of deterministic fault conditions, which is highly undesirable and should, therefore, be circumvented. Although the application of the Kalman innovation filter removes the effect of autocorrelation and cross-correlation upon the significance level for hypothesis testing, this paper demonstrates that an incipient fault condition may go unnoticed by plant operators if it appears to be substantially reduced in magnitude and, thus, not apparent. Moreover, the paper shows that the application of the Kalman innovation filter can also produce a different set of score variables in response to an abnormal event. This clearly hampers the diagnosis of such events. To overcome these cited problems for step-type faults, in particular, typically resulting from sensor bias or failure, this paper introduces a compensation scheme that (i) determines the correct magnitude and (ii) reveals the correct set of score variables that contributes to a deterministic fault condition. This compensation scheme is then extended to general deterministic fault conditions to (i) provide an accurate estimate of fault magnitude, (ii) approximate the fault signature, and (iii) reveal the correct set of score variables that are affected by a fault condition. The paper is divided into the following sections. Section 2 presents a brief review of dynamic process monitoring, followed by an examination of problems resulting from an application * Corresponding authors. E-mail:
[email protected] (U.K.),
[email protected] (L.X.). Tel.: +44 (0) 28 90 974059. Fax: +44 (0) 28 90 667023. † Intelligent Systems and Control Research Group, Queen’s University Belfast. ‡ Zhejiang University. § Energy Systems Research Group, Queen’s University Belfast. | Nanjing University of Aeronautics and Astronautics. ⊥ Technical University of Berlin.
of a Box-Jenkins (ARMA) filter in a univariate context. Section 3 analyzes the influence of filtering the score variables on the magnitude and signature of step and general deterministic fault conditions and subsequently proposes the inclusion of compensation, applied first to a univariate ARMA filter and then to a Kalman innovation model structure for the multivariate context using a number of simplified examples. The utility of the compensated scheme is then demonstrated in Sections 4 and 5 using the Tennessee Eastman (TE) simulator1 and recorded data from an industrial distillation process, respectively, prior to a concluding summary of the work in Section 6. 2. Problem Analysis Following a brief review of dynamic principal component analysis (DPCA) and the utilization of the Kalman innovation filter structure in Subsection 2.1, this section highlights potential problems with such filters in accurately representing the impact of abnormal process behavior. For univariate cases, the influence of ARMA filters upon step-type faults is demonstrated in Subsection 2.2 and that for general deterministic fault conditions is demonstrated in Subsection 2.3. This analysis is then extended in a multivariate framework in Subsection 2.4. Prior to this analysis, we note that any state-space model in Kalman innovation form can be described by an equivalent ARMA model.2 2.1. Brief Revision of DPCA and Score Variable Filtering. DPCA3 is characterized as principal component analysis (PCA) applied to an augmented matrix Z, Z ) [ X0 X-1 ... X-d ], in which a total of N observations of n recorded process variables are arranged to construct an autoregressive model structure of order d. For the kth instance, the DPCA decomposition of Z produces two univariate statistics, T2(k) ) tT(k)Λ-1t(k) and Q(k) ) zT(k)[ I - PPT ] z(k), for which the score vector t(k) ) PTz(k) and Λ is a diagonal matrix storing the variance of the score variables. The Hotelling’s T2 and Q statistics, T2(‚) and Q(‚), respectively, along with scatter diagrams of the score variables,4-6 can be used for on-line process monitoring. Recent published research7 has revealed that dynamic process behavior leads to the production of serial correlation of the
10.1021/ie060017b CCC: $33.50 © 2006 American Chemical Society Published on Web 02/03/2006
1678
Ind. Eng. Chem. Res., Vol. 45, No. 5, 2006
Figure 1. Residual sequence describing step-type fault of magnitude 3, introduced after 400 samples.
DPCA score variables. In addition, the first part of this work (Part I14) has shown that autocorrelation of, and cross-correlation between, score variables is to be expected with this type of behavior and that both types of correlations have the potential to substantially affect the significance level for hypothesis testing. To remedy this unwanted effect, Part 1 demonstrated that applying a Kalman innovation model structure could effectively remove both types of correlations from the score variables and prevent false alarms in excess of (1 - R) × 100%, with R being the statistical confidence of typically 0.95 or 0.99, for example. More precisely, instead of using the computed score variables, t(k), the Kalman innovation sequences, e(k) ) t(k) - Cζ(k) and ζ(k) ) Aζ(k) + Ke(k - 1), where A and C are state-space matrices and K is the Kalman gain, are used to construct the Hotelling T2 statistic and scatter diagrams. 2.2. Effect on Step-Type Faults. This section considers the outstanding issues highlighted in the Introduction concerning the impact of filtering in a condition-monitoring context. This is demonstrated here using a single autocorrelated score variable that is filtered by an ARMA filter and defined by q
e(k) ) t(k) +
p
ψi t(k - i) - ∑φi e(k - i) ∑ i)1 i)1
(1)
where ψi and θi are parameters of the ARMA filter, p and q are the number of lagged moving-average and autoregressive terms, and e(‚) is assumed to be a normally distributed sequence of zero mean and variance σe, i.e., e(‚) ∈ N(0,σe). To study the influence of the filter, characterized by p ) q ) 1, θ ) 0.6, and ψ ) -0.6, upon a step-type fault, a data set containing 1000 samples with e(‚) ∈ N(0,10-4) was simulated and a fault level of f ) 3 was superimposed on the last 600 samples of the sequence t(‚). Figure 1 shows that, although this fault was actually detected, the residual sequence e(‚) incorrectly revealed a fault magnitude of 0.75.
It should also be noted that the initial signature of the fault is also incorrectly described by the inverse ARMA filter. These alterations are a direct result of the transfer function of the ARMA filter,
e(z-1) t(z-1)
)
1 - 0.6z-1 ) T-1(z-1) 1 + 0.6z-1
(2)
where T-1(z-1) is the inverse transfer function and z-1 represents the backshift operator, i.e., z-1t(k) ) t(k - 1). Since the application of an inverse ARMA filter and superposition of the fault to the score variable are both linear operations, we can describe the residuals and the “filtered” step-type fault by -1 Ft(z ) -1 Fe(z )
) 0t(z-1) + f(z-1)
) 0e(z-1) + T-1f(z-1)
(3)
where Ft(z-1) and Fe(z-1) are the autocorrelated and residual sequences describing the fault condition and the subscript 0 refers to the original variables that are not impacted by the fault condition. Defining the magnitude of the step-type fault by η, the filtering of such a sequence by the transfer functions T-1(z-1) results in a dynamic step response function and a change in magnitude, e∞ ) limzf1{T-1(z-1)η}, unless the gain of T-1(z-1) is equal to 1, which cannot be generally assumed. With respect to the filter in eq 2, the sequence T-1(z-1) converges to
1 - 0.6z-1 × 3 ) 0.75 zf1 1 + 0.6z-1
e∞ ) lim
and the dynamic response of this filter to a step-type input is strongly oscillatory for the first 5-10 samples. Hence, the filter alters the magnitude and the initial signature of the step-type fault. Next, the impact of general deterministic fault conditions is examined.
Ind. Eng. Chem. Res., Vol. 45, No. 5, 2006 1679
Figure 2. Residual sequences describing a general deterministic fault condition, introduced after 400 samples.
2.3. Effect of General Deterministic Fault Conditions. To study the effect of general deterministic fault conditions, a simulated deterministic fault was superimposed onto an autocorrelated sequence t(‚). This sequence contained 1000 samples and was described by the ARMA process used in the previous subsection. The fault signature had the following description,
f(z-1) u(z-1)
)
0.0074z-1 + 0.0075z-2 ) F(z-1) 1 - 1.9802z-1 + 0.9851z-2
(4)
where f(‚) and u(‚) represented the fault sequence and a unit step sequence after 400 samples, respectively. After superimposing f(‚) to t(‚), the resulting sequence was filtered using the inverse ARMA filter in eq 2 to produce the following transfer function:
e(z-1) -1
u(z )
)
1 - 0.6z-1 0.0074z-1 + 0.0075z-2 ) 1 + 0.6z-1 1 - 1.9802z-1 + 0.9851z-2 T-1(z-1)F(z-1) (5)
Figure 2 shows the original autocorrelated sequence, t(‚), and the filtered sequence, e(‚), utilized for statistical process monitoring. From this figure, it is apparent that the magnitude of the fault has been significantly reduced. More precisely, the oscillatory response of F(z-1) converges to a final value of 3, i.e., limz)1 F(z-1) ) 3, and the reduction in magnitude, which results from the inverse ARMA filter, produces a final value of 0.75. According to eq 5, it should also be noticed that the filter T-1(z-1) alters the signature of the fault, but to a much lesser extent than that observed in the previous example. 2.4. Multivariate ARMA Filters. In this subsection, the impact of ARMA filters in a multivariate context is analyzed
using a simple example study that includes two score variables, t1(k) and t2(k), which have the following definition,
( ) [
]( [
) ( ) ]( )
t1(k) e (k) 0.118 -0.191 t1(k - 1) ) + 1 + t2(k) t2(k - 1) e2(k) 0.847 0.264 0.447 - 0.809 e1(k - 1) e2(k - 1) 0.859 0.130
(6)
where eT(k) ) (e1(k) e2(k)) ∈ N(0,I). Using the above ARMA filter, a total of 1000 samples of e(k) and t(k) were produced. The recorded scores were augmented by superimposing the following step-type fault on the basis that only the first score variable would be affected by the following abnormal event:
f(k) )
{( ) 10 0 0
∀ k > 400
(7)
∀ k e 400
The upper plot in Figure 3 shows, in particular, the score variables that correspond to a step-type fault after 400 samples. The lower plot in Figure 3 shows the residuals of the score variables and demonstrates that the fault magnitude, signature, and score variables that correspond to the fault are incorrectly estimated in this instance. For step-type faults, the filtered score variables are known to converge to the following steady-state p p values, e∞ ) [I + ∑i)1 Ψi]-1[I - ∑i)1 Θi]µ, where µT ) (10 0) is a vector that represents the fault magnitude for each score variable and Ψi and Θi are parameter matrices that correspond to the multivariate moving-average and autoregressive terms, respectively. Thus, the filtered sequences converged to the following steady-state values:
( ) [
e1∞ 1.447 -0.809 e2∞ ) 0.859 1.130
][ -1
]( ) ( )
0.882 0.191 10 ) -0.847 0.736 0 1.337 -8.512
(8)
1680
Ind. Eng. Chem. Res., Vol. 45, No. 5, 2006
Figure 3. Residual sequences describing a step-type fault of magnitude 10 introduced to the first score variable, injected after 400 samples.
In this example, it was determined that the filtered score variables from the simulation converged to theoretical values predicted by eq 8. Hence, the application of an ARMA filter in a multivariate context not only affects the magnitude and signature of fault conditions, it also affects which particular score variables correspond to an abnormal event. The next section proposes a compensation scheme for step-type and general deterministic fault conditions in a univariate context and extends this concept to a multivariate context.
Subsection 3.1 introduces the compensation scheme for a single score variable for step-type faults. Subsection 3.2 extends this scheme for general deterministic fault conditions, and Subsection 3.3 explores this approach in a multivariate context. 3.1. Compensation for Step-Type Faults (Univariate Case). The compensation for a step fault is based on the definition of an inverse ARMA filter,
3. Compensation for Filtered Score Variables
where v(k - 1) ) (t(k - 1) ‚‚‚ t(k - q))T, e(k - 1) ) (e(k 1) ‚ ‚ ‚ e(k - p))T, and θ and ψ are vectors containing the parameters of the ARMA filter. The occurrence of a step fault is characterized by the superposition of a constant to t(k). If the magnitude of the step-type fault is 1, the identified magnitude q of the inverse ARMA filter is given by e∞ ) (1 + ∑i)1 ψi)/(1 + p ∑i)1θi) (ref 11) and, depending on the parameters, may differ considerably from 1. More precisely, two extreme cases can be established:
The previous section showed that filtering the score variables using inverse ARMA filters might have a significant influence upon the resultant monitoring statistics. More precisely, the filter residuals may yield a substantially different magnitude and an altered signature, as compared to the original score variables. This implies that such filters influence sensitivity when detecting abnormal conditions, which is undesirable. Moreover, the filtering process may also result in a different set of residual variables being affected by a fault condition. This hampers the diagnosis of abnormal fault conditions. For a single variable, previous work by Harris and Ross8 showed that positively autocorrelated AR models tend to be insensitive to step-type faults. Wardell et al.9 extended this analysis to ARMA filters, focusing on the fault signature and the average run length of step-type faults. This was further investigated by Dyer et al.,10 who showed that an inverse ARMA filter may converge to a different steady-state value for steptype faults. In this paper, a compensation scheme is introduced to accommodate this undesired influence. This scheme is developed for a single score variable first and then extended to be a multivariate framework. In terms of fault conditions, the compensation scheme is first developed for step-type faults and later generalized to approximate any deterministic fault signature.
e(k) ) t(k) + ψTv(k - 1) - θTe(k - 1)
p
(i) 1 +
q
θi . 1 + ∑ψi f e∞ ≈ 0 ∑ i)1 i)1 p
(ii) 1 +
(9)
∑ i)1
q
θi , 1 +
ψi f e∞ . 1 ∑ i)1
Case (i) produces a fault impact that has been considerably reduced, which may render it undetected; Case (ii) presents a very sensitive monitoring approach. To reveal the actual effect of a step fault on t(k), the compensation scheme is defined as follows,
(k - 1) )
k-1 e(i)
∑ i)j k-j
(10)
Ind. Eng. Chem. Res., Vol. 45, No. 5, 2006 1681
Figure 4. Approximation of general deterministic fault condition using a fixed window approach.
where j represents the sample when the step fault first occurred, as established by the existing fault detection scheme. This compensation term, E(k - 1), is now subtracted from the score sequence v(k - 1) and the filtered score sequence e(k - 1) such that
e(k) ) t(k) + ψT(v(k - 1) - 1qE(k - 1)) - θT(e(k - 1) 1pE(k - 1)) (11) e(k) ) t(k) + ψTvˆ (k - 1) - θTeˆ (k - 1) Here, 1q and 1p are vectors of ones with dimensions of q and p, respectively. The aim of the compensation term is, thus, to isolate the influence of the ARMA filter from the step-type fault. Consequently, when a new sample becomes available, the departure of this sample from the prediction of the “corrected” inverse ARMA filter should, therefore, converge to the correct value contained in the fault. It should be noted that the fault, i.e., the integer j, can be detected by one or more of the T2 and Q statistics as well as the scatter diagrams. In contrast, Dyer et al.10 highlighted that the magnitude may be altered without the application of the above compensation term. Subsection 3.3.1 which follows provides a proof to show that the compensation term can remove the impact of a steptype fault from a multivariate ARMA filter. Next, the proposed compensation scheme is extended to approximate general fault signatures. 3.2. Compensation for General Deterministic Fault Signatures (Univariate Case). The aim of the proposed scheme is to remove the impact of step-type faults on past values of the score variable, v(k - 1), and the filtered score variable, e(k - 1). As demonstrated in Subsection 3.3.1, the difference between t(k) and the prediction of the ARMA filter, using the “corrected” sequences vˆ (k - 1) and eˆ (k - 1), converges to the “true” magnitude of the step-type fault. In a general case, however, the fault signature cannot be assumed to be a step. To approximate a general fault signature, the window in which the fault condition exists, including the samples between j and k, is divided into several smaller and disjunct windows. The compensation is then applied to each window individually. Figure 4 shows this process schematically. In this figure, w is the base window length, and h is an integer counter indicating which window is currently being used. To guarantee that only the instances inside the current window contribute to the determination of the compensation term, (.) must be set to zero after each window is passed. The window size, w, determines how accurately the fault signature can be approximated. Importantly, it is assumed that the fault is superimposed on the score variable, which represents a stochastic sequence of zero mean. More precisely, the sequence of the score variables, Ft(k), is equal to the “true” sequence 0t(k) plus the fault signature f(k), i.e., Ft(k) ) 0t(k) +
f(k). If w is selected to be too short, the stochastic component of 0t(k) will not be negligible when eq 10 is used to determine the compensation term. In contrast, if w is chosen to be too long, the fault signature cannot be approximated with high accuracy. This is demonstrated using the following simulation examples. 3.2.1. Simulation Example for a Fixed Window Size. The utility of the proposed compensation scheme for one score variable using a fixed window is now demonstrated using the example studied in Subsection 2.3. Using the transfer function (1-0.6z-1)/(1 + 0.6z-1), a sequence of 1000 samples was produced. The upper plot in Figure 5 shows the constructed score variable sequences. Given the description of the score variable in this example, it is apparent that the magnitude of a step-type fault is reduced by a factor of 4. Consequently, from an application of the inverse ARMA filter to remove the autocorrelation, the filtered score variable converges to a final value of 0.75, as observed in the middle plot of Figure 5. In the case of a general deterministic fault, using the proposed approach, a special consideration of the window size must be established. With a window size of w ) 15, the compensation produced the graph shown in the lower plot of Figure 5. 3.2.2. Simulation for Variable Window Size. The selection of the window size is a tradeoff between the ability of the compensation scheme to approximate the fault signature f(k) (short window size) and the reduction of influence of the stochastic score variable 0t(k) upon the compensation term (k) (large window size). This is exemplified by applying a very small window size, w ) 2, a very large window size, w ) 100, and the selected window size of w ) 15. Figure 6 compares the resultant sequences and indicates the following: • For the short window size, w ) 2, the variance of the stochastic component significantly increased between the 401st and 1000th sample, compared with the variance of the first 400 samples (which are not affected by the superimposed fault condition), but the fault sequence is correctly represented. • For the selected window size, w ) 15, the variance of the stochastic component of the compensated signal does not significantly differ from the original sequence e(.) and the fault signature is approximated with high accuracy. • For the long window size, w ) 100, the variance of the stochastic component of the compensated signal is almost equivalent to the uncompensated sequence; however, the signature of the fault condition is incorrectly represented. Table 1 shows the variance of the compensated sequence e(k) for various values of w and indicates that, with an increasing window length, the variance of e(k) converges, as expected, to the variance of the uncompensated sequence. The variances in Table 1 were obtained using the first 400 samples. It should be noted that increasing the window size from 2 to 15 reduces the variance sharply from 0.0049 to 0.0031. In contrast, increasing the window size further only leads to a slow reduction in variance. Consequently, the analysis of Figure 6 and Table 1 reveals that a window size of 15 presents an acceptable trade off between how much the stochastic variable 0t(k) affects the computation of the compensation term (k 1) and how accurately the fault signature can be approximated. This comparison, therefore, reveals the following heuristic for determining the window size: • Start with a larger window size, e.g., 100, and iteratively reduce the window size until the stochastic component of the compensated signal is significantly different from the uncompensated sequence.
1682
Ind. Eng. Chem. Res., Vol. 45, No. 5, 2006
Figure 5. Resultant plots of general deterministic fault condition (third example study). Upper plot f original score variable, middle plot f filtered score variable, and lower plot f filtered and compensated score variable.
Figure 6. Resultant plots for varying window length w (third example study). Upper plot f resultant plot for a window length of w ) 2, middle plot f resultant plot for a window length of w ) 15, and lower plot f resultant plot for a window length of w ) 100.
• Select the smallest window size for which the variance of the stochastic sequence, after compensation, is not significantly different from the variance of the uncompensated sequence. This prevents the compensation scheme from providing a poor approximation of a deterministic fault signature, as demonstrated in the bottom plot of Figure 6. It should be noted, however, that the compensation scheme may lead to departures if the fault
signature varies considerably over a small period of time. However, the fault can be approximated more accurately by using the compensation scheme than without, as demonstrated by this simplified example. The extension of the introduced compensation scheme to a multivariate context is discussed next. 3.3. Compensation for Multivariate Processes. In a multivariate context, the application of a Kalman innovation filter
Ind. Eng. Chem. Res., Vol. 45, No. 5, 2006 1683
where ET(k - 1) ) (1(k - 1) 2(k - 1) ‚ ‚ ‚ m(k - 1)) is defined as follows:
Table 1. Residual Variance of the Compensation Scheme for General Deterministic Fault Conditions for Various Window Lengths window length
variance
2 5 10 15 20 50 100 200 ∞
0.0049 0.0043 0.0039 0.0031 0.0029 0.0028 0.0026 0.0025 0.0024
k-1
E(k - 1) )
q
{
0∀k