1516
Ind. Eng. Chem. Res. 2001, 40, 1516-1527
Principle Component Analysis Based Control Charts with Memory Effect for Process Monitoring Junghui Chen* and Chien-Mao Liao Department of Chemical Engineering, Chung-Yuan Christian University, Chung-Li, Taiwan 320, Republic of China
Franz Ren Jen Lin China Glaze Company, Ltd., Chutung, Hsinchu, Taiwan, Republic of China
Muh-Jung Lu China Steel Corporation, Steel Making Process Section, Kaohsiung, Taiwan 81233, Republic of China
In this research, the combination of the principal component analysis (PCA) model and the traditional techniques with memory effect is developed for detecting the process influenced by a small or a moderate shift in one or more process variables. The traditional techniques, like the exponentially weighted moving average and the cumulative sum, use additional information from the past history of the process for keeping the memory effect of the process behavior trend. On the other hand, PCA can find a set of combination variables which can describe the key variations and trends for the operating data. This integrated method is particularly important for long-term performance deterioration, such as product quality degradation, because gradual small shifts are more difficult to diagnose than a sudden failure of equipment. The complementarity of these methods not only leads to some cross-fertilization between various techniques but also results in a better model. The comparison between the properties of the different combinational methods and PCA in terms of the mathematical definitions is discussed. Furthermore, the proposed methods are demonstrated through two real industrial case studies: a melting process in the glaze industry and the surface quality in a stainless steel slab. 1. Introduction To safely operate a plant or a process, various fault detection and diagnosis methodologies have been proposed. Statistical process control (SPC) charts are commonly used tools for monitoring a process and improving product quality. In a single-variable system, the Shewart chart1,2 was the first proposed method of testing the hypothesis. However, the Shewart chart had a poor performance at detecting small or moderate shifts in the process mean. Cumulative sum (CUSUM) charts proposed by Page detected shifts in the mean level of a process.3,4 Roberts proposed the exponentially weighted moving average (EWMA) chart5 as an alternative to the Shewhart chart. However, the quality of a production process output was often measured by the joint level of several correlated characteristics. In this situation, separate univariate control charts for each measured variable often resulted in false detection when these variables were mutually correlated. In the past, practitioners in industries preferred monitoring a single chart with the combination of multiple process characteristics to giving attention to individual process characteristics using individual control charts. Nowadays, the use of multivariable control charts to monitor industrial processes is increasingly popular. Several advanced techniques have been developed for multivariable SPC (MSPC). For example, Hotelling’s T 2 * To whom all correspondence should be addressed. Email:
[email protected]. Tel: 886-3-4563171 ext 4107. Fax: 886-3-4563171.
statistic was one of the common methods of constructing multivariable control charts.6-8 Like the univariate Shewhart control chart, it was also difficult for Hotelling’s T 2 to identify small or moderate shifts. Because of the CUSUM and EWMA techniques’ sensitivity to small shifts in the univariate process, a logical extension of multivariable CUSUM control charts9 and multivariable EWMA (MEWMA) control charts10-12 was developed. Both methods were based on the concept of the covariance matrix and additional information from the past history of the process. Crosier13 proposed the combinational charts with T 2 and CUSUM and found out that the performance in the integrated model outperformed that in the T 2 model only. Although this method reinforced the direction for the data with a small shift, the solution to the normal T 2 statistic could change drastically for just a small change in the data if the data covariance matrix was nearly rank deficient. In recent years, extensive research in the field of chemical process control has applied a variety of chemometric algorithms to multivariable process monitoring, such as principal component analysis (PCA) and partial least squares.14-16 The basic concept is to extract a set of combinational variables, or major principal components, which can describe the key variations and trends for the operating data. PCA can reduce dimensions when monitoring the process performance, because the key measured variables should be identified to describe the significant behavior of the process. This gives a better understanding of the fault event taking place in the process. However, like Shewhart-type
10.1021/ie000407c CCC: $20.00 © 2001 American Chemical Society Published on Web 02/22/2001
Ind. Eng. Chem. Res., Vol. 40, No. 6, 2001 1517
control charts, PCA cannot exactly detect process faults with a small disturbance because it uses information only from the most current samples. Some researchers tried to combine two SPC methods to improve the detectability. For example, EWM-PCA17 utilized EWMA on the scores data from PCA, while summed-scores PCA (SSPCA)18 constructed summed scores for the last s PCA scores. In both methods, a PCA model was first obtained. Then EWMA or s summed was applied in the scores as a basis for multivariable monitoring. Although EWM-PCA and SSPCA focused on the scores for multivariable monitoring, the confidence control limits for this type of multivariable statistics were not completely developed. In this research, the combination of PCA and the traditional techniques with memory effect is developed. This approach not only processes information with memory to improve the disturbance detectability but also utilizes the joint statistical measures T 2 and the squared residual statistic Q based on the conventional statistical control limits of PCA. Furthermore, two unification statistical bases, Q or T 2, are developed to bring out the similarities and differences for the integrated PCA-based control charts. The main interest here is the combination of PCA with MEWMA and with the multivariate s-term sum (MSSUM). MSSUM means that descriptive statistic variables are computed based on the last s terms of the measured variables. Note that PCA with MSSUM is quite different from the so-called SSPCA because the latter focused only on the summation of the last s PCA scores. The rest of this paper is organized as follows. In section 2, PCA control charts are first reviewed. Then two combinational methods, PCA with MEWMA and MSSUM, respectively, are developed in this section. They provide a basis for comparing the PCA-type methods. The pros and cons of the control charts’ performance, design, and usage are discussed and compared in section 3. In section 4, two real case studiessa melting process in the glaze industry and the surface quality in a stainless steel slabsare employed to illustrate the validity of these proposed techniques. Finally, summaries and conclusions are presented.
standard multivariate statistical technique, it can explain the measurement data via a set of linear functions which models the combinational relationship between measurement variables and latent variables. The number of the linear functions is less than that of the original variables. There is a wide variety of literature on PCA. Valuable presentations of the subject on multivariate statistics and data analysis can be found.19,20 Here only the essential parts involved in the transformation of the original distribution data into the principal components are summarized. Given the measurement, data are normal operating process data collected and put in a two-dimensional data matrix X ∈ Rm×n with m samples and n variables. The data matrix X can be decomposed in terms of r linear principal components with r e n:
2. PCA-Based Control Charts
cRx2θ2h02 θ2h0(h0 - 1) +1+ (CLQ)PCA ) θ1 θ1 θ2
When the data are projected onto a lower dimensional space for extraction of the essential information from the characteristics of the state of the process, the data modeling technique PCA is discussed. However, it is difficult for PCA to monitor process behavior with relatively small shifts. In this section, two skills with the memory effect (MEWMA and MSSUM) based on the PCA control charts are developed. The confidence control limits, Q and T 2, are derived for MEWMA-PCA and MSSUM-PCA, respectively. Note that, in the following notations, scalars are represented in small italic letters, vectors are in small bold nonitalic letters, and matrices (two-dimensional arrays) are in capital BOLD nonitalic letters. The superscript T attached to a matrix refers to the transpose. A vector is assumed to be a column vector, and row vectors are represented as transposes. 2.1. PCA. PCA, a commonly used dimensionality reduction technique, can be employed to compress noisy and correlated measurements into a simpler and smaller informative subspace for measurement data sets. As a
X ) t1p1T + t2p2T + ... + trprT + E ) TPT + E )X ˆ +E
(1)
where X ˆ is the matrix of the desired (or predicated) values; P ∈ Rn×r and T ∈ Rm×r are defined as the principal component loading and the principal component scores, respectively. E is a residual matrix. The number of the retained principal components can be determined by using cross-validation21,22 to weed out relatively unimportant directions and reduce the system dimensions. When PCA is applied to monitoring a process, two measures are commonly used based on the PCA model: the SPE Q and the Hotelling T 2. Statistical Q. The scale value, Qk, is a measurement of the goodness of fit of the sample k, xk ) [xk,1, xk,2, ..., xk,n]T, to the model.
(Qk)PCA ) xkT(I - PPT)xk
(2)
where I is the identity matrix of an appropriate size. The upper confidence control limit for the Q statistic can directly follow the measure of the nondeterministic variation in the model19
(
1
)
1/h0
(3)
n where θd ) ∑j)r+1 λjd with d ) 1, 2, 3 and h0 ) 1 2 (2θ1θ3/3θ2 ). λj is the jth eigenvalue of the correlation matrix and cR is the standard normal deviate corresponding to the upper 1 - R percentile. Statistical T 2: The scale value, Tk2, is a measure of the major variation of the sample xk to the model.
(Tk2)PCA ) xkTPΛ-1PTxk
(4)
where Λ ) diag[λ1, λ2, ..., λr]. The upper control limit for the T 2 statistic is calculated by means of the F distribution as
CLT2 )
r(m + 1)(m - 1) FR,r,m-r m2 - mr
(5)
where FR,r,m-r is the upper 100R% critical point of the F distribution with r and m - r degrees of freedom. This
1518
Ind. Eng. Chem. Res., Vol. 40, No. 6, 2001
means that the data distribution assigns probability 1 - R to the ellipsoid-formed x if
(Tk2)PCA ) xkTPΛ-1PTxk e CLT2
(Tk2)MEWMA-PCA ) zkTP
-1
(2 -λ λΛ)
PTzk
(
k
λ
)
(1 - λ)k-jxjT)P Λ ∑ 2-λ j)1
(6)
) (λ
-1
×
If any change occurs in the original data space, the corresponding change happens in this hyperplane represented by the major principal components. These two variables, Q and T 2, can be implemented by graphical and numerical ways to monitor if the samples are in this accepted region. For an easy explanation in the following subsections, a data matrix with a disturbance d, D ) [x1 + d x2 + d, ..., xm + d]T, is used for testing. The statistical T 2 and Q at the kth sample become
P
(Tk2)PCA ) (xk + d)TPΛ-1PT(xk + d)
The sum of squares of the residuals for Q is expressed as follows:
) xkTPΛ-1PTxk + 2xkTPΛ-1PTd + dTPΛ-1PTd (7)
k
PT(λ
(
∑ j)1
)
k
(1 - λ)k-jxjT) × ∑ j)1
(1 - λ)k-jxj) + 2(λ k
-1
λ
PT(λ
Λ
2-λ
∑ j)1
k
(
P
(l - λ)k-jdT) × ∑ j)1
(1 - λ)k-jd) + (λ λ
2-λ
)
k
-1
Λ
PT(λ
(1 - λ)k-jd) ∑ j)1
(12)
(Qk2)MEWMA-PCA ) zkT(I - PPT)zk k
and
) (λ
(Qk)PCA ) (xk + d)T(I - PPT)(xk + d)
k
dT(I - PPT)d (8) 2.2. Combining PCA with MEWMA. Wold (1994) first indicated that the memory effect in EWMA could be extended in the PCA model,17 but the control limits for this type of PCA model were not clearly developed. In this section, two confidence control limits Q and T 2 will be derived from the combination of PCA and MEWMA, MEWMA-PCA. The MEWMA model is first built based on the originally measured variables of the process, and then PCA is used to extract the correlation between MEWMA variables. A multivariable version of the EWMA (MEWMA) control chart, an extension of the univariate EWMA, is defined as
(9)
where 0 < λ e 1 and z0 ) 0. MEWMA is a statistical method that gives less weight to older data. Substituting recusively for zk, with k ) 2, 3, ..., we obtain k
zk ) λ
(1 - λ)k-jxj ∑ j)1
k
(1 - λ)k-jxj) + 2(λ∑(1 - λ)k-jxjT) × ∑ j)1 j)1
(λ
) xkT(I - PPT)xk + 2xkT(I - PPT)d +
zk ) λxk + (1 - λ)zk-1
(1 - λ)k-jxjT)(I - PPT) × ∑ j)1
(10)
The covariance matrix of MEWMA is Sz ) [λ/(2 - λ)]S as k gets larger. The loading matrix P is composed of the eigenvector of the covariance matrix of data Sz. That is, [λ/(2 - λ)]S ) [λ/(2 - λ)]PΛPT. If only the first r eigenvalues are kept to form a PCA model, the new score for the disturbance data D is
k
T
(I - PP )(λ
(1 - λ) ∑ j)1
k
k-j
(l - λ)k-jdT) × ∑ j)1
d) + (λ
k
(1 - λ)k-jd) ∑ j)1
(I - PPT)(λ
(13)
The control limit of T 2 for the combination of PCA and MEWMA is given by the F distribution. The F distribution of MEWMA-PCA would be different from that of PCA in the degrees of freedom because fewer principal components are needed in the MEWMA-PCA model with accumulated information. The control limit of Q becomes
(CLQ)MEWMA-PCA )
λ (CLQ)PCA 2-λ
(14)
2.3. Combining PCA with MSSUM. In this section, MSSUM is used here to improve the memory effect in the PCA model. The combination of MSSUM and PCA is called MSSUM-PCA. The concept of MSSUM-PCA is similar to that of SSPCA. However, SSPCA18 only focuses on the summed-scores construction and the control limits in the summed-scores statistic are not clearly defined. The major advantage of using MSSUMPCA is to derive two confidence control limits Q and T 2 based on the conventional statistical control limits of PCA. In MSSUM-PCA, by the cumulative sum of the past s observations, sk is defined as
k
(1 - λ)k-jxj) + ∑ j)1
k
tzk ) PTzk ) PT(λ
sk )
PT(λ
(1 - λ)k-jd) ∑ j)1
∑
xi
(15)
i)k-s+1
k
(11)
where zk ) λ(xk + d) + (1 - λ)zk-1. The Hotelling T 2 can be calculated as follows:
The covariance matrix of MSSUM is sS, and the loading matrix P is composed of the eigenvector of the covariance matrix of data. That is, sS ) sPΛPT where the first r eigenvalues are kept to build a PCA model. The
Ind. Eng. Chem. Res., Vol. 40, No. 6, 2001 1519
new score for the disturbance data D is
{
k
PT(
T
tsk ) P sk )
xj) + PT(kd) ∑ j)1
kes (16)
k
∑
T
P (
T
xj) + P (sd) s < k < n
j)k-s+1
The Hotelling T2 is
if k e s (Tk2)MSSUM-PCA ) skTP(sΛ)-1PTsk k
)(
∑ j)1
xjT)P(s?)-1PT(
k
xj) + ∑ j)1
k
2(
xjT)P(sΛ)-1PT(kd) + (kdT)P(sΛ)-1PT(kd) ∑ j)1
(17)
if s e k e m (Tk2)MSSUM-PCA ) skTP(sΛ)-1PTsk k
Figure 1. Control ellipse for two independent score variables. Points 2 (4), 15 (0), and 19 (O) are out of the elliptical bound.
k
xjT)P(sΛ)-1PT( ∑ xj) + ∑ j)k-s+1 j)k-s+1
)(
3. Comparison among PCA, MEWMA-PCA, and MSSUM-PCA
k
2(
xjT)P(sΛ)-1PT(sd) + (sdT)P(sΛ)-1PT(sd) ∑ j)k-s+1 (18)
The sum of squares of the residuals for Q is
if k e s (Qk)MSSUM-PCA ) skT(I - PPT)sk k
∑ j)1
)(
k
xjT)(I - PPT)(
xj) + ∑ j)1
k
2(
xjT)(I - PPT)(kd) + (kdT)(I - PPT)(kd) ∑ j)1
(19)
if s e k e m (Qk)MSSUM-PCA ) skT(I - PPT)sk k
k
xjT)(I - PPT)( ∑ xj) + ∑ j)k-s+1 j)k-s+1
)( k
2(
xjT)(I - PPT)(sd) + (sdT)(I - PPT)(sd) ∑ j)k-s+1
(20)
Now the control threshold in T 2 for the combination of PCA and MSSUM is given by the F distribution. The F distribution of MSSUM-PCA would be different from that of PCA in the degrees of freedom because of fewer principal components needed in the MSSUM-PCA model with accumulated information. The control limit of Q becomes
(CLQ)MSSUM-PCA ) s(CLQ)PCA
(21)
The comparison between the MEWMA approach in conjunction with PCA and the MSSUM method in conjunction with PCA is presented in this section. These methods are quite similar to EWM-PCA suggested by Wold (1994)17 and SSPCA suggested by Wachs and Lewin (1999),18 but they extract the principal components first; then the scores are performed by EWMA or s summed, respectively. In EWM-PCA and SSPCA, they only point out the use of scores instead of the individual measurement variables, but the importance of simultaneous monitoring of the individual scores is not discussed. For example, suppose that the process data are decomposed into two independent directions and associated scores t1 and t2. Figure 1 shows that the samples of t1 and t2 fall within their respective upper and lower control limits and three samples (4, 0, and O) are outside the control ellipse. This indicates that an assignable cause is presented. If the probability that either t1 or t2 exceeds 3σ control limits is 0.0027, the joint probability whose variables exceed their control limits simultaneously is (0.0027)2 ) 0.000 007 29. This joint probability is considerably smaller than the control limit in a single score. This means the type I error and the probability of joint distribution of both variables in control are not equal to their advertised level for the individual control charts. This indicates that simultaneous and independent monitoring of these score characteristics can be very misleading if two independent control charts are used. Therefore, in this paper the statistical T 2 is used for the score variables instead of the individual score control charts. In this subsection, the statistical T 2 and the statistical Q in PCA, MEWMAPCA, and MSSUM-PCA models are compared and the performance of the control charts based on these models is investigated. 3.1. Comparison in Terms of Statistical T 2. The statistical T 2 in these three control charts is quite different for the same testing data. To make a fair
1520
Ind. Eng. Chem. Res., Vol. 40, No. 6, 2001
Figure 2. Ratio of E[(Tk2)MEWMA-PCA/E[(Tk2)PCA arising from a choice of s versus samples.
Figure 3. Ratio of E[(Tk2)MSSUM-PCA]/E[(Tk2)PCA]) arising from a choice of s versus samples.
comparison among the above multivariate control charts, some straightforward matrix algebra operations lead to the following two ratios:
to the following forms:
E[(Tk2)MEWMA-PCA] 2
[( ) ] [( ) ]
E k
(1 - λ) ∑ j)1
) λ(2 - λ)(
E[(Tk )PCA] 2
E[(Tk )MEWMA-PCA] E[(Tk2)PCA]
{
k-j
)
2 kes ) k /s s s