Article pubs.acs.org/IECR
Statistical Modeling and Online Monitoring Based on Between-Set Regression Analysis Chunhui Zhao,* Furong Gao, and Youxian Sun State Key Laboratory of Industrial Control Technology, Department of Control Science and Engineering, Zhejiang University, Hangzhou, 310027, China ABSTRACT: In the present work, a monitoring strategy based on between-set regression analysis is developed for the online monitoring of processes with multiple “modes”. The definition of modes here differs from the conventional ones in that, the modes here may be different sets of variables collected for the same set of objects (called variable mode) or they may be process measurements collected at different times (called time mode). The subject of analysis includes two predictor data sets, corresponding to two neighboring process modes, and one matrix containing data on quality with which both predictor data sets are associated. The basic assumption is that a certain part of the underlying quality-concerned process variability stays constant despite the changeover of process modes. On the basis of between-set regression analysis, the quality-relevant systematic information in each mode space is decomposed into two parts: the between-mode common subspace and the between-mode specific subspace. The former reveals the between-mode quality-relevant similarity and the latter the dissimilarity. The two parts are then used in the development of an online monitoring system. The feasibility and performance of the proposed method are illustrated with a simple numerical case and a typical multiphase batch process. methods,23−31 which work by designing a different local model to match each process mode. However, a global statistical model is actually the statistical average of all operation modes and, therefore, cannot describe each mode closely and may lead to a low resolution for some modes. The problem is more pronounced when these modes are quite different from each other. Multiple-modeling methods, on the other hand, may give each mode a higher resolution but they neglect the relationship between two modes. In multimode processes, it is possible to explore the underlying information in greater detail by dividing the data into meaningful data blocks based on certain rules and building multiple specific models. The behavior of each mode can be seen and thus a more comprehensive understanding of the corresponding process can be expected. In multiphase batch processes, considering that phases abound and each phase exhibits different underlying characteristics, multiphase models have been developed and put into use widely.32−35 In general, multiple, simple phase-representative models are designed in different phases to characterize the local behavior of a batch process along the time direction. This allows the changes in statistical model structures to reflect the dynamics of inherent process characteristics. Improvements in monitoring performance over the conventional multiway statistical analysis strategy have been reported. Different from subphase PCA monitoring methods which can monitor all systematic process variations in a phase (X) effectively, subphase PLS puts more focus on monitoring the phase variations that influence the quality property (Y). Camacho and Pico proposed a multiphase PCA
1. INTRODUCTION In the last few decades, multivariate statistical analysis techniques such as principal component analysis (PCA)1−6 and partial least-squares (PLS)7−16 have been used widely for process analysis and monitoring. These techniques extract the underlying characteristics of measurement data and define the normal operation regions by accommodating all acceptable variations. The new process behaviors can thus be compared with the predefined ones by the monitoring system to ensure that they remain in a “state of statistical control”. When the process moves outside the desired operating region, an “unusual and faulty” change in process behaviors is said to have occurred. However, in practice, manufacturing processes tend to go through different operation patterns due to various factors, such as phases in batch processes where the variables are measured at different times. Moreover, for the same set of subjects, different data sets may be collected from different variables. Here each data set is considered one process mode, where a between-mode difference may result from the fact that different variables are collected for the same set of objects (called variable mode here) or the same set of variables are measured at different process times (called time mode here). In both cases, multiple data spaces that reveal different underlying information can be obtained. The traditional multivariate statistical analysis methods often lead to erroneous monitoring results because of their fundamental assumption that the operating data show a unimodal pattern. They mix together the general systematic information in different data spaces so that the insight provided by each individual data space is lost. In general, different modes reveal different underlying variable correlations. The available solutions to the multimode monitoring problem can be categorized into global-modeling methods,17−22 which work by developing a uniform model to accommodate all operation modes, and multiple-modeling © 2012 American Chemical Society
Received: Revised: Accepted: Published: 8495
November 7, 2011 May 8, 2012 June 6, 2012 June 6, 2012 dx.doi.org/10.1021/ie300731k | Ind. Eng. Chem. Res. 2012, 51, 8495−8509
Industrial & Engineering Chemistry Research
Article
(MPPCA) algorithm34 for automatic phase identification so that each segment of the batch can be adequately approximated by a linear PCA model with acceptable unexplained variance. Later, they extended the MP algorithm to PLS35 for online quality prediction and lagged variables were included to model the variable dynamics. Both algorithms have been reported useful for online monitoring in that they make use of the information associated with different process variations. However, neither algorithm considers the use of the relationship between two phases for process analysis. Another line of research has been to model the variable correlations within each phase under the influence of other phases by multiblock PCA (MBPCA) or multiblock PLS (MBPLS).36−40 A group led by Liu applied MBPLS to regress the inner relation of two-phase measurement data.41 The method could clarify whether the root causes of errors were problems from a previous phase or the changes in interphase correlations. However, both this method and MBPCA are limited to offline retrospective analysis. To more efficiently utilize the available data sets, the relationship between two sets should be studied in greater detail. It has been mentioned that despite the dissimilarity in variable correlations across different data sets, there is also a certain similarity among them.42 A multiset statistical analysis method (multiset variable correlation analysis, MsVCA)42 has been proposed to extract the common correlations along the variable dimension and the algorithm deduction was addressed in detail with theoretical properties analyzed. Following the theoretical development of the algorithm, the algorithm was successfully applied to multimode continuous process monitoring and the analysis of the between-model transition problem.43 In reality, the algorithm can also be modified to analyze the common variation along the object dimension and address the shared systematic process variations across data sets (X). Further, from the quality perspective, some process variations that can influence the quality property (Y) are also shared across data sets. For example, it has been found that the ordered switch between two neighboring phases does not mean that the quality-related process characteristics have completely changed. That is, some of the underlying variability related to quality remains the same and is shared by two neighboring phases. Other underlying variability related to quality changes from phase to phase, reflecting the phase-specific characteristics. More meaningful information could be obtained if the two types of variability can be separated and monitored. A multiset regression analysis method (an MsRA-weight version and an MsRA-score version)44 has been proposed to relate the inherent variable correlations or systematic variations across multiple data spaces from the quality perspective and the theoretical development was addressed in detail with the properties discussed. In this work, the algorithm is modified and its practical application to online monitoring is reported. With this modified algorithm, the original process space of each mode can be separated into two different systematic subspaces with one containing the variations that are similar between two modes and the other containing the dissimilar ones. The two types of variations are then monitored separately. This method differs from the traditional statistical techniques in that it separates the shared information from the unique information, allowing both types of information to be scrutinized. It combines the advantages of multiblock and
sub-PLS modeling methods and effectively uses the relationship between two modes for online monitoring. The rest of this paper is organized as follows. First, the MsRA algorithm for between-set analysis is revisited. The proposed between-set monitoring system is formulated by modifying the MsRA algorithm to make it suitable for both variable and time modes. In the following section, a numerical case and a practical case are reported to demonstrate the algorithm’s feasibility for online between-mode monitoring and its efficacy. Finally, conclusions are drawn in the last section.
2. METHODOLOGY 2.1. Preliminary. Regression modeling is challenging when there are multiple predictor data sets. These multiple predictor data sets may be correlated with one another and may share a common regression structure. The shared regression information should be distinguished from the specific/unique regression information across multiple predictor sets for more accurate regression results. As single-set analytic methods, conventional regression algorithms are rather inefficient. The multiset regression analysis (MsRA) algorithm has been developed for the multiset case.44 The algorithm can consider the underlying quality-relevant systematic variability for more than one predictor space simultaneously. With this algorithm, the systematic predictor information is investigated based on two criteria: the cross-set relationships among predictor spaces and the regression correspondence between predictors and the responses. The cross-set common predictor scores are extracted before the algorithm looks at how the predictors and responses can relate to this “consensus”. The original MsRA algorithm is simply presented in Appendix A. For multimode processes, multiset data corresponding to different patterns or phases are obtained, providing an application platform for the MsRA algorithm. Between-set analysis is actually an extreme case where only two process data sets are looked at. With this method, each original measurement space can be separated into two different subspaces with each containing a different set of predictor variability information. One subspace is called the between-set common subspace, which is composed of the quality-relevant process variations that are shared by both data sets. The other is called the specific subspace, which consists of variations that are unique to either data set. The two subspaces are to check two different types of predictor variability. The separation of the quality-related information that is common to two sets from the quality-related information that is unique to either set provides more meaningful information for the development of a monitoring system. 2.2. Between-Set Subspace Separation. For between-set regression analysis, two cases are addressed. In the first case, the two data sets share the same number of objects. They are denoted as X1(N × Jx,1) and X2(N × Jx,2), where the subscripts 1 and 2 identify the data set; N denotes the number of samples; Jx,1 and Jx,2 denote the number of process variables in each data set and correspond to the same response data set Y(N × Jy) where Jy indicates the number of response variables. The two process data sets may come from different phases in a batch process or from different types of measurement variables for the same set of objects representing different process properties. They have the same observation dimension so that the original MsRA algorithm shown in 8496
dx.doi.org/10.1021/ie300731k | Ind. Eng. Chem. Res. 2012, 51, 8495−8509
Industrial & Engineering Chemistry Research
Article
phase responses Yi(IKi × Jy) are arranged by duplicating the normalized quality measurement Y(I × Jy) so that the time-slice regression relationship {Xi,k(I × Jx,i), Y(I × Jy)} can still be obtained.45 For each phase, the corresponding regression sets {Xi(IKi × Jx,i), Yi(IKi × Jy)} are thus prepared. In order to use the same regression weight v to quality data in the two phases, which will result in the quality scores having different lengths as calculated by Yiv, the arranged phase quality data have to be preprocessed to remove the influence of different phase durations. From eq B1 in Appendix B, the squared phase score covariance can be calculated as:
Appendix A can be directly used for between-set regression analysis and subspace separation. In the second case, the two data sets do not share the same number of objects. In multiphase batch processes, the durations of two neighboring phases may be different. In each phase i, Jx,i variables are collected from I similar batches throughout Ki time intervals. These variables are collected as a three-way array X̲ i(I × Jx,i × Ki), where subscript i = 1, 2 identifies the phase and subscript x denotes the predictor variables. The response matrix Y(I × Jy), where subscript y denotes quality, is obtained at the end of all batches. At each time k, the time-slice Xi,k(I × Jx,i) may be related to the response matrix Y(I × Jy), revealing the contribution of each predictor matrix to quality at that point in time. The contributions of all predictor matrices should be similar within the same phase.45 In order to perform online regression analysis, all time-slices are first normalized at each time k and then those within the same phase are arranged variablewise as Xi(IKi × Jx,i) where the variable dimension stays invariable and I batches are placed one after another following the time index as shown in Figure 1. Correspondingly, the representative
Ki
(v TYi TX ia i)2 = (∑ v TYi TX i , ka i)2 k=1
(1)
Clearly, for each phase, the phase score covariance is actually the sum of the time-slice score covariances between each timeslice predictor data matrix Xi,k and the quality matrix Y(I × Jy). The squared phase score covariance is thus directly influenced by the phase duration Ki. For ease of understanding, assume that the time-slice predictor data matrices Xi,k are the same in the two phases but the two phases have different durations. Then the time-slice score covariances (vTYTXi,kai) will remain the same for the two phases concerned, termed s here, and the squared phase score covariance in eq 1 would be Ki2s2. Since the objective function in eq B1 is the sum of two squared phase score covariances, a larger phase duration will result in a larger phase score covariance, which means a larger weight has been given to one phase than the other phase. To avoid the influence of different phase durations on the covariance calculation, the phase quality data have to be scaled as Yi/√Ki. Moreover, the original MsRA algorithm is modified for between-phase analysis in batch processes and is shown in Appendix B. With the MsRA algorithm, each original piece of predictor data Xi in space i is decomposed into two orthogonal parts Xg,i and Xs,i, which respectively contain the quality-related variability information shared by the two sets (as indicated by subscript g) and that not shared by the two sets (as indicated by subscript s). Xg,i and Xs,i are defined as follows: X i = Xg , i + X s , i Xg , i = X iR g , iPg , i T = Tg , iPg , i T = Tg , i(Tg , i TTg , i)−1Tg , i TX i = G Tg ,iX i X s , i = X i − X iR g , iPg , i T = (I − Tg , i(Tg , i TTg , i)−1Tg , i T)X i = H Tg ,iX i
(2)
where the common subspace is supported by the common or global scores Tg,i(Ni × Rg) (where Rg is the number of common scores), which are obtained from the linear combinations of Xi. MsRA weights Rg,i are calculated using either eq A13 or eq B8. The residual part Xs,i is also called the specific subspace and reveals the quality-related variations that are dissimilar between two phases. The loadings (Pg,i) are used to reconstruct the common subspace based on the common scores. GTig = Tg,i (Tg,iTTg,i)−1 Tg,iT is defined as the orthogonal projector onto the column space of Tg,i, and HTig = (I − Tg,i (Tg,iTTg,i)−1 Tg,iT) is
Figure 1. Phase data arrangement in phase i for case 2 of between-set analysis where the two neighboring phases have different durations. 8497
dx.doi.org/10.1021/ie300731k | Ind. Eng. Chem. Res. 2012, 51, 8495−8509
Industrial & Engineering Chemistry Research
Article
Figure 2. Time-series profiles of the three source signals for the generation of process and quality data.
Figure 4. Profiles of the first scores in two modes extracted using the MsRA, MBPLS and sub-PLS algorithms.
defined as the antiprojector with respect to the column space of Tg,i. The two subspaces, Xg,i and Xs,i, are clearly orthogonal to each other since (GTg,iXi)THTg,iXi = 0. In the left specific subspace Xs,i, the objects cover little variability information associated with the common regression scores. However, there is still other systematic variability that may not be related to quality or similar between modes. The traditional PCA can be performed in each specific subspace Xs,i to separate the systematic information from noise as follows: Ts , i = X s , iPs , i X̂ s , i = X s , iPs , iPs , i T Ei = X s , i − X̂ s , i = X i − Xg , i − X̂ s , i
(3)
where Ps,i(Jx,i × Rs,i), the PCA loadings calculated by performing PCA modeling on Xs,i, reveals the major variation directions in each phase-specific subspace and Rs,i is the number of retained principal components. Ei is the final modeling residual and is deemed measurement noise. Each phase-specific subspace is further divided into two parts, the systematic specific variation (X̂ s , i ) and the final errors (Ei). Moreover, it is clear that the specific PCA components (Ts,i) are orthogonal to the common regression scores (Tg,i) as a result of the fact that (Xg,i)TXs,i = 0. In summary, the underlying characteristics of each measurement space are formulated in two subspaces as follows: X i = Xg , i + X s , i = Xg , i + X̂ s , i + Ei
Figure 3. Profiles of the disturbed source signals for four fault cases (red lines denote faulty source signals; black dashed lines denote normal source signals).
= X iR g , iPg , i T + X s , iPs , iPs , i T + Ei 8498
(4)
dx.doi.org/10.1021/ie300731k | Ind. Eng. Chem. Res. 2012, 51, 8495−8509
Industrial & Engineering Chemistry Research
Article
Figure 5. Monitoring results for fault I in mode I and mode II using (a) the MsRA model, (b) the sub-PLS model, and (c) the MBPLS model.
2.3. Between-Set Monitoring System. Two types of statistics are commonly calculated: the T2 statistic describes the systematic part captured by monitoring models while the SPE statistic represents the residual part that monitoring models fail to capture. Corresponding to each subspace, they are calculated at each time as follows:
components that are common across sets to be used, but some variations unique to either predictor space may be included in the calculation, making the monitoring model difficult to interpret. In the second step of the MsRA calculation, these variations can be separated from Tg,i so that the two different systematic parts can be monitored by the Tg,i2 statistic and the Ts,i2 statistic, making the analysis and interpretation of the monitoring results much more accurate. Assume that the process variables follow normal variations. The control limits in the systematic subspace for each mode can be approximated by the F-distribution with significance factor α:46
Tg , i 2 = (tg , i − tg̅ , i)T Og , i−1(tg , i − tg̅ , i c) Ts , i 2 = (ts , i − ts̅ , i)T Os , i−1(ts , i − ts̅ , i) SPEs , i = es , i Tes , i
(5)
where tg,i(Rg × 1) and tg,i(Rs,i × 1) are the common score and specific score vectors at each time. The terms tg̅ ,i and ts̅ ,i denote the corresponding mean vectors, which are zero vectors resulting from mean-centering during data preprocessing. Og,i(Rg × Rg) and Os,i(Rs,i × Rs,i) are the variance-covariance matrices of different components. Here, ei(J × 1) is the residual. The monitoring system can also be developed based on the first step of the MsRA calculation, although not without problems. For example, the use of the covariance index in the cost function in the first step of the calculation allows many
Ti 2 ∼
R i(Ni 2 − 1) FR , N − R , α Ni(Ni − R i) i i i
(6)
Similarly, in the residual subspace, the representative confidence limit of squared prediction error (SPE) within each mode can be approximated by a weighted χ-squared distribution:47,48 SPEi ∼ giχh
2
i, α
8499
(7)
dx.doi.org/10.1021/ie300731k | Ind. Eng. Chem. Res. 2012, 51, 8495−8509
Industrial & Engineering Chemistry Research
Article
Figure 6. Monitoring results for fault II in mode I and mode II using (a) the MsRA model, (b) the sub-PLS model, and (c) the MBPLS model.
where gi = νi/2mi and hi = 2(mi)2/νi, in which mi is the average of all the SPE values within the ith mode calculated in eq 5 and νi is the corresponding variance. Also the control limits can be modified based on model validation. When a new observation sample Xnew(J × 1) is available, the affiliation to the phase concerned is judged by the time index or the affiliation to the variable space as indicated by the variable index. The data are then normalized and projected onto the proper models to calculate the corresponding systematic variation and residual in different subspaces as follows:
The new monitoring statistics are then calculated as follows: Tg ,new 2 = tg ,new TOg , i−1tg ,new Ts ,new 2 = ts ,new TOs , i−1ts ,new SPEnew = enew Tenew
(9)
All three statistics are then compared with the predetermined control limits. If all monitoring statistics stay well within the predefined ranges, the current operation sample would be deemed normal. Otherwise, a fault might be occurring.
tg ,new T = x new TR g , i xg ,new T = tg ,new TPg , i T
3. ILLUSTRATIONS AND DISCUSSIONS 3.1. Case Study 1. In this section, the performance of the proposed monitoring strategy is illustrated through a simple numerical case, where two sets of variables are collected for the same set of objects and used to represent two variable modes. A performance comparison between the proposed algorithm and the sub-PLS and MBPLS algorithms will be made.
x s ,new = x new − xg ,new ts ,new T = x s ,new TPs , i x̂ s ,new T = ts ,new TPs , i T ei = x s ,new − x̂ s ,new
(8) 8500
dx.doi.org/10.1021/ie300731k | Ind. Eng. Chem. Res. 2012, 51, 8495−8509
Industrial & Engineering Chemistry Research
Article
Figure 7. Monitoring results for fault III in mode I and mode II using (a) the MsRA model, (b) the sub-PLS model, and (c) the MBPLS model.
Consider three source variables defined as follows:
These two sets of process variables are used to represent two different variable modes, which are associated with the same quality variable Y = 7s2. We can calculate the relationship between the quality variable and the process variables as follows:
s1(k) = 2cos(0.08k) sin(0.06k) s2(k) = sign[sin(0.3k) + 3cos(0.1k)] s3(k) = uniformly distributed noise in the range [− 1, 1]
⎡ 0.92 ⎤ ⎢ ⎥ ⎢ 2.13 ⎥ y = X1a y1 = X1⎢ 9.27 ⎥ ⎢ ⎥ ⎢−2.62 ⎥ ⎢⎣ −2.31⎥⎦
(10)
S = [s1 s2 s3] and the three variables are normalized and shown in Figure 2. From the three source signals, two sets of process variables are generated according to two different linear mixing relationships X1 = SA1 and X2 = SA2: ⎡ 0.86 − 0.55 0.17 − 0.33 0.89 ⎤ ⎢ ⎥ A1 = ⎢ 0.79 0.65 0.32 0.12 − 0.97 ⎥ ⎢⎣ 0.67 0.46 − 0.28 0.27 − 0.74 ⎥⎦
⎡ 2.79 − 1.13 3.90 − 1.12 0 ⎤ ⎢ ⎥ A 2 = ⎢ 2.86 1.07 2.01 0.29 − 1.01⎥ ⎢⎣1.17 2.06 − 0.80 0.97 − 2.07 ⎥⎦
y = X 2a y 2
(11)
⎡ 2.55 ⎤ ⎢ ⎥ ⎢ 4.28 ⎥ = X 2⎢ 0.71 ⎥ ⎢ ⎥ ⎢ 4.52 ⎥ ⎢⎣ 7.54 ⎥⎦
(13)
(14)
A thousand samples are generated. The first half will be used for model training, and the second half, for model testing.
(12) 8501
dx.doi.org/10.1021/ie300731k | Ind. Eng. Chem. Res. 2012, 51, 8495−8509
Industrial & Engineering Chemistry Research
Article
Figure 8. Monitoring results for fault IV in mode I and mode II using (a) the MsRA model, (b) the sub-PLS model, and (c) the MBPLS model.
significant where the perturbation of each process variable in X cannot be clearly observed (not shown here). On the basis of the generated data, the sub-PLS, MBPLS and two-step MsRA-score models are developed to relate the two data blocks and the quality. They focus on the extraction of different underlying systematic variability for online monitoring. With the MBPLS algorithm, block scores are used for data deflation39 to make it suitable for online application. With the MsRA algorithm, one common regression score and one specific PCA component are extracted. Two regression scores are needed for both the sub-PLS and MBPLS algorithms. As shown in Figure 4, the first scores extracted from both modes are compared for the three different models and normal test data. In general, the first score extracted by the proposed MsRA algorithm is very similar to the second source signal, which is shared by two modes and closely related to quality. The scores extracted by the MBPLS algorithm, which focuses on both between-mode and X−Y covariances, and the sub-PLS algorithm, which isolates the X−Y covariance in each mode, seem to reflect the combination of the first source and the second source on the other hand.
Normally distributed noise with a mean of zero mean and a standard deviation of 0.1 is imposed on the process data. It is noted that although not all source signals follow the Gaussian distribution exactly, the measurement data, as the linear combinations of source signals and the normal noise, can easily be shown to be approximately normally distributed by drawing a normal probability plot.49 Four types of faults are considered for model testing by imposing different disturbances as shown in Figure 3: • Fault I: a gradual change with a slope of 0.01 in the second source signal from the 100th to the 200th sample. • Fault II: normally distributed noise with a mean of zero and a standard deviation of one is imposed on the second source from the 300th to the 400th sample. • Fault III: a unit step change in the second source signal from the 200th to the 300th sample. • Fault IV: besides fault I, a step change with a magnitude of 1.5 is imposed on the first source signal from the 300th to the 400th sample. Although the abnormalities in the source signals seem large in Figure 3, their influence on measurement data X is not 8502
dx.doi.org/10.1021/ie300731k | Ind. Eng. Chem. Res. 2012, 51, 8495−8509
Industrial & Engineering Chemistry Research
Article
rotation, preparing for next cycle. It can be readily implemented for experiments, in which, all key process conditions such as temperature, pressure, displacement, and velocity can be measured online using the corresponding transducers, providing abundant process information. The material used in this case study is high-density polyethylene (HDPE). Twelve process variables, as shown in Table 1, are monitored online with a set of sensors. Two
If the sources of disturbance can be uncovered from the measurement data, the developed monitoring statistics would be able to clearly detect the disturbances. In the following, to illustrate the superiority of the proposed algorithm for monitoring, different disturbances are imposed on the second source which is closely related with quality and shared between two modes. In the case of fault IV, the disturbance is also imposed on the first source but at a different time. Only the proposed algorithm can extract the second source from the mixed measurement data so it is believed to perform better than the other two methods (sub-PLS and MBPLS). Here since only one common score is extracted by the MsRA algorithm and one specific score is extracted by PCA, both Tg2 and Ts2 are calculated as the mean of two serial values along time direction. For fault I in Figure 5 where the slow drift is considered, the Tg2 statistics in between-set common subspace can give out-ofcontrol alarms albeit with a certain time delay. The phenomenon can be observed for both modes. On the other hand, neither the sub-PLS model nor the MBPLS model can detect the disturbance clearly, which may be due to the fact that the disturbance is hidden by the influence of the other systematic sources. A similar phenomenon is observed for the monitoring results of fault II in Figure 6 and fault III in Figure 7. For the first three faults, when the abnormal second source is excluded from the measurement data, the process measurements are brought back to normal. Therefore, the two monitoring statistics (Ts2 and SPE) given by PCA do not trigger any out-of-control alarms, which agrees well with the real case. For fault IV, it is known that disturbances occur in both common and specific systematic subspaces but at different times. With the proposed algorithm, the two different disturbances are detected by the Tg2 and Ts2 statistics for both variable modes as shown in Figure 8a. With the sub-PLS model and the MBPLS model however (see Figures 8b and c), as shown by the T2 statistic, the out-of-control alarms are issued only after the 300th sample, which actually results from the disturbances being added to the first source. In summary, by comparing the monitoring results for the three different modeling methods and four abnormal test cases, the superiority of the proposed method is demonstrated. The dominant source signal, which significantly contributes to quality and is shared by the two modes, is automatically identified as the common regression score by the proposed MsRA algorithm. The other variability appears in another subspace. The monitoring results in different subspaces indicate which part the disturbance influences severely, the part that is common to both modes or the part that is not. Considering that different variables dominate in different subspaces, more fault information can be obtained using the proposed method, which makes it much easier to understand the processes under study. 3.2. Case Study 2. In this section, a typical multiphase batch process is considered. Only two neighboring phases are used as two time modes to test the monitoring strategy that is based on between-phase analysis. Injection molding,50 a key process in polymer processing, transforms polymer materials into various shapes and types of products. A typical injection molding process consists of three operation phases: injection of molten plastic into the mold, packing-holding of the material under pressure, and cooling of the plastic in the mold until the part becomes sufficiently rigid for ejection. Besides, plastication takes place in the barrel in the early cooling phase, where polymer is melted and conveyed to the barrel front by screw
Table 1. Process and Quality Variables for an Injection Molding Process no.
variable description
process variables 1 cavity temperature (CT) 2 nozzle pressure (NP) 3 stroke 4 injection velocity (IV) 5 hydraulic pressure (HP) 6 plastication pressure (PP) 7 cavity pressure (CP) 8 screw rotation speed (SRS) 9 SV1 opening (SV1) 10 SV2 opening (SV2) 11 barrel temperature (BT) 12 mold temperature (MT) quality variables 1 weight 2 length 3 jetting 4 record grooves
unit °C bar mm mm/s bar bar bar RPM % % °C °C g mm
dimension indicesproduct length (mm) and weight (g) whose real values can be directly measured with instruments and two surface defectsjetting and record grooveswhose real values can be quantified by a process operator expert before
Figure 9. Regression weights in phases I and II calculated using the sub-PLS model and the MsRA model. 8503
dx.doi.org/10.1021/ie300731k | Ind. Eng. Chem. Res. 2012, 51, 8495−8509
Industrial & Engineering Chemistry Research
Article
Figure 10. Monitoring results for fault I in phases I and II using (a) the MsRA model and (b) the sub-PLS model.
Figure 11. Monitoring results for fault II in phases I and II using (a) the MsRA model and (b) the sub-PLS model.
batches are used for modeling, while the remaining eight cycles are used for model validation. First, the process duration is partitioned into five main clusters or phases from the quality point of view.45,51 Various strategies26,32−35,45,51 have been reported to obtain the phase marks from different viewpoints and based on different principles, providing a rich database for phase division. In the present work, assuming that no prior process knowledge is available, C phases can be readily identified along the time direction using a clustering algorithm.45,51 In the clustering algorithm, first, 1300 normalized time-slices Xk(25 × 12) are
modeling are chosen as the characteristics representative of the product’s quality to be evaluated. The process provides a good candidate for the application and verification of the proposed between-phase modeling strategy. Thirty three normal batch runs are conducted under various operation conditions using the method of design of experiment. Using injection stroke as the indicator variable, the reference batches are made to have even durations (1300 samples in all) by data interpolation, resulting in the descriptor array X̲ (33 × 12 × 1300). The quality characteristics are measured only at the end of process, generating the dependent matrix Y(33 × 4). The first 25 8504
dx.doi.org/10.1021/ie300731k | Ind. Eng. Chem. Res. 2012, 51, 8495−8509
Industrial & Engineering Chemistry Research
Article
Figure 12. Monitoring results for fault III in phases I and II using (a) the MsRA model and (b) the sub-PLS model.
obtained from X̲ (25 × 12 × 1300). Also the quality variables are normalized Y(25 × 4). Then 1300 time-slice PLS weight matrices are obtained by performing PLS on the normalized data sets {Xk,Y}. The matrices are then weighted using the timevarying variances of PLS scores and fed into the clustering algorithm to evaluate the similarity of time slices and reveal the changes of process-quality relationships along the time direction. Those matrices with similar weights share similar quality-related process characteristics. They are classified into the same group representing one phase. Operation time information is included in the clustering algorithm so that process samples are consecutive within the same clustering. The five phases are directly related to the process time, with the first, second, third, fourth, and fifth phases spanning the first−37th, 38th−245th, 246th−574th, 575th−893rd, and 894th−1300th samples, respectively. Certainly, the phase clustering results can also be modified from prior expertise. Here the third and fourth clusters are used as two representative neighboring phases (phases I and II) for between-phase analysis. There are 329 samples in phase I and 319 samples in phase II, forming two data arrays X̲ 1(25 × 12 × 329) (where I = 25, Jx,1 = 12, and K1 = 329) and X̲ 2(25 × 12 × 319) (where I = 25, Jx,2 = 12, and K2 = 329). The procedure for MsRA based modeling is as follows. First, the normalized time slices Xk(25 × 12) and the normalized quality variables Y(25 × 4) are obtained. Then two predictor data blocks Xi(IKi × 12) (i = 1, 2) are prepared by variableunfolding. These two data blocks have different numbers of observations as indicated by IKi. Correspondingly, the quality data block Yi(IKi × 4) is arranged as described before. Moreover, to remove the influence of different object dimensions as explained in subsection 2.2, each quality block is scaled by Yi/√Ki. The MsRA weights for the calculation of regression scores are shown in Figure 9, as are the sub-PLS
weights. Here, the MsRA weights are normalized since the scale of their original values is very small in order to get unit-length phase scores. As can be seen in the extraction of the first regression score vector, the two algorithms associate the same process variable with a different level of significance. With the sub-PLS algorithm, the second, third, fifth, seventh, and ninth process variables all have big roles to play in phase I. In contrast, the eleventh process variable is dominant with the MsRA algorithm in phase I. The MsRA algorithm calculates only two common regression scores in both phases and four PCA components in phase I and five PCA components in phase II. The sub-PLS algorithm, on the other hand, calculates five regression scores in phase I and seven in phase II. Three faults are considered by adding disturbances to two different process variables in different phases. For fault I, the eleventh process variable (barrel temperature) is increased by 50 °C from the 150th to the 250th sample in phase I. The Tg2 monitoring results in phase I (see Figure 10) show that the proposed method can clearly detect the disturbance in the common subspace. Ts2 in the specific subspace also triggers outof-control alarms, albeit not as obvious as Tg2. Comparatively, the T2 statistic also triggers alarms by sub-PLS, which is deemed to be driven by the mixed influences of common and specific systematic variability. A similar case is observed for fault II in Figure 11 where the twelfth process variable (mold temperature) is increased by 30 °C from the 20th to the 100th sample in phase II. The first two faults are generated by disturbing the dominant process variables in the subspace common to both phases based on the weights shown in Figure 9. For the third fault, the ninth process variable (SV1 opening), which is the dominant variable with the sub-PLS model, is gradually increased by 12% from the 200th sample in phase I to the end of phase II. As shown in Figure 12, both the MsRA and 8505
dx.doi.org/10.1021/ie300731k | Ind. Eng. Chem. Res. 2012, 51, 8495−8509
Industrial & Engineering Chemistry Research
Article
The optimization problem leads to a simple analytical solution:
sub-PLS models can detect the abnormality. As the change takes place gradually, the influence of the faulty variable is not noticed until the process enters phase II. In general, the proposed monitoring strategy is able to divide the underlying systematic variability of each phase into two subspaces and monitor them separately. However, with subPLS algorithm, the common scores given by the MsRA algorithm may not be extracted as the systematic variability or may be suppressed by the other systematic variability. The proposed strategy is able to confidently detect those disturbances that occur in the common subspace, but not the sub-PLS model.
C
YYT(∑ X iX i T)u = λg u i=1
Qu = λg u
where Q = YYT(ΣCi=1 XiXiT) and λg is the objective parameter. The associated subweight and subscore vectors in different data sets can be calculated by
4. CONCLUSIONS In this paper, a statistical model is developed for between-mode online monitoring based on a multiset regression analysis (MsRA) algorithm. From the viewpoint of between-set analysis, the underlying quality-related systematic variability in each mode or phase can be accurately characterized and then monitored by different monitoring systems in each subspace. The performance of the proposed method is demonstrated with a numerical case and an experimental case. The proposed method provides desirable performance improvements in online monitoring. In particular, it offers interesting qualityrelated insights into the process behaviors by considering the relationship between two data sets. The fact that the MsRA model can separate the systematic variability common across the data sets from that not common across the data sets from the quality perspective is what makes it unique. The method is expected to work with a broad range of practical applications.
■
(A3)
1 X i Tu = a i λi ti̅ = X ia i
(A4)
where λi can be calculated by
u TX iX i Tu = λi
(A5)
and satisfies λg = ΣCi=1 λi. Before extracting the second common regression subscore in each data space, the predictor space (Xi) is deflated by the first subscore: pi T = ( ti̅ T ti̅ )−1 ti̅ TX i X i = X i − ti̅ pi T
APPENDIX A
(A6)
Then the procedure is repeated. The new regression subscores will be orthogonal to the previous ones. R̅ number of Y-score vectors will be derived, resulting in the same number of subscore vectors that are mutually orthogonal in each predictor space based on eq A4. The corresponding ai vectors are gathered in A̅ i(Jx,i × R̅ ) and subscores in T̅ i(N × R̅ ). The weights needed for calculating T̅ i(N × R̅ ) directly from Xi can be calculated much like with the PLS algorithm:9
MsRA Algorithm
A two-step MsRA algorithm is designed to extract the regression scores that are common across data sets. The algorithm finds different linear combinations of the variables in each of C collective predictor data sets (i.e., the subscores) and makes them all close related to the same score in Y space. Step One. In the first step of MsRA, the cost function and certain constraints are defined as follows: C
R̅ i = A̅ i (Pi̅ TA̅ i )−1
max R2 = max ∑ (v TYTX ia i)2 i=1 T ⎧ ⎪v v = 1 s.t. ⎨ ⎪ T ⎩ai ai = 1
where P̅ i is the loading matrix calculated by P̅ i T = (T̅ iTT̅ i)−1T̅ iTXi. However, the maximization of covariance information, as obtained by calculating the inner product between the global and subscore vectors, may not necessarily imply strong correlations. A higher covariance could result from the larger modules of subscore vectors. Some low-module scores which can reveal the Y-related predictor variation information common across sets are likely to be overlooked. This may lead to misinterpretation of the true predictor interrelations among sets. It may even result in some pseudo common subscores which are in fact quite different across sets and thus do not resemble the so-called global score (Yv). Step Two. It may not be possible to extract the score vectors common across sets in step one based on the stacked subset covariances. To obtain the subscores that are common across sets and that are closely correlated, the correlation analysis index should be used instead of the covariance index. The correlation analysis index is implemented on the basis of the result from the first step of analysis (T̅ i(N × R̅ )) and a
(A1)
The Y-score Yv serves as the global score to which all Xi-scores in different Xi spaces are related. The combination coefficient vector ai is set to unit length. Therefore, (vTYTXiai)2 actually models the covariance between the subregression score (Xiai) and the quality score (Yv). Thus the objective function undesirably involves the modules of subscore vector and global scores rather than the pure correlation analysis. Using a Lagrange operator, the initial objective function can be expressed as an unconstrained extremum problem: C
F(v, a i , λg , λi) =
∑ (v TYTX ia i)2 − λg (v Tv − 1) i=1 C
−
∑ λi(a i Ta i − 1) i=1
(A7)
(A2)
where λg and λi are constant scalars. 8506
dx.doi.org/10.1021/ie300731k | Ind. Eng. Chem. Res. 2012, 51, 8495−8509
Industrial & Engineering Chemistry Research
Article
The algorithm is also performed in two steps. In the first step, the same objective function as that shown in eq A1 is used except that a quality data set is specific to one data set:
constrained optimization problem is formulated as follows: C 2 max R2 = max ∑ (v TYTTa i̅ i)
C
i=1
max R2 = max ∑ (v TYi TX ia i)2
T T ⎧ ⎪ v Y Yv = 1 s.t. ⎨ ⎪ T T ⎩ a i Ti̅ Ta i̅ i = 1
i=1 T ⎧ ⎪v v = 1 s.t. ⎨ ⎪ T ⎩ai ai = 1
(A8)
Using a Lagrange operator, the original optimization problem can be rewritten as an unconstrained extremum problem:
Based on a similar derivation procedure, the solution in eq A3 is modified as
C
F(v, a i , λg , λi) =
∑ (v
T T
2
T T
Y Ta i̅ i) − λg (v Y Yv − 1)
C
i=1
(∑ Yi TX iX i TYi)v = λg v
C
−
∑ λ i (a i
T
T
Ti̅ Ta i̅ i − 1)
i=1
i=1
(A9)
(B2)
The associated subweight and subscore vectors in different predictor data sets can be calculated by
where λg and λi are constant scalars. The optimization problem finally leads to a simple analytical solution:
1 X i TYiv = a i λi
C
Y(YTY)−1YT(∑ Ti̅ (Ti̅ TTi̅ )−1Ti̅ T)u = λg u
ti̅ = X ia i
i=1
(B3)
where λi = v Then, the deflation is performed before the calculation of the subsequent regression subscores. R̅ number of Y-score vectors can be derived. The corresponding ai vectors are gathered in A̅ i(Jx,i×R̅ ) and subscores in T̅ i(Ni×R̅ ). The weights needed for calculating T̅ i(Ni×R̅ ) directly from Xi can be formulated much like with the PLS algorithm:9 T
YiTXiXiTYiv.
Su = λg u
(A10)
where S = Y(YTY)−1YT(∑i = 1 Ti̅ (Ti̅ TTi̅ )−1Ti̅ T). The associated subweight and subscore vectors in different data sets can be calculated by C
ai =
(B1)
1 (Ti̅ TTi̅ )−1Ti̅ Tu λi
R̅ i = A̅ i (Pi̅ TA̅ i )−1
(B4)
where P̅ i is the loading matrix calculated by P̅ i = (T̅ iTT̅ i)−1T̅ iTXi. In the second step, the same objective function as that shown in eq A8 is used except that a quality data set is specific to one data set: T
ti = Ta i̅ i
(A11)
where λi can be calculated by u TTi̅ (Ti̅ TTi̅ )−1Ti̅ Tu = λi
(A12)
ΣCi=1
and satisfies λg = λi. In turn, the global Y-scores are calculated by the second eigenvector of S in eq A10 and so on. These global Y-scores form a global score subspace U(N × R) and thus different predictor subscore spaces Ti(N × R) composed of ti vectors. Therefore, the final weights needed for calculating the common subscores directly from the predictor variables are derived: R i = R̅ iA i
C 2 max R2 = max ∑ (v TYi TTa i̅ i) i=1 T T ⎧ ⎪ v Yi Yiv = 1 ⎨ s.t. ⎪ T T ⎩ a i Ti̅ Ta i̅ i = 1
Based on a similar derivation procedure, the solution in eq A10 is modified as
(A13)
C
where R̅ i is the weight matrix from the first step and Ai is composed of the weights (ai) calculated in the second step. It is clear that PiTRi = I, where Pi is the loading matrix calculated in the second step by PiT = (TiTTi)−1TiTXi. Unlike in the first step of extraction, in the second step of extraction the effects of the module lengths of subscore vectors are excluded and the sum of their correlations is directly maximized. This can be seen as the postprocessing of the results from the first step of analysis.
■
(B5)
C
̃ Ỹ (∑ Yi TYi)−1(∑ Yi TTi̅ (Ti̅ TTi̅ )−1Ti̅ TYi)v = λg Yv i=1
i=1
(B6)
where Ỹ is the Ni × Jy -dimensional joint quality obtained by arranging each quality data set (Yi) variablewise one after another. The Yi-score ui is calculated by Yiv. The subweight and subscore for each predictor data set (Xi) can be calculated by ΣCi=1
APPENDIX B
ai =
Modified MsRA Algorithm
Different sets of regression pairs {Xi(Ni×Jx,i), Yi(Ni×Jy)} are prepared, where Ni indicates the number of samples in a data set. The original MsRA algorithm is modified to tackle cases like this one.
1 (Ti̅ TTi̅ )−1Ti̅ Tu i λi
ti = Ta i̅ i
(B7)
where the parameter λ i is calculated by λ i = v Y iT T̅ iT(T̅ iTT̅ i)−1T̅ iT Yiv. Then, the subscore is normalized and T
8507
dx.doi.org/10.1021/ie300731k | Ind. Eng. Chem. Res. 2012, 51, 8495−8509
Industrial & Engineering Chemistry Research
Article
(18) Nomikos, P.; MacGregor, J. F. Multivariate SPC charts for monitoring batch processes. Technometrics 1995, 37, 41. (19) Nomikos, P.; MacGregor, J. F. Multiway partial least squares in monitoring batch processes. Chemom. Intell. Lab. Syst. 1995, 30, 97. (20) van Sprang, E. N. M.; Ramaker, H.-J.; Westerhuis, J. A.; Gurden, S. P.; Smilde, A. K. Critical evaluation of approaches for on-line batch process monitoring. Chem. Eng. Sci. 2002, 57, 3979. (21) ündey, C.; Ertunc, C. S.; Cinar, A. Online Batch/Fed-batch Process performance monitoring, Quality Prediction, and VariableContribution Analysis for Diagnosis. Ind. Eng. Chem. Res. 2003, 42, 4645. (22) Lee, J.-M.; Yoo, C. K.; Lee, I.-B. Enhanced process monitoring of fed-batch penicillin cultivation using time-varying and multivariate statistical analysis. J. Biotechnol. 2004, 110, 119. (23) Hwang, D. H.; Han, C. H. Real-time monitoring for a process with multiple operation modes. Control Eng. Pract. 1999, 7, 891. (24) Lane, S.; Martin, E. B.; Kooijmans, R.; Morris, A. J. Performance monitoring of a multi-product semi-batch process. J. Process Control 2001, 11, 1. (25) Liu, J. L. Process monitoring using bayesian classification on PCA subspace. Ind. Eng. Chem. Res. 2004, 43, 7815. (26) Lu, N. Y.; Gao, F. R.; Wang, F. L. A sub-PCA modeling and online monitoring strategy for batch processes. AIChE J. 2004, 50, 255. (27) Zhao, S. J.; Zhang, J.; Xu, Y. M. Monitoring of processes with multiple operating modes through multiple principle component analysis models. Ind. Eng. Chem. Res. 2004, 43, 7025. (28) Lee, Y. H.; Jin, H. D.; Han, C. H. On-line process state classification for adaptive monitoring. Ind. Eng. Chem. Res. 2006, 45, 3095. (29) Zhao, S. J.; Zhang, J.; Xu, Y. M. Performance monitoring of processes with multiple operating modes through multiple PLS models. J. Process Control 2006, 16, 763. (30) Yoo, C. K.; Villez, K.; Lee, I.; Rosen, C.; Vanrolleghem, P. A. Multi-model statistical process monitoring and diagnosis of a sequencing batch reactor. Biotechnol. Bioeng. 2007, 96, 687. (31) Yuan, B.; Wang, X. Z. Multilevel PCA and inductive learning for knowledge extraction from operational data of batch processes. Chem. Eng. Commun. 2001, 185, 201. (32) Ü ndey, C.; Cinar, A. Statistical monitoring of multistage, multiphase batch processes. IEEE Control Syst. Mag. 2002, 22, 40. (33) Zhao, C. H.; Wang, F. L.; Mao, Z. Z.; Lu, N. Y.; Jia, M. X. Improved batch process monitoring and quality prediction based on multiphase statistical analysis. Ind. Eng. Chem. Res. 2008, 47, 835. (34) Camacho, J.; Picó, J. Online monitoring of batch processes using multi-phase principal component analysis. J. Process Control 2006, 16, 1021. (35) Camacho, J.; Pico, J. Multi-phase analysis framework for handling batch processes data. J. Chemom. 2008, 22, 632. (36) Macgregor, J. F.; Jaeckle, C.; Kiparissides, C.; Koutoudi, M. Process monitoring and diagnosis by multiblock PLS methods. AIChE J. 1994, 40, 826. (37) Kourti, T.; Nomikos, P.; Macgregor, J. F. Analysis, monitoring and fault-diagnosis of batch processes using multiblock and multiway PLS. J. Process Control 1995, 5, 277. (38) Westerhuis, J. A.; Kourti, T.; MacGregor, J. F. Analysis of multiblock and hierarchical PCA and PLS models. J. Chemom. 1998, 12, 301. (39) Qin, S. J.; Valle, S.; Piovoso, M. J. On unifying multiblock analysis with application to decentralized process monitoring. J. Chemom. 2001, 15, 715. (40) Zhao, C. H.; Gao, F. R. Multiblock-Based Qualitative and Quantitative Spectral Calibration Analysis. Ind. Eng. Chem. Res. 2010, 49, 8694. (41) Liu, J.; Wong, D. S. H. Fault detection and classification for a two-stage batch process. J. Chemom. 2008, 22, 385−398. (42) Zhao, C. H.; Gao, F. R. A Two-step Basis Vector Extraction Strategy for Multiset Variable Correlation Analysis. Chemom. Intell. Lab. Syst. 2011, 107 (1), 147−154.
used to deflate Xi for the calculation of subsequent subscores. These subscores form different predictor subscore spaces Ti(Ni×R). The corresponding ai vectors are gathered in A̅ i(Jx,i×R̅ ). Therefore, the final weights needed for calculating common subscores directly from the process variables are obtained: R i = R̅ iA i
(B8)
PiTRi
It is clear that = I, where Pi is the loading matrix calculated in the second step by PiT = (TiTTi)−1TiTXi.
■
AUTHOR INFORMATION
Corresponding Author
*Tel.: 86-571-87951879. Fax: 86-571-87951879. E-mail address:
[email protected]. Notes
The authors declare no competing financial interest.
■
ACKNOWLEDGMENTS This work is supported by the “Fundamental Research Funds for the Central Universities (2012QNA5012)” and the National Program on Key Basic Research Project (973 Program) under grant 2012CB720505.
■
REFERENCES
(1) Wold, S.; Esbensen, K.; Geladi, P. Principal component analysis. Chemom. Intell. Lab. Syst. 1987, 2, 37. (2) Dunteman, G. H. Principal component analysis; SAGE publication LTD: London, 1989. (3) Jackson, J. E. A User’s Guide to Principal Components; Wiley: New York, 1991. (4) Wang, X. Z.; Li, R. F. Combining conceptual clustering and principal component analysis for state space based process monitoring. Ind. Eng. Chem. Res. 1999, 38, 4345. (5) Martin, E. B.; Morris, A. J. Enhanced bio-manufacturing through advanced multivariate statistical technologies. J. Biotechnol. 2002, 99, 223. (6) Qin, S. J. Statistical process monitoring: Basics and beyond. J Chemom. 2003, 17, 480. (7) Martens, H.; Naes, T. Multivariate Calibration, 2nd ed.; Chichester: Wiley, 1994. (8) Burnham, A. J.; Viveros, R.; MacGregor, J. F. Frameworks for latent variable multivariate regression. J Chemom. 1996, 10, 31. (9) Doyal, B. S.; MacGregor, J. F. Improved PLS algorithms. J Chemom. 1997, 11, 73. (10) Brereton, R. G. Introduction to multivariate calibration in analytical chemistry. Analyst 2000, 125, 2125. (11) Kleinbaum, D. G.; Kupper, L. L.; Muller, K. E.; Nizam, A. Applied Regression Analysis and Other Multivariable Methods, (3rd ed.; Wadsworth Publishing Co Inc: CA, 2003. (12) Ergon, R. Reduced PCR/PLSR models by subspace projections. Chemom. Intell. Lab. Syst. 2006, 81, 68. (13) Cserhati, T.; Kosa, A.; Balogh, S. Comparison of partial leastsquare method and canonical correlation analysis in a quantitative structure-retention relationship study. J Biochem. Biophys. Methods 1998, 36, 131. (14) Anderson, T. W. Canonical correlation analysis and reduced rank regression in autoregressive models. Ann. Stat. 2002, 30, 1134. (15) Hardoon, D. R.; Szedmak, S.; Taylor, J. S. Canonical correlation analysis: An overview with application to learning methods. Neural Comput. 2004, 16, 2639. (16) Yu, H. L.; MacGregor, J. F. Post processing methods (PLSCCA): Simple alternatives to preprocessing methods (OSC-PLS). Chemom. Intell. Lab. Syst. 2004, 73, 199. (17) Nomikos, P.; MacGregor, J. F. Monitoring of batch processes using multiway principal component analysis. AIChE J. 1994, 40, 1361. 8508
dx.doi.org/10.1021/ie300731k | Ind. Eng. Chem. Res. 2012, 51, 8495−8509
Industrial & Engineering Chemistry Research
Article
(43) Zhao, C. H.; Yao, Y.; Gao, F. R.; Wang, F. L. Statistical Analysis and Online Monitoring for Multimode Processes with Between-mode Transitions. Chem. Eng. Sci. 2010, 65 (22), 5961−5975. (44) Zhao, C. H.; Gao, F. R. A Two-step Multiset Regression Analysis (MsRA) Algorithm. Ind. Eng. Chem. Res. 2012, 51 (3), 1337− 1354. (45) Zhao, C. H.; Wang, F. L.; Mao, Z. Z.; Lu, N. Y.; Jia, M. X. Improved Knowledge Extraction and Phase-based Quality Prediction for Batch Processes. Ind. Eng. Chem. Res. 2008, 47 (3), 825−834. (46) Lowry, C. A.; Montgomery, D. C. A. review of multivariate control charts. IIE Trans. 1995, 27, 800−810. (47) Box, G. E. P. Some Theorems on Quadratic Forms Applied in the Study of Analysis of Variance Problems, I. Effect of Inequality of Variance in the One-Way Classification. Ann. Math. Stat. 1954, 25, 290. (48) Jackson, J. E.; Mudholkar, G. S. Control Procedures for Residuals Associated with Principal Component Analysis. Technometrics 1979, 21, 341. (49) Pham, H. Springer handbook of engineering statistics; SpringerVerlag London Limited: London, 2006. (50) Yang, Y.; Gao, F. Cycle-to-cycle and within-cycle adaptive control of nozzle pressures during packing-holding for thermoplastic injection molding. Polym. Eng. Sci. 1999, 39, 2042. (51) Zhao, C. H.; Gao, F. R. Multiphase Calibration Modeling and Quality Interpretation by Priority Sorting. Chem. Eng. Sci. 2011, 66, 5400.
8509
dx.doi.org/10.1021/ie300731k | Ind. Eng. Chem. Res. 2012, 51, 8495−8509