Just-in-Time Selection of Principal Components for Fault Detection

Equipment and Control Engineering, Zhejiang University of Technology, Hangzhou 310014, China. Ind. Eng. Chem. Res. , Article ASAP. DOI: 10.1021/ac...
2 downloads 16 Views 6MB Size
Article Cite This: Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

pubs.acs.org/IECR

Just-in-Time Selection of Principal Components for Fault Detection: The Criteria Based on Principal Component Contributions to the Sample Mahalanobis Distance Lijia Luo,* Shiyi Bao, Jianfeng Mao, and Di Tang Institute of Process Equipment and Control Engineering, Zhejiang University of Technology, Hangzhou 310014, China ABSTRACT: Principal component analysis (PCA) has been widely used in the field of fault detection. A main difficulty in using PCA is the selection of principal components (PCs). Different PC selection criteria have been developed in the past, but most of them do not connect the selection of PCs with the fault detection. The selected PCs may be optimal for data modeling but not for fault detection. In this paper, the just-in-time cumulative percent contribution (JITCPC) criterion and the just-in-time contribution quantile (JITCQ) criterion are proposed to select PCs from the viewpoint of fault detection. In the JITCPC and JITCQ criteria, the contributions of PCs to the sample Mahalanobis distance are used to evaluate the importance of PCs to fault detection. The larger contribution the PC makes, the more important it is to detect a fault. The JITCPC criterion selects the leading PCs with the cumulative percent contribution (CPC) larger than a predefined threshold (e.g., 90%). The JITCQ criterion selects the PCs with contributions larger than a quantile (e.g., median) of contributions of all PCs. The PCs selected by the JITCPC or JITCQ criterion vary with samples to guarantee that the key features of each sample are captured. The selected and nonselected PCs are used to define the primary and secondary T2 statistics, respectively. A fault detection method is then proposed. The effectiveness and advantages of the proposed PC selection criteria and the fault detection method are illustrated by case studies in a simulation example and in an industrial process.

1. INTRODUCTION The safe production of high quality products is one of the main objectives of industrial processes. To meet increasing demands for higher quality products and increasingly stringent process safety regulations, control engineering techniques have been widely applied in modern industrial processes. Controllers are designed to keep the process running smoothly and safely by compensating for the effect of disturbances occurring within the process. Although process controllers can overcome many types of disturbances, there are special events which the controllers cannot handle adequately. These special events are referred to as process faults that may be caused by process parameter changes, equipment malfunctions, actuator or sensor failures, improper operations, and abnormal disturbances. Process faults can lead to out of control behaviors, product deterioration, performance degradation, and even damage to the process equipment itself or to human health. The quick and accurate detection of these process faults is helpful for minimizing downtime, increasing the operation safety, avoiding equipment damage, and reducing economic loss. Therefore, it is necessary to develop effective fault detection techniques to detect process faults as early and as accurately as possible. In the last few decades, data-driven fault detection methods have been widely investigated.1−6 These methods often use multivariable statistical analysis techniques to analyze process © XXXX American Chemical Society

data for building a monitoring model which describes the major relations among process variables. When a fault appears, it may change the inherent variable relations and this could be detected. Principal component analysis (PCA) is one of the most popular multivariable statistical analysis techniques for developing the data-driven fault detection methods. PCA is an optimal dimensionality reduction technique in terms of capturing the variance of data. Principal components (PCs) extracted from the data by PCA are a set of linear combinations of measured variables, and they account for correlations among variables. After implementing dimensionality reduction on the data, the PCA divides the measurement space into two orthogonal subspaces: a principal component subspace (PCS) consisting of the leading PCs, and a residual subspace (RS) consisting of the remaining PCs. The T2 and squared prediction error (SPE) statistics1 are then defined in the PCS and RS respectively. If faults arise in the process operation, the T2 and SPE statistics can be used to detect them. Received: Revised: Accepted: Published: A

September 16, 2017 December 25, 2017 February 26, 2018 February 27, 2018 DOI: 10.1021/acs.iecr.7b03840 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Article

Industrial & Engineering Chemistry Research

Figure 1. Fault detection results of two PCs for all test samples. (a) Scree plot, (b) PC1 and PC2, (c) PC3 and PC4, (d) PC2 and PC3.

samples. Due to the aforementioned drawbacks, the conventional PC selection methods are not suitable for the use in fault detection. Therefore, it is necessary to develop more effective selection methods to choose optimal PCs for fault detection. In this paper, new PC selection methods are proposed from the viewpoint of fault detection. The PC contributions to the sample Mahalanobis distance are used to evaluate the importance of PCs for fault detection. The PCs with larger contributions are more important to detect faults. To choose an appropriate number of PCs for fault detection, two selection criteria named the just-in-time cumulative percent contribution (JITCPC) criterion and the just-in-time contribution quantile (JITCQ) criterion are proposed. In the JITCPC criterion, after sorting all PCs in descending order of contributions, the leading PCs with the cumulative percent contribution (CPC) larger than a predefined threshold (e.g., 90%) are selected. The JITCQ criterion selects the PCs with contributions larger than a quantile (e.g., median) of contributions of all PCs. The JITCPC and JITCQ criteria carry out the PC selection for each sample separately, and the selected PCs vary with samples to guarantee that the important features of each sample can be captured. The selected and nonselected PCs are used to define the primary and secondary T2 statistics respectively for detecting faults. The implementation, effectiveness and advantages of the proposed PC selection criteria and the fault detection method are illustrated by the applications to a simulation example and to the Tennessee Eastman process. The rest of the paper is organized as follows. Section 2 uses a motivating example to illustrate the need for developing effective PC selection criteria for fault detection, and then the JITCPC and JITCQ selection criteria are presented. Section 3

One of the main difficulties in using PCA for fault detection is choosing appropriate PCs for the PCS. In conventional PCAbased fault detection methods, the PCs corresponding to the first q largest eigenvalues are often retained in the PCS. The optimal number (q) of retained PCs can be determined by different PC selection methods, such as cumulative percent variance (CPV),7 scree test on residual percent variance (RPV),8 average eigenvalue (AE),9 cross validation based on the predicted error sum of squares or R ratio,10−13 variance of the reconstruction error (VRE),14,15 Akaike information criterion (AIC),16 and imbedded error function (IEF).17 The above PC selection methods choose PCs in the order of largest to smallest eigenvalues, under the assumption that PCs corresponding to larger eigenvalues are more important than the PCs associated with smaller eigenvalues. This assumption is reasonable from the viewpoint of data modeling, since a PC corresponding to a larger eigenvalue captures a larger part of data information. However, it may be not reasonable from the viewpoint of fault detection, because faulty data are different from the normal data used to compute the PCs, and thus the main fault information may be captured by the PCs with smaller eigenvalues rather than the PCs with larger eigenvalues. In other word, the PCs selected on the basis of eigenvalues may be not optimal for revealing the largest differences between faulty data and normal data. Moreover, in conventional PC selection methods, the selected PCs do not vary with samples; namely, the same group of PCs is used for all samples. This is obviously not suitable for fault detection. The reason is that faults occur not only in the selected PCs but in any PCs, and the effects of faults on the PCs vary with samples. A fixed group of PCs cannot capture the importance features of all faulty B

DOI: 10.1021/acs.iecr.7b03840 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Article

Industrial & Engineering Chemistry Research

principal component subspace (i.e., PC1 and PC2) together with the SPE statistic defined in the residual subspace (i.e., PC3 and PC4), we still cannot detect all faulty samples in Y. However, all faulty samples in Y are successfully detected when using PC2 and PC3, as shown in Figure 1d. The above example demonstrates that the selection of PCs has great effect on the fault detection results. Most of traditional PC selection methods may not choose appropriate PCs for fault detection, because they do not connect the selection of PCs with the fault detection. They tend to choose the PCs which can capture most of the variance of normal data, whereas the characteristics of faulty data are ignored. Since faults can occur in any PCs, the PCs selected only according to the normal data may be not optimal for revealing the differences between faulty data and normal data. This reduces the fault detection performance. Besides, when using the traditional PC selection methods, the selected PCs do not vary with samples. This is not suitable for fault detection, because faults occur not just in the selected PCs but in any PCs for different faulty samples. To capture the key features of each faulty sample, the selected PCs should vary with samples. Therefore, effective selection criteria are in urgent need to choose optimal PCs from the viewpoint of fault detection. 2.2. Selection Criteria Based on PC Contributions to the Sample Mahalanobis Distance. Let X denote a training data set containing n samples of m variables. A total of m PCs are obtained by carrying out PCA on the training data set X. The Mahalanobis distance (MD) of a sample x can be defined in terms of the PCA scores and eigenvalues

describes the fault detection method based on the JITCPC and JITCQ criteria. Section 4 demonstrates the effectiveness and advantages of the proposed methods using two case studies in a simulation example and in the Tennessee Eastman process. A brief summary and the conclusions are given in Section 5.

2. CRITERIA FOR SELECTING PRINCIPAL COMPONENTS FOR FAULT DETECTION 2.1. A Motivating Example. A motivating example is shown to illustrate the need for effective selection criteria which can choose appropriate principal components to improve the fault detection performance. Example. Consider a system with four variables and three factors x1 = 0.5 + 2t1 x 2 = 1.5 − 0.5t 2 x3 = 0.5t1 + 0.02t 2 x4 = 1.0 + 40(t 2 − t1) + 10t3

(1)

where t1, t2, and t3 are three mutually independent factors that follow the normal distribution with zero mean and unit standard deviation, i.e., ti ∼ N(0,1). A training data set X containing 1000 samples was generated by eq 1, and these data represent the normal conditions. A test data set Y containing 20 samples was generated using the following functions y1 = 0.5 + 2k1 y2 = 1.5 − 0.5k 2

MDx = (x − x0)T S−1(x − x0)

y3 = 0.5k1 + 0.02k 2

= (x − x0)T PPTS−1PPT(x − x0)

y4 = 1.0 + 40(k 2 − k1) + 11k 3 + 12

= (t − t 0)T Λ−1(t − t 0)

(2)

m

where k1, k2, and k3 are mutually independent factors that follow the normal distributions: k1 ∼ N(2,1), k2 ∼ N(1.5,1) and k3 ∼ N(2,1). The samples in Y are faulty relative to X, because they do not fit well with the system functions in eq 1. The aim of fault detection is to use the training data X to detect all faulty samples in Y. Carrying out PCA on X produces four principal components (PCs). The eigenvalues corresponding to four PCs are shown in the scree plot in Figure 1a. Note that the first two PCs explain about 99.6% of the variance of the data set X. Consider the problem of choosing appropriate PCs for detecting the faulty samples in Y. When using traditional methods (e.g., cumulative percent variance,7 scree test on residual percent variance,8 average eigenvalue,9 and cross validation based on the predicted error sum of squares10−13) to select the PCs to be retained in the principal component subspace, the first two PCs will be chosen, and thus the remaining two PCs constitute the residual subspace. Figure 1b and c show the fault detection results when using the first two PCs and the last two PCs, where the red circle represents the 99% control limit of the Mahalanobis distances (i.e., T2 statistics) of training samples in X. In Figure 1b, only six samples (3, 9, 10, 13, 14, and 18) in Y are detected to be faulty because these samples are outside the control limit. Sixteen faulty samples are detected in Figure 1c, with the exception of four samples (2, 6, 12 and 15). Samples 2, 6, 12, and 15 are not detected both in Figure 1b and in Figure 1c. This means that even if using the T2 statistic defined in the

=

∑ λi−1(ti − t0,i)2 i=1 m

=

T (x − x0) ∑ λi−1(x − x0)T pp i i i=1

(3)

where x0 is the mean of n training samples, S is the covariance matrix of X, P = [p1, p2,···, pm] (PPT = PTP = I) is the loading matrix of PCA, t = [t1, t2,···, tm]T = PTx is the score vector of x, t0 is the score vector of x0, pi is the ith loading vector, and Λ = diag(λ1, λ2,···, λm] = PTSP is a diagonal matrix consisting of PCA eigenvalues λ1, λ2,···, λm. Note that the covariance matrix S is assumed to be nonsingular (When S is a singular matrix, it can be replaced by Ŝ = S + εI with ε being a very small positive number). The MD provides a measure of the variation within the sample x relative to the variation within the training data. A larger MD indicates a larger deviation from the training data. A sample is considered to be faulty if its MD exceeds the normal limit. Based on the decomposition of MD in eq 3, the contribution of the ith PC to the MD of the sample x can be defined as T ci , x = λi−1(x − x0)T pp (x − x0) i i

(4)

Note that the contribution of the ith PC is proportional to the inverse of the ith PCA eigenvalue. This implies that PCs with smaller eigenvalues may make more contributions than PCs with larger eigenvalues, and thus they are more important C

DOI: 10.1021/acs.iecr.7b03840 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Article

Industrial & Engineering Chemistry Research

where hx = m − lx is the number of nonselected PCs for the sample x. The primary and secondary T2 statistics are two mutually complementary statistics that monitor the data variation captured by the selected and nonselected PCs, respectively. To evaluate when large values of the above two T2 statistics are significant for determining the occurrence of faults, control limits of two T2 statistics are required. Because the distributions of two T2 statistics are unknown, the kernel density estimation (KDE) is applied to determine the control limits of two T2 statistics. The primary T2 statistics of n training samples in X constitute a univariate set Ω = {Tx21,p, Tx22,p,···, T2xn,p}. The probability density function for the primary T2 statistic may be estimated from Ω using KDE18

for fault detection. Therefore, traditional PC selection methods, which often place more importance on the PCs with larger eigenvalues, are unsuitable for the use in fault detection. The PCs which make larger contributions are crucial for fault detection, because they are the main factors causing the MD to exceed the normal limit. From the viewpoint of fault detection, choosing the PCs with larger contributions is the priority. The remaining problem is to determine how many PCs should be selected for each sample. In this paper, the following two justin-time selection criteria are proposed for determining the number of selected PCs for each sample: (1) The just-in-time cumulative percent contribution (JITCPC) criterion: After sorting the PC contributions to the MD of the sample x in descending order, that is, c1,x ≥ c2,x ≥ ··· ≥ cm,x, the cumulative percent contribution (CPC) made by the first l PCs is defined as

f ̂ (Tx2, p) =

l

CPCx (l) = 100

∑ j = 1 cj , x m

∑i = 1 ci , x

%

1 nθ

n

⎛ T2 − T2 ⎞ x ,p xi , p ⎟ ⎟ θ ⎝ ⎠

∑ K ⎜⎜ i=1

(8)

where K(·) is a kernel function, and θ is a bandwidth parameter. The control limit, T2c,p, of the primary T2 statistic at significance level α is thus obtained from

(5)

The first l*x PCs corresponding to a desired CPC (e.g., CPC ≥ 90%) are selected for the sample x. (2) The just-in-time contribution quantile (JITCQ) criterion: For a sample x, the PCs with contributions larger than or equal to a threshold cqx are selected, where cqx is the q (0 < q < 1, e.g., 0.5) quantile of contributions of all PCs. Note that the above two selection criteria should be applied to each sample separately, because the PCs may contribute very differently to different samples. Consequently, different PCs may be selected for different samples, no matter which selection criterion is used. However, when using traditional PC selection methods (e.g., CPV, AE, or RPV) based on PCA eigenvalues, a same group of PCs are selected for all samples. This is a main difference between the selections of PCs according to PC contributions and according to PCA eigenvalues. When using the JITCPC criterion, the number of selected PCs may also vary with samples, but the selected PCs have similar CPCs for all samples. On the contrary, when using the JITCQ criterion, the same numbers of PCs are selected for all samples, but the CPC of the selected PCs may vary with samples.

Tc2, p

∫−∞

f ̂ (Tx2, p)dTx2, p = 1 − α

(9)

The control limit, T2c,s, of the secondary T2 statistic can also be obtained by KDE in a similar way. A test sample is considered to be faulty (with (1-α)·100% confidence), if either its primary or secondary T2 statistic exceeds the corresponding control limit. 3.2. Fault Detection Procedure. A two-staged fault detection procedure is summarized as follows: (see Figure 2).

3. FAULT DETECTION USING THE PROPOSED PC SELECTION CRITERIA 3.1. Primary and Secondary T2 Statistics. Let X be a training data set consisting of n samples and m variables. Carrying out PCA on X produces m PCs. For a sample x, the contributions of m PCs to the MD can be computed via eq 4. Then, the PCs which are crucial for fault detection are selected using the JITCPC or JITCQ criterion. A primary T2 statistic is defined as the summation of contributions of all selected PCs Tx2, p =

Figure 2. Fault detection procedure using the proposed PC selection criteria.

lx

T (x − x0) ∑ λi−1(x − x0)T pp i i i=1

(6)

where lx is the number of PCs selected for the sample x. In addition, a secondary T2 statistic is defined as the summation of contributions of all nonselected PCs Tx2, s =

Stage I: Of f line data analysis (1) Normalize the training data set X to zero mean and unit variance. (2) Apply PCA on X to produce PCs. (3) Use the JITCPC or JITCQ criterion to select PCs for each training sample.

hx

∑ λj−1(x − x 0)T pj pjT (x − x0) j=1

(7) D

DOI: 10.1021/acs.iecr.7b03840 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Article

Industrial & Engineering Chemistry Research (4) Compute primary and secondary T2 statistics for each training sample via eq 6 and eq 7. (5) Determine control limits for the primary and secondary T2 statistics by KDE. Stage II: Online fault detection For each new sample, xnew (1) Normalize xnew using the mean and variance of training data. (2) Project xnew onto all PCs obtained from the training data. (3) Compute contributions of PCs to the Mahalanobis distance (MD) of xnew. (4) Use the JITCPC or JITCQ criterion to select PCs for xnew. (5) Compute the primary and secondary T2 statistics for xnew via eq 6 and eq 7. (6) Compare two T2 statistics to the control limits. If any of them exceeds the control limit, detect a fault. (7) Return to step 1) and monitor the next new sample.

for samples 2, 6, 12, and 15. This explains why faulty samples 2, 6, 12, and 15 can be detected in the subspace of PC2 and PC3 but not detected in the subspaces of others PCs, as shown in Figure 1. The selected and nonselected PCs are used to define the primary and secondary T2 statistics for fault detection. Control limits for two T2 statistics are set to be 99%, corresponding to α = 0.01 in eq 9. Figure 3 shows the fault detection results for all

4. CASE STUDIES 4.1. The Motivating Example. The proposed PC selection criteria and fault detection method are applied to the motivating example in Section 2.1. Four PCs are obtained by applying PCA to the training data set X. To make a comparison between two selection criteria, both the JITCPC and JITCQ criteria are used for the PC selection. For the JITCPC criterion, the desired CPC is set to be larger than 90%. The median (i.e., q = 0.5) is used for the JITCQ criterion. Table 1 shows the PCs selected by the JITCPC and JITCQ criteria for all samples in the test data set Y. When using the Table 1. PCs Selected by the JITCPC and JITCQ Criteria for 20 Test Samplesa sample

JITCQ

sample

PC3, PC2 PC3, PC2

11 12

PC3, PC2 PC3, PC2

PC3, PC2 PC3, PC2

PC3, PC2

13

PC3, PC1

PC3, PC1

4

PC3, PC2 PC3, PC2, PC1 PC3, PC2, PC1 PC3, PC2

PC3, PC2

14

PC2, PC3

5 6

PC3, PC1 PC3, PC2

PC3, PC1 PC3, PC2

15 16

7 8 9

PC3, PC2 PC3, PC2 PC3, PC2, PC1 PC3, PC2, PC1

PC3, PC2 PC3, PC2 PC3, PC2

17 18 19

PC2, PC3, PC1 PC2, PC3 PC3, PC1, PC2 PC3, PC2 PC2, PC3 PC3, PC2

PC3, PC2

20

PC3, PC1, PC2

PC3, PC1

1 2 3

10 a

JITCPC

JITCPC

JITCQ

PC2, PC3 PC3, PC1

Figure 3. Fault detection results of the primary and secondary T2 statistics for 20 test samples. (a) JITCPC criterion, (b) JITCQ criterion.

PC3, PC2 PC1, PC2 PC3, PC2

samples in the test data set Y when using the JITCPC and JITCQ criteria. For comparison, Figure 4 shows the fault detection results when using the traditional PCA-based T2 and SPE statistics,1 with the T2 statistic defined using the first two PCs (i.e., PC1 and PC2) while the SPE statistic is defined by the last two PCs (i.e., PC3 and PC4). Figure 4 shows that the traditional PCA-based T2 statistic only detects 6 faulty samples. In particular, the faulty samples 2, 6, 11, 12, 15, and 16 are neither detected by the traditional PCA-based T2 statistic nor the SPE statistic. However, as shown in Figure 3, all faulty samples are successfully detected by the primary T2 statistic, no matter which selection criterion is used. The secondary T2 statistic detects only a few faulty samples, because it captures a very small part of data variation of faulty samples. The above results demonstrate that the proposed JITCPC and JITCQ

PCs are sorted in descending order of contributions.

JITCPC criterion, different numbers of PCs were selected for 20 test samples to ensure that the CPC for each sample meets the constraint: CPC ≥ 90%. When using the JITCQ criterion with q = 0.5, the two PCs with the largest contributions were selected for each sample. Although the PC3 corresponding to a much smaller eigenvalue (≈0.0163) than PC1 and PC2 (see Figure 1a), it makes the largest contribution to the MDs of most samples (with the exception of three samples 14, 15, and 18), as shown in Table 1. Therefore, the PC3 is more important than PC1 and PC2 for fault detection. The PC3 and PC2 are the two largest contributors for 15 samples, especially E

DOI: 10.1021/acs.iecr.7b03840 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Article

Industrial & Engineering Chemistry Research

time of 48 h. In each test data set, the process fault occurred at the 160th sample. Table 3. Faults in the TE Process no. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

Figure 4. Fault detection results of traditional PCA-based T2 and SPE statistics for 20 test samples.

criteria are effective for choosing appropriate PCs for fault detection. 4.2. Tennessee Eastman Process. 4.2.1. Process Description. The Tennessee Eastman (TE) process has been widely used as a benchmark for testing various fault detection methods.19,20 This process consists of five operating units: an exothermic reactor, a flash separator, a reboiled stripper, a condenser, and a recycle compressor. Four reactants A, C, D, and E and an inert B are used to produce two products G and H and a byproduct F. In this case study, a total of 33 process variables, as listed in Table 2, are measured at a sampling rate of 3 min. A training data set was collected under the normal operating conditions, and 21 test data sets were generated under the 21 faulty operating conditions in Table 3. Each data set consists of 960 samples, corresponding to the operation

no.

variable name

1 2 3 4

A feed (stream 1) D feed (stream 2) E feed (stream 3) A and C feed (stream 4)

18 19 20 21

5

recycle flow (stream 8)

22

6 7 8 9

reactor reactor reactor reactor

23 24 25 26

stripper temperature stripper steam flow compress work reactor cooling water outlet temperature condenser cooling water outlet temperature D feed flow valve (stream 2) E feed flow valve (stream 3) A feed flow valve (stream 1) A and C feed flow valve (stream 4) compressor recycle valve purge valve (stream 9)

10 11

variable name

feed rate (stream 6) pressure level temperature

27 28

12

purge rate (stream 9) product separator temperature product separator level

13

product separator pressure

30

14

product separator underflow (stream 10) stripper level stripper pressure

31

15 16 17

29

32 33

fault type step step step step step step step random random random random random slow drift sticking sticking unknown unknown unknown unknown unknown constant

4.2.2. Fault Detection Results. A total of 33 PCs are produced by carrying out PCA on the training data set. The JITCPC and JITCQ criteria are used to select PCs for fault detection, respectively. The desired CPC is set as 90% for the JITCPC criterion. The third quartile (i.e., q = 0.75) is used for the JITCQ criterion. The primary and secondary T2 statistics are defined based on the selected and nonselected PCs, respectively. Fault detection capabilities of two T2 statistics are compared with the traditional PCA-based T2 and SPE statistics.1 The PCA-based T2 statistic is defined using the PCs corresponding to the 16 largest eigenvalues, and these PCs explain about 90.2% of the variance of the training data (i.e., CPV ≈ 90.2%). The remaining 17 PCs are used to define the SPE statistic. Control limits for all monitoring statistics are set to be 99%. Fault detection performance is quantified by fault detection rate (FDR) and false alarm rate (FAR). For each test data set, FDR and FAR are calculated by FDR = 100 ndf/ntf % and FAR = 100 ndn/ntn %, where ndn or ndf denotes the number of detected faulty samples in the normal operating condition (before the 160th sample) or in the faulty operating condition (after the 161st sample), ntn or ntf is the total number of actual normal samples or faulty samples (ntn = 160 and ntf = 800 for each test data set). A higher FDR and a lower FAR indicate the better fault detection performance. Table 4 shows the fault detection results of all monitoring statistics for 21 faulty data sets. It can be seen that 18 faults are obviously detected by all the monitoring statistics, with the exception of faults 3, 9, and 15. Faults 3, 9, and 15 are difficult to detect because they have less effect on process variables in Table 2. The primary T2 statistics of JITCPC and JITCQ have comparable FDRs for each of the 18 detectable faults, while the primary T2 statistic of JITCPC has slightly lower FARs than the primary T2 statistic of JITCQ. In addition, two primary T2 statistics both have higher FDRs than either the PCA-based T2

Table 2. Variables of the TE Process no.

faulty variable A/C feed ratio, B composition constant (stream 4), B composition, A/C feed ratio constant (stream 4) D feed temperature (stream 2) reactor cooling water inlet temperature condenser cooling water inlet temperature A feed loss (stream 1) C header pressure loss-reduced availability (stream 4) A, B, C feed composition (stream 4) D feed temperature (stream 2) C feed temperature (stream 4) reactor cooling water inlet temperature condenser cooling water inlet temperature reaction kinetics reactor cooling water valve condenser cooling water valve unknown unknown unknown unknown unknown the valve for stream 4 was fixed at the steady state position

separator pot liquid flow valve (stream 10) stripper liquid product flow valve (stream 11) stripper steam valve reactor cooling water flow valve condenser cooling water flow valve

stripper underflow (stream 11) F

DOI: 10.1021/acs.iecr.7b03840 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Article

Industrial & Engineering Chemistry Research Table 4. Fault Detection Results of All Monitoring Statistics for 21 Faulty Data Setsa traditional PCA (CPV ≈ 90.2%) T2

JITCPC (CPC ≥ 90%) primary T2

SPE

JITCQ (q = 0.75)

secondary T2

primary T2

secondary T2

fault

FDR (%)

FAR (%)

FDR (%)

FAR (%)

FDR (%)

FAR (%)

FDR (%)

FAR (%)

FDR (%)

FAR (%)

FDR (%)

FAR (%)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

99.1 98.4 1.0 50.9 23.8 99.0 100.0 97.0 1.5 27.9 52.5 98.4 93.8 99.9 1.3 12.3 79.5 89.1 11.8 31.1 41.4

0.0 0.6 0.0 0.6 0.6 0.0 0.0 0.6 1.9 0.0 0.0 0.0 0.0 0.0 0.6 3.1 1.3 0.6 0.0 0.6 0.0

99.9 95.1 3.0 99.9 23.9 100.0 100.0 86.3 2.0 36.1 61.6 90.3 95.1 98.9 2.0 36.3 95.9 90.5 16.5 52.8 48.8

1.9 0.6 1.3 2.5 2.5 0.6 1.3 0.6 1.9 0.6 3.1 1.3 0.6 0.6 0.0 0.6 3.1 3.8 0.6 1.9 4.4

100.0 98.4 4.3 100.0 100.0 100.0 100.0 98.0 3.6 89.8 81.4 99.9 95.3 100.0 7.5 92.8 97.3 90.4 94.4 91.0 56.6

0.6 0.6 6.9 0.6 0.6 0.0 0.0 0.0 5.6 0.0 0.6 0.6 0.0 0.6 4.4 3.8 1.9 1.3 0.6 0.6 3.8

100.0 98.4 3.8 100.0 100.0 100.0 100.0 98.1 3.3 89.6 80.1 99.9 95.3 100.0 6.3 92.4 97.0 90.3 93.3 91.0 55.6

0.6 0.0 5.0 1.9 1.9 0.0 0.0 0.0 5.0 0.0 0.6 0.6 0.0 0.6 2.5 3.8 0.6 0.6 1.9 0.0 1.9

99.9 98.5 4.8 100.0 100.0 100.0 100.0 98.4 4.0 90.8 82.3 99.9 95.3 100.0 8.8 93.5 97.3 90.5 94.4 91.3 57.1

1.9 0.0 9.4 2.5 2.5 0.0 0.0 0.6 4.4 0.0 1.9 2.5 0.6 0.0 4.4 5.0 3.8 2.5 1.3 1.9 3.8

99.6 96.9 3.3 58.0 28.6 99.8 89.0 97.1 2.3 51.1 50.3 98.0 94.8 99.8 4.1 45.5 88.6 89.9 44.4 63.5 47.5

0.0 0.6 0.6 0.6 0.6 0.0 0.0 1.3 1.3 1.3 0.6 1.3 0.6 1.3 1.3 0.6 0.6 0.0 0.6 0.0 2.5

a

The highest FDR for each fault is highlighted in bold.

performance when using the JITCPC criterion. For the JITCQ criterion, however, the primary T2 statistic has better fault detection performance than the secondary T2 statistic. This is ascribed to the different PC selection rules of JITCPC and JITCQ. For a sample x, the JITCPC criterion selects the first l*x PCs with the CPC larger than or equal to a threshold, while the JITCQ criterion selects the PCs with contributions larger than or equal to a quantile of contributions of all PCs. When using the JITCPC criterion, the PCs selected for all samples have the similar CPCs but different numbers. On the contrary, when using the JITCQ criterion, the PCs selected for all samples have the same numbers but different CPCs. When using the JITCPC criterion with CPC = 90%, the selected PCs for all samples have the similar CPCs of around 90%, and the nonselected PCs for all samples have the similar CPCs of around 10%. Because faulty samples often have larger Mahalanobis distances than training/normal samples, the JITCPC-based primary and secondary T2 statistics of faulty samples generally exceed the corresponding control limits obtained from training samples. If the JITCPC-based primary T2 statistic of a faulty sample exceeds the corresponding control limit, it is highly possible that the JITCPC-based secondary T2 statistic also exceeds the corresponding control limit. Consequently, the JITCPC-based secondary T2 statistic has the similar fault detection performance as the primary T2 statistic. When using the JITCQ criterion with q = 0.75, the CPCs of selected PCs and nonselected PCs for training samples and samples of fault 19 are shown in Figure 7. It can be seen that the CPCs of selected PCs and nonselected PCs vary significantly with samples. For most of training samples, the CPCs of selected PCs are below 80%, and the CPCs of nonselected PCs are above 20%. On the contrary, for most of faulty samples of fault 19, the CPCs of selected PCs exceed

or SPE statistic, especially for faults 5, 10, 11, 16, and 19−21. This proves that the proposed JITCPC and JITCQ criteria are better than the traditional CPV criterion in choosing appropriate PCs for fault detection. To further illustrate advantages of the JITCPC and JITCQ criteria, Figure 5 and Figure 6 show the monitoring charts for faults 10 and 19. In Figure 5a and b, the primary T2 statistics of JITCPC and JITCQ detect the occurrence of fault 10 at the 180th sample, which is at least 42 min (i.e., 14 samples) ahead of the PCA-based T2 and SPE statistics. The PCA-based T2 and SPE statistics detected less than 37% of faulty samples (i.e., FDR < 37%), with many undetected faulty samples in the period between 350th and 650th samples and at the end of the monitoring chart (see Figure 5c). However, the primary T2 statistics of JITCPC and JITCQ detected about 90% of faulty samples; besides, the primary T2 values of many faulty samples exceed very far from the control limit, which reveals that the fault 10 has a significant effect on process variables. As shown in Figure 6, the fault 19 was first detected by the primary T2 statistics of JITCPC and JITCQ at the 162nd sample, which is at least 27 min (about 9 samples) earlier than the PCA-based T2 and SPE statistics. Moreover, two primary T2 statistics have high FDRs of about 90%, and they clearly show that the fault 19 lasts from the 162nd sample until the end (see Figure 6a and b). However, the PCA-based SPE and T2 statistics detected only a very small part (less than 17%) of faulty samples; moreover, the detected faulty samples are discrete in time, and their T2 and SPE values are slightly exceed the control limits (Figure 6c). Figure 5 and Figure 6 illustrate how the primary T2 statistics of JITCPC and JITCQ outperform the traditional PCA-based T2 and SPE statistics by detecting faults more quickly and more accurately. As shown in Table 4, Figure 5 and Figure 6, the primary and secondary T2 statistics have almost the same fault detection G

DOI: 10.1021/acs.iecr.7b03840 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Article

Industrial & Engineering Chemistry Research

Figure 5. Monitoring charts for Fault 10. (a) JITCPC-based T2 statistics, (b) JITCQ-based T2 statistics, (c) PCA-based T2 and SPE statistics.

Figure 6. Monitoring charts for Fault 19. (a) JITCPC-based T2 statistics, (b) JITCQ-based T2 statistics, (c) PCA-based T2 and SPE statistics.

80%, and the CPCs of nonselected PCs are below 20%. For such faulty samples, the primary T2 statistic is easy to exceed the control limit obtained from training samples; however, the secondary T2 statistic may be inside the corresponding control limit. Consequently, Figure 6b shows that some faulty samples are detected by the JITCQ-based primary T2 statistic but not detected by the secondary T2 statistic. As shown in Table 4, the FDR of the JITCQ-based primary T2 statistic for fault 19 is about 94%, but the FDR of the JITCQ-based secondary T2

statistic is only 44%. For the same reason, the JITCQ-based primary T2 statistic has better fault detection performance than the secondary T2 statistic for other faults.

5. CONCLUSIONS In this paper, two new selection criteria named the just-in-time cumulative percent contribution (JITCPC) criterion and the just-in-time contribution quantile (JITCQ) criterion are H

DOI: 10.1021/acs.iecr.7b03840 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Article

Industrial & Engineering Chemistry Research

Figure 7. CPCs of (a) selected PCs and (b) nonselected PCs of JITCQ for training samples and samples of fault 19.



proposed for choosing appropriate principal components (PCs) for fault detection. These two selection criteria use PC contributions to the sample Mahalanobis distance to evaluate the importance of PCs for fault detection. The larger contribution the PC makes, the more important it is to detect a fault. In the JITCPC criterion, all PCs are sorted in descending order of contributions, and then the leading PCs with the cumulative percent contribution (CPC) larger than a predefined threshold (e.g., 85%, 90%, or 95%) are selected. The JITCQ criterion selects the PCs with contributions larger than the q (0 < q < 1, e.g., 0.25, 0.5, or 0.75) quantile of contributions of all PCs. The JITCPC or JITCQ criterion is applied to each sample separately. The selected PCs are not fixed but vary with samples to ensure that they can capture the important features of each sample. Two fault detection indices named the primary and secondary T2 statistics are defined using the selected and nonselected PCs, respectively. A fault detection method is then proposed. The implementation and performance of the proposed methods were illustrated by case studies in a simulation example and in the Tennessee Eastman process. The results indicate that both the JITCPC and JITCQ criteria are able to select appropriate PCs for fault detection. The primary T2 statistics of JITCPC and JITCQ detect faults more quickly and more accurately than the traditional PCAbased T2 and SPE statistics. The secondary T2 statistic of JITCPC has better fault detection performance than that of JITCQ. Although the JITCPC and JITCQ criteria are proposed for PCA in this paper, they can be easily extended to the PCArelated methods (e.g., kernel PCA) and other latent variable modeling methods (e.g., partial least-squares, Fisher discriminant analysis, locality preserving projections, canonical correlation analysis, etc.).



REFERENCES

(1) Nomikos, P.; MacGregor, J. F. Monitoring batch processes using multiway principal component analysis. AIChE J. 1994, 40, 1361− 1375. (2) Qin, S. J. Survey on data-driven industrial process monitoring and diagnosis. Annu. Rev. Control 2012, 36, 220−234. (3) MacGregor, J. F.; Cinar, A. Monitoring, fault diagnosis, faulttolerant control and optimization: Data driven methods. Comput. Chem. Eng. 2012, 47, 111−120. (4) Ge, Z.; Song, Z.; Gao, F. Review of recent research on data-based process monitoring. Ind. Eng. Chem. Res. 2013, 52, 3543−3562. (5) Luo, L.; Lovelett, R. J.; Ogunnaike, B. A. Hierarchical monitoring of industrial processes for fault detection, fault grade evaluation, and fault diagnosis. AIChE J. 2017, 63 (7), 2781−2795. (6) Luo, L.; Bao, S.; Ding, Z.; Mao, J. A variable-correlation-based sparse modeling method for industrial process monitoring. Ind. Eng. Chem. Res. 2017, 56, 6981−6992. (7) Malinowski, E. R. Factor Analysis in Chemistry; WileyInterscience: New York, 1991. (8) Cattell, R. B. The scree test for the number of factors. Multivariate Behav. Res. 1966, 1 (2), 245−276. (9) Kaiser, H. F. The application of electronic computers to factor analysis. Educ. Psychol. Meas. 1960, 20 (1), 141−151. (10) Wold, S. Cross validatory estimation of the number of components in factor and principal components analysis. Technometrics 1978, 20, 397−406. (11) Eastment, H. T.; Krzanowski, W. Cross-validatory choice of the number of components from a principal component analysis. Technometrics 1982, 24 (1), 73−77. (12) Carey, R. N.; Wold, N. S.; Westgard, J. O. Principal component analysis: an alternative to referee methods in method comparison studies. Anal. Chem. 1975, 47 (11), 1824−1829. (13) Osten, D. W. Selection of optimal regression models via crossvalidation. J. Chemom. 1988, 2, 39−48. (14) Valle, S.; Li, W.; Qin, S. J. Selection of the number of principal components: The variance of the reconstruction error criterion with a comparison to other methods. Ind. Eng. Chem. Res. 1999, 38, 4389− 4401. (15) Qin, S. J.; Dunia, R. Determining the number of principal components for best reconstruction. In IFAC DYCOPS’98, Greece, June 1998. (16) Akaike, H. Information theory and an extension of the maximum likelihood principle. In Proceedings 2nd International Symposium on Information Theory, Petrov, Caski, Eds., 1974; pp 267−281. (17) Malinowski, E. R. Determination of the number of factors and the experimental error in a data matrix. Anal. Chem. 1977, 49 (4), 612−617.

AUTHOR INFORMATION

Corresponding Author

*Phone: +86 (0571) 88320349; e-mail: [email protected]. ORCID

Lijia Luo: 0000-0002-6040-6147 Shiyi Bao: 0000-0001-9700-577X Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS This study was supported by the National Natural Science Foundation of China (no. 61304116). I

DOI: 10.1021/acs.iecr.7b03840 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Article

Industrial & Engineering Chemistry Research (18) Martin, E. B.; Morris, A. J. Non-parametric confidence bounds for process performance monitoring charts. J. Process Control 1996, 6, 349−358. (19) Downs, J. J.; Vogel, E. F. A plant-wide industrial process control problem. Comput. Chem. Eng. 1993, 17, 245−255. (20) Lyman, P. R.; Georgakis, C. Plant-wide control of the Tennessee Eastman problem. Comput. Chem. Eng. 1995, 19, 321−331.

J

DOI: 10.1021/acs.iecr.7b03840 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX