Sparse Robust Principal Component Analysis with Applications to

Jan 4, 2019 - Lijia Luo*† , Shiyi Bao† , and Chudong Tong‡. † Institute of Process Equipment and ... An, Yang, and PAN. 0 (ja),. Abstract: Sin...
0 downloads 0 Views 2MB Size
Subscriber access provided by Iowa State University | Library

Process Systems Engineering

Sparse robust principal component analysis with applications to fault detection and diagnosis Lijia Luo, Shiyi Bao, and Chudong Tong Ind. Eng. Chem. Res., Just Accepted Manuscript • DOI: 10.1021/acs.iecr.8b04655 • Publication Date (Web): 04 Jan 2019 Downloaded from http://pubs.acs.org on January 10, 2019

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 31 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Sparse robust principal component analysis with applications to fault detection and diagnosis Lijia Luo†, Shiyi Bao†, and Chudong Tong‡ †Institute

of Process Equipment and Control Engineering, Zhejiang University of Technology, Hangzhou 310014, China ‡Faculty of Electrical Engineering and Computer Science, Ningbo University, Ningbo 315211, China

ABSTRACT: Principal component analysis (PCA) is a popular method for the modeling and analysis of the high-dimensional data. In spite of its advantages, the classical PCA also has two drawbacks. First, it is very sensitive to outliers in the data. Second, it cannot yield interpretable PCs because most of the loadings are nonzero. To overcome these drawbacks, we propose a new PCA method that has the properties of robustness and sparsity at the same time, called sparse robust PCA (SRPCA). The robustness is achieved by taking a robust covariance matrix instead of the classical covariance matrix used in PCA. Meanwhile, an additional penalty is imposed on the number of nonzero loadings to achieve the sparsity. SRPCA is not only robust against outliers, but also can yield interpretable PCs. A robust process monitoring method is developed using SRPCA. A



Corresponding Author. Tel.: +86 (0571) 88320349. E-mail address: [email protected] 1

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 31

cumulative percent contribution criterion is proposed for selecting the optimal PCs for process monitoring. The selected PCs are used to define two fault detection indices. Based on the sparsity of PCs, two-level contribution plots are developed for fault diagnosis. This fault diagnosis method narrows down the faulty variables to active variables (with nonzero loadings) in a dominant PC contributing the most to a fault. The effectiveness and advantages of the proposed method are illustrated with a case study. 1. Introduction Principal component analysis (PCA) is one of the most popular techniques for dimension reduction of the multivariate data, and it has been widely used for the monitoring of industrial processes.1-4 PCA seeks a small set of linear combinations of the original variables, called principal components (PCs), which capture as much information of the data as possible. PCs correspond to directions in which the variance of projected data is maximized. Usually the first few PCs are enough to summarize the data well. By retaining these PCs in the PCA model, the original data space is divided into two orthogonal subspaces: a PC subspace preserving most of the data information, and a residual subspace containing the remaining data information. Fault detection indices, such as the T2 and SPE statistics,1,2 can be defined in the PC subspace and in the residual subspace respectively to monitor the data. Faults are detected and diagnosed if any data cause fault detection indices out of the corresponding control limits. In the classical PCA, PCs can be obtained by a spectral decomposition of the empirical covariance matrix of the data. The empirical covariance matrix, however, is sensitive to outliers,5 i.e., data points deviating from the majority of the data. PCs can be pulled toward outliers such that they fit 2

ACS Paragon Plus Environment

Page 3 of 31 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

outliers well but may not capture features of the normal data.5 Conclusions draw from the distorted PCs can be misleading. To eliminate the undue influence of outliers, it is necessary to consider the so-called robust PCA. Various approaches to robust PCA have been proposed over the past few decades.5-8 They can be classified into six groups: 6-8 (1) taking robust covariance estimators instead of the empirical covariance matrix,9,10 (2) projection pursuit techniques,11,12 (3) a combination of the ideas of groups (1) and (2),13 (4) robust subspace estimation,14 (5) elliptical and spherical PCA,15 (6) low-rank matrix approximation.16 Although robust PCA methods have been widely investigated, there are few applications of robust PCA to process monitoring.17,18 In practice, data collected in industrial processes frequently contain outliers. The faulty data, low-quality data, and data obtained during irregular operating phases (e.g., startup or shutdown periods) can all be viewed as outliers.18,19 Therefore, any data-driven process monitoring scheme should take into account the possible presence of outliers in the training data. For the process monitoring, outliers affect not only the monitoring model but also the fault detection indices and their control limits. Fault detection indices computed using a distorted monitoring model may not expose the differences between normal data and faulty data. Control limits of fault detection indices can also be inflated by outliers such that they cannot detect true faults. Therefore, to get reliable process monitoring results, influences of outliers on the monitoring model and on fault detection indices should be eliminated at the same time. Another drawback of the classical PCA is that it does not yield interpretable PCs. In general, a PC is a combination of all original variables. Such a PC is difficult to interpret because all variables are involved. To overcome this drawback, sparse PCA is developed to produce sparse PCs consisting of a small subset of original variables. The interpretation of a sparse PC is much easier, since only a few 3

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 31

variables need to be considered. The data features extracted by sparse PCs can also be useful in understanding intrinsic connections between variables. This makes sparse PCs very helpful for analyzing the multivariate data. Up till now, many different sparse PCA algorithms have been proposed, such as the sparse PCA algorithm based on the Lasso-penalized regression,20 the DSPCA algorithm using semidefinite programming relaxations,21 the GSPCA algorithm based on a greedy search technique,22 the generalized power (GPower) algorithm,23 and the DC-PCA algorithm.24 Some sparse PCA methods have been applied to process monitoring.25-27 For example, Yu et al.25 introduced a RNSPCA method and applied it to fault diagnosis and feature discovery of industrial processes. Luo et al.26 developed a sparse PCA method and two-level contribution plots for fault detection and diagnosis. Gajjar et al.27 proposed a real-time fault detection and diagnosis method using sparse PCA. It is shown that these sparse PCA-based methods have better fault detection and diagnosis performance than the classical PCA-based methods. All these methods, however, are not robust against outliers. Moreover, a difficult problem that is not well solved is how to select the optimal sparse PCs for process monitoring.28 The price for the sparsity of PCs is a loss in the explained variance of the data. To minimizing this loss, sparse PCs capturing more variance in the data are often selected with higher priority. However, such a selection criterion may be not suitable for process monitoring, because it does not correlate the PC selection to the monitoring performance. In this paper, a new PCA method combining the properties of robustness and sparsity is proposed. To achieve the robustness, a weighted covariance matrix is introduced to replace the empirical covariance matrix used in classical PCA. This weighted covariance matrix is robust against outliers by down-weighting contributions of outlying points to the data covariance. Meanwhile, the sparsity 4

ACS Paragon Plus Environment

Page 5 of 31 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

of PCs is achieved by penalizing the number of nonzero loadings. Based on the robust PCs, an outlier detection method is also developed to identify outliers in the data set. The proposed sparse robust PCA (SRPCA) method is further applied to process monitoring. The importance of PCs to process monitoring is evaluated by their contributions to the T2 value of a sample. A cumulative percent contribution (CPC) criterion is used to select optimal PCs for process monitoring. The selected PCs are used to define the T2 and SPE statistics for fault detection. Based on the sparsity of PCs, two-level contribution plots are developed for fault diagnosis. This diagnosis method narrows down faulty variables to active variables (with nonzero loadings) in a dominant sparse PC with the most contribution to a fault. The proposed methods are tested on the benchmark Tennessee Eastman process to illustrate the effectiveness and advantages. The rest of the paper is organized as follows. The SRPCA method and an outlier detection method are proposed in Section 2. Section 3 introduces the SRPCA-based process monitoring method and its components: the CPC criterion for PC selection, the T2 and SPE statistics for fault detection, and the two-level contribution plots for fault diagnosis. Section 4 illustrates the advantages of the proposed methods through a case study. Conclusions are presented in Section 5. 2. Sparse robust principal component analysis (SRPCA) 2.1 From PCA to SRPCA PCA is a popular dimension reduction technique to project the data onto a lower-dimensional subspace that contains most of the variance in the data. Let 𝑿 ∈ ℛ𝑛 × 𝑚 be a data matrix consisting of n samples of m variables. PCA aims to find a few orthogonal latent variables, called principal components (PCs), which point in directions maximizing the variance of the data projected on them. 5

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 31

Let 𝚺 be the covariance matrix of X, the first PCA direction is given by 𝒑 ∗ = arg max 𝒑𝑇𝚺𝒑

(1)

𝒑𝑇𝒑 = 1

The optimal direction 𝒑 ∗ corresponds to the first eigenvector of 𝚺. All PCs can be obtained by a spectral decomposition of 𝚺. The covariance matrix 𝚺 is highly sensitive to outliers. When the data set contains outliers, the outcome of classical PCA may not represent the data accurately. Hence there is a need for the robust PCA, which is able to well describe the majority of the data and yield reliable results in presence of outliers. Since PCs correspond to eigenvectors of the covariance matrix, a straightforward approach to robust PCA is the replacement of the classical covariance matrix by a robust covariance matrix. The classical covariance matrix of a data set 𝑿 ∈ ℛ𝑛 × 𝑚 is computed by 1

𝑛

𝚺 = 𝑛∑𝑖 = 1(𝒙𝑖 ― 𝝁0)(𝒙𝑖 ― 𝝁0)𝑇

(2)

where 𝒙𝑖 is the ith sample in X, and 𝝁0 = ∑𝑖𝒙𝑖 𝑛 is the mean of n samples. Eq. (2) can be reformulated as29 1

𝑛

𝚺 = 𝑛∑𝑖 = 1(𝒙𝑖 ― 𝝁0)(𝒙𝑖 ― 𝝁0)𝑇 1

[

(

𝑛

𝑛

)(

𝑛

= 𝑛2 𝑛∑𝑖 = 1𝒙𝑖𝒙𝑇𝑖 ― ∑𝑖 = 1𝒙𝑖 ∑𝑗 = 1𝒙𝑇𝑗 1

𝑛

1

𝑛

)]

= 2𝑛2∑𝑖,𝑗 = 1(𝒙𝑖𝒙𝑇𝑖 ― 2𝒙𝑖𝒙𝑇𝑗 +𝒙𝑗𝒙𝑇𝑗)

(3)

= 2𝑛2∑𝑖,𝑗 = 1(𝒙𝑖 ― 𝒙𝑗)(𝒙𝑖 ― 𝒙𝑗)𝑇 It can be seen from Eq. (3) that the classical covariance matrix can be easily affected by outliers that lie far from the data majority. To reduce the influence of outliers on the covariance matrix, we introduce weight coefficients to emphasize the contribution of close samples in comparison with distant samples (i.e., outliers). This leads to a weighted covariance matrix given by

6

ACS Paragon Plus Environment

Page 7 of 31 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

𝑛

𝚺 = ∑𝑖,𝑗 = 1𝑤𝑖𝑗(𝒙𝑖 ― 𝒙𝑗)(𝒙𝑖 ― 𝒙𝑗)𝑇 = 𝑿𝑇(𝑫 ― 𝑾)𝑿 = 𝑿𝑇𝑳𝑿

(4)

with weight coefficients 𝑤𝑖𝑗 defined as 𝑤𝑖𝑗 =

{

𝑑𝑖𝑗

if 𝑑𝑖𝑗 ≠ 0 0 if 𝑑𝑖𝑗 = 0 𝑑𝑖𝑗

(5)

where 𝑑𝑖𝑗 = ‖𝒙𝑖 ― 𝒙𝑗‖2 is the distance between samples and 𝑑𝑖𝑗 = min(∑𝑖𝑑𝑖𝑗,∑𝑗𝑑𝑖𝑗), 𝑾 ∈ ℛ𝑛 × 𝑛 is a weight matrix consisting of all the 𝑤𝑖𝑗, and 𝑫 ∈ ℛ𝑛 × 𝑛 is a diagonal matrix with diagonal elements being 𝐷𝑖𝑖 = ∑𝑗𝑤𝑖𝑗. In Eq. (4), the contribution of each pair of samples to the covariance is down weighted according to distances between samples. The larger the distance is, the smaller the weight coefficient will be. Because outliers are far from the majority of the data, they suffer the largest decreases in their contributions to 𝚺. The weighted covariance matrix 𝚺 is therefore robust against outliers. Replacing the classical covariance matrix 𝚺 in Eq. (1) with the weighted covariance matrix 𝚺 yields the optimization problem of robust PCA 𝒑 ∗ = arg max 𝒑𝑇𝚺𝒑

(6)

𝒑𝑇𝒑 = 1

Robust PCs are obtained by computing the eigenvectors of 𝚺. A PC is generally a combination of almost all the original variables, because the eigenvector of 𝚺 (i.e., the loading vector of PCA) often has few zero elements. Such a PC is difficult to interpret. In most applications, however, original variables have concrete physical meanings. Hence a PC is more interpretable if it consists of only a small number of original variables. In other words, to make PCs easier to interpret, sparse loading vectors with many elements exactly equal to zero are preferred. The sparsity can be achieved on loading vectors by adding an l0-norm penalty in the objective function of robust PCA, yielding the sparse robust PCA problem 7

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ϕ(γ) = max 𝒑𝑇𝚺𝒑 ― γ‖𝒑‖0

Page 8 of 31

(7)

𝒑𝑇𝒑 ≤ 1

where ‖𝒑‖0 denotes the l0-norm (i.e., the number of nonzero elements) of the vector 𝒑, and γ is a tuning parameter controlling the tradeoff between the explained variance and the sparsity. The larger γ is, the more the elements of 𝒑 are shrunken towards zero, and the more the explained variance is sacrificed. Some algorithms for solving the l0-norm penalized optimization problem Eq. (7) are available in the literature, such as the DSPCA algorithm,21 the GPowerl0 algorithm,23 and the DC-PCA algorithm.24 The GPowerl0 algorithm is adopted in this paper, because it is proved to have better performance.23 Before using the GPowerl0 algorithm, a factorization of the form 𝚺 = 𝑿𝑇𝑿 should be identified. The symmetric matrix L in Eq. (4) can be decomposed (by eigenvalue decomposition) 𝑇

into 𝑳 = 𝑸𝚲𝑸𝑇 = 𝑽𝑇𝑽 with 𝑽 = 𝑸𝚲1/2𝑸 , where Q is a orthogonal matrix and Λ is a diagonal matrix. Based on the factorization 𝚺 = 𝑿𝑇𝑽𝑇𝑽𝑿 = 𝑿𝑇𝑿, the problem Eq. (7) is transformed into ϕ(γ) = max 𝒑𝑇𝑿𝑇𝑿𝒑 ― γ‖𝒑‖0

(8)

𝒑𝑇𝒑 ≤ 1

For γ ≥ max ‖𝒙𝑘‖22 with 𝒙𝑘 being the kth column of 𝑿, the optimal solution of Eq. (8) is 𝒑 ∗ = 𝑘

𝟎.23 In the case of 0 < γ < max ‖𝒙𝑘‖22, the problem Eq. (8) can be cast in the following form23,30 𝑘

𝑚

[

]+

2 ϕ(γ) = max ∑𝑘 = 1 (𝒙𝑇𝑘𝒂) ― γ 𝒂𝑇 𝒂 = 1

(9)

where 𝑦 + = max{0,𝑦}. According to the GPowerl0 algorithm (Algorithm 3) in ref. 23, the solution of problem Eq. (9) is obtained by repeating 𝑚

[ (

2

)] + 𝒙𝑇𝑘𝒂𝒙𝑘

𝒂 = ∑𝑘 = 1 sign (𝒙𝑇𝑘𝒂) ― γ 𝒂 = 𝒂 ‖𝒂‖

(10)

until a stopping criterion is satisfied. Given a solution 𝒂 ∗ of Eq. (9), the solution of Eq. (8) is given by23 8

ACS Paragon Plus Environment

Page 9 of 31 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

𝑝𝑘∗

=

[sign((𝒙𝑇𝑘𝒂 ∗ )2 ― γ)] + 𝒙𝑇𝑘𝒂 ∗ , 𝑘 = 1,…,𝑚 𝑚 ∑𝑖 = 1[sign((𝒙𝑇𝑖𝒂 ∗ )2 ― γ)] (𝒙𝑇𝑖𝒂 ∗ )2 +

(11)

{ │(𝒙𝑇𝑘𝒂 ∗ )2 > γ}. Note

The set of active indices (i.e., indices of nonzero elements) of 𝒑 ∗ is 𝑰 = 𝑘

that solving Eq. (8) yields only the first sparse robust PC. The following deflation procedure is used for computing more sparse robust PCs 𝑿𝑗 + 1 = 𝑿𝑗 ― 𝒂𝑗∗ 𝒑𝑗∗ 𝑇

(12)

where 𝑿1 = 𝑿, and 𝒂𝑗∗ and 𝒑𝑗∗ are solutions corresponding to the jth sparse robust PC. Variables in a data set are often measured on different scales, which may affect the outcome of SRPCA. To eliminate such undue influence, the data are normalized (i.e., centered and scaled) prior to carrying out SRPCA. Because the data set 𝑿 ∈ ℛ𝑛 × 𝑚 may contain outliers, robust estimators of data location and scale5 are used for normalization, given by

(𝒙𝑖 ― 𝜇𝑙1) 𝒙𝑖 = 𝚲𝑄―1 𝑛

(13)

where 𝒙𝑖 is the ith sample in X, 𝜇𝑙1 is the l1-median of X computed by min ∑𝑖‖𝒙𝑖 ― 𝜇𝑙1‖1

(14)

𝜇𝑙1

and 𝚲𝑄𝑛 = diag(𝜎𝑄𝑛(1),…,𝜎𝑄𝑛(𝑚)) is a diagonal matrix holding the Qn scale estimates31 of m variables, defined as 𝜎𝑄𝑛(𝑘) = 2.2219{|𝑥𝑖𝑘 ― 𝑥𝑗𝑘|;𝑖 < 𝑗}(𝑞)

()

(15)

ℎ where 𝑥𝑖𝑘 is the value of the kth variable in the ith sample, 𝑞 = 2 with ℎ = [𝑛 2] +1.31 Main steps of the SRPCA algorithm are summarized as follows. Step 1: Normalize the data set X via Eq. (13). Step 2: Compute the weighted covariance matrix 𝚺 via Eq. (4). Step 3: Decompose the matrix 𝚺 as 𝚺 = 𝑿𝑇𝑿. 9

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 31

Step 4: Compute the solution of problem Eq. (8) via Eq. (10) and Eq. (11). Step 5: Obtain more PCs using the deflation procedure in Eq. (12). 2.2 Outlier detection Based on the PCs of SRPCA, a method is further developed to identify outliers in the data set 𝑿 = {𝒙1,𝒙2,…,𝒙𝑛}. Let 𝑷 ∈ ℛ𝑚 × 𝑚 be a set of m PCs of SRPCA. The Stahel–Donoho outlyingness of the ith sample xi is computed by7 out𝑖 = max

|𝒙𝑇𝑖𝒑 ― med𝑗(𝒙𝑇𝑗𝒑)| mad𝑗(𝒙𝑇𝑗𝒑)

𝒑∈𝑷

(16)

where p is a PC belonging to 𝑷, 𝒙𝑇𝑖𝒑 is the projection of xi on p, med𝑗(𝒙𝑇𝑗𝒑) denotes the median of

{𝒙𝑇𝑗𝒑,𝑗 = 1,…,𝑛},

and mad𝑗(𝒙𝑇𝑗𝒑) = med𝑗|𝒙𝑇𝑗𝒑 ― med𝑘(𝒙𝑇𝑘𝒑)| is the MAD (median absolute

deviation) scale estimator. After the out𝑖 for all samples have been computed, h samples with the smallest outlyingness are gathered in a h-subset of “least outlying” samples, denoted as 𝑯0 = {𝒙1,…, 𝒙ℎ}. The value of h is chosen in the range of 𝑛 2 ≤ ℎ < 𝑛, but n − h should be larger than the number of outliers in the data set X. It is often recommended to take h = 0.75n. The location and scatter estimators of samples in the h-subset 𝑯0 are computed by32 1

𝝁ℎ,0 = ℎ∑𝒙

𝑘

𝚺ℎ,0 =

(17)

𝒙 ∈ 𝑯0 𝑖

𝑐(ℎ,𝑛,𝑚)𝑠(ℎ,𝑛,𝑚) ∑𝒙 ∈ 𝑯 (𝒙𝑗 ℎ―1 𝑘 0

― 𝝁ℎ,0)(𝒙𝑘 ― 𝝁ℎ,0)𝑇

(18)

where 𝑐(ℎ,𝑛,𝑚) and 𝑠(ℎ,𝑛,𝑚) are two correction factors. The 𝑐(ℎ,𝑛,𝑚) makes 𝚺ℎ Fisher consistent when samples 𝒙𝑖 follow a normal distribution, given by33 𝑐(ℎ,𝑛,𝑚) = 𝑃(𝜒2

ℎ𝑛

𝑚+2

< 𝜒2𝑚,ℎ 𝑛)

(19)

where 𝜒2𝑚,ℎ 𝑛 is the ℎ 𝑛 cutoff point of the 𝜒2𝑚 distribution. The 𝑠(ℎ,𝑛,𝑚) is to reduce the small sample bias of 𝚺ℎ, and it can be obtained using the method of Pison et al.34 The squared robust 10

ACS Paragon Plus Environment

Page 11 of 31 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

distance (SRD) of a sample 𝒙𝑖 ∈ 𝑿 is computed by ―1( 𝒙𝑖 ― 𝝁ℎ,0) 𝑑2𝑖,0 = (𝒙𝑖 ― 𝝁ℎ,0)𝑇𝚺ℎ,0

(20)

The h samples with the smallest 𝑑2𝑖,0 are stored in a new h-subset 𝑯1. In analogy to 𝑯0, the location and scatter estimators (i.e., 𝝁ℎ,1 and 𝚺ℎ,1) of 𝑯1 are computed via Eq. (17) and Eq. (18). The SRDs 𝑑2𝑖,1 for all samples 𝒙𝑖 ∈ 𝑿 are computed via Eq. (20) using 𝝁ℎ,1 and 𝚺ℎ,1. Then, h samples with the smallest 𝑑2𝑖,1 are stored in a new h-subset 𝑯2. Repeating the above procedure to update the h-subset until the determinant of 𝚺ℎ no longer reduces. On convergence, SRDs for all samples are recomputed using the location and scatter estimators of the final h-subset. The SRDs of samples are compared with a threshold to identify outliers. A remaining problem is to determine the threshold. It is important to note that outlying samples are independent of 𝝁ℎ and 𝚺ℎ. Hence SRDs of outlying samples follow an F-distribution given by35 𝑚𝑛 ∗

𝑑2~𝑐(𝑛 ∗ ― 𝑚 + 1)𝐹𝑚,𝑛 ∗ ― 𝑚 + 1

(21)

where 𝑐 is a consistency correction factor, and 𝑛 ∗ is a parameter obtained by the adjusted asymptotic method in ref. 35. The threshold for SRD can be set as the 1–ε (typically ε = 0.025 or 0.01) cutoff point of the F-distribution in Eq. (21). Samples with SRDs beyond the threshold are identified as outliers. 3. Process monitoring using SRPCA 3.1 PC selection Let 𝑿 = [𝒙1,⋯, 𝒙𝑛] ∈ ℛ𝑚 × 𝑛 be a historical data set of an industrial process that contains n samples of m process variables. The data set X may be contaminated by outliers. Applying SRPCA on X yields a total of m sparse robust PCs, and then outliers in X can be identified using the method 11

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 31

in Section 2.2. A clean data set Xc is obtained after removing outliers from X. Note that PCs obtained by SRPCA may not be mutually orthogonal, and sparse patterns of some PCs may be partly overlapped. Therefore, to ensure a good process monitoring capability, it is necessary to select an optimal subset of PCs for each sample. A criterion is introduced below for the selection of PCs. The T2 statistic is one of the most widely used fault detection indices. Let 𝑷𝑙 ∈ ℛ𝑚 × 𝑙 be a set of l (l ≤ m) PCs, and denote the projection (i.e., scores) of a sample x on 𝑷𝑙 as 𝒚 = 𝑷𝑇𝑙𝒙. The T2 statistic of x can be defined in a lower-dimensional space spanned by 𝑷𝑙 𝑇2𝑙 = (𝒚 ― 𝒚)T𝑺 ―1(𝒚 ― 𝒚) 𝑛

𝒚= 𝑺=

𝑐 𝒚 ∑𝑖 = 1 𝑖

𝑛𝑐 ∑𝑖 = (𝒚 1 𝑖

𝑛𝑐 ― 𝒚)(𝒚𝑖 ― 𝒚)𝑇

(22)

𝑛𝑐 ― 1

where 𝒚𝑖 = 𝑷𝑇𝑙𝒙𝑖 is the projection of the ith sample 𝒙𝑖 in the clean data set Xc. Given a value of l, a larger 𝑇2𝑙 is better for fault detection. The goal of PC selection is therefore to choose l PCs yielding the largest 𝑇2𝑙 . In other words, (m-l) PCs with the least contributions to the T2 value are discarded. The contribution of a PC to the T2 value is computed by the MYT decomposition,36 which decomposes the T2 statistic as 𝑇2𝑚 = 𝑇2𝑚 ― 1 + 𝑇2𝑚.1,2,…,𝑚 ― 1

(23)

where 𝑇2𝑚 and 𝑇2𝑚 ― 1 are the T2 values computed using the first m and (m-1) PCs respectively, and 𝑇2𝑚.1,2,…,𝑚 ― 1 is the conditional contribution of the mth PC given the first (m-1) PCs. Similar to 𝑇2𝑚, 𝑇2𝑚 ― 1 is further decomposed as 𝑇2𝑚 ― 1 = 𝑇2𝑚 ― 2 + 𝑇2𝑚 ― 1.1,2,…,𝑚 ― 2

(24)

Continuing in this way gives a possible MYT decomposition 𝑇2𝑚 = 𝑇21 + 𝑇22.1 + 𝑇23.1,2 +⋯ + 𝑇2𝑚.1,2,…,𝑚 ― 1 12

ACS Paragon Plus Environment

(25)

Page 13 of 31 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Other MYT decompositions are obtained by changing the permutations of PCs in Eq. (25). Based on the MYT decomposition, the decrease in the T2 value is calculated by removing m-l PCs one by one. By minimizing the decrease of the T2 value, l optimal PCs are selected. Although the above PC selection approach is easy to implement for a given l, it remains the problem of choosing a proper value for l. To avoid this problem, the following cumulative percent contribution (CPC) criterion is used to select PCs. CPC criterion: For a sample x, compute the conditional contribution 𝑇2𝑚.1,2,…,𝑚 ― 1 (see Eq. (23)) of each PC, and discard the PC with the smallest 𝑇2𝑚.1,2,…,𝑚 ― 1. For the remaining m-1 PCs, compute the conditional contribution 𝑇2𝑚 ― 1.1,2,…,𝑚 ― 2 of each PC and discard the PC with the smallest 𝑇2𝑚 ― 1.1,2,…,𝑚 ― 2. Continue to discard PCs one by one until two constraints are satisfied 𝑇2𝑟

𝐶𝑃𝐶𝑥(𝑟) = 100𝑇2 % ≥ 𝐶𝑃𝐶 ∗ 𝑚

𝐶𝑃𝐶𝑥(𝑟 ― 1) = 100

𝑇2𝑟 ― 1 𝑇2𝑚

% < 𝐶𝑃𝐶 ∗

(26) (27)

where 𝐶𝑃𝐶𝑥(𝑟) is the CPC of r remaining PCs to the overall T2 value (i.e., 𝑇2𝑚) of x, and 𝐶𝑃𝐶 ∗ is a predefined threshold (e.g., 𝐶𝑃𝐶 ∗ = 80%). The remaining r PCs are selected for the sample x. 3.2 Fault detection Let 𝑷𝑥 ∈ ℛ𝑚 × 𝑟 be a matrix that consists of the PCs selected for a sample x. The T2 and SPE values of x are computed by 𝑇2𝑥 = (𝒙 ― 𝒙)𝑇𝑷𝑥𝑺 ―1𝑷𝑇𝑥(𝒙 ― 𝒙) 𝑆𝑃𝐸𝑥 = (𝒙 ― 𝒙)𝑇(𝒙 ― 𝒙)

(28) (29) 1

𝑛

𝑐 𝒙 𝑛 is the mean of 𝑛 training samples in the clean data set X , 𝑺 = where 𝒙 = ∑𝑖 = c 𝑐 𝑛𝑐 ― 1 1 𝑖 𝑐 𝑛

𝑐 ∑𝑖 = 𝑷𝑇(𝒙 ― 𝒙)(𝒙𝑖 ― 𝒙)𝑇𝑷𝑥 is a covariance matrix, and 𝒙 = 𝑷𝑇𝑥 + 𝑷𝑇𝑥𝒙 with 𝑷𝑥+ being the 1 𝑥 𝑖

13

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 31

pseudo-inverse of 𝑷𝑥. The 𝑇2𝑥 and 𝑆𝑃𝐸𝑥 are compared with control limits to determine whether or not a fault occurs. Control limits for the T2 and SPE statistics are determined using kernel density estimation (KDE). Consider the T2 statistic for example. First, the CPC criterion is applied to select PCs for each training sample in the data set Xc. Second, the T2 value of each training sample is computed using the selected PCs. The probability density function of the T2 statistic is estimated as37 1

𝑛

𝑐 𝐾 𝑓(𝑡) = 𝑛𝑐𝜃∑𝑖 = 1

𝑡 ― 𝑇2𝑖

( ) 𝜃

(30)

where K(∙) is a kernel function, θ is a tuning parameter, and 𝑇2𝑖 is the T2 value of the ith training sample. Finally, for a given significance level α, the control limit, 𝑇2, is computed by 𝑇2

∫ ―∞𝑓(𝑡)𝑑𝑡 = 1 ― 𝛼

(31)

The control limit, 𝑆𝑃𝐸, for the SPE statistic is determined in the same way as the T2 statistic. A fault is detected, if 𝑇2𝑥 > 𝑇2 or 𝑆𝑃𝐸𝑥 > 𝑆𝑃𝐸. 3.3 Fault diagnosis Sparse PCs often reveal meaningful connections between variables, which are useful for fault diagnosis. Active variables (corresponding to nonzero loadings) in each PC can be viewed as a variable group. Two-level contribution plots,26 consisting of the group-wise contribution plot (GWCP) and the variable-wise contribution plot (VWCP), are used for fault diagnosis. The GWCP is computed by 𝑐𝑘,𝑥 = (𝒙 ― 𝒙)𝑇𝒑𝑘𝑆𝑘𝑘𝒑𝑇𝑘(𝒙 ― 𝒙)

(32)

where 𝑐𝑘,𝑥 denotes the contribution of the kth group to the T2 value of a sample x, 𝒑𝑘 is the kth PC in the matrix 𝑷𝑥, and 𝑆𝑘𝑘 is the kth diagonal element of the matrix 𝑺 ―1 in Eq. (28). The group with the largest 𝑐𝑘,𝑥 most likely contains faulty variables. The faulty variable in this group is then 14

ACS Paragon Plus Environment

Page 15 of 31 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

identified using the VWCP that is computed by 𝑐𝑘𝑗,𝑥 = 𝑆𝑘𝑘(𝑥𝑗𝑝𝑘𝑗)2

(33)

where 𝑐𝑘𝑗,𝑥 is the contribution of the jth variable in the kth group to the T2 value of x, 𝑥𝑗 is the jth variable in x, and 𝑝𝑘𝑗 is the jth element of the kth PC. Due to the sparsity of PCs, only a few variables have nonzero 𝑐𝑘𝑗,𝑥. The variable with the largest 𝑐𝑘𝑗,𝑥 is regarded as the faulty variable. 3.4 Online process monitoring The online process monitoring procedure consists of the following steps. Stage I: Data analysis and modeling Step 1: Normalize the raw data set X via Eq. (13). Step 2: Compute PCs by performing SRPCA on X. Step 3: Detect outliers in X using the method in Section 2.2. Step 4: Remove outliers from X to get a clean data set Xc. Step 5: Compute T2 and SPE values for training samples in Xc. Step 6: Determine control limits for the T2 and SPE statistics via KDE. Stage II: Online monitoring Step 1: Normalize a new sample xnew using robust location and scale estimators of training data. Step 2: Select PCs for xnew using the CPC criterion. Step 3: Compute the T2 and SPE values of xnew via Eq. (28) and Eq. (29). Step 4: Compare T2 and SPE values with control limits. If a control limit is violated, detect a fault and go to Step 5; otherwise skip to Step 6. Step 5: Identify faulty variables using two-level contribution plots. 15

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 31

Step 6: Return to Step 1 and monitor the next sample. 4. Case study: Tennessee Eastman process 4.1 Process description The Tennessee Eastman (TE) process has been widely used for testing the performance of various process monitoring methods. Fig. 1 shows a flow sheet of the TE process.26,38 This process mainly consists of five operating units: a reactor, a condenser, a compressor, a separator and a stripper. Downs and Vogel38 have built mathematic models for the simulation of TE process. More details about the TE process and simulation models can be found in ref. 38. In this case study, 33 process variables are monitored, as shown in Table 1. A total of 960 reference samples are obtained under normal operation conditions. A contaminated training data set is generated by randomly replacing a small number of reference samples with outliers. Moreover, 20 disturbances in Table 2 are introduced separately to the TE process to generate the faulty data. There are 960 samples in each faulty data set, with a process disturbance started at the 160th sample. 4.2 Fault detection results Applying SRPCA to the contaminated training data set yields 33 PCs. The CPC criterion is used to select optimal PCs for each sample, with the threshold 𝐶𝑃𝐶 ∗ set as 80%. The 99% (i.e., α = 0.01 in Eq. (31)) control limits of the T2 and SPE statistics are computed from the T2 and SPE values of normal training samples using KDE. The results from SRPCA are compared with two other methods: classical PCA with an outlier filter called IRMCD-PCA, and a robust PCA method called ROBPCA.13 In the IRMCD-PCA method, outliers are first detected and removed from the contaminated training data set by using an outlier filter based on iterated reweighted minimum 16

ACS Paragon Plus Environment

Page 17 of 31 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

covariance determinant (IRMCD)39. The classical PCA is then applied to the clean training data for building a PCA model. The first 13 PCs, which explain about 82% of the variance in the training data, are retained in the PCA model. The conventional PCA-based T2 and SPE statistics1,2 are used for fault detection. In the ROBPCA method, the score distance (SD) and the orthogonal distance (OD),13 similar to the T2 and SPE statistics, are used for fault detection. For a centered sample x, the 𝑘 SD and OD are defined as 𝑆𝐷 = ∑𝑗 = 1𝑡𝑗 𝜆𝑗 and 𝑂𝐷 = ‖𝒙 ― 𝑷𝒕‖,13 where 𝑷 is a matrix

𝑇

consisting of the first k (here k = 13) PCs of ROBPCA, 𝒕 = 𝑷 𝒙 is the projection of x, tj is the jth element of t, and 𝜆𝑗 is the jth eigenvalue of ROBPCA. The 99% control limits for SD and OD are determined by the χ2-distribution.13 Fault detection performance is evaluated by two indices: fault detection rate (FDR) and fault detect delay (FDD). The FDR is the percentage of samples successfully detected as faulty during the faulty operating period. FDD is the time delay before a fault was detected. Fault detection results of three methods are shown in Table 3. FDRs for faults 3, 9 and 15 are very low, because they have little effect on the process. Three methods all have very high FDRs for faults that are easy to be detected, such as faults 1, 2, 4, 6-8, 12-14 and 18. For faults 5, 10, 11, 16, 19 and 20, however, the SRPCA has much better monitoring performance than ROBPCA and IRMCD-PCA. Of six fault detection indices, the SRPCA-based T2 statistic has the best detection performance for all faults. Compared to four fault detection indices based on ROBPCA and IRMCD-PCA, the FDR of the SRPCA-based T2 statistic increases by more than 62% for faults 5, by more than 35% for fault 10, and by more than 10% for faults 16, 19 and 20. In addition, faults 1, 10 and 17-19 are detected by SRPCA more quickly than ROBPCA and IRMCD-PCA. These results 17

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 31

indicate that the SRPCA method outperforms the ROBPCA and IRMCD-PCA methods, by virtue of higher FDRs and shorter FDDs. The main reason is that the fault detection capability of the SRPCA-based T2 statistic is improved significantly by using the CPC criterion for PC selection. The CPC criterion chooses PCs according to their importance to fault detection, which is obviously superior to the methods that choose PCs according to eigenvalues or the explained variance. Fig. 2 and Fig. 3 show monitoring charts of three methods for faults 10 and 19, which illustrate how SRPCA yields better fault detection results than ROBPCA and IRMCD-PCA. In Fig. 2, fault 10 is detected by SRPCA at the 180th sample, which is earlier than the fault detection time of ROBPCA and IRMCD-PCA at the 187th sample and at the 209th sample respectively. Besides, more than 90% of faulty samples (i.e., the samples collected after a fault occurred at the 160th sample) are successfully detected by the SRPCA-based T2 statistic (Fig. 2a), but four fault detect indices based on ROBPCA and IRMCD-PCA only detect less than a half of faulty samples (Fig. 2b and Fig. 2c). As shown in Fig. 3, fault 19 is detected by SRPCA immediately after it occurred, while it is detected by ROBPCA and IRMCD-PCA with time delays of 9 samples and 24 samples respectively. Again, most of faulty samples are detected by the SRPCA-based T2 statistic (Fig. 3a), whereas many faulty samples are not detected in the monitoring charts of ROBPCA and IRMCD-PCA (Fig. 3b and Fig. 3c). As a consequence, the SRPCA-based T2 statistic has a high FDR of 95% for fault 19, but fault detect indices based ROBPCA and IRMCD-PCA have smaller FDRs for fault 19 (Table 3). 4.3 Fault diagnosis results Once a fault has been detected, two-level contribution plots consisting of GWCP and VWCP are immediately used to diagnose variables responsible for the fault. Faults 6 and 11 are taken as two 18

ACS Paragon Plus Environment

Page 19 of 31 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

examples to demonstrate the implementation and effectiveness of two-level contribution plots. Fig. 4 shows two-level contribution plots at the 161st sample (the detection time of fault 6) for fault 6. Note that the GWCP is not plotted in Fig 4, because only one sparse PC is selected by the CPC criterion for the 161st sample of fault 6. This selected PC consists of two variables: v1 (A feed rate) and v25 (A feed flow valve), corresponding to the control loop of the A feed flow (see Fig. 1). The VWCP in Fig. 4 shows that variables v1 and v25 both make very high contributions to the T2 value. Hence these two variables are identified as faulty variables responsible for fault 6. Based on the physical connection between variables v1 and v25, it is easy to infer that fault 6 occurred in the control loop of the A feed flow. This diagnosis is consistent with the actual cause of fault 6 – A feed loss (see Table 2). Fig. 5 shows two-level contribution plots at the 166th sample (the fault detection time) for fault 11. A total of 7 sparse PCs are selected by the CPC criterion for the 166th sample of fault 11. The GWCP in Fig. 5a shows that the 2nd PC (group 2) contributes the most to the T2 value, whereas contributions of other PCs are very small. This indicates that the 2nd PC contains faulty variables. This PC consists of two variables: v9 (reactor temperature) and v32 (reactor cooling water flow valve), corresponding to the control loop of the reactor temperature. As shown in the VWCP in Fig. 5b, the variable v32 has a much larger contribution than the variable v9. The variable v32 is therefore identified as the faulty variable responsible for fault 11. Because variables v9 and v32 are closely related to the reactor cooling system (see Fig. 1), we may infer that fault 11 occurred within the reactor cooling system. This diagnosis is consistent with the actual cause of fault 11 – Random changes in the reactor cooling water inlet temperature (see Table 2). It is evident by means of above 19

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 31

two examples that two-level contribution plots provide reliable fault diagnosis by utilizing the meaningful variable connections revealed by parse PCs. 5. Conclusions In this paper, we propose a new PCA method that combines the properties of robustness and sparsity, called sparse robust principal component analysis (SRPCA). To achieve the robustness, SRPCA replaces the empirical covariance matrix used in classical PCA by a weighted covariance matrix that is robust against outliers in the data. Compared to the classical PCA, the outcome of SRPCA is less affected by outliers. Based on the robust PCs of SRPCA, an outlier detection method is also developed to identify outliers in a data set. In addition, to achieve sparsity on PCs, an l0 norm penalty is imposed on the loading vector of each PC to shrink most of loadings to be exactly zero. Different from the ordinary PC, a sparse PC consists of only a small number of original variables. The sparsity property enables PCs to reveal meaningful connections between variables, which makes the interpretation of PCs easier. This is especially attractive for applications in the cases where the result interpretation is important. An online process monitoring method is developed based on SRPCA. A cumulative percent contribution (CPC) criterion is used to select optimal PCs for process monitoring. In this criterion, the importance of PCs to process monitoring is evaluated by the PC contribution that is computed using the MYT decomposition of the T2 statistic. For each sample, a few important PCs with the CPC exceeding a threshold are selected. Based on the selected PCs, the T2 and SPE statistics are defined as two fault detection indices. Based on the meaningful variable connections revealed by sparse PCs, two-level contribution plots are developed for fault diagnosis. The fault diagnosis 20

ACS Paragon Plus Environment

Page 21 of 31 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

capability is improved by narrowing down faulty variables to the active variables in a dominant variable group (corresponding to a sparse PC) contributing the most to a fault. A case study on the Tennessee Eastman process demonstrates that: (1) the SRPCA-based method has superior monitoring performance, in spite of the presence of outliers in the training data; (2) the CPC criterion is capable of selecting optimal PCs for process monitoring; (3) two-level contribution plots yield reliable fault diagnosis results. Acknowledgements This study was supported by the National Natural Science Foundation of China (no. 61304116). References (1) Nomikos, P.; MacGregor, J. F. Monitoring batch processes using multiway principal component analysis. AIChE J. 1994, 40(8), 1361–1375. (2) Qin, S. J. Survey on data-driven industrial process monitoring and diagnosis. Annu. Rev. Control 2012, 36, 220–234. (3) Ge, Z. Review on data-driven modeling and monitoring for plant-wide industrial processes. Chemom. Intell. Lab. Syst. 2017, 171, 16–25. (4) Luo, L.; Bao, S.; Mao, J.; Ding, Z. Industrial process monitoring based on knowledge−data integrated sparse model and two-level deviation magnitude plots. Ind. Eng. Chem. Res. 2018, 57, 611−622. (5) Hubert, M.; Rousseeuw, P. J.; Van Aelst, S. High-breakdown robust multivariate methods. Stat. Sci. 2008, 23, 92−119. (6) Moller, S. F.; Von Frese, J.; Bro, R. Robust methods for multivariate data analysis. J. Chemometrics 2005, 19, 549–563. (7) Daszykowski, M.; Kaczmarek, K.; Heyden, Y. V.; Walczak, B. Robust statistics in data analysis-A review basic concepts. Chemom. Intell. Lab. Syst. 2007, 85, 203–219. (8) Serneels, S.; Verdonck, T. Principal component analysis for data containing outliers and missing elements. Comput. Stat. Data An. 2008, 52, 1712–1727. (9) Croux, C.; Haesbroeck, G. Principal component analysis based on robust estimators of the covariance or correlation matrix: Influence functions and efficiencies. Biometrika, 2000, 87(3), 603– 618. (10) Salibian-Barrera, M., Van Aelst, S.; Willems, G. PCA based on multivariate MM-estimators 21

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 31

with fast and robust bootstrap. J. Amer. Statist. Assoc. 2006, 101, 1198–1211. (11) Hubert, M.; Rousseeuw, P.; Verboven, S. A fast method for robust principal components with applications to chemometrics. Chemom. Intell. Lab. Syst. 2002, 60, 101–111. (12) Croux, C.; Ruiz-Gazen, A. High breakdown estimators for principal components: the projection-pursuit approach revisited. J. Multivariate Anal. 2005, 95, 206–226. (13) Hubert, M.; Rousseeuw, P. J.; Branden, K. V. ROBPCA: a new approach to robust principal component analysis. Technometrics 2005, 47, 64–79. (14) Maronna, R. A. Principal components and orthogonal regression based on robust scales. Technometrics 2005, 47, 264–273. (15) Locantore, N.; Marron, J. S.; Simpson, D. G.; Tripoli, N.; Zhang, J. T.; Cohen, K. L. Robust principal component analysis for functional data. Test 1999, 8, 1–73. (16) Wright, J.; Ganesh, A.; Rao, S.; Peng, Y.; Ma, Y. Robust principal component analysis: Exact recovery of corrupted low-rank matrices via convex optimization. Proc. Advances in Neural Information Processing Systems, Vancouver, British Columbia, Canada, 2009. (17) Chen, J.; Bandoni, J. A.; Romagnoli, J. A. Robust PCA and normal region in multivariate statistical process monitoring. AIChE J. 1996, 42(12), 3563−3566. (18) Tharrault, Y.; Mourot, G.; Ragot, J. Fault detection and isolation with robust principal component analysis. 16th Mediterranean Conference on Control and Automation, Ajaccio, France, 2008. (19) Bao, S.; Luo, L.; Mao, J.; Tang, D.; Ding, Z. Robust monitoring of industrial processes in the presence of outliers in training data. Ind. Eng. Chem. Res. 2018, 57, 8230−8239. (20) Zou, H.; Hastie, T.; Tibshirani, R. Sparse principal component analysis. J. Comput. Graph Stat. 2006, 15, 265−286. (21) d’Aspremont, A.; El Ghaoui, L.; Jordan, M. I.; Lanckriet, G. R. G. A direct formulation for sparse PCA using semidefinite programming. SIAM Rev. 2007, 49, 434−448. (22) Moghaddam, B.; Weiss, Y.; Avidan, S. Spectral bounds for sparse PCA: Exact and greedy algorithms. In Advances in Neural Information Processing Systems; MIT Press: Cambridge, MA, 2006; Vol. 18, pp 915−922. (23) Journée, M.; Nesterov, Y.; Richtárik, P.; Sepulchre, R. Generalized power method for sparse principal component analysis. J. Mach. Learn. Res. 2010, 11, 517−553. (24) Sriperumbudur, B. K.; Torres, D. A.; Lanckriet, G. R. G. A majorization-minimization approach to the sparse generalized eigenvalue problem. Mach. Learn. 2011, 85, 3−39. (25) Yu, H.; Khan, F.; Garaniya, V. A sparse PCA for nonlinear fault diagnosis and robust feature discovery of industrial processes. AIChE J. 2016, 62, 1494−1513. (26) Luo, L.; Bao, S.; Mao, J.; Tang, D. Fault detection and diagnosis based on sparse PCA and two-level contribution plots. Ind. Eng. Chem. Res. 2017, 56, 225−240. (27) Gajjar, S.; Kulahci, M.; Palazoglu, A. Real-time fault detection and diagnosis using sparse 22

ACS Paragon Plus Environment

Page 23 of 31 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

principal component analysis. J. Process Control 2018, 67, 112−128. (28) Luo, L.; Bao, S.; Mao, J.; Tang, D. Just-in-time selection of principal components for fault detection: The criteria based on principal component contributions to the sample Mahalanobis distance. Ind. Eng. Chem. Res. 2018, 57, 3656−3665. (29) Luo, L. Process monitoring with global−local preserving projections. Ind. Eng. Chem. Res. 2014, 53, 7696−7705. (30) d’Aspremont, A.; Bach, F. R.; El Ghaoui, L. Optimal solutions for sparse principal component analysis. J. Mach. Learn. Res. 2008, 9, 1269–1294. (31) Rousseeuw, P. J.; Croux, C. Alternatives to the median absolute deviation. J. Am. Stat. Assoc. 1993, 88(424), 1273–1283. (32) Rousseeuw, P. J.; Van Driessen, K. A fast algorithm for the minimum covariance determinant estimator. Technometrics 1999, 41, 212–223. (33) Croux, C.; Haesbroeck, G. Influence function and efficiency of the minimum covariance determinant scatter matrix estimator. J. Multivariate Anal. 1999, 71, 161−190. (34) Pison, G.; Van Aelst, S.; Willems, G. Small sample corrections for LTS and MCD. Metrika 2002, 55, 111−123. (35) Hardin, J.; Rocke, D. M. The distribution of robust distances. J. Comput. Graph. Stat. 2005, 14(4), 910–927. (36) Mason, R. L.; Tracy, N. D.; Young, J. C. Decomposition of T2 for multivariate control chart interpretation. J. Qual. Technol. 1995, 27, 99−108. (37) Martin, E. B.; Morris, A. J. Non-parametric confidence bounds for process performance monitoring charts. J. Process Control 1996, 6(6), 349−358. (38) Downs, J. J.; Vogel, E. F. A plant-wide industrial process control problem. Comput. Chem. Eng. 1993, 17, 245−255. (39) Cerioli, A. Multivariate outlier detection with high-breakdown estimators. J. Am. Stat. Assoc. 2010, 105(489), 147−156.

23

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 24 of 31

Table 1. Monitored variables in the TE Process.26 Reprinted with permission from ref 26. Copyright 2017 American Chemical Society. No. Variable Units No. Variable Units kscmh ℃ v1 A feed rate (stream 1) v18 Stripper temperature -1 kg h kg h-1 v2 D feed rate (stream 2) v19 Stripper steam flow kg h-1 kW v3 E feed rate (stream 3) v20 Compress work Reactor cooling water outlet kscmh ℃ v4 A and C feed rate (stream 4) v21 temperature Condenser cooling water outlet kscmh ℃ v5 Recycle flow (stream 8) v22 temperature kscmh % v6 Reactor feed rate (stream 6) v23 D feed flow valve (stream 2) kPa % v7 Reactor pressure v24 E feed flow valve (stream 3) % % v8 Reactor level v25 A feed flow valve (stream 1) ℃ v9 Reactor temperature v26 A and C feed flow valve (stream 4) % kscmh % v10 Purge rate (stream 9) v27 Compressor recycle value % v11 Product separator temperature ℃ v28 Purge valve (stream 9) Separator pot liquid flow valve % % v12 Product separator level v29 (stream 10) Stripper liquid product flow valve kPa % v13 Product separator pressure v30 (stream 11) m3 h-1 % v14 Product separator underflow v31 Stripper steam valve % % v15 Stripper level v32 Reactor cooling water flow valve Condenser cooling water flow kPa % v16 Stripper pressure v33 valve v17 Stripper underflow (stream 11) m3 h-1

24

ACS Paragon Plus Environment

Page 25 of 31 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Table 2. Disturbances in the TE process.38 Reprinted with permission from ref 38. Copyright 1993 Elsevier. No. Variable Type 1 A/C feed ratio, B composition constant (stream 4), Step 2 B composition, A/C feed ratio constant (stream 4) Step 3 D feed temperature (stream 2) Step 4 Reactor cooling water inlet temperature Step 5 Condenser cooling water inlet temperature Step 6 A feed loss (stream 1) Step 7 C header pressure loss-reduced availability (stream 4) Step 8 A, B, C feed composition (stream 4) Random 9 D feed temperature (stream 2) Random 10 C feed temperature (stream 4) Random 11 Reactor cooling water inlet temperature Random 12 Condenser cooling water inlet temperature Random 13 Reaction kinetics Slow drift 14 Reactor cooling water valve Sticking 15 Condenser cooling water valve Sticking 16 Unknown Unknown 17 Unknown Unknown 18 Unknown Unknown 19 Unknown Unknown 20 Unknown Unknown

25

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Table 3. Detection results of three methods for 20 faults. SRPCA

ROBPCA

IRMCD-PCA SPE SD OD T2 SPE Fault FDR FDD FDR FDD FDR FDD FDR FDD FDR FDD FDR FDD (%) (ns) (%) (ns) (%) (ns) (%) (ns) (%) (ns) (%) (ns) 1 100 0 97 7 100 2 99 4 99 7 100 2 2 99 12 97 28 99 12 94 43 98 14 96 34 3 7 / 2 / 3 / 4 / 1 / 4 / 4 100 0 17 8 56 3 100 0 23 2 100 0 5 100 0 36 0 29 12 38 0 24 0 23 0 6 100 0 100 1 100 0 100 1 99 8 100 0 7 100 0 61 0 100 0 35 0 100 0 100 0 8 98 19 94 22 98 19 87 22 97 25 87 19 9 6 / 2 / 2 / 5 / 2 / 4 / 10 91 19 46 24 37 26 56 33 30 57 28 48 11 82 5 20 119 38 6 67 5 42 6 75 5 12 100 1 98 2 95 7 97 1 98 2 91 2 13 95 37 91 49 95 40 94 42 94 48 95 37 14 100 0 27 30 75 1 100 0 100 0 100 0 15 10 / 7 / 3 / 5 / 1 / 4 / 16 94 6 38 21 31 35 84 7 14 310 29 18 17 97 19 72 32 41 37 71 27 77 28 96 21 18 91 77 89 87 88 97 90 84 89 87 90 82 19 95 1 12 10 6 11 80 9 8 165 24 24 20 91 62 62 72 50 84 78 62 31 85 52 84 “(ns)” denotes the number of samples. T2

26

ACS Paragon Plus Environment

Page 26 of 31

Page 27 of 31 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46

Industrial & Engineering Chemistry Research

Figure 1. Flow sheet of the Tennessee Eastman process.26 Reprinted with permission from ref 26. Copyright 2017 American Chemical Society. 27

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 2. Monitoring charts for fault 10: (a) SRPCA, (b) ROBPCA and (c) IRMCD-PCA.

28

ACS Paragon Plus Environment

Page 28 of 31

Page 29 of 31 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Figure 3. Monitoring charts for fault 19: (a) SRPCA, (b) ROBPCA and (c) IRMCD-PCA.

29

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 4. Variable-wise contribution plots (VWCP) at the 161st sample for fault 6.

30

ACS Paragon Plus Environment

Page 30 of 31

Page 31 of 31 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Figure 5. Two-level contribution plots at the 166th sample for fault 11: (a) GWCP, (b) VWCP.

31

ACS Paragon Plus Environment