A variable-correlation-based sparse modeling ... - ACS Publications

May 30, 2017 - ... improve the fault diagnosis capability, hierarchical contribution plots consisting of group-wise and group-variable-wise contributi...
1 downloads 0 Views 3MB Size
Article

A variable-correlation-based sparse modeling method for industrial process monitoring Lijia Luo, Shiyi Bao, Zhenyu Ding, and Jianfeng Mao Ind. Eng. Chem. Res., Just Accepted Manuscript • Publication Date (Web): 30 May 2017 Downloaded from http://pubs.acs.org on May 30, 2017

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Industrial & Engineering Chemistry Research is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

A variable-correlation-based sparse modeling method for industrial process monitoring Lijia Luo, Shiyi Bao∗, Zhenyu Ding, and Jianfeng Mao Institute of Process Equipment and Control Engineering, Zhejiang University of Technology, Engineering Research Center of Process Equipment and Remanufacturing, Ministry of Education, Hangzhou 310014, China ABSTRACT: Dimensionality reduction techniques are widely used in data-driven process monitoring methods for extracting key process features from process data. Dimensionality reduction generally leads to information loss, and therefore may reduce the process monitoring performance. However, choosing an appropriate data projection matrix that can minimize the effect of dimensionality reduction on process monitoring performance is often challenging. In this paper, we introduce a method to construct a variable-correlation-based sparse projection matrix (VCBSPM) for reducing the dimension of process data and for building the process monitoring model. The VCBSPM is constructed on the basis of variable correlations, with each column of VCBSPM corresponding to a variable group consisting of highly correlated variables. The VCBSPM has two advantages: (1) it implements dimensionality reduction only for highly correlated variables, and therefore the negative effect of dimensionality reduction on process monitoring performance is significantly reduced; (2) the sparsity of VCBSPM not only improves its interpretability, but also enables it to eliminate redundant interferences between variables and to reveal meaningful variable



Corresponding Author. Tel.: +86 (0571) 88320349; fax: +86 (0571) 88320842

E-mail address: [email protected]. 1

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

connections. These advantages make the VCBSPM-based monitoring model well suited for fault detection and diagnosis. In addition, to utilize meaningful variable connections revealed by the VCBSPM to improve the fault diagnosis capability, hierarchical contribution plots consisting of group-wise and group-variable-wise contribution plots are developed for fault diagnosis. The hierarchical contribution plots can identify both the faulty groups corresponding to actual control loops or physical links in the process and the faulty variables responsible for the fault. The implementation, effectiveness and key features of the proposed methods are illustrated by an industrial case study. 1. Introduction Process safety and high product quality are crucial to industrial manufacturing processes. To guarantee process safety and product quality, it is necessary to develop dedicated process monitoring methods. With the applications of automation technologies in industrial processes, large amount of process data are collected and stored at high sampling rates. This expedites the development of data-driven monitoring methods. An effective data-driven monitoring method should, at first, be able to reveal significant information from massive data and, second, provide effective means to detect process faults caused by disturbances, improper operations, equipment malfunctions, etc. and, third, identify faulty process variables responsible for faults and diagnose the root causes of faults. The data-driven process monitoring methods have been intensively studied over the past decades.1-12 To extract significant information from high-dimensional process data, the existing data-driven monitoring methods often apply dimensionality reduction techniques, such as principal component analysis (PCA),3,5 partial least squares (PLS),6,7 locality preserving projections (LPP),10 2

ACS Paragon Plus Environment

Page 2 of 40

Page 3 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Fisher discriminant analysis (FDA),11 nonlocal and local structure preserving projection (NLLSPP),12 etc., to the process data set. After the dimensionality reduction, the original data space is divided into a reduced subspace that preserves significant process information, and a residual subspace that contains only insignificant information. The T2 and SPE statistics are then defined in the reduced subspace and in the residual subspace respectively for fault detection.13 Although the T2 and SPE statistics are two complementary statistics used for monitoring data variations in the whole data space, their joint fault detection performance is generally not comparable to the global T2 statistic defined in the original data measurement space, due to the loss of data information caused by dimensionality reduction. To reduce the negative effect of dimensionality reduction on fault detection, one should avoid including the significant data information into the residual subspace by decreasing the data information captured by the residual subspace as much as possible. This can be achieved by choosing an appropriate dimension of the reduced subspace or constructing a better projection matrix for dimensionality reduction. Various methods and criteria have been proposed for determining the dimension of the reduced subspace, for example, when using PCA for dimensionality reduction, the number of principal components can be chosen by the cumulative percent variance (CPV),14 variance of reconstruction error (VRE),15 cross validation,16 and so on. Most of them, however, do not correlate the dimension selection with the process monitoring performance explicitly. Consequently, the selected dimension of the reduced subspace or the corresponding projection matrix may be not optimal for process monitoring. Up to now, it lacks effective methods for choosing an appropriate dimension of the reduced subspace or constructing an optimal projection matrix to minimize the negative effect of dimensionality reduction on the fault detection performance. 3

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

In addition, the projection matrix obtained by conventional dimensionality reduction techniques (e.g., PCA) lacks sparsity, and thus each latent variable generated by dimensionality reduction is the combination of almost all original variables in the data set. This often complicates and confounds the interpretation of latent variables. However, interpretable latent variables are very useful for process monitoring, especially for fault diagnosis. Recently, new dimensionality reduction techniques have been proposed to obtain interpretable latent variables for process monitoring. For example, Yu et al.17 proposed the robust, nonlinear and sparse PCA (RNSPCA) for nonlinear fault diagnosis and robust feature discovery of industrial processes. Bao et al.18 proposed the use of sparse global-local preserving projections (SGLPP) for enhanced fault detection and diagnosis in industrial processes. Luo et al.19 used the sparse PCA (SPCA) to improve the fault detection and diagnosis capabilities. These methods produce a sparse projection matrix for dimensionality reduction to reveal meaningful patterns in the data. The resulting latent variables have good interpretability because they are composed of a small number of original variables. The key issue of RNSPCA, SGLPP and SPCA is to select the degree of sparsity (i.e., the number of nonzero elements) of the projection matrix. The sparsity is often controlled by some penalty parameters.17-19 However, specifying these penalty parameters are challenging tasks. Furthermore, the sparsity of projection matrix is achieved at the cost of severe information loss. More information may be divided into the residual subspace when using a sparse projection matrix for dimensionality reduction. This, in turn, may enlarge the negative effect of dimensionality reduction on fault detection and reduce the fault detection performance. Therefore, it is necessary to develop effective methods to obtain a sparse projection matrix while minimizing the information loss. 4

ACS Paragon Plus Environment

Page 4 of 40

Page 5 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

In this paper, to improve fault detection and diagnosis capabilities, we propose a method to construct a variable-correlation-based sparse projection matrix (VCBSPM) for reducing the dimension of process data. The VCBSPM can significantly reduce the negative effect of dimensionality reduction on the fault detection performance, because it implements dimensionality reduction only for the highly correlated variables. Besides, the VCBSPM is a sparse matrix, and the nonzero elements in each column of VCBSPM correspond to process variables with high correlations. This sparsity enables the VCBSPM to produce interpretable latent variables. Consequently, the process monitoring model, which is built by applying the VCBSPM to process data, is well suited for fault detection and diagnosis. Furthermore, the VCBSPM classifies process variables into different groups according to variable correlations. Each column of VCBSPM represents a variable group consisting of highly correlated variables. To utilize meaningful variable connections in each group to improve the fault diagnosis capability, hierarchical contribution plots are developed for fault diagnosis. The first level group-wise contribution plot is used to identify faulty groups, and then faulty variables are identified by the second level group-variable-wise contribution plot. The implementation, effectiveness and advantages of the proposed methods are demonstrated through a case study on the Tennessee Eastman process. The paper is organized as follows: the next section analyzes the effect of dimensionality reduction on fault detection, followed by the introduction of the method for constructing the variable-correlation-based sparse projection matrix (VCBSPM). Section 3 describes the VCBSPM-based process monitoring method and its components: the monitoring model, fault detection indices, and hierarchical contribution plots for fault diagnosis. In Section 4, the 5

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 40

effectiveness and advantages of the VCBSPM-based monitoring method are demonstrated by a case study on the Tennessee Eastman process. The conclusions are presented in Section 5. 2. Constructing the sparse projection matrix based on variable correlations 2.1. Effect of dimensionality reduction on fault detection PCA is one of the most commonly used dimensionality reduction techniques in process monitoring. We take PCA as an example to demonstrate the effect of dimensionality reduction on fault detection. For an m-dimensional dataset  = [ , ⋯ ,  ] ∈ ℛ× consisting of n samples, the PCA model is  =   +    + ⋯ +    =   

(1)

where tk (k = 1, …, m) are score vectors or principal components (PCs), pk are loading vectors,  ∈ ℛ× is the score matrix, and  ∈ ℛ × is the loading matrix. If retaining the first l (l < m) PCs, the dimension of the data set is reduced from m to l. Thus, Eq. (1) can be rewritten as

 = ∑    + ∑   

=    +    =    + 

(2)

where Tl and Pl are score matrix and loading matrix corresponding to the first l PCs, Tm-l and Pm-l are score matrix and loading matrix corresponding to the remaining m-l PCs, and E is the residual matrix.  =    , The original measurement space is divided into a principal component subspace (PCS),   =    = . For a sample x, the T2 and SPE statistics can be and a residual subspace (RS),  defined in the PCS and RS13   =   !  "#$ = % % =   

(3) (4)

where  =   ,  =   , % =  −   , and Sl is the covariance matrix of Tl. A sample is regarded as a faulty sample if either its T2 or SPE statistic exceeds the pre-specified confidence limit.

6

ACS Paragon Plus Environment

Page 7 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Suppose that the covariance matrix,

=  ⁄() − 1) of X is nonsingular. If the dimension of

the data set X is not reduced, the global T2 statistic for a sample x can be defined in the original measurement space as -. = 





(5)

Let  be the loading matrix in Eq. (1) that consists of all loading vectors of X, we have    =    = / and     ⁄() − 1) = 0 , where 0 = 1234(5 , … , 5 ) is a diagonal matrix with λi (i = 1, …, m) being the ith eigenvalue of the covariance matrix  ⁄() − 1). The global T2 statistic in Eq. (5) can be reformulated as -. = 



 =    



   

0 8   9 7 9    8 0  =   0    +   0    =   !  +   !     =  +  =  [  ] 7

(6)

where 0 = 1234(5 , … , 5 ), 0 = 1234(5 , … , 5 ), Pl and Pm-l are two loading matrices defined in Eq. (2). Eq. (6) reveals that the global T2 statistic is the sum of two parts:  and  . The  is exactly the T2 statistic defined in Eq. (3). The  is similar to the SPE statistic in Eq. (4), except that the  is defined in the form of the Mahalanobis distance while the SPE statistic is in the form of the Euclidean distance. The confidence region (i.e., in-control region) of the global T2 statistic is an ellipsoid in the measurement space, while the confidence region defined jointly by the PCA-based T2 and SPE statistics is the combination of two ellipsoids. In general, the global T2 statistic has better fault detection performance than the PCA-based T2 and SPE statistics, because it uses all data information for fault detection, without any loss of useful information. Differences between the global T2 statistic and the combination of the PCA-based T2 and SPE 7

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

statistics can be explained with a simple example in Fig. 1, which shows the confidence region for measurement data of two variables. In a two-dimensional space, the joint confidence region of T2 and SPE statistics is a rectangle, while the confidence region of the global T2 statistic is an ellipse. In Fig. 1a, Fig. 1b, Fig. 1d and Fig. 1e, the joint confidence region of the PCA-based T2 and SPE statistics is larger than the confidence region of the global T2 statistic; therefore, the faulty samples within the grey shaded area can be detected by the global T2 statistic, but they are undetected by neither the PCA-based T2 statistic nor the SPE statistic. On the contrary, the joint confidence region of the PCA-based T2 and SPE statistics is smaller than the confidence region of the global T2 statistic in Fig. 1i; therefore, the normal samples within the magenta shaded area are wrongly identified as faulty samples by the PCA-based T2 or SPE statistic, leading to false alarms. In Fig. 1c, Fig. 1f, Fig. 1g and Fig. 1h, the PCA-based T2 and SPE statistics cannot detect the faulty samples within the grey shaded areas, while they wrongly identify the normal samples within the magenta shaded areas as faulty samples. The example in Fig. 1 indicates that the dimensionality reduction may reduce the fault detection performance by decreasing the fault detection rate (because some faulty samples may be undetected) and increasing the false alarm rate (because some normal samples may be identified as faulty samples). It is easy to infer that the negative effect of dimensionality reduction on the fault detection performance become worse as the dimension of process data increases. There are two ways to alleviate the negative effect of dimensionality reduction on the fault detection performance. One way is to combine the T2 and SPE statistics in a better way to approximate the global T2 statistic, for example, the combined index proposed by Yue and Qin.20 The other way is to decrease the data variance captured by the residual subspace as much as possible. 8

ACS Paragon Plus Environment

Page 8 of 40

Page 9 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Clearly, when the data have very small variance in one dimensionality (namely this dimensionality corresponds to a very small eigenvalue of the covariance matrix of data), the confidence limit for this dimensionality is very small; therefore, classifying this dimensionality into the residual subspace has less effect on the fault detection performance. In some special cases, the data may have no variance in one dimensionality, for example, when more than two variables are perfectly correlated. In this case, the global T2 statistic cannot be defined by Eq. (5) because the covariance matrix S is singular; therefore, it is necessary to replace the global T2 statistic with the combination of T2 and SPE statistics, since the SPE statistic avoids the inversion of the covariance matrix. 2.2. Variable-correlation-based sparse projection matrix According to the aforementioned analysis, to decrease the effect of dimensionality reduction on fault detection as much as possible, we need only to cope with those variables with correlation coefficients close to 1. Based on this idea, a variable-correlation-based projection matrix can be constructed for dimensionality reduction. Furthermore, to improve the sparsity of the projection matrix and thus make it more suitable for fault diagnosis, we consider constructing a variable-correlation-based sparse projection matrix (VCBSPM). Let  = [ ,  , ⋯ ,  ] be a training data set consisting of n samples and m process variables, the procedure for constructing the VCBSPM is as follows: Step 1: Calculate the correlation coefficient matrix of the training data set X. Step 2: Specify a threshold value β (β < 1) for the correlation coefficient. To reduce the effect of dimensionality reduction on fault detection, the threshold β should be close to 1. Step 3: Classify process variables with cross-correlation coefficients larger than or equal to β into

9

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

significantly correlated variable groups (SCVGs). Classify the remaining process variables with cross-correlation coefficients smaller than β into a “non-significantly” correlated variable group (NSCVG). Step 4: Implement PCA on the data of each SCVG to compute loading vectors {p1, …, ph} and the corresponding eigenvalues {λ1, …, λz}, where z is the number of variables in the SCVG. For each SCVG, divide the original measurement space into a PCS and a RS according to two rules: (1) if k variables in the group are perfectly correlated (with cross-correlation coefficients of 1), assign the loading vectors corresponding to k zero eigenvalues into the RS, while assign the remaining z-k loading vectors into the PCS; (2) if the SCVG does not contain perfectly correlated variables, only assign the loading vector corresponding to the smallest eigenvalue into the RS, while assign other z-1 loading vectors into the PCS. Step 5: Enlarge the loading vectors of all SCVGs to m-dimensional sparse loading vectors by adding zero elements for the variables that are not included in each SCVG. For example, let p be a loading vector of the SCVG consisting of variables v3 and v6. One can enlarge the vector p : by adding m-2 zeros for v1-v2, v4-v5 and to be an m-dimensional sparse loading vector  v7-vm into p. Step 6: Merge those sparse loading vectors in the PCSs of all SCVGs into a main sparse loading  , while merge those sparse loading vectors in the RSs of all SCVGs into a matrix (MSLM)  ;. residual sparse loading matrix (RSLM) 

 for the NSCVG artificially. Each Step 7: Construct an auxiliary sparse loading matrix (ASLM) 

 is an m-dimensional sparse vector that contains only one nonzero element column of  10

ACS Paragon Plus Environment

Page 10 of 40

Page 11 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research