Fault Detection and Diagnosis for Nonlinear and ... - ACS Publications

Sep 18, 2017 - per day.1 The control limits of process variables are often relaxed arbitrarily .... bivariate copula C: [0, 1] × [0, 1] → [0, 1] su...
0 downloads 0 Views 3MB Size
Subscriber access provided by Nanyang Technological Univ

Article

Fault Detection and Diagnosis for Nonlinear and NonGaussian Processes Based on Copula Subspace Division Xiang Ren, Kunping Zhu, Ting Cai, and Shaojun Li Ind. Eng. Chem. Res., Just Accepted Manuscript • DOI: 10.1021/acs.iecr.7b02419 • Publication Date (Web): 18 Sep 2017 Downloaded from http://pubs.acs.org on September 25, 2017

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Industrial & Engineering Chemistry Research is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 72

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Fault Detection and Diagnosis for Nonlinear and Non-Gaussian Processes Based on Copula Subspace Division

Xiang Ren1,2, Kunping Zhu3, Ting Cai4, Shaojun Li1* 1

Key Laboratory of Advanced Control and Optimization for Chemical Processes, Ministry of Education, East China University of Science and Technology, Shanghai, 200237, China 2

Department of Chemical and Biochemical Engineering, Rutgers University, Piscataway, NJ 08854, United States

3

Department of Mathematics, East China University of Science and Technology, Shanghai, 200237, China

4

Department of Environmental Sciences, Rutgers University, Piscataway, NJ 08854, United States

1

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Abstract A novel copula subspace division strategy is proposed for fault detection and diagnosis. High-dimensional industrial data are analyzed in two elemental subspaces: margin distribution subspace (MDS) modeled by joint margin distribution, and dependence structure subspace (DSS) modeled by copula. Highest density regions of two sub-models are introduced and quantified using probability indices. To improve the robustness of the monitoring index, a hyper-rectangular control boundary in MDS is designed, and the equivalent univariate control limits are estimated. Two associated contribution indices are also constructed for fault diagnosis. The interactive relationships among the root-cause variables are investigated via a proposed state chart. The effectiveness and superiority of the proposed approaches (double-subspace and multi-subspace) are validated using a numerical example and Tennessee Eastman chemical process. Better monitoring performance is achieved compared with some conventional approaches such as principal component analysis, independent component analysis, kernel principal component analysis and vine copula-based dependence description. The proposed multi-subspace approach fully utilizes univariate-based alarm data with a dependence restriction modulus, which is promising for industrial application. Keywords: fault detection, fault diagnosis, robust process monitoring, subspace division, copula

2

ACS Paragon Plus Environment

Page 2 of 72

Page 3 of 72

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

1. Introduction Large-scale chemical production needs a considerate number of control and measurement points to ensure process safety and product quality. Univariate monitoring strategy is still mainstream for modern chemical production, and causes an overwhelming number of alarms in distributed control system (DCS) and emergency shutdown device (ESD). In a continuous 18-day observation at a European refinery, an average of 14,250 alarms are triggered, with the peak of 26,650 per day.1 The control limits of process variables are often relaxed arbitrarily to alleviate flooding of alarm disturbances, despite the fact that this may cause misleading results and higher risks for accidents. At present, alarm flooding is becoming a threat to process safety, and accordingly, a crucial issue needs to be addressed in the field of process control.2-4 As an efficient strategy, dynamic risk assessment updates the constructed performance indicator (e.g., failure probability, economic loss) accounting for different factors, e.g., the number of historical events,5-8 the effect of process deviations,9-12 the effect of multiple shifts,13 etc. This strategy utilizes data mining techniques mainly to deal with univariate-based data. In contrast, multivariate monitoring strategy incorporates process variable variations into one or a few monitoring indices, and faults are identified according to only a small number of indices. It is worth noting that current multivariate process monitoring strategy ignores univariate-based alarms in DCS and ESD. Wider acceptance for current industry could be achieved if multivariate monitoring strategy utilizes sufficient univariate-based alarms database in DCS and ESD. This article aims to create such a multivariate statistical process monitoring system. Multivariate statistical process monitoring (MSPM), especially in fault detection, has

3

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

received much attention and development in recent years.14 One essential task is to deal with correlation behaviors among high-dimensional variables. The effectiveness for describing, simplifying or removing correlation plays a pivotal role in monitoring performance. Multivariate projection techniques are widely used in MSPM because of its low computational load, simplicity and effectiveness.15 This technique usually needs to solve an (constrained) optimization problem, aiming to find a set of vectors that capture the features of interest in the transformation space. As one of the most popular multivariate statistical projection techniques (MSPT), principal component analysis (PCA) finds a set of loading vectors that maximize the variance of each dimension in the transformation space. Linear uncorrelated features (scores) are obtained through orthogonal transformation. The process of decoupling and space division not only solves the problem of ill-conditioned covariance matrix,15 but also makes a clear distinction between noise in the data and the latent variables.16 It has been demonstrated that such a distinction offers more capability of detecting faults with small magnitudes than that in a complete space.17 Nevertheless, information retained in PCA, extracted from up to second order statistics such as mean, variance and covariance is not enough when the data under normal operating condition (NOC) follow non-Gaussian distribution. By comparison, independent component analysis (ICA) uses higher order statistics such as kurtosis and negentropy to optimize directions such that data projections onto those directions are as independent as possible.18 It should be emphasized that this method is only able to model linear and non-Gaussian processes, as the linear transformation for ICA is inadequate to capture nonlinear structures.19 To cope with the nonlinear relationships among process variables, kernel-based method was proposed.20,21 In this method, nonlinear

4

ACS Paragon Plus Environment

Page 4 of 72

Page 5 of 72

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

transformation is required to project the nonlinear data onto a higher dimensional space where the extracted features are linearly separable. Kernel function avoids direct nonlinear mapping and inner product calculation. However, the size of kernel matrix becomes rather large, thus computationally expensive as the number of samples increases. Besides, the kernel mapping is irreversible, making fault identification and diagnosis more difficult.22 Recently, Yu et al.23 proposed a nonparametric PCA by determining a set of sparse pseudoeigenvectors that maximize variations with respect to rank correlation measures such as Spearman and Kendall correlation. Rank correlation measures are capable of capturing more information (linear and monotonic nonlinear) regarding variable interaction compared with traditional Pearson correlation. Therefore, better monitoring performance can be achieved than traditional PCA. MSPT are good at capturing information of interest. However, important information is susceptible to getting lost in dimension reduction and feature extraction. In practice, it is difficult to find a transformation matrix that preserves the crucial information for detecting different types of faults. In this case, probabilistic modeling techniques show superiority. Probabilistic models can describe all statistical information of complex data. The problem is how to obtain accurate models. Given that potential dependence structures are usually difficult to quantify, and the associated high-dimensional optimization problems for multivariate distributions are also cumbersome. Thissen et al.24 constructed a kind of Gaussian mixture model (GMM) that is determined and optimized using Bayesian information criteria (BIC). Different types of restrictions to the shapes, volumes and sizes of the Gaussian components are predefined. As a result, the number of parameters to be optimized in the GMM is reduced remarkably. This method,

5

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

however, uses a few Gaussian components to describe the non-Gaussian behaviors of NOC data and pays little attention to nonlinear relationships hidden in the probabilistic model. Recently, Ren et al.25 proposed a vine copula-based dependence description (VCDD) approach for fault detection. The decomposition strategy of vine copula simplifies multivariate optimization of high-dimensional dependence structures into estimation of bivariate copulas in sparse matrix. Vine copula shows more flexibility accounting for conditional dependence, asymmetries and tail dependence. However, it requires the nonlinear relationships among variables to be monotonic. To address this issue, Yu et al.26 applied a rolling pin (RP) method27 for fault detection and diagnosis. For the NOC data, the relationships among variables become monotonic via a linear monotonization transformation. Multivariate Gaussian copula is then introduced in the transformation space. Herein, optimization is not required as rank correlation measures can uniquely determine the Gaussian copula.28 The final probabilistic model which is able to capture non-monotonic nonlinear relationships is obtained, according to the relationships of the probability distribution functions before and after the monotonicity transformation. The corresponding fault diagnosis strategy was established as well. Fault contribution to each variable is calculated based on some consecutive samples, leading to time delay in process monitoring. In addition, only dependence structures are considered, which may hardly provide sufficient information on the anomalies. Despite of the contribution analysis methods,29,30 other methods have been proposed and further investigated, including reconstruction-based method,31,32 classification-based method,33.34 dissimilarity analysis,35,36 etc. Fault isolation and diagnosis for complex processes are still challenging topics that need further research.

6

ACS Paragon Plus Environment

Page 6 of 72

Page 7 of 72

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

The main contributions of this work include: (1) Proposed the concept of subspace of copula to improve accuracy and robustness; (2) Constructed two monitoring indices for fault detection: margin deviation probability (MDP) and dependence variation probability (DVP); (3) Designed a robust hyper-rectangular control boundary in margin distribution subspace; (4) Constructed two contribution indices: margin deviation contribution (MDC) and dependence variation contribution (DVC), for nonlinear and non-Gaussian fault diagnosis; (5) Proposed state chart of pair variables for analyzing essential fault behaviors and further troubleshooting. The rest of the article is organized as follows. Section 2 provides a brief review of vine copula and highest density region, with an emphasis on the relationship between rank correlation measures and copula. Detailed description of the proposed copula subspace division monitoring approach is then given in Section 3. The validity and effectiveness of the approach is further illustrated using a numerical example (Section 4) and the Tennessee Eastman (TE) benchmark process (Section 5). Finally, conclusions are made in Section 6.

2. Preliminaries Gaussian distribution, extensively applied in industrial data analysis, preserves many attracting properties. Let X = [ X 1 , X 2 ]T denote a two-dimensional random vector that follows

multivariate Gaussian distribution. The structure is determined according to its probability density function (PDF) in Eq. 1 f X (x ) =

1  1  exp  − ( x − µ) T Σ −1 ( x − µ ) 1/ 2 2π | Σ |  2 

The vector of means µ = [ µ1 , µ 2 ]T provides the location of each margin, and the covariance

7

ACS Paragon Plus Environment

(1)

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 72

cov( X 1 , X 2 )  var( X 1 ) matrix Σ =  captures variations and correlations among variables. cov( , ) var( X 2 )  X X 2 1 

Pearson correlation is then used to measure the deterministic relationship between X 1 and X 2 with stochastic behaviors, defined by

ρ X ,X = 1

2

cov( X 1 , X 2 ) var( X 1 ) var( X 2 )

(2)

For bivariate Gaussian distribution, the best estimation for x2 is a linear function of x1, xˆ 2 = k1 x1 + k 2 (k1 ≠ 0) , where k1 = ρ X 1 , X 2

var( X 2 ) . Relevant proof is provided in the Supporting var( X 1 )

Information. Pearson’s ρ , involving up to two order statistics, is essentially a linear correlation measure. This measure is suitable to describe relationships among variables of Gaussian distribution. However, for more general multivariate distributions with non-Gaussian stochastic behaviors and nonlinear relationships, Pearson’s ρ becomes less efficient, and more appropriate correlation measures should be considered.37

2.1. Copula and rank correlation measure Sklar Theorem38 indicates that any multivariate distribution consists of two components: (1) marginal distribution functions, and (2) a linking copula function that provides the dependence structure. Let

FX 1 , X 2 ( x1 , x2 )

be the joint cumulative distribution function (CDF) of

X = [ X 1 , X 2 ]T . If the two marginal CDFs FX 1 ( x1 ) and FX 2 ( x2 ) are continuous, then there exists

a unique bivariate copula C: [0,1] × [0,1] → [0,1] such that FX ( x ) = C ( FX 1 ( x1 ), FX 2 ( x2 ))

(3)

f X ( x ) = f X 1 ( x1 ) ⋅ f X 2 ( x2 ) ⋅ c( FX 1 ( x1 ), FX 2 ( x2 ))

(4)

The joint PDF satisfies

8

ACS Paragon Plus Environment

Page 9 of 72

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Where c denotes the copula density function of C. Rank correlations, e.g., Kendall, Spearman, Gini and Blomqvist correlations, are applied to measure the monotone association, not necessarily linear, between two variables39,40 Herein, Kendall correlation measure is considered. Let [ X 1(1) , X 2(1) ]T and [ X 1( 2 ) , X 2( 2 ) ]T be independent and identically distributed random pairs with CDF FX 1 , X 2 ( x1 , x2 ) . The two pairs are concordant if ( X 1(1) − X 1( 2 ) )( X 2(1) − X 2( 2 ) ) > 0 and discordant if ( X 1(1) − X 1( 2 ) )( X 2(1) − X 2( 2 ) ) < 0 , and Kedall’s τ is defined as

τ X , X = Pr[( X 1(1) − X 1( 2 ) )( X 2(1) − X 2( 2 ) ) > 0] − Pr[( X 1(1) − X 1( 2 ) )( X 2(1) − X 2( 2 ) ) < 0] 1

2

1 1

= 4 Pr( X 1(1) > X 1( 2 ) , X 2(1) > X 2( 2 ) ) − 1 = 4 ∫ ∫ C (u1 , u2 )dC (u1 , u2 ) − 1

(5)

0 0

A larger value of Kendall’s τ indicates higher probability near a monotonic function of two random variables.39 According to Eq. 5, copula parameter (single-parameter case) is uniquely determined when bivariate copula structure and Kendall’s τ are given. Equations 3-4 can be extended to any higher dimensions. However, curse of dimensionality may occur in parameter estimation for conventional multivariate copulas.

2.2. Vine copula Vine copula is hierarchical graphic model that decomposes m-dimensional copula density into a product of ( m − 1)m / 2 bivariate conditional copulas.41 This tool is very promising for nonlinear and non-Gaussian data analysis. It is able to learn margins from different univariate distribution families, and a variety of dependence behaviors from different bivariate copula families, controlled by their structures and parameters. Consider m exchangeable random variables X = [ X 1 ,L, X m ]T with a joint PDF in Eq. 6

f X ( x ) = f X 1 ( x1 ) ⋅ f X 2 |X 1 ( x2 | x1 ) L f X m |X 1LX m −1 ( xm | x1 L xm −1 )

9

ACS Paragon Plus Environment

(6)

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 72

Here, each conditional PDF can be further factorized in an iterative manner via f X i |V ( xi | v ) = c X i ,V j |V− j ( FX i |V− j ( xi | v − j ), FV j |V− j (v j | v − j )) ⋅ f X i |V− j ( xi | v − j )

(7)

where V is a random vector representing the subset of X. V− j denotes V excluding V j . c X i ,V j |V− j denotes the density function of bivariate conditional copula, called pair copula. A compact form of vine factorization is given by m −1 m − i

m

f X ( x ) = ∏ f X i ( xi ) ⋅ ∏∏ c pair i =1 i =1 j =1 1 4243 1 4243 margins

(8)

vine copula

In Eq. 8, vine copula term has different expressions as the factorization in Eq. 7 varies. Typically, C-vine and D-vine receive much attention in application studies. More general vine copula expression can be illustrated and specified via graph theory.42 Two steps are required to calculate all the parameters in Eq. 8. In the first step, each marginal distribution is estimated separately. For a known distribution family f X i ( xi ; ψ i ) with parameters ψi (i = 1,L, m ) , ψ i can be easily estimated using xi1 ,L, xin , where n denotes the number of

observations. Otherwise, nonparametric method such as kernel density estimation is introduced. The univariate kernel estimator is formulated as 1 n  xi − xij   fˆX i ( xi ) = ∑ K nh j =1  h 



where K (⋅) is the kernel function, satisfying

+∞

−∞

(9)

K ( x )dx = 1 , and h is the smoothing parameter

or bandwidth. Several criteria have been reported43-45 for selecting h. In the second step, Bayesian inference criterion (BIC) is used to select the best structure of vine copula, as shown in Eq. 10 n m −1 m − i

BIC = −2∑∑∑ ln c pair (θ) + q ln n k =1 i =1 j =1

10

ACS Paragon Plus Environment

(10)

Page 11 of 72

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

where q denotes the number of parameters θ . To simplify computation, θ are estimated sequentially using maximum likelihood method. For sequential estimation and selection, conditional distribution in the current tree is a function of two conditional distributions in the previous tree calculated by46

FX i |V ( xi | v ) =

∂C X i ,V j |V− j ( FX i |V− j ( xi | v − j ), FV j |V− j (v j | v − j )) ∂FV j |V− j (v j | v − j )

(11)

In most cases, bivariate copula structures are controlled by a single parameter, e.g., Gaussian, Clayton, Gumbel, Frank and Joe copula, as shown in Figure 1. Such a parameter can be calculated simply based on the associated Kendall’s τ using Eq. 5. For two-parameter case, e.g., Student-t in Figure 1, a constraint optimization problem should be solved (maximum likelihood estimation) by θˆ pair = arg max pair θ

s.t.

θ

pair

n

∑ ln c

pair

(θ pair )

k =1

∈ D(θ

pair

(12)

)

where θ pair and D (θ pair ) denote parameters and definition domain of the pair copula, respectively. Each pair variable is fitted using different bivarite copula families, the best structure of the pair copula are then selected according to BIC. More details on vine copula model selection and parameter estimation are presented in the previous work.13,25 The simplified approach transforms high-dimensional optimization problem into estimation of a series of bivariate copulas listed in sparse matrix, therefore, curse of dimensionality can be alleviated.

Figure 1. Contour plots of six typical bivariate copulas, all with 0.5 Kendall’s τ .

11

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 72

2.3. Highest density region Following the construction of distribution model, a boundary needs to be defined within which the data can be considered normal, while those outside are small-probability events and defined as anomalies. For Gaussian distribution, this area can be easily calculated using Mahalanobis distance, i.e., a hyper-ellipsoid with confidence level 1 − α . However, when it comes to multivariate non-Gaussian distribution, Mahalanobis distance becomes invalid, and there are infinite numbers of areas that cover the normal data with 1 − α probability. To uniquely determine the area, highest density region (HDR)47 is introduced, and it represents the minimum 1 − α region, mathematically:

HDR ( fα ) = {x | f (x) ≥ fα }

(13)

Where f ( x ) is the PDF mapping of sample vector x, HDR ( fα ) represents the point set in the 1 − α HDR, and

fα is the α quantile of the one-dimensional random variable

f (X) .

Equation 13 can ensure the following: Pr[ x ∈ HDR ( fα )] = 1 − α

(14)

The boundary of HDR is estimated using density quantile approach with Monte-Carlo simulation. For instance, sample n observations, x1 , x 2 ,L , x n , from the m-dimensional random vector X, then calculate their joint PDF values f ( x i ) and put them in ascending order. The [α ⋅ n]th sample represents the equal-density value of 1 − α HDR, where [⋅] means rounding to

the nearest integer. Online monitoring may become time-consuming when n is too large. To address this problem, the probability value of HDR is estimated via searching a density quantile table created offline. More details can be seen in ref 25.

12

ACS Paragon Plus Environment

Page 13 of 72

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

3. Methodology According to copula theory, the shape of multivariate distribution is determined by both margins and dependence structures. In this work, we treat these two components independently, define them as margin distribution subspace (MDS) and dependence structure subspace (DSS), and propose a copula subspace division approach for fault detection and diagnosis. The motivation and advantages of the copula subspace division are discussed in the next few subsections, and some interesting results are also presented.

3.1. Margin distribution subspace and dependence structure subspace The most common idea for process monitoring is to construct a distribution model that takes into account all possible NOC data or extracted features. A proper control boundary is selected and served as a decision criterion for fault. It is generally easy to detect well the faults that cause large deviation from NOC area. For the small-magnitude faults, three main factors should be thoroughly considered to improve the monitoring performance, including completeness of training data, accuracy of distribution model, and appropriateness of control boundary, as illustrated in Figure 2.

Figure 2. Illustration of the three factors that affect monitoring performance for all possible faults and NOC data.

In practice, the above three factors may somewhat deviate from the ideal cases. First, it is difficult to guarantee the completeness of NOC data, note that collecting sufficient industrial data

13

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

sometimes is of high cost. Besides, external disturbances may appear frequently such as catalyst deactivation, season changing, device aging, etc. Therefore, it is unrealistic to specify all NOC data considering various fluctuations. Second, it may be difficult to model complex data (high dimensionality, nonlinear, non-Gaussian, etc.) with absolute precision. If such a modeling technique exists, it still makes no sense as training data are not complete. Third, any kind of control boundary will engender certain false alarm rate. The suitability of control boundary mainly depends on the distribution pattern of NOC data or even faulty data. In addition, the first two factors also affect the efficiency of the selected control boundary. Note that completeness of the training data is usually difficult to achieve, the distribution model created in a complete space may perform poorly for some types of faults. Specifically, if there is a NOC area not involved in the training dataset, then the points in that area will be mistakenly treated as faults. However, if the distribution models are established in different subspaces utilizing some latent characteristics, the results will likely become much better though the points in that area are not taken into account in training. For instance, PCA separates data information into two subspaces: principal component subspace (PCS) and residual subspace (RS). Mahalanobis distance in PCS and Euclidean distance in RS are accordingly designed, and served as two complementary monitoring statistics. The control area with 1 − α Mahalanobis distance defined in a complete space is not the same as the combination of 1 − α Mahalanobis distance in PCS and 1 − α Euclidean distance in RS. In essence, PCA is able to distinguish noises in the data and latent features, and gives flexible tolerance of random noises in RS.16 Such distinction and flexible tolerance make it possible to separate NOC data with various random noises from

14

ACS Paragon Plus Environment

Page 14 of 72

Page 15 of 72

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

small-magnitude faults that cause deviation in the PCS. In view of these issues, subspace division based on latent structures can be considered as a robust strategy when mismatch exists for the first two factors in Figure 2. The idea of copula can be extended to the field of process monitoring, where the ability to distinguish between learning of margins and learning of complex dependence structures is crucial. On the basis of vine copula model, the joint PDF can be divided into two components, accordingly, two distribution models are obtained m

f XMDS (x ) = ∏ f X i ( xi )

(15)

i =1

m −1 m −i

f XDSS ( x ) = ∏∏ c pair i =1

(16)

j =1

Equation 15 is the product of margins, termed as joint margin distribution. It reflects the stochastic behavior of each random variable removing multivariate dependence. In other words, the associated random variables are independently distributed in this subspace. While Eq. 16 can be used to capture multivariate dependence. Two probability indices based on HDR are then defined as MDP ( x ) = Pr{x ∈ HDR[ f XMDS ( x )]}

(17)

DVP ( x ) = Pr{x ∈ HDR[ f XDSS ( x )]}

(18)

where MDP (x ) and DVP (x ) represent margin deviation probability and dependence variation probability of sample x, respectively. Figure 3 illustrates the copula subspace division process. Random variables X 1 and X 2 follow non-Gaussian distribution, and their relationship is nonlinear. The 0.95 HDR of the joint PDF, an area surrounded by dashed lines, is crescent-shaped. Margins and the dependence

15

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

structure given by some copula function are herein considered separately. It is found that the transformed data in the two subspaces have some attracting properties. The joint margin ~ ~ distribution in MDS is non-Gaussian and the associated random variables X 1 and X 2 are

statistically independent. The information on nonlinear relationship is captured in DSS. In this subspace, original data are normalized with rank transformation, i.e., univariate CDF mapping. Therefore, dependence structure is analyzed in a unified unit square whose random variables in x-axis and y-axis directions, F ( X 1 ) and F ( X 2 ) are all uniformed over (0,1) . Evidently, the 0.95 HDR in the complete space is not equivalent to the combination of 0.95 HDR in the two subspaces. However, the double-subspace approach is superior as it grasps latent structures and simplifies distribution model by separating margin deviation from dependence variation. Further division of joint margin distribution is also considered. In this case, the equivalent control boundary in the MDS is designed. Quantitative analysis is presented in the next subsection.

Figure 3. Illustration of copula subspace division.

3.2. Systematic design for robust control boundary in MDS In MDS, random variables of the joint margin distribution are all statistically independent while their stochastic behaviors can be non-Gaussian. If the distribution model is accurate enough to describe the training data that cover all possible NOC areas, then the 1 − α HDR of the joint margin distribution is a good choice for control area. However, this condition, in reality, cannot be achieved. Note that discrepancy between training data and NOC data, mismatch between real

16

ACS Paragon Plus Environment

Page 16 of 72

Page 17 of 72

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

model and estimated model always exist. An exact boundary (solid line) and a 1 − α contour of the constructed model (dashed line) are depicted in Figure 4(a). Evidently, the constructed model fails to capture the structure in region 1, partly due to the lack of NOC data for training in that area. Once the testing data appear in region 1, they will be mistakenly considered as faults. In addition, due to accuracy limitation of modeling techniques, the 1 − α contour hardly keeps in align with the exact boundary, e.g., area 2. Thus, risks of type I and type II errors increase. Considering these issues, a general and simple control boundary is recommended. Given that the original 1 − α HDR of the joint margin distribution, practically, covers 100(1 − α )% sample points, the constructed control boundary equivalently should encompass 100(1 − α )% samples using a unified shape, e.g., rectangular, ellipse, etc. The 1 − α rectangular and the 1 − α ellipse are shown in Figure 4(b)-4(c) respectively. The regions that 1 − α rectangular/ellipse include but lies outside of 1 − α HDR of the joint margin distribution, may lead to type I and type II errors if 1 − α HDR strictly agrees with the exact NOC boundary. For the case where mismatch exists

between exact model and estimated model, the alternative 1 − α rectangular/ellipse is more appropriate. For instance, region 3 where information is incomplete for training, is regarded as NOC area using 1 − α rectangular, though regions 4 and 6 probably cause type I error using the unified control boundary. Due to the complexity of the three factors in Figure 2, a unified control boundary appears more suitable as it makes a systematic trade-off and is able to achieve relatively good performance accounting for the overall situation. The principles for designing such a control boundary is illustrated as follows.

17

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 72

Figure 4. Exact boundary and equivalent robust control boundary in MDS.

For a two-dimensional joint margin distribution f XMDS ( x1 , x2 ) , confidence level 1 − α 1,X2

equals the double integral over the HDR, i.e.,

1 − α = ∫∫ f XMDS ( x1 , x2 )dx1dx2 = ∫∫ f X 1 ( x1 ) f X 2 ( x2 )dx1dx2 1, X 2

(19)

Ω = HDR ( f αMDS ) = {( x1 , x2 ) | f MDS ( x1 , x2 ) ≥ f αMDS }

(20)





where

~ Given that mismatch always exists on Ω , a simple and standardized close area Ω is suggested

such that it equivalently covers 100(1 − α )% sample points in the joint margin distribution, as shown in Eq. 21 1 − α = ∫∫ f X 1 ( x1 ) f X 2 ( x2 )dx1dx2 = ∫∫~ f X 1 ( x1 ) f X 2 ( x2 )dx1dx2 Ω



(21)

~ There are infinite numbers of possible shapes for Ω . Herein for simplicity and standardization,

rectangular is considered, and the integration formula can be further decomposed as 1 − α = ∫∫~ f X 1 ( x1 ) f X 2 ( x2 )dx1dx2 = ∫ Ω

UB1 UB2



LB1 LB2

UB1 UB2 f X 1 ( x1 ) f X 2 ( x2 )dx1dx2 =  ∫ f X 1 ( x1 )dx1  ⋅  ∫ f X 2 ( x2 )dx2  LB LB  1   2 

(22) where ~ Ω = {( x1 , x2 ) | x1 ∈ [ LB1 ,UB1 ], x2 ∈ [ LB2 ,UB2 ]}

(23)

To uniquely determine the shape and location of the rectangular, a restriction is added for convenience UB1



LB1

f X 1 ( x1 )dx1 = ∫

UB2

LB2

where [ LBi ,UBi ] denotes the

f X 2 ( x2 )dx2 = 1 − α > 1 − α

(24)

1 − α HDR of the ith marginal distribution f X i ( xi ) , satisfying

18

ACS Paragon Plus Environment

Page 19 of 72

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

[ LBi ,UBi ] = HDR ( f1−

1−α

) = {xi | f ( xi ) ≥ f1−

1−α

} (i = 1, 2)

(25)

Equations 19-25 can be extended to any higher-dimensional case. An equivalent hyper-rectangular in MDS can be designed, with the following four characteristics: (1) It is simple and robust to the incompleteness of training data and the inaccuracy of distribution model; (2) The edge length of the constructed hyper-rectangular are, mathematically, equivalent to the univariate control boundaries of all process variables. It shows that the univariate-based method is responsible for capturing the margin deviation information. (3) The control boundary of the univariate-based method is provided theoretically. If the confidence levels of the univariate distributions all equal, then the value will be larger than that of the associated joint margin distribution. (4) The univariate-based probability index has crucial sensitive regions helpful for process monitoring, compared with the multivariate probability index, MDP. For simplicity and without loss of generality, Gaussian distribution is considered to demonstrate the forth point above. Herein, two random variables X 1 and X 2 are all scaled to zero mean and unit variance. If X 1 and X 2 are independent, then the 1 − α HDR of the joint Gaussian distribution is a circle, and the associated 1 − α robust rectangular becomes a square. Figure 5 displays the case when 1 − α = 0.99 . It is emphasized that the 1 − α square cannot be treated as a robust area when X 1 and X 2 are correlated, e.g., case with ρ X 1 , X 2 = 0.9 in Figure 5(b). Specifically, regions 1 and 2 will cause large type II errors using 1 − α square, which is mainly due to the loss of dependence information. Therefore, the robust rectangular boundary should be constructed exclusively in MDS without dependence information, as shown in Figure 5(a). Copula in DSS then confines NOC area to the ellipse area in Figure 5(b).

19

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 72

Figure 5. The 0.99 HDR and the univariate-based 0.995 HDR of (a) joint margin distribution (no dependence) and (b) joint distribution (with dependence) under Gaussian assumption.

The main focus is on the sensitive regions of two types of probability indices in Figure 5(a). Consider one standardized point ( a, b) , the univariate probability index is defined as a b UP = max ∫ f ( x1 )dx1 , ∫ f ( x2 )dx2  −b  −a 

(26)

and the multivariate probability index of the joint margin distribution is given by

MDP = ∫∫ 2

x1 + x 22 ≤ a 2 + b 2

f ( x1 ) f ( x2 )dx1dx2

(27)

Point ( a, b) is more sensitive to univariate probability index if UP > MDP . With the assumption that X 1 and X 2 follow standard Gaussian distribution, the sensitive region for univariate probability can be determined using Eq. 28 a b max  2 / π ∫ exp( −0.5 x12 )dx1 , 2 / π ∫ exp( −0.5 x22 )dx2  > 1 − exp[ −0.5( a 2 + b 2 )] 0 0  

(28)

Equation 28 corresponds to the shaded area in Figure 6. It shows that the two types of the probability indices have different sensitive regions. The sensitive region of UP is larger than that of MDP around the two control boundaries (dashed line). Given that faulty data may include several dimensions with normal deviation, if the number of faulty variables becomes extremely smaller than the whole dimensions, then fault information is likely overwhelmed using multivariate probability index. In this case, univariate probability index is more sensitive and appropriate for process monitoring.

Figure 6. Sensitive regions of two indices, UP and MDP, where A(1.24, 1.24), B(1.24, -1.24),

20

ACS Paragon Plus Environment

Page 21 of 72

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

C(-1.24, -1.24), D(-1.24, 1.24). The 0.99 HDR and the univariate-based 0.995 HDR of the joint margin distribution are also included.

3.3. Contribution indices and state chart for pair variables Information in MDS and DSS is further analyzed to identify variable contribution to a fault. For a real-time sample point x = [ x1 ,L, xm ]T , the contribution index of the ith variable to margin deviation is given by MDCi = Pr{xi ∈ HDR[ f X i ( xi )]} (i = 1,2,L , m)

(29)

where f X i ( xi ) is the ith marginal PDF. Margin deviation contribution (MDC) index is defined based on the probability value in HDR relating each margin, hence, the contribution for all variables has comparable scales even for non-Gaussian processes. However, this index is not sensitive to large-magnitude faults, mathematically: ∆MDCi  =0 MDlim i → ∞ ∆MD  i MD = X − µ i i i 

(30)

where µ i is the mean of X i , MDi is the distance from µi to X i , and ∆ is difference operator. As the rare-event probability is insensitive and difficult to capture, an approximate index is introduced in Eq. 31 MDCi =

xi − µ i

σi

(31)

Eq. 31 scales xi to zero mean and unit variance, where MDCi is quantified using a standardized quantile value and is sensitive to margin deviation. This index is appropriate for approximating Gaussian process or non-Gaussian processes with large-magnitude faults.

21

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 72

The dependence variation contribution (DVC) index for each variable is constructed based on

[

a pair dependence matrix cij ( FX i ( xi ), FX j ( x j ))

]

m× m

, where cij (⋅) denotes the bivariate copula of

X i and X j . For sample point x = [ x1 ,L, xm ]T , the contribution index of xi to dependence

variation is defined as

DVC i

∑ = ∑

m j =1, j ≠ i

m j =1, j ≠ i

DVPij ⋅ | τ X i , X j |

(1 − α )⋅ | τ X i , X j |

(i = 1,2,L , m )

(32)

where DVPij represents a probability value in HDR of the bivariate copula cij ( FX i ( xi ), FX j ( x j )) DVPij = Pr{( xi , x j ) ∈ HDR[cij ( FX i ( xi ), FX j ( x j ))]}

τ X ,X i

j

(33)

is the Kendall correlation between X i and X j , and 1 − α is a predefined confidence

level. The numerator of DVCi is a combination of DVPij ( j = 1,L, m, j ≠ i ) weighted by τ X i , X j , indicating that pair dependence variation with stronger correlation contributes more to the total dependence variation of the ith variable. The denominator of DVCi serves as a scaling term so that the DVC index is comparable to each variable. DVC index is able to deal with nonlinear and non-Gaussian processes, as it involves the probability of HDR appropriate for any distributions and rank correlation measures. In addition, DVC index is sensitive to dependence variation of high-dimensional process variables. For instance, assuming that xi deviates from NOC region, and the dependence structure between xi and x j ( j = 1,L, m, j ≠ i ) changes. This indicates that DVPij ( j = 1,L, m, j ≠ i ) increases or even to one, and accordingly, a large m will significantly

amplify this type of variation effect for the ith variable. Once process variables are identified as the root cause of a fault, it is of great value to further analyze the abnormal relationships among pair variables. For instance, assume xi , x j and xk are three variables that contribute most to a fault, then we want to know how this abnormal event impacts the identified variables, as well as their interaction behaviors, because it enables us to 22

ACS Paragon Plus Environment

Page 23 of 72

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

better understand the fault effects or faulty variable behaviors. For a pair variable, there are two conditions for margin 1 and margin 2 respectively, i.e., Deviation and No Deviation, and two conditions for the dependence structure, i.e., Variation and No Variation. All combinations of different conditions for the three factors lead to eight possible states as illustrated in Table 1, where √ denotes Deviation or Variation, and × denotes No Deviation or No Variation. If margin 1 and margin 2 locate in the normal region, and the dependence structure also retains normal, then the pair variable is in normal state, denoted as 0, and so on.

Table 1. Illustration of Eight Possible States (√ Denotes Deviation or Variation and × Denotes No Deviation or No Variation)

According to Figure 3, the joint distribution of pair variables is decomposed into two margins and one bivariate copula, and the associated HDR serves as a control boundary for each component. Note that bivariate copula is defined on a unit square whose domain of definition is [0,1] × [0,1] , the control boundary of each margin is added into the unit square, and a state chart

can be defined. Figure 7 provides a state chart where 1 − α HDR of the bivariate copula is leaf-shaped. Eight states, corresponding to Table 1, are formed. It is observed that (1) Deflected variables not exceeding their margin boundaries probably cause dependence variation, e.g., state 3; (2) pair variables retaining strong correlation likely deviate from their margin boundaries significantly, e.g., state 2, 4 and 5. In practice, the number of possible states sometimes is less than eight, depending on the relative position between margin boundaries and the 1 − α HDR of the bivariate copula. It should be noted that, the shape of HDR for bivariate copula varies as shown in 23

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 1, and MDC i (i = 1, 2) and DVPij are suitable indices for quantifying the state of pair variables.

Figure 7. State chart for pair variables in the unit square.

3.4. Fault detection and identification A Flowchart of the proposed approach is depicted in Figure 8. Detailed procedures are presented as follows. Firstly, information on NOC data is divided into margin distribution subspace and dependence structure subspace. Joint margin distribution model and copula model are then constructed in each subspace respectively. The control boundaries are further estimated using HDR. That is, for 1 − α confidence level, we have 1 − α HDR of the joint margin distribution and 1 − α HDR of the copula model. On the other hand, Kendall’ τ correlation matrix and pair dependence matrix of the process data are constructed and calculated, serving as an offline modeling part for fault identification. When the real-time sample x comes, MDP and DVP monitoring indices are computed. If x lies outside the 1 − α HDR of the joint margin

distribution, i.e., MDP(x) > 1 − α , then it is believed that some dimension(s) of x significantly deviates from its normal region. If x lies outside the 1 − α HDR of the copula model, i.e., DVP (x) > 1 − α , then it is believed that dependence structure of x changes. Any of the above two

situation indicates a fault. To identify the root cause variables, MDC and DVC indices are computed. Any of these two indices makes sense only when small-probability event occurs in the corresponding subspace. Variables with the largest indices are accordingly considered as the root-cause variables. Finally, the states of the pair variables are evaluated. 24

ACS Paragon Plus Environment

Page 24 of 72

Page 25 of 72

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Figure 8. Flowchart of the proposed copula double-subspace process monitoring approach.

It should be emphasized that the offline models for fault detection and identification are different, though both of which are constructed on the basis of two subspaces. DVP in DSS is the 1 − α HDR of a multivariate copula c1,L,m ( FX 1 ( x1 ),L, FX m ( xm )) such as the aforementioned vine

copula model, while DVC is estimated through a pair dependence matrix model, where its component DVPij represents the 1 − α HDR of the bivariate copula cij ( FX i ( xi ), FX j ( x j )) . In addition, DVP is a probability ranging from zero to one which is sensitive to NOC data and faulty data. DVC is actually not the decomposition form of DVP index as it combines information on the associated bivariate copulas for variable xi . This index is very sensitive to NOC variables and root-cause variables in a specific time point. The denominator of Eq. 32 is a critical value with 1 − α HDRs of all bivariate copulas related to xi . When a fault triggers, only one or much more

DVP values may exceed 1 − α , and the numerator can be smaller or larger than the denominator.

As a result, DVC for root-cause variables can be smaller or larger than one. The copula double-subspace approach can be further improved using a robust control boundary, where MDP in MDS is replaced by a high-dimensional version of UP defined in Eq. 26. The improved approach is more robust to incompleteness of training data and model mismatch, and may have more sensitive regions for some types of faults. In addition, it can fully utilize the univariate-based alarm data stored in DCS and ESD. As shown in Figure 9, compared with the univariate process monitoring strategy, e.g., Shewhart control chart48, MSPM takes into account correlations among high-dimensional variables, thus better monitoring performance can be 25

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

guaranteed. However, traditional MSPM generally requires constructing a multivariate statistical model, and the univariate-based monitoring information is of little use in fault detection and diagnosis. The proposed strategy makes full use of the univariate-based monitoring information which is very common in industrial fields. It employs additional dependence restriction modulus to ensure good performance for process monitoring compared with traditional MSPM strategy.

Figure 9. Characteristics and development of the modeling process in statistical process monitoring.

4. Illustrative Example A numerical example is discussed to illustrate the effectiveness of the proposed copula double-subspace (CDS) approach. The data are simulated and provided in the Supporting Information of ref 25. Since the data were originally used for multimode process monitoring, we equivalently focus on two independent processes. Two groups of training data are extracted from different operating modes, each of which has 200 samples in three dimensions. That is, group 1 (for process 1) corresponds to the 200 training data of mode 3, while group 2 (for process 2) comes from the 200 training data of mode 1. 200 testing data are assigned for each process. For process 1, Fault 1 (a bias error of four added to variable 1) occurs at samples 51-100. The process is in normal state at samples 1-50 and samples 101-200. For process 2, Fault 2 (bias error of 0.5 along with a nonlinear drifting error added to variable 2) occurs at samples 101-200. To detect two types of faults, two independent models are established based on the training data in two groups. The training data for two processes are highly non-Gaussian. Heavy lower tail dependence exists 26

ACS Paragon Plus Environment

Page 26 of 72

Page 27 of 72

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

between variables 1 and 2, heavy upper tail dependence exists between variables 1 and 3, while there is no significant tail dependence between variables 2 and 3. Fault 1 is a typical fault reflecting significant margin deviation, while Fault 2, with nonlinear drifting error, is a representative of dependence variation. Figure 10 depicts the monitoring results of PCA, VCDD and the proposed CDS. Due to large magnitude deviation, all methods perform well on Fault 1. For Fault 2, both PCA-T2 and PCA-SPE fail to detect the initial anomalies (e.g., samples 101-140), and compared to VCDD, a relatively large false alarm rate is achieved. We have discussed in ref 25 that the good performance of VCDD is mainly due to the accurate modeling process. The proposed CDS method has almost the same good performance with VCDD. Two complementary indices, MDP and DVP, show that Fault 2 causes dependence variations initially (samples 101 to 170). Later, dependence variation intensifies, leading to significant magnitude deviation. The false alarm rate of CDS appears somewhat higher than VCDD. This is generally inevitable in exchange for robustness. Costs and advantages of the copula subspace division strategy will be analyzed in Section 5 using a practical industrial process.

Figure 10. Fault detection charts for Fault 1 and Fault 2, using PCA-T2, PCA-SPE, VCDD-GLP, CDS-MDP, and CDS-DVP. (0.95 confidence level)

Figure 11 shows variable contribution to T2, SPE, MDC, and DVC at sample 73 for Fault 1. Sample 73 is randomly selected, and it can represent the general performance of testing samples. The proposed contribution index makes sense only when the corresponding monitoring index 27

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

exceeds the control limit (CL). For instance, if DVP>CL, then a maximum DVCi indicates variable i contributes most to dependence variation. Herein, all four indices in Figure 11 are valid. T2

correctly identifies variable 1 as the root-cause variable for deviation in PCS. In RS, SPE mistakenly identifies variable 2 as the root cause variable, though the contribution of variable 1 is relatively large as well. In contrast, the proposed two indices, MDC and DVC, successfully identify the actual root cause variable. The high contribution of variable 1 also shows that large magnitude deviation may lead to dependence variation.

Figure 11. Variable contribution plots at sample 73 for Fault 1, using PCA-T2, PCA-SPE, CDS-MDC, and CDS-DVC.

Figure 12 depicts the real-time variable contribution for Fault 2. It shows that MDCi and DVCi are not sensitive to anomalies in the 200 testing samples. In other words, it is difficult to

provide a line (CL) in real-time MDC and DVC plots to separate the normal and faulty sample data in each dimension. However, they are good at distinguishing faulty contribution of each variable for a specific sample. To illustrate the efficiency of the proposed contribution indices, we select three consecutive samples (120-122) and analyze their performance mathematically. In Figure 13, PCA-SPE and CDS-DVC are two valid contribution indices. It shows that variable 1 misleadingly yields the highest contribution to SPE for all three samples. In contrast, DVC successfully identifies variable 2 as the root cause for samples 120 and 122, which is due to the ability of capturing complex dependence variation using various bivariate copulas.

28

ACS Paragon Plus Environment

Page 28 of 72

Page 29 of 72

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Figure 12. Real-time variable contribution to MDC and DVC of testing samples for Fault 2.

Figure 13. Variable contribution plots at samples 120 to 122 for Fault 2, using PCA-T2, PCA-SPE, CDS-MDC, and CDS-DVC.

The reason for failing to identify sample 121 using DVC varies, which can be illustrated by Figure 14. Column represents a particular pair copula, denoted by ci , j (# , # , # ) with bivariate copula between variables i and j (copula structure, Kendall correlation coefficient τ X i , X j , copula parameter). Solid line represents the contour curve, and dashed line represents the 0.95 HDR in DSS. In each subplot, “HDR(#): #” represents “HDR(PDF value of HDR): probability value of HDR, DVPij ”. According to Eq. 32, DVC of variables 1 and 2 can be written as DVC1 = DVC2 =

DVP12 ⋅ τ X 1 , X 2 + DVP13 ⋅ τ X 1 , X 3 0.95 ⋅ (τ X 1 , X 2 + τ X 1 , X 3 ) DVP21 ⋅ τ X 2 , X 1 + DVP23 ⋅ τ X 2 , X 3 0.95 ⋅ (τ X 2 , X 1 + τ X 2 , X 3 )

=

1 ( DVP12 ⋅ w12(1) + DVP13 ⋅ w13(1) ) 0.95

1 ( 2) = ( DVP12 ⋅ w12( 2) + DVP23 ⋅ w23 ) 0.95

(34)

where wij( k ) denotes the fraction of τ X i , X j in the sum of τ associated with variable k, and it quantifies the importance of certain pair dependence. For sample 121, DVP23 equals 0.32, smaller ( 2) , it likely yields DVC1>DVC2. It should than that of sample 120 (DVP23=0.58). When w13(1) ≈ w23

be emphasized that faults in this case study only change in one dimension, which can be treated as a kind of simulated sensor fault for an open-loop system. For those widely discussed process faults, anomaly of variable 1 may lead to deviation of variable 2 or 3, depending on the multivariate dependence behaviors. We could simulate these process faults by regulating the state variables in the framework of a state equation model. However, the simulated sensor fault is only considered in this case study because it enables us to clearly observe the change and mathematical meaning of 29

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

the proposed indices. As a drifting error occurs on variable 2, DVP23 generally should be a large number. DVP index is powerful for complex industrial processes because it is able to quantify complex dependence and involve importance levels for pair variables. In addition, the effect of dependence variation will accumulate in the root-cause variables if dimensions become rather large. The proposed method is further investigated and applied to the TE chemical process in the next section.

Figure 14. Highest density region of samples 120 to 122 in DSS. (0.95 confidence level) On the top, “ ci , j (# , # , # ) ” represents “bivariate copula between variables i and j (copula structure, Kendall correlation coefficient τ X i , X j , copula parameter)”. In each subplot, “HDR(#): #” represents “HDR(PDF value of HDR): probability value of HDR, DVPij ”.

5. TE Chemical Process In this section, the proposed method is applied to the TE chemical process. The flow diagram of this chemical plant is presented in Figure S1. The process consists of five major operating units, including an exothermic two-phase reactor, a product condenser, a flash separator, a recycle compressor, and a reboiled stripper. More details of this process are explained in ref 49. There are total 41 process measurements and 12 manipulated variables. To analyze variables more conveniently, 22 continuous process variables among the 41 process measurements are selected for fault detection and diagnosis. These variables are listed in Table S1. The variables are sampled with an interval of 3 min. 960 samples are observed each for training and testing. In the testing data sets, all faults are introduced at sample 161. There are 21 programmed faults, ranging from 30

ACS Paragon Plus Environment

Page 30 of 72

Page 31 of 72

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

step change, random variation, slow drift, sticking. These faults are listed in Table S2. The TE process data can be download at http://web.mit.edu/braatzgroup/links.html. For the copula double-subspace (CDS) approach, both Gaussian assumption (CDS(GM)) and kernel density estimation method (CDS(KDEM)) are employed to estimate the joint margin distribution in MDS. Copula in DSS is then optimized according to BIC. Define SL ∈ [0, 1] as the significance level of independence hypothesis tests for all pair variables. For a large SL, more independence tests are likely rejected, therefore, more pair copulas need to be estimated and selected. If SL is too small, then less pair copulas are included in vine, leading to low modeling accuracy. In this case study, BIC index is minimized when the significance level (SL) is 0.05, as shown in Figure 15. At such a SL, totally 73 pair copulas should be estimated: a trade-off between model accuracy and model complexity/computational load.

Figure 15. BIC index as a function of significance level.

For the copula multi-subspace (CMS) approach, the hyper-rectangular is designed in MDS. This hyper-rectangular is uniquely determined based on the confidence level of each univariate 1

distribution. Due to high dimensions of TE process, confidence level (Eq. 24) is 0.99 22 → 1 . The control limit (quantile of confidence level) of each margin can be estimated via the maximum/minimum value of each dimension of the normal data. The same copula model of CDS is then introduced to monitor dependence behaviors in DSS. The monitoring results of three multivariate statistical projection approaches (PCA, ICA, KPCA) and five probabilistic modeling approaches (VCDD(GM), VCDD(KDEM), CDS(GM), 31

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

CDS(KDEM), and CMS) are presented in Table 2. VCDD(GM) and VCDD(KDEM) are vine copula-based dependence description monitoring approach25 whose margins are estimated with Gaussian assumption and kernel density estimation method, respectively. The rate for Fault 0 represents the false alarm rate for the training data. For double-index approach, their union is calculated and discussed. The maximum detection rate for each fault is marked in bold. For all approaches, confidence level is set as 0.99 to achieve fair comparison. Selected principal components of PCA and KPCA are such that 85% variance is explained.20,50 Independent components are ordered and extracted according to L2 norm criterion.18 It is observed that the performance of VCDD(GM) is better than that of PCA except for Fault 3, 9, 11, 13, 14, 21 (Comparison of VCDD performance in TE process was not discussed in ref 25). Due to different abilities to extract data information of interest, ICA (mainly extract non-Gaussian information) achieves much better performance than KPCA for Fault 10, 19, 20, while KPCA (mainly extract nonlinear information) is more prominent for Fault 7, 15. VCDD(KDEM) is superior to ICA and KPCA except for Fault 3, 10, 12, 21, and outperforms PCA and VCDD(GM) for all faults. This is mainly due to the improvement of global accuracy of VCDD(KDEM) in the modeling process (describe both non-Gaussian and nonlinear information) . The proposed CDS(KDEM) outperforms VCDD(KDEM) except Fault 1, 8, 11, 12, 18. CDS(GM) outperforms VCDD(GM) for all faults. In general, training data are incomplete and the details of the model, termed as local accuracy, are hard to improve. CDS improves the robustness of the model in exchange for local accuracy. When global accuracy is inhibited (VCDD(GM)), the performance of subspace division strategy (CDS(GM)) significantly improves. In this case, less emphasis is placed on local accuracy. Such robustness can be further improved using CMS. It is 32

ACS Paragon Plus Environment

Page 32 of 72

Page 33 of 72

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

observed that CMS performs the best, with the lowest false alarm rate and the highest detection rate for almost all faults excluding Fault 8, 12, 21.

Table 2. Fault Detection Performance for 21 Faults Using Different Approaches (0.99 Confidence Level )

The performance of the proposed approach is further analyzed with two typical fault types. The fault detection charts of the TE process for Fault 6 is presented in Figure 16. It is demonstrated that all approaches can detect the faults with 100% detection rate. For CDS(GM), and CDS(KDEM), 100% detection rate is obtained with MDP, meaning that margin deviation information is sufficient to detect this type of fault. The scatter plots where Fault 6 causes a step change in A feed lose are depicted in Figure 17. Black points represent the training data, and red points represent the testing data. From a mathematical view, a constructed hyper-sphere (model without deterministic relationships) is able to set apart the raw NOC data and the faulty data. For faults that cause large deviation, the level of global accuracy (not to mention local accuracy) can be low/intermediate. A satisfying monitoring results can be achieved by only considering the information in MDS instead of making efforts in DSS modeling.

Figure 16. Fault detection charts of the Tennessee Eastman process for Fault 6.

Figure 17. Scatter plots of the Tennessee Eastman process for Fault 6. Black points represent training data, and red points represent testing data. 33

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The fault detection charts of TE process for Fault 10 is presented in Figure 18. Compared with Fault 6, Fault 10 causes intermediate variable deviation because of controller regulating. According to Table 2, PCA only obtains 43.25% detection rate as the fact that its modeling process is based on linear and Gaussian assumption. For VCDD(GM), improvements are made in dependence modeling, though margins still follow Gaussian distribution. The detection rate correspondingly increases to 66.13%. If non-Gaussian condition for margins is added (VCDD(KDEM)), the results will become even better (78.88%). Figure 19 shows the control boundaries of VCDD(GM) and VCDD(KDEM) in the scatter plot of X7 and X20. Dot denotes training data, and plus denotes anomalies that VCDD(KDEM) successfully detects and VCDD(GM) fails. Due to the improvement of global accuracy, VCDD(KDEM) detects additional 102 anomalies while no additional anomalies are detected by VCDD(GM).

Figure 18. Fault detection charts of the Tennessee Eastman process for Fault 10.

Figure 19. Control boundaries of VCDD(GM) and VCDD(KDEM), and the scatter plot for Fault 10.

For the double-subspace division approach, the detection rate of CDS(GM) for Fault 10 increases to 76.75% (66.13% with VCDD(GM)), and CDS(KDEM) increases to 82.63% (78.88% with VCDD(KDEM)). CDS(GM) (76.75%) achieves similar monitoring performance to VCDD(KDEM) (78.88%). Figure 20 depicts the control boundaries of VCDD and CDS in the 34

ACS Paragon Plus Environment

Page 34 of 72

Page 35 of 72

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

scatter plot of X7 and X20. Dot denotes training data, plus denotes anomalies that CDS successfully detects but VCDD fails, and cross denotes anomalies that VCDD successfully detects but CDS fails. It shows that CDS(GM) detects 97 additional faulty data, and VCDD(GM) detects 12 additional faulty data (see Figure 20 (a)). CDS(KDEM) detects 47 additional faulty data, and VCDD(KDEM) detects 17 additional faulty data (see Figure 20 (b)). The results indicate that, for intermediate faults, both global accuracy and local accuracy should be considered for better monitoring performance. Note that local accuracy is usually difficult to achieve due to factors like incompleteness of training data, technique restriction etc, the effects of these restrictions can accordingly be reduced by double-subspace division approach.

Figure 20. Control boundaries of VCDD(GM), CDS(GM) in MDS, VCDD(KDEM), CDS(KDEM) in MDS, and the scatter plots for Fault 10.

For CMS, it achieves higher detection rate (83.57%) than any other methods discussed above. The control boundary of CDS(KDEM) in MDS, calculated by the joint margin distribution model, may deviate from the real case (local accuracy cannot be ensured). As it clearly shows in Figure 21, the control boundary in MDS is constructed with a robust hyper-rectangular confined by the maximum/minimum values in each dimension. The better monitoring performance of CMS is probably due to the improvement of model robustness in MDS. It should be noted that robustness idea in CDS may lead to type I and type II errors as well.

Figure 21. Control boundaries of CDS(KDEM) and CMS in MDS, and the scatter plots for Fault 35

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

10.

The proposed CDS (KDEM) and CMS achieve significant improvement for detecting several faults such as Fault 3, 4, 9, 15, 19. The fault detection charts for Fault 15 and Fault 19 are provided in Figure S2 and Figure S3, respectively. For fault diagnosis, Fault 10 is selected to evaluate the performance of CDS in comparison to PCA. When the fault occurs, C feed (stream 4) temperature changes randomly. As stream 4 flows into the product stripper, the stripper temperature (X18) is directly affected. As a result, the cascade controller significantly regulates the stripper steam flow (X19) for temperature correction. During this control process, the stripper pressure (X16) alters for its high correlation with stripper temperature. Pressure variation occurs as well at the outlet of the recycle compressor through pipe 5 and 8, which results in the change of compressor work (X20). The variable contribution plots for Fault 10 are presented in Figure 22. Variable contribution is estimated based on the average of samples 201-230. In this time interval, both CDS and PCA successfully detect the faults, and thus a fair comparison is ensured. It is observed that CDS correctly identifies X16, X18, X19, and X20. In contrast, PCA fails to identify X16 and X19. Although MDC16 and MDC19 are small, DVC are large. Besides, significant margin deviation occurs in X20

and X4, and MDC20>MDC4. However, the T2 contribution index in PCA yields T202 < T42 . Such misleading results derive from the distorted dependence information extracted by PCA. For DVC, DVC12 is always zero due to the fact that X12 is independent of all other variables via

independence tests for pair variables. Therefore, data information on X12 only exists in MDS.

36

ACS Paragon Plus Environment

Page 36 of 72

Page 37 of 72

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Figure 22. Variable contribution plots of the Tennessee Eastman process for Fault 10.

The state charts of pair variables for Fault 10 are presented in Figure 23. For the total 22 variables, pairs with the strongest dependence to each variable are analyzed. It can be seen that X18 has the strongest dependence with X19 than any other variables, and so does X19,

τ 18,19 =

max

i =1,L, 22 , i ≠18

τ 18,i = 0.77 , τ 19,18 =

max

i =1,L, 22, i ≠19

τ 19,i = 0.77 . When Fault 10 occurs, the state of this

pair X18-X19 changes from 0 to 3 frequently, and occasionally to 6 or 7, indicating that X18 and X19 in most cases has dependence variation but no significant margin deviation. This agrees with the fault identification result of CDS that MDC18, MDC19 are relatively small while DVC18, DVC19 are large. State 5 appears frequently for pair X4-X20, which means that only the margin of X20 deviates, and no dependence variation occurs for this pair. For pair X7-X13, though the state alters to 2 occasionally (margin deviation occurs for both variables, and no dependence variation). The corresponding MDC and DVC are not the largest, thus X7 (reactor pressure) and X13 (product separator pressure) are not identified as the root-cause variables. Overall, the contribution plots, combined with the state charts of some critical pair variables, are powerful tools for analyzing essential fault behaviors and further troubleshooting.

Figure 23. State charts of pair variables for Fault 10.

6. Conclusion This article proposes a subspace division strategy for copula and a novel technique to fault detection and diagnosis. For CDS approach, MDP and DVP are constructed to quantify variable deviation and dependence variation in multivariate systems. DVP is calculated via HDR of copula, 37

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

which to our knowledge was seldom investigated before. As an improved version of CDS, CMS approach further decomposes the joint margin distribution in MDS into more subspaces, and a robust hyper-rectangular is designed accordingly. For fault diagnosis, DVC is constructed by combining DVP of all pair variables with different importance levels. The state charts for pair variables are also created to analyze essential fault behaviors. Advantages of the proposed technique include: (1) It can detect and identify variables with nonlinear and non-Gaussian behaviors; (2) It is more robust than traditional VCDD approach; and (3) It is able to fully utilize univariate-based alarm data in DCS and ESD, by introducing an additional restriction modulus to DSS. The proposed technique was applied to the fault detection and diagnosis of an illustrated example and the Tennessee Eastman chemical process in comparison to PCA, ICA, KPCA and VCDD. The results demonstrate that the presented approach in general detects and identifies variables more effectively than the four approaches. It gains much higher global accuracy than PCA, ICA and KPCA in modeling, and improves robustness to local inaccuracy compared with VCDD. This study focuses on the effectiveness and robustness of the technique to different faults. However, the presented approach is less time-efficient than the traditional MSPT such as PCA. On a computer with 2.4 GHz CPU and 8 GB RAM, it takes 0.1 second on average for online monitoring of each sample. While several minutes are required for offline modeling to estimate the aforementioned 73 pair copulas, which can be alleviated using simplified or sparse vine copula.51,52 Future work includes identifying crucial pair variables for process monitoring (pairs with weak dependence may be more important), assessing the trade-off between computational 38

ACS Paragon Plus Environment

Page 38 of 72

Page 39 of 72

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

load and effectiveness (BIC may not be the most efficient one). In addition, the application of state chart for pair variables can be broadened, which, for instance, may include critical information for fault classification.

Associated Content Supporting Information The Supporting Information is available free of charge on the ACS Publications website at DOI: ie-2017-02419j. Proof of linear approximation of variables in bivariate Gaussian distribution; standardized margin deviation (SMD) index for CMS; monitored variables of the Tennessee Eastman chemical process (Table S1); programmed faults of the Tennessee Eastman chemical process (Table S2); process flow diagram of the Tennessee Eastman chemical process (Figure S1); fault detection charts of the Tennessee Eastman process for Fault 15 (Figure S2); fault detection charts of the Tennessee Eastman process for Fault 19 (Figure S3) (PDF)

Author Information Corresponding Author *

Tel.: +86-21-64253820. E-mail: [email protected].

Notes The authors declare no competing financial interest.

Acknowledgments 39

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

First author gratefully acknowledges the financial support from the TA-GA Professional Development fund for research at Rutgers University. Corresponding author appreciates the National Natural Science Foundation of China (No. 21676086 and No. 21406064), National Natural Science Foundation of Shanghai (14ZR1410500) and Fundamental Research Funds for the Central Universities under Grant 222201717006. The authors would like to thank the anonymous reviewers for their valuable comments and suggestions to improve the quality of the paper.

References (1) Grosdidier, P.; Connor, P.; Hollifield, B.; Kulkarni, S. A path forward for DCS alarm management. Hydrocarbon Processing. 2003, 82 (11), 59–68. (2) Cheng, Y.; Izadi, I.; Chen, T. Pattern matching of alarm flood sequences by a modified Smith–Waterman algorithm. Chem. Eng. Res. Des. 2013, 91 (6), 1085–1094. (3) Adhitya, A.; Cheng, S.; Lee, Z.; Srinivasan, R. Quantifying the effectiveness of an alarm management system through human factors studies. Comput. Chem. Eng. 2014, 67, 1–12. (4) Li, D.; Hu, J.; Wang, H.; Huang, W. A distributed parallel alarm management strategy for alarm reduction in chemical plants. J. Process Control. 2015, 34, 117–125. (5) Meel, A.; Seider, W. D. Plant-specific dynamic failure assessment using Bayesian theory. Chem. Eng. Sci. 2006, 61 (21), 7036–7056.

(6) Kalantarnia, M., Khan, F., Hawboldt, K. Dynamic risk assessment using failure assessment and Bayesian theory. J. Loss Prev. Process Ind. 2009, 22 (5), 600–606. (7) Pariyani, A.; Seider, W. D.; Oktem, U. G.; Soroush, M. Dynamic risk analysis using alarming 40

ACS Paragon Plus Environment

Page 40 of 72

Page 41 of 72

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

databases to improve process safety and product quality: part II–Bayesian analysis. AIChE J. 2012, 58 (3), 826–841.

(8) Khakzad, N., Khan, F., Amyotte, P. Dynamic risk analysis using bow-tie approach. Reliab. Eng. Syst. Safe. 2012, 104, 36–44.

(9) Hashemi, S. J., Ahmed, S., Khan, F. I. Risk-based operational performance analysis using loss functions. Chem. Eng. Sci. 2014, 116, 99–108. (10) Zadakbar, O., Khan, F., Imtiaz, S., Development of economic consequence methodology for process risk analysis. Risk Anal. 2015, 35 (4), 713–731. (11) Khan, F., Wang, H., Yang, M. Application of loss functions in process economic risk assessment. Chem. Eng. Res. Des. 2016, 111, 371–386. (12) Yu, H. Dynamic risk assessment of complex process operations based on a novel synthesis of soft-sensing and loss function. Process Saf. Environ. 2017, 105, 1–11. (13) Ren, X.; Li, S.; Lv, C.; Zhang, Z. Sequential dependence modeling using Bayesian theory and D-Vine Copula and its application on chemical process risk prediction. Ind. Eng. Chem. Res. 2014, 53 (38), 14788–14801.

(14) Qin, S. J. Statistical process monitoring: basics and beyond. J. Chemomet. 2003, 17 (8-9), 480–502. (15) Kourti, T.; MacGregor, J. F. Process analysis, monitoring and diagnosis, using multivariate projection methods. Chemomet. Intell. Lab. Syst. 1995, 28, 3–21. (16) Yue, H. H.; Qin, S. J. Reconstruction-based fault identification using a combined index. Ind. Eng. Chem. Res. 2001, 40 (20), 4403–4414.

(17) Tong, H.; Crowe, C. M. Detection of gross errors in data reconciliation by principal 41

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

component analysis. AIChE J. 1995, 41 (7), 1712–1722. (18) Lee, J. M.; Yoo, C. K.; Lee, I. B. Statistical process monitoring with independent component analysis. J. Process Control. 2004, 14 (5), 467–485. (19)Yang, J.; Gao, X.; Zhang, D.; Yang, J. Kernel ICA: An alternative formulation and its application to face recognition. Pattern Recogn. 2005, 38 (10), 1784–1787. (20) Lee, J. M.; Yoo, C. K.; Choi, S. W.; Vanrolleghem, P. A.; Lee, I. B. Nonlinear process monitoring using kernel principal component analysis. Chem. Eng. Sci. 2004, 59, 223–234. (21) Lee, J. M.; Qin, S. J.; Lee, I. B. Fault detection of non‐linear processes using kernel independent component analysis. Can. J. Chem. Eng. 2007, 85 (4), 526–536. (22) Choi, S. W.; Lee, C.; Lee, J. M.; Park, J. H.; Lee, I. B. Fault detection and identification of nonlinear processes based on kernel PCA. Chemomet. Intell. Lab. Syst. 2005, 75, 55–67. (23) Yu, H.; Khan, F.; Garaniya, V. A sparse PCA for nonlinear fault diagnosis and robust feature discovery of industrial processes. AIChE J. 2016, 62 (5), 1494–1513. (24) Thissen, U.; Swierenga, H.; Weijer, A. P.; Wehrens, R.; Melssen, W. J.; Buydens, L. M. C. Multivariate statistical process control using mixture modelling. J. Chemomet. 2005, 19, 23–31. (25) Ren, X.; Tian, Y.; Li, S. Vine copula-based dependence description for multivariate multimode process monitoring. Ind. Eng. Chem. Res. 2015, 54 (41), 10001–10019. (26) Yu, H.; Khan, F.; Garaniya, V. A probabilistic multivariate method for fault diagnosis of industrial processes. Chem. Eng. Res. Des. 2015, 104, 306–318. (27) Ahooyi, T. M.; Arbogast, J. E.; Soroush, M. Rolling pin method: efficient general method of joint probability modeling. Ind. Eng. Chem. Res. 2014, 53 (52), 20191–20203. 42

ACS Paragon Plus Environment

Page 42 of 72

Page 43 of 72

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

(28) Ahooyi, T. M.; Arbogast, J. E.; Soroush, M. Rebuttal to the ‘comment on “rolling pin method: efficient general method of joint probability modeling”’. Ind. Eng. Chem. Res. 2015, 54 (8), 2416–2417. (29) Liu, J.; Chen, D. Fault isolation using modified contribution plots. Comput. Chem. Eng. 2014, 61, 9–19. (30) Zhou, Z.; Wen, C.; Yang, C. Fault isolation based on k-nearest neighbour rule for industrial processes. IEEE Trans. Ind. Electron. 2016, 63 (4), 2578–2586. (31) Li, G., Alcala, C. F., Qin, S. J., Zhou, D. Generalized reconstruction-based contributions for output-relevant fault diagnosis with application to the Tennessee Eastman process. IEEE Trans. Control Syst. Technol. 2011, 19, 1114–1127.

(32) Sun, H.; Zhang, S.; Zhao, C.; Gao, F. A sparse reconstruction strategy for online fault diagnosis in nonstationary processes with no a priori fault information. Ind. Eng. Chem. Res. 2017, 56 (24), 6993–7008.

(33) Yu, J. Localized Fisher discriminant analysis based complex chemical process monitoring. AIChE J. 2011, 57 (7), 1817–1828.

(34) Yin, Z.; Hou, J. Recent advances on SVM based fault diagnosis and process monitoring in complicated industrial processes. Neurocomputing. 2016, 174, 643–650. (35) Kano, M.; Hasebe S.; Hashimoto, I. Statistical process monitoring based on dissimilarity of process data. AIChE J. 2002, 48 (6), 1231–1240. (36) Zhao, C.; Gao, F. A sparse dissimilarity analysis algorithm for incipient fault isolation with no priori fault information. Control Eng Pract. 2017, 65, 70-82. (37) Mao, G. Testing independence in high dimensions using Kendall’s tau. Comput. Stat. Data. 43

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Anal. 2018, 117, 128–137.

(38) Sklar, A. Fonctions dé Repartition á n Dimension et Leurs Marges. Publications del'Institut de Statistique de L'Université de Paris. 1959, 8, 229–231. (39) Joe, H. Multivariate models and multivariate dependence concepts; CRC Press, 1997. (40) Nelsen, R. B. An introduction to Copulas; Springer: Berlin, 2006. (41) Kurowicka, D.; Joe, H. Dependence modeling: vine copula handbook; World Scientific: New Jersey, 2011. (42) Lopez-Paz, D.; Hernández-Lobato, J. M.; Ghahramani, Z. Gaussian process vine copulas for multivariate dependence. Paper in ICML, 2013. (43) Bowman, A. W. An alternative method of cross-validation for the smoothing of density estimates. Biometrika. 1984, 71 (2), 353–360. (44) Silverman, B. W. Density estimation; Chapman & Hall: London, 1986. (45) Wand, M. P.; Jones, M. C. Multivariate plug-in bandwidth selection. Comput. Stat. 1994, 9 (2), 97–116. (46) Aas, K.; Czado, C.; Feigessi, A.; Bakken, H. Pair-copula construction of multiple dependence. Insurance. Math. Econom. 2009, 44 (2), 182–198.

(47) Hyndman, R. J. Computing and graphing highest density regions. J. Am. Stat. Assoc. 1996, 50 (2), 120–126. (48) Shewhart, W. A. Statistical method from the viewpoint of quality control; Dover: New York, 1986. (49) Downs, J. J.; Vogel, E. F. A plant-wide industrial process control problem. Comput. Chem. Eng. 1993, 17 (3), 245–255.

44

ACS Paragon Plus Environment

Page 44 of 72

Page 45 of 72

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

(50) Jiang, Q.; Yan, X. Chemical processes monitoring based on weighted principal component analysis and its application. Chemomet. Intell. Lab. Syst. 2012, 119, 11–20. (51) Nagler, T.; Czado, C. Evading the curse of dimensionality in nonparametric density estimation with simplified vine copulas. J. Multivar. Anal. 2016, 151, 69–89. (52) Müller, D.; Czado, C. Selection of sparse vine copulas in high dimensions with the Lasso. 2017, arXiv:1705.05877.

45

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Table 1. Illustration of Eight Possible States (√ Denotes Deviation or Variation and × Denotes No Deviation or No Variation) Margin 1

Margin 2

Dependence

State

×

×

×

0







1





×

2

×

×



3



×

×

4

×



×

5



×



6

×





7

46

ACS Paragon Plus Environment

Page 46 of 72

Page 47 of 72

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Table 2. Fault Detection Performance for 21 Faults Using Different Approaches (0.99 Confidence Level ) VCD VCD D PCA

ICA

KPCA

D

CDS(GM)

CDS(KDEM)

CMS

(KDE (GM) M) Fault

T2

SPE

Union

I2

SPE

Union

T2

SPE

Union

GLP

GLP

MDP

DVP

Union

MDP

DVP

Union

SMD

DVP

Union

0

0.63

0.83

1.46

1.04

1.04

2.08

1.98

1.46

2.29

1.04

1.04

0.94

0.94

1.88

0.94

0.94

1.88

0

0.94

0.94

1

99.12

99.75

99.75

99.62

99.88

99.88

99.75

99.75

99.75

99.75

99.88

99.50

45.75

99.75

99.63

45.75

99.75

99.75

45.75

99.88

2

98.00

98.25

98.25

97.75

98.12

98.25

98.13

98.25

98.25

98.25

98.63

98.00

91.00

98.25

98.50

91.00

98.75

98.63

91.00

98.88

3

1.00

2.38

3.38

1.5

6.50

8.00

4.38

5.00

7.00

3.25

7.38

1.25

3.63

4.75

5.13

3.63

8.13

10.00

3.63

12.25

4

0.50

1.87

2.25

1.13

3.13

4.13

2.00

2.25

3.25

2.25

4.63

0.50

3.00

3.13

2.50

3.00

5.13

6.13

3.00

8.00

5

23.38

15.25

25.25

24.37

26.50

28.88

27.00

27.00

28.63

26.38

29.00

23.38

12.50

28.25

25.88

12.50

29.75

29.88

12.50

32.75

6

100

100

100

100

100

100

100

100

100

100

100

100

97.75

100

100

97.75

100

100

97.75

100

7

37.50

22.75

38.25

34.13

36.63

39.87

42.38

42.63

44.50

40.00

45.00

39.75

17.25

43.75

43.38

17.25

47.13

48.38

17.25

50.25

8

97.00

91.63

97.75

96.25

97.75

97.88

97.38

97.75

97.88

98.50

98.88

97.38

38.38

98.63

98.00

38.38

98.63

98.00

38.38

98.63

9

1.63

2.00

3.62

1.38

5.37

6.63

3.38

4.88

6.13

3.13

7.25

1.25

3.25

4.50

5.13

3.25

7.88

9.75

3.25

12.13

10

29.75

25.75

43.25

72.75

70.75

80.50

45.00

60.00

62.00

66.13

78.88

40.88

54.00

76.75

54.25

54.00

82.63

58.13

54.00

83.75

11

11.13

46.25

47.00

27.63

31.87

40.63

34.50

40.88

41.75

36.50

49.38

25.75

15.88

35.88

42.75

15.88

47.38

49.63

15.88

51.88

12

98.50

86.25

99.00

96.75

99.12

99.50

99.50

99.13

99.50

99.25

99.38

98.75

29.88

99.25

98.75

29.88

99.25

98.88

29.88

99.38

13

93.75

95.13

95.13

94.25

94.75

94.75

94.75

94.63

94.75

94.75

95.13

94.38

45.00

94.88

94.63

45.00

95.13

95.00

45.00

95.13

14

98.25

100

100

99.88

100

100

99.88

99.88

99.88

99.88

100

99.88

79.63

99.88

100

79.63

100

100

79.63

100

15

1.00

2.88

3.88

1.38

5.37

6.75

10.13

7.13

12.13

6.75

13.88

4.50

5.13

8.88

11.88

5.13

15.88

18.63

5.13

21.50

16

18.12

11.25

23.75

20.37

32.37

40.88

32.38

35.38

39.50

31.13

45.88

23.50

21.38

37.63

34.50

21.38

47.38

42.75

21.38

51.63

17

79.25

95.75

95.87

93.75

91.63

95.87

95.38

94.63

95.88

96.38

97.13

87.88

77.25

96.88

93.63

77.25

97.13

95.00

77.25

97.50

18

89.25

90.25

90.38

89.75

90.63

90.63

89.88

89.88

90.13

90.38

90.88

89.25

81.38

90.50

89.88

81.38

90.75

91.63

81.38

91.88

19

11.00

6.63

16.63

35.13

15.50

44.37

4.13

16.63

17.13

24.13

48.13

2.00

37.88

39.25

18.50

37.88

51.25

36.88

37.88

62.88

20

24.75

41.13

46.63

77.00

67.50

81.37

45.00

50.63

52.13

74.50

81.75

34.63

69.50

79.88

45.75

69.50

82.38

52.00

69.50

83.38

21

42.50

41.50

46.75

19.63

58.13

58.37

44.63

49.75

50.63

46.13

53.38

41.38

16.50

50.13

48.25

16.50

55.25

52.00

16.50

57.38

47

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 1. Contour plots of six typical bivariate copulas, all with 0.5 Kendall’s τ .

48

ACS Paragon Plus Environment

Page 48 of 72

Page 49 of 72

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Figure 2. Illustration of the three factors that affect monitoring performance for all possible faults and NOC data.

49

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 3. Illustration of copula subspace division.

50

ACS Paragon Plus Environment

Page 50 of 72

Page 51 of 72

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Figure 4. Exact boundary and equivalent robust control boundary in MDS.

51

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 5. The 0.99 HDR and the univariate-based 0.995 HDR of (a) joint margin distribution (no dependence) and (b) joint distribution (with dependence) under Gaussian assumption.

52

ACS Paragon Plus Environment

Page 52 of 72

Page 53 of 72

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Figure 6. Sensitive regions of two indices, UP and MDP, where A(1.24, 1.24), B(1.24, -1.24), C(-1.24, -1.24), D(-1.24, 1.24). The 0.99 HDR and the univariate-based 0.995 HDR of the joint margin distribution are also included.

53

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 7. State chart for pair variables in the unit cube.

54

ACS Paragon Plus Environment

Page 54 of 72

Page 55 of 72

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

NOC Training Data

Online Process Data

Margin Distribution Subspace

Dependence Stucture Subspace

Compute MDP And DVP Index

Build Joint Margin Distribution

Build Copula Model

Compare

Exceed Control Boundary?

Determine Control Boundary Using HDR

Yes Compute MDC And DVC Index

NOC Training Data

Build Marginal Distributions

No

Estimate Kendall Tau Matrix

Identify Root Cause Variables

Construct Pair Dependence Matrix

Analyze State of Pair Variables

Figure 8. Flowchart of the proposed copula double-subspace process monitoring approach.

55

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Strategy I

HighDimensional Process Data

Strategy II

Univariate Distribution Models

Ignore multiva riate d epend ence; May achie ve unre liable results an d cause alarm flo oding; Still main/commo n strategy in modern/current industrial prod uction

Multivariate Statistical Model

Take in to account complex depen dence; Univariatebased alar ms info rmatio n in DCS and ESD b ecomes invalid; Sufficie ntly investiga ted in academic field s but not widely used in in dustria l fields

Univariate Distribution Models In MDS

Make full use of univaria te-based alar ms info rmatio n in DCS an d E SD; Sho w that univar iate-based control limits mathematically could be hig her tha n previou sly tho ught

Copula Model In DSS

Ser ve as a re striction modu lus with re spe ct to multiva riate d epende nce varia tion; HDR in co pula app ears a goo d mea sur e q uantifying dep enden ce beh avior for each sin gle point.

Pro posed Stra teg y

Figure 9. Characteristics and development of the modeling process in statistical process monitoring.

56

ACS Paragon Plus Environment

Page 56 of 72

Page 57 of 72

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Figure 10. Fault detection charts for Fault 1 and Fault 2, using PCA-T2, PCA-SPE, VCDD-GLP, CDS-MDP, and CDS-DVP. (0.95 confidence level)

57

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 11. Variable contribution plots at sample 73 for Fault 1, using PCA-T2, PCA-SPE, CDS-MDC, and CDS-DVC.

58

ACS Paragon Plus Environment

Page 58 of 72

Page 59 of 72

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Figure 12. Real-time variable contribution to MDC and DVC of testing samples for Fault 2.

59

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 13. Variable contribution plots at samples 120 to 122 for Fault 2, using PCA-T2, PCA-SPE, CDS-MDC, and CDS-DVC.

60

ACS Paragon Plus Environment

Page 60 of 72

Page 61 of 72

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Figure 14. Highest density region of samples 120 to 122 in DSS. (0.95 confidence level) On the top, “ ci , j (# , # , # ) ” represents “bivariate copula between variables i and j (copula structure, Kendall correlation coefficient τ X i , X j , copula parameter)”. In each subplot, “HDR(#): #” represents “HDR(PDF value of HDR): probability value of HDR, DVPij ”.

61

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 15. BIC index as a function of significance level.

62

ACS Paragon Plus Environment

Page 62 of 72

Page 63 of 72

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Figure 16. Fault detection charts of the Tennessee Eastman process for Fault 6.

63

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 17. Scatter plots of the Tennessee Eastman process for Fault 6. Black points represent training data, and red points represent testing data.

64

ACS Paragon Plus Environment

Page 64 of 72

Page 65 of 72

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Figure 18. Fault detection charts of the Tennessee Eastman process for Fault 10.

65

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 19. Control boundaries of VCDD(GM) and VCDD(KDEM), and the scatter plot for Fault 10.

66

ACS Paragon Plus Environment

Page 66 of 72

Page 67 of 72

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Figure 20. Control boundaries of VCDD(GM), CDS(GM) in MDS, VCDD(KDEM), CDS(KDEM) in MDS, and the scatter plots for Fault 10.

67

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 21. Control boundaries of CDS(KDEM) and CMS in MDS, and the scatter plots for Fault 10.

68

ACS Paragon Plus Environment

Page 68 of 72

Page 69 of 72

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Figure 22. Variable contribution plots of the Tennessee Eastman process for Fault 10.

69

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 23. State charts of pair variables for Fault 10.

70

ACS Paragon Plus Environment

Page 70 of 72

Page 71 of 72

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Table of Contents Graphic

71

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

72

ACS Paragon Plus Environment

Page 72 of 72