Multiblock Independent Component Analysis Integrated with Hellinger

Feb 12, 2015 - Open Access .... Studies have also reported that not all measured variables are equally ... Section 2 briefly reviews the conventional ...
0 downloads 0 Views 6MB Size
Subscriber access provided by RENSSELAER POLYTECH INST

Article

Multiblock Independent Component Analysis integrated with Hellinger Distance and Bayesian Inference for Non-Gaussian Plant-Wide Process Monitoring Qingchao Jiang, Bei Wang, and Xuefeng Yan Ind. Eng. Chem. Res., Just Accepted Manuscript • DOI: 10.1021/ie403540b • Publication Date (Web): 12 Feb 2015 Downloaded from http://pubs.acs.org on February 18, 2015

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Industrial & Engineering Chemistry Research is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 44

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Multiblock Independent Component Analysis integrated with Hellinger Distance and Bayesian Inference for Non-Gaussian Plant-Wide Process Monitoring Qingchao Jiang&, Bei Wang& and Xuefeng Yan*

(Key Laboratory of Advanced Control and Optimization for Chemical Processes of Ministry of Education, East China University of Science and Technology, Shanghai 200237, P. R. China)

&: These authors contributed equally to this work. *: Corresponding author: Xuefeng Yan Email address: [email protected] Tel\Fax Number: +86-21-64251036 Address: P.O. BOX 293, MeiLong Road NO. 130, Shanghai 200237, P. R. China

Abstract A novel multiblock plant-wide process monitoring method based on Hellinger distance (HD), Bayesian inference, and independent component analysis (ICA) (HDBICA) is proposed in this paper. Multiblock methods are usually employed for plant-wide process monitoring; however, block division is usually based on prior process knowledge that may not always be available. This paper proposes a totally data-driven multiblock monitoring method that employs HD to divide blocks automatically. Variables with similar probability distributions are divided into the same block on the basis of HD, and sub-ICA models are built for sub-block status monitoring. Finally, the monitoring results from all blocks 1

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 44

are combined on the basis of Bayesian inference. HDBICA is exemplified by using a numerical study and the Tennessee–Eastman benchmark process. The monitoring results indicate that the performance of HDBICA is superior to the performances of ICA, Kernel ICA, and other state-of-the-art variant-based methods.

Keywords: multiblock independent component analysis, Hellinger distance, Bayesian inference, plant-wide process monitoring

2

ACS Paragon Plus Environment

Page 3 of 44

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

1. Introduction Process monitoring has gained increasing attention in recent decades1-6 with the increasing demand for product quality and process safety. Data-based process monitoring, particularly multivariate statistical process monitoring (MSPM) methods have become very popular and progressed quickly2, 5-9 with the development of data gathering and computing technology. Among the MSPM methods, principal component analysis (PCA) is usually considered the most fundamental technology and has been researched by many scholars6, 9-14. The following assumptions are needed in PCA monitoring: first, the data collected from chemical processes are Gaussian distributed; second, process variables are linearly related. These assumptions are easily violated in reality, particularly in plant-wide process monitoring, which is usually complex and has many different operation units; therefore, PCA may therefore not function well6, 11, 15-17. Independent component analysis (ICA) method has been proposed16, 18, 19 to relax the Gaussian assumption in PCA monitoring. ICA decomposes observed data into linear combinations of independent components (ICs), which reveal more higher-order statistical information than PCA16, 18, 20. ICA has been extensively studied because of its effectiveness; furthermore, numerous successful applications for ICA have been reported. Li and Wang19 removed the dependencies of variables by using ICA. Kano18 investigated the feasibility of ICA monitoring and demonstrated the superiority of ICA over PCA. Lee et al.16 proposed an ICA monitoring method that separates the IC space into dominant ICs and excluded ICs for process monitoring. ICA has also been extended to dynamic ICA (DICA), multiway ICA (MICA), and kernel ICA (KICA) to solve various problems21-23. Ge and Song23, 24 recently proposed a monitoring method based on ICA-PCA to extract both Gaussian and non-Gaussian information24, 2524, 3

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

2523, 24

Page 4 of 44

. Rashid and Yu25, 26 integrated multidimensional mutual information with ICA to evaluate the

dependency between the ICs of monitored and normal benchmark data sets for monitoring performance improvements26, 2726, 2725, 26. Jiang and Yan28, 29 proposed a weighting strategy in ICA monitoring to highlight useful fault information for performance improvements. Ge and Song30 introduced a performance-driven ensemble learning ICA to consider the fault information and improve process monitoring performance. In the ICA-based monitoring scheme, kernel density estimation (KDE) is usually used to determine the control limit of monitored statistics because the statistics do not follow Gaussian distribution16, 22, 31. Plant-wide processes are usually characterized by multiple operation units, large number of variables, and complex correlations; hence, plant-wide process monitoring has become a hot research topic in recent years. The multiblock method is usually used11 to reduce process complexity. MacGregor et al.32 developed multiblock methods for monitoring the variations in each sub-block and the entire process. Westerhuis et al.33 compared several multiblock and hierarchical PCA/partial least square (PLS) methods from an algorithmic viewpoint. Qin et al.34 further analyzed multiblock methods and defined block and variable contributions upon T2 and Q statistics to decentralize monitoring by using multiblock PCA. Choi and Lee35 developed block or variable contributions to the statistics in multiblock PLS monitoring. Kohonen et al.36 demonstrated the efficiency of the multiblock PLS method in relation to normal PLS. Zhang and Ma37 proposed a multiblock kernel ICA method that considers non-Gaussianity for fault detection and diagnosis. Ge and Song25 proposed two-level multiblock ICA and PCA methods for monitoring performance improvements. Studies have also reported that not all measured variables are equally important for process monitoring38, specifically in a multiblock process with mass variables. 4

ACS Paragon Plus Environment

Page 5 of 44

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Therefore, several key variable selection-based methods have been developed38-41. These variable selection methods can effectively reduce process complexity and significantly improve monitoring veracity; however, fault datasets are usually needed for the selection. Block division is an initial and key step in multiblock methods, and the abovementioned methods usually divide blocks under the assumption that some process knowledge is known11. However, in a complex plant-wide process, process knowledge is not always available or only a part of it is available. The block division step should be conducted automatically in this case, and totally data-driven methods are of particular interest11. Ge and Song11 recently proposed a distributed PCA (DPCA) for plant-wide process monitoring. The DPCA constructs sub-blocks according to the different directions of the PCA principal components. Thereafter, the original feature space can be divided into several sub-feature spaces automatically. Tong et al. proposed a four-subspace construction and Bayesian inference (FSCB) method, which divides the monitoring space into four subspaces to improve the process monitoring performance42. This subspace division is based on PCA and is also a totally data-based division method. Ge et al.15 proposed a method that uses linear subspace PCA and Bayesian inference (BSPCA) for nonlinear process monitoring, and the PCA decomposition is employed to construct linear subspaces. These totally data-driven methods have significantly promoted the development of multiblock monitoring methods; however, some issues still need to be discussed. First, PCA can model linear correlations between variables but performs poorly for nonlinear relations43, 44. Second, PCA considers only the mean and variance information and ignores the probability distribution information43, 44. These shortcomings will affect the dividing performance and limit the practical application of PCA. A novel multiblock plant-wide process monitoring method that uses Hellinger distance (HD), 5

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 44

Bayesian inference technologies, and ICA (HDBICA) is proposed in this paper. In a plant-wide process, the number of monitored variables will be large, and there will be different distributions for variables, for example, Gaussian, uniform or the different orders of them (as is shown in the numerical example in the Case study section). If two measured variables are distributed very differently, they can hardly be linear-correlated or generated by linear combination of the same source signals. Therefore, dividing the variables with similar distributions into the same block can be an effective approach to reduce the nonlinearity. HD, which was introduced by Ernst Hellinger45-47, is a statistical and information technology that can effectively quantify the similarity between two probability distributions. After HD division, variables with similar distributions are divided into the same block. Variables in each sub-block are likely to be within the same order because they have similar distributions; hence, these variables can be considered linear combinations of some latent variables. Each sub-block is linear, and applying ICA algorithm to extract latent components and model the process is appropriate. The advantages of the proposed HDBICA method can be summarized as follows: (i) HDBICA does not need prior process knowledge and can divide blocks automatically compared with traditional multiblock methods; (ii) HDBICA considers the probability distributions of variables compared with the PCA-based dividing method; thus, the use of ICA in extracting latent variables is appropriate; (iii) the Gaussian distribution assumption is relaxed because the ICA method is employed. After a fault has been detected, fault diagnosis is the next issue for process monitoring. The most widely used approach in PCA and ICA models to date involves the use of contribution plots6, 16, 29. For the proposed method, similar contribution plots are presented for fault diagnosis. The remainder of this paper is organized as follows. Section 2 briefly reviews the conventional ICA-based process monitoring 6

ACS Paragon Plus Environment

Page 7 of 44

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

technique and the HD for probability distribution evaluation. Section 3 presents the details of the proposed HDBICA process monitoring method. Section 4 illustrates the performance of the HDBICA method by using both a numerical process and the Tennessee–Eastman (TE) chemical process. A comparison between state-of-the-art variant-based methods is also provided. Finally, Section 5 concludes.

2. Preliminaries This section presents a brief review of ICA and HD. The two descriptions help introduce the proposed method completely. 2.1 Independent component analysis ICA is defined as the statistical “latent variable” model, which reveals hidden ICs of random variables, measurements, or signals. Given a data matrix X ∈ R l × n , where l and n denote the numbers of measured variables and measurements, respectively. l measured variables can be decomposed into a linear combination

S ∈ R r×n

that contains r unknown ICs [ s1 , s2 ,..., sr ]T ( r ≤ l ) . The relationship

between them can be expressed as follows16, 20: X = AS + E

(1)

,

where A ∈ R l × r is the mixing matrix and E ∈ R l × n is the residual matrix. A separating matrix W is needed to help reconstruct a data matrix Sˆ whose statistical independency is maximized in terms of negentropy to estimate the IC matrix S and the mixing matrix A , i.e., the components in Sˆ are independent of each other16, 20.

7

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 44

Sˆ = WX .

(2)

For process monitoring purpose, two statistics, namely, I 2 and SPE , are employed to monitor the IC subspace and residual subspace, respectively16, 29: I 2 ( i ) = sˆ ( i ) sˆ ( i ) T

(3)

,

SPE ( i ) = e ( i ) e ( i ) T

(4)

,

where sˆ ( i ) = Wx ( i ) and e ( i ) = x ( i ) − xˆ ( i ) . Prediction xˆ ( i ) is defined as follows: xˆ ( i ) = QBT Wx ( i )

(5)

.

Where Q is the whitening matrix, B is an orthogonal matrix, W = BT Q is the separating matrix and can be obtained by FastICA algorithm16, 20. The confidence limits of the I 2 and SPE statistics are determined by KDE for process monitoring16, 29. 2.2 Hellinger distance HD is a type of f-divergence that quantifies the similarity between two probability distributions in probability and statistics. HD was initially introduced by Ernst Hellinger45, 46 and defined in terms of the Hellinger integral. To introduce this theory, P and Q are assumed to denote two probability measures that are absolutely continuous in relation to a third probability measure λ . Thereafter, the square of HD between P and Q can be expressed as follows45, 46: 2

1  dP dQ  H ( P, Q ) = ∫  −  dλ , 2  dλ d λ  2

where

(6)

dP dP and are the Radon– Nikodym derivatives of P and Q , respectively. dλ dλ

The HD between two discrete probability distributions, namely, Q = ( q1 , q2 ,..., qk ) , can be described as follows45, 46: 8

ACS Paragon Plus Environment

P = ( p1 , p2 ,..., pk )

and

Page 9 of 44

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

H ( P, Q ) =

1 2

∑( k

pi − qi

i =1

)

2

,

(7)

which can be considered as the Euclidean norm of the difference of the square root vectors: H ( P, Q ) =

1 2

P− Q

2

,

(8)

where k is the number of selected elements in the probability distribution to describe the distribution. The probability distributions of the variables in this study are obtained by KDE, which is a non-parametric estimation method. The density and probability distribution of each variable can be estimated by KDE given a proper number of training samples. The value of k can influence the HD value because an excessively high value will lead to an artificial HD value, whereas a considerably small value will mask the key differences between P and Q. However, a relative evaluation of the HD between two variables is required; therefore, k will insignificantly affect the block division result. A default set of 100 samples are generated in Matlab toolbox to represent the probability (the k used in the current study is 100). The value of HD ranges between zero and one. A small HD value represents high similarity in the two variables, and the value is equal to zero only if P = Q . HD is employed in this study to measure the relationships between each variable by regarding one variable as P and another as Q ; this approach will consider probability distribution in quantifying the relationships between the two variables.

3 HDBICA process monitoring scheme The detailed description of the proposed method is provided in this section. 3.1 Division of subspaces

9

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 44

In a plant-wide process, mass data are collected and saved for fault detection and analysis. Traditional ICA can handle a certain amount of data but can hardly handle large amounts of data. However, the multiblock method can solve this issue by dividing the original variables into several subspaces to reduce variable dimensionality. The division principle, which is based on the statistical method HD in this paper, then becomes the key point of the multiblock method. In the current study, the blocks are obtained through the HD method and the variables with similar distribution are divided into the same block. Variables with similar distributions are likely to be within the same order and these variables can be considered as linear combinations of some latent variables. This block division can (i) reduce the process complexity and make the local blocks more linear for monitoring; (ii) explore more local behaviors for monitoring. Therefore, the division considering variable distribution is reasionable and will benefit the followed monitoring. Suppose that data matrix X = [ x1 ,K , xl ] ∈ R l ×n is collected from the plant-wide process. The T

probability density function p ( xi ) and p ( x j ) should be identified first to calculate the HD between variables xi and x j .The HD value H ( p ( xi ), p ( x j )) between xi and x j then emerges according to Eq. (8). The HD values between each two variables should be calculated to divide the variables accurately. For a variable, HD values with the other l − 1 variables are calculated and the HD value with itself equals to zero. Thus, an HD matrix H ∈ R l ×l that reflects the similarity between the two variables is constructed. Variables with small HD values should be assigned into the same block because a small HD value indicates high similarity. A threshold value α can be empirically set for the division. The alpha value selection plays an important role in the division process. An overly small alpha value leads to a large number of sub-blocks, and an overly large alpha value can hardly divide the variables. 10

ACS Paragon Plus Environment

Page 11 of 44

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

The optional choice of the alpha value should take the following several factors into consideration: the number of sub-blocks, the HD difference level between the variables and the number of original variables. When two variables have HDs that stay below α , such variables should be placed into the same block. More than one variable can be similar to another variable at the same time. HD values above α may also exist for some variables. A variable should follow another variable with the smallest HD value to the same block. On the basis of this criterion, the variables are divided as follows: X =  X1T

where

X 2T L B

X BT 

T

(9)

,

represents the number sub-blocks. Thus, each sub-block can be expressed as

X b ∈ Rlb ×n (b = 1, 2,..., B) , where lb is the number of variables in the b th sub-block. ICA is then built in

each subspace according to the following equation: X b = Ab S b + E b .

(10)

The corresponding statistics I 2 and SPE in each sub-block, as well as the confidence limits, are developed. Once the statistics of any sub-block exceed the corresponding confidence limits, a fault occurs. 3.2 Bayesian inference Given the different statistics I 2 and SPE generated from each sub-block, observe the statistics all the time is impossible, specifically when a large-scale system produces many variables and when the number of sub-blocks is high. Therefore, a fusion method is required to combine the statistics into the final result for direct monitoring. In this study, Bayesian inference15, 48 is adopted to transfer the results in a probabilistic manner. For the monitored sample x , the first step is to calculate the fault probability of I 2 statistic in each sub-block as follows15: 11

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

PI 2 ( F | xb ) =

PI 2 ( xb | F ) PI 2 ( F ) PI 2 ( xb )

Page 12 of 44

,

(11)

PI 2 ( xb ) = PI 2 ( xb | N ) PI 2 ( N ) + PI 2 ( xb | F ) PI 2 ( F ) ,

(12)

where xb represents the selected data vector of the b th sub-block; “ N ” and “ F ” are considered normal and abnormal conditions, respectively; PI ( N ) and PI ( F ) are the corresponding prior 2

probabilities and are set as

2

and β , respectively, where 1- β is the confidence level and is

1− β

advised as 0.975 to 0.995. The conditional probabilities, PI ( xb | N ) and PI ( xb | F ) , in Eq. (12) can 2

2

be expressed as follows: PI 2 ( xb | N ) = exp{−

PI 2 ( xb | F ) = exp{−

I b2 ( xb ) } , vI b2,lim I b2,lim vIb2 ( xb )

(13)

(14)

}

,

where Ib2 ( xb ) is the statistic I 2 of sub-block b and I b2,lim is the corresponding confidence limit. Therefore, the final monitoring statistic BICI ( x ) can be calculated by the following formula: 2

B

BICI 2 ( x ) = ∑ {

PI 2 ( xb | F ) PI 2 ( F | xb )

b



B k

PI 2 ( xb | F )

(15)

}

.

Similarly, the fault probability PSPE ( F | xk ) and conditional probabilities ( PSPE ( xk | N ) and PSPE ( xk | F ) ) of the SPE statistic can also be calculated according to Eqs. (11) to (14) by changing I 2 statistic into corresponding SPE statistic. The final statistic BIC SPE ( x ) of SPE can be written as follows: B

BICSPE ( x ) = ∑{ b

PSPE ( xb | F ) PSPE ( F | xb )



B b

PSPE ( xb | F )

(16)

}

.

12

ACS Paragon Plus Environment

Page 13 of 44

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

To monitor the process, the confidence limit for BICI ( x ) and BIC SPE ( x ) is set as β . If any 2

value of BICI ( x ) or BIC SPE ( x ) exceeds β , a fault will occur or will indicate that the process is in 2

normal condition. 3.3 Fault detection based on Hellinger distance and Bayesian inference The flow chart of the proposed method is presented in Figure 1, which also shows the implementation procedures. The specific steps are summarized as follows:

Figure 1. Flow chart of HDBICA strategy for process monitoring. Step1: Collect the training data X from the normal plant-wide process and mean-variance normalize the data; Step2: Calculate the HD variables between each variable by using Eq. (8) and establish the HD matrix H;

Step3: Specify a threshold value α ; Step4: Divide the variables on the basis of the values in the HD matrix and threshold value α ; Step5: Build ICA models in each sub-block and estimate the confidence limits of the corresponding statistics;

13

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 44

Step6: Divide the variables of the monitored sample as the result of Step 4 and calculate statistics I 2 and SPE in each sub-block; Step7: Specify the significance level β and combine the results together on the basis of Bayesian inference. Step8: Develop new monitoring statistics BICI ( x ) and BIC SPE ( x ) . Faults can be detected with the 2

monitored samples when BICT ( x ) > β or BICSPE ( x ) > β . 2

3.4 Contribution plots for fault diagnosis Faults can be detected efficiently by observing the variation of the new monitoring statistics. The next issue is correctly diagnosing the root cause of the fault. The most widely used contribution plot is employed here for identifying the responsible variables for the fault6, 16, 29. The contribution rates in each sub-block reflect different variations during fault diagnosis. When a fault occurs, the sub-blocks are distributed into two classes: normal block and abnormal block. The root causes of the fault are obviously in the abnormal blocks; thus, the contribution rates in the fault sub-blocks are combined into the final contribution plot. In this manner, fault diagnosis can be simplified and the possible responsible variables can be narrowed. This approach greatly benefits the final determination of the root causes.

4. Case study This section provides two case studies: a simple numerical example and the TE benchmark process. The two case studies evaluate the performance of the proposed HDBICA method. The ICA, KICA, and other state-of-the-art variant-based methods are also employed for comparison. 4.1 Numerical example 14

ACS Paragon Plus Environment

Page 15 of 44

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

A 10 variable non-Gaussian system is constructed to introduce the proposed method as follows:  x1  1.65 1.43 1.6   e1         x2  1.56 1.21 1.4   s1  e2   x3  = 1.72 2.4 1.8   s2  +  e3          x4  1.68 1.3 1.7   s3  e4   x  1.59 2.2 1.75  e5    5   x6  1.25     x7  1.47  x8  = 1.54     x9  1.84  x  1.38  10  

1.35 1.83  e6   2 1.92 1.36  s4   e7    1.46 1.38  s52  +  e8     2.25 1.42  s62   e9  e10  1.62 1.15

 x11  1.15     x12  1.45  x13  = 1.62     x14  1.74  x  1.32  15  

e11  1.25 1.35    3 1.55 1.25  s1  e12    1.47 1.53  s23  + e13     1.69 1.46  s33  e14  e  1.42 1.56  15  ,

where [ s1

s2

(17)

s3 ] ~ U (−3,3) ; [ s4 T

and 0.1 standard variance;

s5

[e1 ,K , e10 ]

T

s6 ] ~ N ( 0,1) follows a normal distribution with a 0 mean T

and

[e11 ,K, e15 ]

T

are the 0 mean noise that satisfy the

standard normal distribution with standard variance 0.05 and 0.1, respectively. Data vectors x = [ x1 , x2 , x3 ,..., x15 ] consisting of 15 variables and 500 samples are generated under normal conditions

for building the model. The next step after collecting the training data is to divide the blocks by using HD. The HD values between every two variables are displayed in Figure 2. The threshold value α is set as 0.03.Figure 2 shows that the first 5 variables have small HD values, thus indicating that the probability distributions of the variables are similar. These five variables are divided into the same block according to the rule introduced in the last section. Variables 6 to 10 and 11 to 15 are similarly divided into the same blocks, and 3 blocks are constructed. The division result is in accordance with reality, and the division is reasonable. 15

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 44

Figure 2. HD values between each variable. The ICA monitoring model is established in each sub-block, as well as the global ICA model for the entire process, after dividing the blocks. The ICs retained in the dominant subspace in each sub-block is set as 3, and the global ICA model is set as 9. The confidence level 1- β is set to 97.5%. One normal dataset and two fault cases are generated for testing purposes. All the datasets consist of 500 samples, and the details of the two fault cases are presented as follows: Case 1: A step change of 0.2 is added to x1 from Sample 151 to the end. Case 2: A ramp change is introduced from Sample 151 to 350 by adding 0.01( i − 150 ) to x6 , where i is the sample number. The monitoring results of the normal dataset, Cases 1 and 2, and some comparisons are presented in Figures 3, 4, and 5 respectively. Figures 3(a) and 3(b) show the monitoring results of the normal dataset 16

ACS Paragon Plus Environment

Page 17 of 44

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

by ICA and HDBICA, respectively. We can see from the figures that both ICA and HDBICA monitor the case well. The false alarm rates in both methods are low and can be neglected in practical applications. The monitoring results in each sub-block are presented, and false alarms can be neglected.

(a)

(b)

(c)

(d)

17

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 44

(e)

Figure 3. Monitoring results of the normal condition of the simple process: (a) ICA; (b) HDBICA; (c) sub-ICA 1; (d) sub-ICA 2; (e) sub-ICA 3. The fault in Case 1 (Fault 1) is a local fault and only affects variable x 1 . The monitoring results of the fault by ICA and HDBICA are presented in Figures 4(a) and (b), respectively. Figure 4(a) shows that the fault cannot be detected by the conventional ICA method because the local behaviors is not reflected in the global ICA model. Figure 4(b) shows that the fault has been detected by the HDBICA method because the local behavior in block 1 is well reflected by sub-ICA Model 1 and because the results are reflected by multiblock monitoring. The monitoring results in sub-block 1 are also presented in Figure 4(c), which shows that the fault has been successfully detected by the sub-ICA model.

18

ACS Paragon Plus Environment

Page 19 of 44

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

(a)

(b)

(c)

Figure 4. Monitoring results of Fault 1 in the simple process: (a) ICA; (b) HDBICA; (c) sub-ICA 1. The monitoring results of the fault in Case 2 (Fault 2) by using ICA and HDBICA are presented in Figures 5(a) and (b), respectively. Figure 5(a) shows that the fault can be detected by two statistics in ICA; however, the detection delays and non-detection rates are large. The fault is detected after the 250th point and delayed by about 100 points. The non-detection rates are also large. Figure 5(b) shows 19

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 44

that the fault is successfully detected by the SPE statistic in the HDBICA method. The detection delays and non-detection rates are significantly reduced by HDBICA compared with the conventional ICA method. The monitoring results in sub-block 2 are presented in Figure 5(c), which shows that the fault can be successfully indicated in the sub-ICA model.

(a)

(b)

(c)

Figure 5. Monitoring results of Fault 2 in the simple process: (a) ICA; (b) HDBICA; (c) sub-ICA 2. 20

ACS Paragon Plus Environment

Page 21 of 44

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

4.2 TE benchmark process The TE benchmark process was created by Downs and Vogel49 as a realistic industrial process. Various monitoring approaches are applied to this process to test the performance of this approach. As shown in the Supporting Information Figure 1, this system is composed of 5 major operations: a reactor, condenser, compressor, separator, and stripper. A total of 41 measured variables and 12 manipulated variables exist in this process; among these variables, 33 variables (Supporting

Information Table 1) are selected according to Ge et al.24 for process monitoring. A total of 21 different pre-programmed faults (Supporting Information Table 2) are simulated for testing purposes, and 960 samples for each fault are collected6. All faults in the process are introduced from Sample 161. One normal dataset without any fault is also generated with 700 samples. Further details on the TE process can be seen in Chiang et al.6, and the simulation code can be downloaded from http://brahms.scs.uiuc.edu. The variables are divided according to the HD value, which measures the similarity between variables. Appropriate variables that are grouped together have similar statistic characteristics that can improve the efficiency of failure detection. The detailed division results can be seen in the

Supporting Information Table 3. First, the data set collected under normal condition (Fault 0) is analyzed to test the false alarm performance of the proposed method. The monitoring performances of Fault 0 by using ICA and HDBICA are presented in Figures 6(a) and (b), respectively. Both these two monitoring results provide evidence that the TE process is operating under normal condition. The false alarm rates are

21

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 44

low and can be neglected from the aspect of practical application. The monitoring performance is undamaged by introducing the multiblock monitoring strategy.

(a)

(b)

Figure 6. Process monitoring results under normal condition: (a) ICA; (b) HDBICA. The non-detection rates of all 21 faults by using ICA16, KICA21, FSCB42, BSPCA15, DPCA11, and HDBICA are tabulated in Supporting Information Table 4. The proposed HDBICA method obviously outperforms the other 5 methods and has obvious improvements in the faults marked with shadows. Detecting Faults 1, 2, 5, 6, 7, 8, 12, 13, 14 and 18 are easy; thus, these faults are excluded in the discussion. Faults 3, 9, and 15 are small process faults that are normally hidden and cannot be detected. However, the proposed HDBICA can detect Fault 3, which is selected for further analysis in the following section. The conventional ICA method also has difficulty in detecting Faults 4 and 19, and these faults are also typically analyzed to demonstrate the efficiency of the proposed method. Fault 3 is a step change of the D feed temperature (Stream 2), which is a small process fault that cannot be detected by most approaches6. However, the proposed HDBICA method can partly detect 22

ACS Paragon Plus Environment

Page 23 of 44

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Stream 2 by dividing the variables to reveal the hidden fault. The monitoring result of the traditional ICA method is shown in Figure 7(a), which shows that the fault is not detected. The HDBICA monitoring results are shown in Figure 7(b), which shows that the fault is detected after the 190th point. The monitoring result of the fault sub-block are also presented in Figure 7(c) to explain the HDBICA further. The monitoring statistics fluctuate significantly and have two obvious step changes soon after Sample 161. Thus, the variables in Fault sub-block 5 have a strong relationship with Fault 3. To determine the root causes, the proposed contribution plots are made and shown in Figure 7(d), which reveals that Variables 18, 19, and 31 are the most possible responsible variables. The change of the D feed temperature integrated with the behavior of TE process may lead to variations in the stripper temperature, stripper steam flow, and stripper steam valve, and the fault analysis is answerable to reliability.

(a)

(b)

23

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(c)

Page 24 of 44

(d)

Figure 7. Process monitoring results of Fault 3: (a) ICA; (b) HDBICA; (c) sub-ICA 5; (d) contribution plot. A step change can be seen in the reactor cooling water inlet temperature6 in Fault 4. The monitoring statistics corresponding to ICA and HDBICA are shown in Figures 8(a) and 8(b), respectively, which show that the BICI statistics can mostly stay above the confidence limit but the 2

I2

and SPE of ICA cannot. By contrast, HDBICA can detect Fault 4 with a 0% missed detection

rate (Supporting Information Table 4), which is superior to the conventional ICA method. The underlying causes of Fault 4 can be revealed by Figure 8(c), which is the monitoring result of sub-block 2. Once Fault 4 occurs, this sub-block exhibits variation, whereas the other sub-blocks still performs normally. The variables in sub-block 2 are definitely closely related to Fault 4. Figure 8(d) shows the contribution rates of the variables in sub-block 2 for fault analysis. Variable 32 has the biggest contribution rate from the possible responsible variables. Variable 32 is the reactor cooling water flow, and the root cause of Fault 4 is the change of the reactor cooling water inlet temperature

24

ACS Paragon Plus Environment

Page 25 of 44

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

(Supporting Information Tables 1 and 2); thus, the proposed contribution plot can narrow the possible responsible variables successfully.

(a)

(b)

(c)

(d)

Figure 8. Process monitoring results of Fault 4: (a) ICA; (b) HDBICA; (c) sub-ICA 2; (d) contribution plot. Finally, Fault 19, which is an unknown fault in the TE process6, is analyzed. The monitoring results of ICA and HDBICA are presented in Figures 9 (a) and (b). Figure 9(a) shows that the 25

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 26 of 44

non-detection rates are large in the ICA method., and Figure 9(b) shows that HDBICA has good performance and that non-detections are significantly reduced. Two fault sub-blocks in this fault show different levels of variations (Figures 8 (c) and (d)). The performance of the new monitoring statistic BICI 2 is superior to that of any sub-blocks after combining the sub-block results on the basis of

Bayesian inference. The fault analysis is considered because Fault 19 is an unknown fault whose root causes are unknown.

(c)

(d)

(a)

(b) 26

ACS Paragon Plus Environment

Page 27 of 44

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Figure 9. Process monitoring results of Fault 19: (a) ICA; (b) HDBICA; (c) sub-block 4; (d) sub-block 5. In summary, the proposed HDBICA outperforms the traditional ICA method and some recently developed multiblock monitoring methods. The efficiency of HDBICA has been demonstrated in this paper.

5. Conclusions This paper presented a novel multiblock ICA method that divides variables on the basis of HD and combines the monitoring results according to Bayesian inference. The distribution similarities among the variables for sub-block construction are quantitatively measured by the HD, which is a probability statistics method and is concerned with nonlinear, independent, or other relations between variables. The sub-ICA-models are then built in the low-dimensional subspace, and the monitoring results are combined by Bayesian inference. When the fault is detected, the corresponding fault diagnosis is started by the proposed contribution plots. Two case studies have convincingly demonstrated the superiority and practicability of the proposed method. This paper can serve as the basis for future works on the development of probability statistics and multiblock theory for the monitoring of plant-wide processes.

Associated Content Supporting Information Tables 1-4 as mentioned in the text and the control scheme for the Tennessee Eastman process. This information is available free of charge via the Internet at http://pubs.acs.org/.

27

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 28 of 44

Acknowledgments The authors gratefully acknowledge the support of the following foundations: 973 project of China (2013CB733605), National Natural Science Foundation of China (21176073) and the Fundamental Research Funds for the Central Universities.

Notes The authors declare no competing financial interest.

References 1.

Venkatasubramanian, V.; Rengaswamy, R.; Kavuri, S. N., A review of process fault detection and

diagnosis: Part II: Qualitative models and search strategies. Comput. Chem. Eng. 2003, 27, (3), 313-326. 2.

Venkatasubramanian, V.; Rengaswamy, R.; Kavuri, S. N.; Yin, K., A review of process fault

detection and diagnosis: Part III: Process history based methods. Comput. Chem. Eng. 2003, 27, (3), 327-346. 3.

Venkatasubramanian, V.; Rengaswamy, R.; Yin, K.; Kavuri, S. N., A review of process fault

detection and diagnosis: Part I: Quantitative model-based methods. Comput. Chem. Eng. 2003, 27, (3), 293-311. 4.

Ghosh, K.; Natarajan, S.; Srinivasan, R., Hierarchically distributed fault detection and identification

through Dempster–Shafer evidence fusion. Ind. Eng. Chem. Res. 2011, 50, (15), 9249-9269. 5.

Ge, Z.; Song, Z.; Gao, F., Review of recent research on data-based process monitoring. Ind. Eng.

Chem. Res. 2013, 52, (10), 3534-3562. 6.

Chiang, L. H.; Braatz, R. D.; Russell, E. L., Fault detection and diagnosis in industrial systems. 28

ACS Paragon Plus Environment

Page 29 of 44

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Springer, London: 2001. 7.

Qin, S. J.; Yu, J., Recent developments in multivariable controller performance monitoring. J.

Process Control 2007, 17, (3), 221-227. 8.

Yu, J., Nonlinear bioprocess monitoring using multiway kernel localized fisher discriminant

analysis. Ind. Eng. Chem. Res. 2011, 50, (6), 3390-3402. 9.

Qin, S. J., Statistical process monitoring: basics and beyond. J. Chemom. 2003, 17, (8‐9), 480-502.

10. Kourti, T.; MacGregor, J. F., Multivariate SPC methods for process and product monitoring. J. Quality Technology 1996, 28, (4). 11. Ge, Z.; Song, Z., Distributed PCA Model for Plant-Wide Process Monitoring. Ind. Eng. Chem. Res.

2013, 52, (5), 1947-1957. 12. Jiang, Q.; Yan, X., Chemical processes monitoring based on weighted principal component analysis and its application. Chemom. Intell. Lab. Syst. 2012, 119, 11-20. 13. Jiang, Q.; Yan, X., Weighted kernel principal component analysis based on probability density estimation and moving window and its application in nonlinear chemical process monitoring. Chemom. Intell. Lab. Syst. 2013, 127, 121-131. 14. Jiang, Q.; Yan, X., Just‐in‐time reorganized PCA integrated with SVDD for chemical process monitoring. AIChE J. 2014. 60, (3), 949-965. 15. Ge, Z.; Zhang, M.; Song, Z., Nonlinear process monitoring based on linear subspace and Bayesian inference. J. Process Control 2010, 20, (5), 676-688. 16. Lee, J.-M.; Yoo, C.; Lee, I.-B., Statistical process monitoring with independent component analysis. J. Process Control 2004, 14, (5), 467-485. 29

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 30 of 44

17. Schölkopf, B.; Smola, A.; Müller, K.-R., Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput. 1998, 10, (5), 1299-1319. 18. Kano, M.; Tanaka, S.; Hasebe, S.; Hashimoto, I.; Ohno, H., Monitoring independent components for fault detection. AIChE J. 2003, 49, (4), 969-976. 19. Li, R.; Wang, X., Dimension reduction of process dynamic trends using independent component analysis. Comput. Chem. Eng. 2002, 26, (3), 467-473. 20. Hyvärinen, A.; Oja, E., Independent component analysis: algorithms and applications. Neural Networks 2000, 13, (4), 411-430. 21. Lee, J. M.; Qin, S. J.; Lee, I. B., Fault Detection of Non ‐ Linear Processes Using Kernel Independent Component Analysis. Can. J. Chem. Eng. 2007, 85, (4), 526-536. 22. Lee, J.-M.; Yoo, C.; Lee, I.-B., Statistical monitoring of dynamic processes based on dynamic independent component analysis. Chem. Eng. Sci. 2004, 59, (14), 2995-3006. 23. Yoo, C. K.; Lee, J.-M.; Vanrolleghem, P. A.; Lee, I.-B., On-line monitoring of batch processes using multiway independent component analysis. Chemom.Intell. Lab. Syst. 2004, 71, (2), 151-163. 24. Ge, Z.; Song, Z., Process monitoring based on independent component analysis-principal component analysis (ICA-PCA) and similarity factors. Ind. Eng. Chem. Res. 2007, 46, (7), 2054-2063. 25. Ge, Z.; Song, Z., Two-level multiblock statistical monitoring for plant-wide processes. Korean J. Chem. Eng. 2009, 26, (6), 1467-1475. 26. Rashid, M. M.; Yu, J., A new dissimilarity method integrating multidimensional mutual information and independent component analysis for non-Gaussian dynamic process monitoring. Chemom. Intell. Lab. Sys. 2012, 115, (15), 44-58. 30

ACS Paragon Plus Environment

Page 31 of 44

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

27. Rashid, M. M.; Yu, J., Nonlinear and Non-Gaussian Dynamic Batch Process Monitoring Using a New Multiway Kernel Independent Component Analysis and Multidimensional Mutual Information Based Dissimilarity Approach. Ind. Eng. Chem. Res. 2012, 51, (33), 10910-10920. 28. Jiang, Q.; Yan, X., Non-Gaussian chemical process monitoring with adaptively weighted independent component analysis and its applications. J. Process Control 2013, 23, (9), 1320-1331. 29. Jiang, Q.; Yan, X.; Tong, C., Double-Weighted Independent Component Analysis for Non-Gaussian Chemical Process Monitoring. Ind. Eng. Chem. Res. 2013, 52, (40), 14396-14405. 30. Ge, Z.; Song, Z., Performance-driven ensemble learning ICA model for improved non-Gaussian process monitoring. Chemom. Intell. Lab. Syst. 2013, 123, 1-8. 31. Webb, A. R.; Copsey, K. D.; Cawley, G., Statistical pattern recognition. Wiley: 2011. 32. MacGregor, J. F.; Jaeckle, C.; Kiparissides, C.; Koutoudi, M., Process monitoring and diagnosis by multiblock PLS methods. AIChE J. 1994, 40, (5), 826-838. 33. Westerhuis, J. A.; Kourti, T.; MacGregor, J. F., Analysis of multiblock and hierarchical PCA and PLS models. J. Chemom. 1998, 12, (5), 301-321. 34. Qin, S. J.; Valle, S.; Piovoso, M. J., On unifying multiblock analysis with application to decentralized process monitoring. J. Chemom. 2001, 15, (9), 715-742. 35. Choi, S. W.; Lee, I.-B., Multiblock PLS-based localized process diagnosis. J. Process Control 2005, 15, (3), 295-306. 36. Kohonen, J.; Reinikainen, S. P.; Aaljoki, K.; Perkiö, A.; Väänänen, T.; Høskuldsson, A., Multi‐ block methods in multivariate process control. J. Chemom. 2008, 22, (3‐4), 281-287. 37. Zhang, Y.; Ma, C., Decentralized fault diagnosis using multiblock kernel independent component 31

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 32 of 44

analysis. Chem. Eng. Res. Des. 2012, 90, (5), 667-676. 38. Ghosh, K.; Ramteke, M.; Srinivasan, R., Optimal variable selection for effective statistical process monitoring. Comput. Chem. Eng. 2014, 60, 260-276. 39. Srinivasan, R.; Qian, M., State-specific key variables for monitoring multi-state processes. Chem. Eng. Res. Des. 2007, 85, (12), 1630-1644. 40. Arbel, A.; Rinard, I. H.; Shinnar, R., Dynamics and control of fluidized catalytic crackers. 3. designing the control system: Choice of manipulated and measured variables for partial control. Ind. Eng. Chem. Res. 1996, 35, (7), 2215-2233. 41. Tyréus, B. D., Dominant variables for partial control. 1. A thermodynamic method for their identification. Ind. Eng. Chem. Res. 1999, 38, (4), 1432-1443. 42. Tong, C.; Song, Y.; Yan, X., Distributed Statistical Process Monitoring Based on Four-Subspace Construction and Bayesian Inference. Ind. Eng. Chem. Res. 2013, 52, (29), 9897-9907. 43. Jackson, J. E., A user's guide to principal components. Wiley-Interscience: 1991; Vol. 244. 44. Johnson, R. A., Applied multivariate statistical analysis. In Prentice-Hall: New Jersey, 2007. 45. Hellinger, E., Neue Begründung der Theorie quadratischer Formen von unendlichvielen Veränderlichen (New grounds for the theory of quadratic forms of infinitely many variable). Journal of Pure and Applied Mathematics 1909, 136, 210-271. 46. Nikulin, M., Hellinger distance. Encyclopaedia of Mathematics. Kluwer Academic Publishers,U.S.A.

2002. 47. Oosterhoff, J.; van Zwet, W. R., A note on contiguity and Hellinger distance, Selected Works in Probability and Statistics. Springer: 2012. 32

ACS Paragon Plus Environment

Page 33 of 44

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

48. Bishop, C. M.; Nasrabadi, N. M., Pattern recognition and machine learning. Springer, New York: 2006. 49. Downs, J. J.; Vogel, E. F., A plant-wide industrial process control problem. Comput. Chem. Eng.

1993, 17, (3), 245-255.

33

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 34 of 44

List of all figures Figure 1. Flow chart of HDBICA strategy for process monitoring. Figure 2. HD values between each variable. Figure 3. Monitoring results of the normal condition of the simple process: (a) ICA; (b) HDBICA; (c) sub-ICA 1; (d) sub-ICA 2; (e) sub-ICA 3.

Figure 4. Monitoring results of Fault 1 in the simple process: (a) ICA; (b) HDBICA; (c) sub-ICA 1. Figure 5. Monitoring results of Fault 2 in the simple process: (a) ICA; (b) HDBICA; (c) sub-ICA 2. Figure 6. Process monitoring results under normal condition: (a) ICA; (b) HDBICA. Figure 7. Process monitoring results of Fault 3: (a) ICA; (b) HDBICA; (c) sub-ICA 5; (d) contribution plot.

Figure 8. Process monitoring results of Fault 4: (a) ICA; (b) HDBICA; (c) sub-ICA 2; (d) contribution plot.

Figure 9. Process monitoring results of Fault 19: (a) ICA; (b) HDBICA; (c) sub-block 4; (d) sub-block 5.

34

ACS Paragon Plus Environment

Page 35 of 44

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

All figures in the manuscript

Figure 1. Flow chart of HDBICA strategy for process monitoring.

35

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 2. HD values between each variable.

36

ACS Paragon Plus Environment

Page 36 of 44

Page 37 of 44

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

(a)

(b)

(c)

(d)

37

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 38 of 44

(e)

Figure 3. Monitoring results of the normal condition of the simple process: (a) ICA; (b) HDBICA; (c) sub-ICA 1; (d) sub-ICA 2; (e) sub-ICA 3.

38

ACS Paragon Plus Environment

Page 39 of 44

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

(a)

(b)

(c)

Figure 4. Monitoring results of Fault 1 in the simple process: (a) ICA; (b) HDBICA; (c) sub-ICA 1.

39

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(a)

Page 40 of 44

(b)

(c)

Figure 5. Monitoring results of Fault 2 in the simple process: (a) ICA; (b) HDBICA; (c) sub-ICA 2.

40

ACS Paragon Plus Environment

Page 41 of 44

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

(a)

(b)

Figure 6. Process monitoring results under normal condition: (a) ICA; (b) HDBICA.

41

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

(a)

(b) 7

6

5 contribution

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 42 of 44

4

3

2

1

0 0

5

10

15

20

25

30

35

variable number

(c)

(d)

Figure 7. Process monitoring results of Fault 3: (a) ICA; (b) HDBICA; (c) sub-ICA 5; (d) contribution plot.

42

ACS Paragon Plus Environment

Page 43 of 44

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

(a)

(b)

(c)

(d)

Figure 8. Process monitoring results of Fault 4: (a) ICA; (b) HDBICA; (c) sub-ICA 2; (d) contribution plot.

43

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(c)

(d)

(a)

(b)

Page 44 of 44

Figure 9. Process monitoring results of Fault 19: (a) ICA; (b) HDBICA; (c) sub-block 4; (d) sub-block 5.

44

ACS Paragon Plus Environment