An Alternative Formulation of PCA for Process Monitoring Using

Dec 30, 2015 - First, a semiparametric Gaussian transformation is proposed to make the .... correlation coefficient and the distance correlation coeff...
0 downloads 0 Views 4MB Size
Article pubs.acs.org/IECR

An Alternative Formulation of PCA for Process Monitoring Using Distance Correlation Hongyang Yu,†,‡ Faisal Khan,*,†,‡ and Vikram Garaniya† †

National Centre for Maritime Engineering and Hydrodynamics, Australian Maritime College, University of Tasmania, Launceston, TAS, Australia ‡ Safety and Risk Engineering Group, Faculty of Engineering and Applied Science, Memorial University of Newfoundland, St. John’s, NL, Canada ABSTRACT: Scale-invariant principal component analysis (PCA) is prevalent in process monitoring because of its simplicity and efficiency. However, a number of limitations are associated with this technique because of underlying assumptions. This article attempts to relax these limitations by introducing three key elements. First, a semiparametric Gaussian transformation is proposed to make the process data follow a multivariate Gaussian distribution, such that the standard PCA can be directly applied to explain the majority of the process data variance. The Gaussian transformation function preserves both important statistical information and the correlation structures of the process data. Second, eigenvectors spanning the feature space are extracted using the Spearman correlation coefficient and the distance correlation coefficient. This feature space is able to retain nonlinear and nonmonotonic correlation structures of the process data. Finally, this technique is computationally more efficient than KPCA, KICA, and improved KICA by avoiding expensive kernel mapping. Semiparametric PCA is tested on two industrial case studies and exhibits satisfactory performance.

1. INTRODUCTION Principal component analysis (PCA) is an efficient algorithm for extracting latent features from high-dimensional data. Three basic steps are involved in the standard PCA feature-extraction procedure. The first step is to compute the covariance matrix from high-dimensional data samples. Eigen decomposition is then applied to the covariance matrix to extract a set of eigenvectors known as the principal components (PCs). In practice, the PCs that explain a significant portion of the variances are retained, whereas the others are discarded. In the third step, the retained PCs form the span of a subspace of lower dimensionality. This subspace is referred to as the feature space. The latent features of the high-dimensional data are revealed in the feature space through subspace projection. The covariance matrix of standard PCA is sensitive to data scaling, which makes it unsuitable for process monitoring.1,2 Process data samples collected under different measurement units can easily disrupt the covariance matrix, leading to the extraction of nonrelevant PCs. This issue is easily addressed by standardizing the process data samples. Consequently, the covariance matrix of the standardized process data becomes the correlation matrix of the original process data, which is scaleinvariant. The PCs are then extracted from the correlation matrix. This particular type of PCA is called scale-invariant PCA and is prevalent in process monitoring.3−6 The correlation matrix in standard scale-invariant PCA is formed by computing the Pearson correlation coefficient between each pair of process variables. The Pearson correlation coefficient provides a strict linear measure of dependence. This substantially limits the accuracy of feature extraction using PCA from process data in which nonlinear dependences are dominating. The eigen decomposition of the correlation matrix determines a set of orthogonal eigenvectors. These eigenvectors © XXXX American Chemical Society

represent the directions of maximum variation within the data space.7 The amount of data variance explained along the direction of each eigenvector is equal to its corresponding eigenvalue. If these eigenvectors are rescaled using their eigenvalues, their lengths then define the spreads of the process data in orthogonal directions. In this respect, the boundary of the process data can be outlined by a perfect hyperellipse whose major axes are the scaled eigenvectors. This implies that the process data must follow a multivariate Gaussian distribution. However, the operations of industrial processes are constantly subjected to disturbances. These disturbances can be external disturbances that are completely random in nature. They can also be internal disturbances such as those induced by closed-loop control actions. Either type of disturbances, combined with nonlinear dependence structures existing between process variables, can make the process data follow a non-Gaussian distribution. The limitation of the Pearson correlation coefficient in modeling nonlinear dependences and the assumption that process data follow a perfect Gaussian distribution adversely affect the process monitoring capabilities of PCA. A number of alternative feature-extraction techniques have been proposed to address this limitation of PCA. Independent component analysis (ICA) was developed to retain the non-Gaussian variation of process data in the feature space. The feature space is spanned by a set of independent components (ICs) that are obtained by seeking the direction of maximum non-Gaussian variance within the process data space.8,9 A major drawback of Received: September 11, 2015 Revised: November 21, 2015 Accepted: December 29, 2015

A

DOI: 10.1021/acs.iecr.5b03397 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Article

Industrial & Engineering Chemistry Research

variables regardless of their relevance to the fault condition can yield degraded performance. Having realized this problem, Jiang and Yan proposed a distributed nonlinear monitoring technique for large-scale process monitoring.18 Furthermore, a statistical pattern analysis based process monitoring technique was reported by Wang and He.19 The non-Gaussian and nonlinear features of the process were captured by exploring high-order moments of the data, such as skewness and kurtosis. This method also used various statistics to describe autocorrelation and cross-correlation between process data samples. The statistics for each window of data samples are collected to form a row of new data samples, known as a statistical pattern (SP). Conventional PCA is then applied to the SPs for fault detection. In addition to the above unsupervised methods, machine-learning-based supervised fault detection and diagnosis methods, such as artificial neural networks (ANNs) and support vector machines (SVMs), have also been investigated.20−27 Most of these techniques require prelabeled process data collected under both normal and faulty operating conditions, and the number of faults identifiable relies heavily on the availability of fault data. Although the methods summarized above have been demonstrated to outperform conventional PCA, they share a number of shortcomings. They are often computationally expensive and also require the delicate tuning of several parameters. In this work, an alternative formulation of PCA is proposed to address the major drawbacks of conventional scale-invariant PCA. An extra step of semiparametric data transformation is introduced after standardization of the process data. The semiparametric data transformation is monotonic. It preserves the statistical information (mean and variance) and the correlation structure of the original data. The transformed process data follow a multivariate Gaussian distribution. In addition, the Pearson correlation measure is replaced by two types of nonlinear correlation measures: the Spearman correlation coefficient and the distance correlation coefficient. The Spearman correlation coefficient is limited to modeling monotonic dependence structures. On the other hand, the distance correlation coefficient is capable of modeling both monotonic and nonmonotonic dependence structures. The eigenvectors for feature space are extracted from the correlation matrix constructed using these two nonlinear correlation coefficients. The process monitoring procedure of the proposed technique is very similar to that of PCA. Online process data samples are first transformed through a semiparametric Gaussian transformation and then projected into the feature space. For fault detection and diagnosis, the T2 and squared prediction error (SPE) statistics are computed and decomposed with respect to the feature space and the residual space. There are four major advantages of the proposed technique. First, the semiparametric Gaussian transformation converts non-Gaussian variations into Gaussian variations with minimum information loss, such that standard scale-invariant PCA can be directly applied to extract more useful latent features. Second, the use of nonlinear correlation measures to form the correlation matrix enables the extracted eigenvectors to retain nonlinear dependence structures in the feature space. The distance correlation is able to model any types of nonlinear relationships between process variables that are not statistically independent. Third, the proposed method is computationally more efficient than many nonlinear and non-Gaussian process monitoring techniques. The computational complexity of the proposed method is equivalent to that of conventional PCA. In

the original ICA approach is that the ICs cannot be ranked according to the amount of variance they explain. Lee et al.10 proposed a modified ICA that retained the ranking of the ICs from the PCA whitening step, thus enabling the correct number of ICs to be selected for latent feature extraction. In terms of online fault detection, each IC can have different levels of sensitivity for fault detection. Jiang and Yan11,12 proposed adaptive weighting methods to quantify the importance of each IC for each monitoring interval. Those ICs having a higher rate of change are given heavier weights, whereas others are assigned less weight in real-time monitoring; this treatment of ICs leads to improvements in the fault detection rate. Nevertheless, modified ICA is still incapable of modeling nonlinear dependence structures. To solve this problem, the kernel extension of ICA known as kernel ICA (KICA) was developed.13 According to this approach, process data samples are first projected into a higher-dimensional kernel space in which the dependence structures between process variables become linear. Standard ICA is then performed in the kernel space for feature extraction. KICA is able to retain both nonlinear and non-Gaussian features in the feature space. This capability makes KICA superior to PCA, ICA, and kernel PCA (KPCA). The kernel matrices of KPCA and KICA are determined by computing the correlation between each pair of samples rather than variables; for a large-scale training data set, some of the training samples might share very similar features. As a result, having a kernel matrix with a rank equal to the number of training data samples might be redundant. To address this issue, an improved KICA technique was developed to retain only those data samples with distinct features in the kernel matrix.14,15 In this manner, the computational effort for building the KICA model can be reduced. In addition, because the nonrelevant data samples are discarded, the improved KICA method can better capture the features of normal process operation, resulting in enhanced process monitoring performance. However, the kernel mapping of process data samples is irreversible and intractable. It is therefore not possible to perform fault diagnosis with KICA through subspace reconstruction. In addition, kernel mapping is computationally expensive, making it less desirable for large-scale processes that have more process variables and require substantially more training process data samples. Furthermore, the selection of kernels has a considerable impact on the process monitoring performance of KICA, although there might not exist a best practice for this selection. Apart from KPCA and KICA, many other feature-extraction and -classification techniques have also been proposed to improve the monitoring of nonlinear or non-Gaussian processes from various aspects. A kernel Gaussian mixture model was developed for chemical process monitoring.16 The Gaussian mixture model can effectively capture non-Gaussian variations in process operation while being integrated with a kernel method, making this method capable of modeling nonlinear relationships between process variables. In addition, a copula-model-based method was studied by Yu et al.17 for complex process monitoring. Copula models rely on a nonlinear correlation measure called Spearman’s correlation to model the complete dependence structure of the process variables. The resulting model is able to capture both nonlinear and non-Gaussian features of the process operation. For many large-scale processes, a fault condition might cause disruptions in monitoring a small number of process variables; many plantwide monitoring techniques that take into account all process B

DOI: 10.1021/acs.iecr.5b03397 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Article

Industrial & Engineering Chemistry Research

then the copula in eq 3 is a Gaussian copula.17,30 In addition, the complete dependence structure of the random variables is also captured in the copula model. This is shown by taking derivatives of both sides of eq 3 to obtain

addition, the proposed technique does not require exquisite tuning of hyperparameters [as is necessary to ensure high performance for KPCA, KICA, statistics pattern analysis (SPA), SVM, neural networks, and so on]; therefore, it has high practical value. The remaining parts of this article are organized as follows: The set of mathematical notations adopted in this article is presented in section 2, followed by a brief introduction of the fundamental concepts necessary for formulating the proposed technique. The complete methodology of the proposed PCA model is then explained in detail in section 3. In section 4, two case studies including a motivational example and the TE benchmark process are used to assess the effectiveness of the performance of the proposed technique. The performance of the proposed technique is also compared with those of PCA, KPCA, and KICA to further demonstrate its strengths. Finally, the major findings of this research and directions for future work are summarized in section 5.

f (x1i , x 2i , ..., xdi)

where c: [0, 1] → [0, 1] is the copula density function measuring the degree of independency among random variables. The Gaussian copula is an important link in demonstrating that the semiparametric Gaussian transformation preserves the mean and variance of the original process data samples. 2.3. Spearman and Distance Correlation Coefficients. Both the Spearman and distance correlation coefficients are able to provide the nonlinear dependence measure between two random variables. To compute the Spearman correlation coefficient, the numerical values of each random variable are first ranked according to their magnitudes. Then, the Spearman correlation coefficient is the Pearson correlation coefficient between the ranked random variables. Let riX denote the rank of the ith numerical value of random variable X. Similarly, riY is the rank of the ith numerical value of random variable Y. Then, the Spearman correlation coefficient between X and Y is calculated as31 N

ρs =

μXr

Note that, for a uniform distribution, the following equality is true P[F(Xj) ≤

=

x 2i ,

...,

xdi]

=

C[F(x1i), i

...,

F(xdi)]

(5)

μYr

r

(6)

r

The Spearman covariance between X and Y based on these two data points, (xi, yi) and (xk, yk), is computed as

(2)

F(x 2i ),

N

diYμY = (rYi − μrY ) = dkYμY = (rYk − μrY )

Based on eqs 1 and 2, the following equality also holds F[x1i ,

N

∑i = 1 (rXi − μrX )2 ∑i = 1 (rYi − μrY )2

y i = (x i)2 = y k = (x k)2

(1)

F(xji)

∑i = 1 (rXi − μrX )(rYi − μrY )

where = = (N + 1)/2. One of the most important assumptions of the Spearman correlation coefficient is that the relationship between the random variables is monotonic. For nonmonotonic relationships, the Spearman correlation coefficient fails in modeling the dependence structure. This is mainly because of the way the covariance is computed in the numerator of eq 5. The covariance is a measure of how much the variation of one random variable agrees with that of the other in reference to their respective means. The variations of a single ranked random variable are quantified as the signed distances between its ranked numerical values and the ranked centroid [dXiμXr = (riX − μXr ) and dYiμYr = (riY − μYr )]. It is assumed that xi and xk are symmetric about the centroid μX, meaning that dXiμX = −dXkμX. It is not difficult to see that the ranked numerical values of xi and xk are also symmetric about the X ranked centroid μrX and diμX rX = −dkμ Additionally, the X. r relationship between X and Y satisfies Y = X2, which is nonmonotonic and nonlinear. Therefore, the following equalities hold

P[F(X1) ≤ F(x1i), F(X 2) ≤ F(x 2i ), ..., F(Xd) ≤ F(xdi)]

F(xji)]

(4) d

2. PRELIMINARIES 2.1. Notation. Every process variable is considered as a continuous random variable denoted as Xj, j ∈ {1, 2, ..., d}, where d is the total number of process variables. The real values each process variable can take are represented by xj. If the process is running normally for N sampling intervals, the process data samples are collected in a data matrix of the form X = {xi}Ni=1 = {xi1, xi2, ..., xid}Ni=1, i ∈ {1, 2, ..., N}, where each column of X stores the measured values of a single process variable. The eigenvectors of the data matrix are represented as {v1, v2, ..., vd} and are ranked according to the magnitude of their respective eigenvalues. The probability density of a single process variable Xj under a probability density function (PDF) f:  → [0, 1] is expressed as f(Xj). Similarly, the cumulative distribution function (CDF) of Xj is defined as F(Xj). For each data sample xij, the probability density and probability are obtained as f(xij) = p(Xj = xij) and F(xij) = p(Xj ≤ xij), respectively. 2.2. Gaussian Copula. A copula is a function that computes the joint probability of a set of continuous random variables under probability integral transforms. The probability integral transform (PIT) of a random variable Xj is in fact its CDF, F(Xj), which follows a standard uniform distribution, F(Xj) ∼ [0, 1]. This condition holds exactly if F(Xj) is the true CDF of Xj. In cases where F(Xj) is estimated through data fitting, this condition holds approximately for a sufficiently large sample size. In this respect, the copula is a joint CDF of a random vector sampled from a unit cube C: [0, 1]d → [0, 1]28,29 = C[F(x1i), F(x 2i ), ..., F(xdi)]

= c[F(x1i), F(x 2i ), ..., F(xdi)]

d

∏ j = 1 f (xji)

cov(X , Y ) = diXμ XdiYμY + dkXμ XdkYμY = diXμ XdiYμY − dkXμ XdkYμY = 0 r

(3)

r

r

r

r

r

r

r

(7)

{xi1,

xi2, ..., xid} F(xij), and a

The joint CDF of x = is decomposed into the univariate marginal CDFs, copula function. If each random variable follows a Gaussian distribution, xij ∼ N(μj, σj),

Consequently, the Spearman correlation coefficient is also zero, even though there is a perfect quadratic relationship between X and Y. The Spearman correlation coefficient is not C

DOI: 10.1021/acs.iecr.5b03397 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Article

Industrial & Engineering Chemistry Research able to model such a relationship because it is not monotonic. This characteristic of the Spearman correlation coefficient could lead to poor nonlinear feature extraction for process monitoring because nonmonotonic relationships such as the quadratic relationship are not uncommon in industrial processes. However, the assumption of a monotonic relationship is a significant relaxation of the assumption of a strict linear relationship for the Pearson correlation coefficient. On the other hand, the distance correlation is more appropriate in modeling nonmonotonic relationships. The variations of a single random variable are quantified as the Euclidean distances between any pair of numerical values. The Euclidean distance is always positive, thus effectively avoiding the problem described by eqs 6 and 7 and enabling the distance correlation coefficient to model nonmonotonic relationships. The distance correlation coefficient is formally defined as follows: Let aik be the Euclidean distance between the ith and kth values of X, aik = ∥xi − xk∥2, i, k ∈ {1, 2, ..., N}, and likewise for random variable Y, bik = ∥yi − yk∥2. aik and bik are used to form two N × N matrices, α, β ∈ N × N . These two matrices are then doubly centered through the procedure A = α − JN × 1 × α̅ . k − αi̅ . × J1 × N − a ̅ B = β − JN × 1 × β.̅ k − βi .̅ × J1 × N − β ̅

(8)

where α̅ .k and β̅.k are the vectors containing column-wise means of α and β; similarly, α̅i. and β̅i. store the row-wise means. JN×1 is the unity matrix of size N × 1. α̅ and β̅ are the grand means of α and β, respectively. The distance correlation coefficient between random variables X and Y is determined as32 N

ρd =

∑i , k = 1 A i , kBi , k N

N

∑i , k = 1 (A i , k)2 ∑i , k = 1 (Bi , k )2

(9)

Note that the distance correlation coefficient has a range between 0 and 1. It is maximized at 1 when Ai,k > 0, Bi,k > 0, ∀i, k ∈ {1, 2, ..., N}, implying that any pair of data samples, (xi, yi) and (xk, yk), are arranged on an “inclined” line locally and in a piecewise manner; that is, the relationship between X and Y is monotonic. Conversely, the correlation coefficient is minimized at zero when Ai,k > 0, Bi,k > 0, ∀i, k ∈ {1, 2, ..., N}, that is, when X and Y are statistically independent. This enables the distance correlation to model any type of relationship (regardless of whether it is monotonic or nonmonotonic), as long as X and Y are not statistically independent.

Figure 1. Logic flow diagram of the proposed semiparametric PCA.

to explain non-Gaussian variances in the feature space. This weakness can be addressed if the process data samples are transformed to follow a multivariate Gaussian distribution without losing important statistical information or compromising the correlation structures. To achieve this, each transformed process variable has to follow a univariate Gaussian distribution. The Gaussian transformation function for a process variable Xj is defined as

3. METHODOLOGY In this section, a semiparametric Gaussian transformation technique is developed based on the Gaussian copula to make the transformed process data follow a multivariate Gaussian distribution. It is also demonstrated that the first two moments and the correlation structures of the original process data are preserved through the transformation. Subsequently, a set of eigenvectors are extracted from the correlation matrix of the transformed process data, determined using the Spearman and distance correlation coefficients. In the last subsection, the process monitoring procedure based on the new PCA model is formulated. To provide a better overview of the methodology, a logic flow diagram explaining each implementation step of the proposed approach is shown in Figure 1. 3.1. Semiparametric Gaussian Transformation. A major drawback of standard scale-invariant PCA is its inability

gj :  →  gj(Xj) ∼ N (μĵ , σĵ 2)

(10)

where μ̂j and σ̂j are the mean and variance, respectively, of the transformed process variable. Then, the joint PDF of the transformed process variables is expressed as 2

f [g1(X1), g2(X 2), ..., gd (Xd)] ∼ N (μ ̂, Σ̂)

(11)

The joint CDF of the transformed process variables is also derived as F[g (x1i), g (x 2i ), ..., g (xdi)] = P[g1(X1) ≤ g (x1i), g2(X 2) ≤ g (x 2i ), ..., gd (Xd) ≤ g (xdi)] D

(12) DOI: 10.1021/acs.iecr.5b03397 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Article

Industrial & Engineering Chemistry Research

because μj = 0 and σj = 1. As the distribution of the original process data is unknown, the CDF, F(Xj), in eq 17 does not have a parametric formit has to be estimated from data. In this study, the empirical distribution function is used to approximate the CDF

According to section 2.2, eq 12 can be decomposed into a copula function and a set of marginal CDFs F[g (x1i), g (x 2i ), ..., g (xdi)] = C {F[g (x1i)], F[g (x 2i )], ..., F[g (xdi)]} (13)

Because gj(Xj) follows a univariate Gaussian distribution, eq 13 can be reorganized into the form F[g (x1i),

g (x 2i ),

...,

g (xdi)]

F(xji) =

⎧ ⎡ g (x i ) − μ ̂ ⎤ ⎪ 1⎥ ⎢ 1 = C⎨ Φ ⎪ ⎥⎦ σ1̂ ⎩ ⎢⎣

⎡ g (x i ) − μ ̂ ⎤ ⎫ ⎡ g (x i ) − μ ̂ ⎤ d 2 d ⎥⎪ 2⎥ ⎬ , Φ⎢ , ..., Φ⎢ ⎢⎣ ⎥⎦ ⎢⎣ ⎥⎦⎪ σ2̂ σd̂ ⎭

(14)

(15)

must be satisfied. The necessary condition for eq 15 to be true is that the first two moments of the original process variable are preserved, meaning that [Xj] = μj = [gj(Xj)] = μĵ and Var[Xj] = σj =

∂gj(Xj) Xj

Var[gj(Xj)] = σ̂j. This condition is confirmed in lemma 1. Lemma 1. The equality F(xi1, xi2, ..., xid) = F[g(xi1), g(xi2), ..., g(xid)] is true if and only if [Xj] = [gj(Xj)] and Var[Xj] =

⎡ g (x i ) − μ ̂ ⎤ j⎥ j j = Φ⎢ ⎢ ⎥ σĵ ⎣ ⎦

σj

(17)

(18)

Var[gj(Xj)] = Var{σj Φ−1[F(Xj)] + μj } (19)

Therefore, the Gaussian transformation proposed in eq 17 preserves the mean and variance of the process variables. ■ Note that if the process data samples are standardized before transformation, the Gaussian transformation in eq 17 can be simplified as

gj(xji) = Φ−1[F(xji)]

(21)

= σj

∂z Xj

(22)

∂F(Xj) (23)

∂z

1 ∂z ∂z ∂F(Xj) f (X j ) > 0 = σj = σj −1 Xj ∂F(Xj) ∂Xj φ{Φ [F(Xj)]}

Because σj > 0, ϕ(·) > 0 and f(Xj) > 0. ■ 3.2. Process Monitoring Based on Semiparametric PCA. The feature space for process monitoring is spanned by a set of eigenvectors extracted from the correlation matrix. In this study, both the Spearman correlation coefficient and the distance correlation coefficient are used to generate the correlation matrix. The number of eigenvectors that need to be extracted is determined through cross-validation.36 During real-time monitoring, an online process data sample is standardized with the mean and variance of the training data samples and then transformed through eq 17. Specifically, a standardized online process data sample (xij) of a process variable (xj) is first compared with its respective training data samples using eq 21 to determine F(xij). Subsequently, this probability is inserted into eq 20, and the van der Waerden normal score transformation procedure is used. The transformed process data sample is then projected into the feature space. Suppose that the feature space is spanned by r PCs, Vr = {v1, v2, ..., Vr}. Then, the subspace projection of a transformed data sample g(xi) is achieved through the following matrix operation

(16)

[gj(Xj)] = {σj Φ−1[F(Xj)] + μj } = {σj Φ−1[F(Xj)]}

= Var{σj Φ−1[F(Xj)]} + Var(μj ) = σj

i j

(24)

Equation 17 is the required form of the Gaussian transformation function. Note that eq 17 is identifiable (one-to-one mapping) if and only if μ̂ j = μj and σ̂j = σj.34 The mean and variance of the transformed process variables can be easily derived as

+ (μj ) = μj

k j

j=1

Application of the chain rule yields

Equation 16 can be reformatted into the form gj(xji) = σĵ Φ−1[F(xji)] + μĵ

∑ 1x ≤ x

Φ′(z) = φ(z) =

Var[gj(Xj)]. Proof. According to Sklar’s theorem,33 for two cumulative CDFs to be equivalent, their marginal CDFs must be equal: F(xij) = F[g(xij)]. This equality combined with eq 14 yields the following necessary condition F(xji)

n

In this respect, eq 17 is in a parametric form but has employed the nonparametric CDF estimation; hence, it is a semiparametric Gaussian transformation. Note that Φ−1[F(xij)] is simply the van der Waerden normal score transformation.35 In addition to persevering the statistical information (mean and variance), the semiparametric transformation must preserve the correlation structures of the original process data as well, such that the feature space constructed from the transformed process data is valid for process monitoring. This necessitates that gj(Xj) be strictly monotonically increasing with respect to Xj. This condition is confirmed in the lemma 2. Lemma 2. The transformation function gj (Xj) is strictmonotonically increasing with respect to Xj. Proof. Let z = Φ−1[F(xj)]. Then

where Φ[·] is the CDF of a standard normal distribution. The copula function in eq 14 is a Gaussian copula, as its marginal CDFs are cumulative Gaussian distributions. For the Gaussian transformation in eq 10 to be valid, the equality F[x1i , x 2i , ..., xdi] = F[g (x1i), g (x 2i ), ..., g (xdi)]

1 n+1

si = Vr T[g (x i)]T

(25)

where s ∈  , r ≤ d. Subsequently, the T statistic of the projected data sample is computed as i

r

Ti 2 = (si)T Λr −1si

(20) E

2

(26) DOI: 10.1021/acs.iecr.5b03397 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Article

Industrial & Engineering Chemistry Research where Λr is a diagonal matrix whose diagonal elements are the first r eigenvalues associated with {v1, v2, ..., vr}. The SPE statistic is also calculated as ̂ V V T[g (x i)]T g (x i ) = r r

(27)

SPEi = {[g (x i)]T − g (x i)}T̂ {[g (x i)]T − g (x i)}̂

(28)

In this case study, three PCA-based process monitoring techniques are applied to the CSTH problem, including standard scale-invariant PCA, semiparametric PCA with the Spearman correlation coefficient, and the proposed semiparametric PCA using the distance correlation coefficient. For each of these techniques, 100 normal data samples are generated for eigenvector extraction. The number of eigenvectors retained is set to 2 across all techniques to ensure consistency in comparison. With regard to semiparametric PCA, the process data samples are first transformed through eq 17, and the marginal CDFs are estimated using the empirical probability function. The density functions of the original process data and the transformed process data are shown in Figure 3. It can be observed that, before transformation, the process variables do not follow a Gaussian distribution, in particular the flow rate (denoted as Flow). After standardization and Gaussian transformation, all three process variables follow standard Gaussian distributions having zero mean and unit variance. Note that the mean and variance of the standardized process variables are 0 and 1, respectively. The first two moments of the process data are preserved through the transformation. This transformation allows standard scale invariance to be effectively applied to extract eigenvectors retaining the majority of the data variance. During real-time process monitoring, the CSTH simulation runs for 200 sample indices and generates 200 online data samples. At sample index 100, two fault conditions are introduced in the outlet temperature and the inlet flow rate of the CSTH. The first fault condition is an increased Gaussian noise with zero mean and 0.05 variance. The second fault condition is a beta noise with shape parameters 4 and 1. These fault conditions can easily propagate throughout the system by means of close-loop control actions to upset the process operation. The process monitoring results of all three techniques are summarized in Table 1. The best process monitoring results are achieved using the proposed semiparametric PCA with distance correlation, as both its T2 and SPE statistics are able to provide fault detection rates of greater than 75%. On the other hand, the T2 statistic of semiparametric PCA based on the Spearman correlation is also able to provide the same level of performance; however, the SPE statistic delivers the lowest fault detection rate of all three techniques. This phenomenon could be due to the combination of the Gaussian transformation with a correlation matrix that is incapable of capturing nonmonotonic correlation structure. The Gaussian transformation converts all process data variations to follow Gaussian distributions. This allows the majority of the process information to be retained in the feature space using the proposed semiparametric PCA. Consequently, there is little information left to be retained in the residual space. In addition, the SPE statistic measuring the reconstruction error is computed through reverse projection based on the eigenvectors. The eigenvectors extracted from the Spearman correlation matrix do not preserve the nonmonotonic relationships between process variables. These factors lead to the poor performance of the SPE statistic of semiparametric PCA using the Spearman correlation matrix. In contrast, standard scaleinvariant PCA offers the worst process monitoring performance overall, with fault detection rates of less than 70% for both statistics. This poor performance is mainly caused by the incapability of this method to explain non-Gaussian variances and to model nonlinear correlation structures. The feature

where g (x i) ̂is the reconstructed process data sample in the original data space. The upper control limits for both statistics are estimated using kernel density estimation.37 For fault diagnosis, the T2 statistic can be decomposed to determine the contributions of the individual process variables Ti 2 = (si)T Λ−r 1si = (si)T Λ−r 1Vr T[g (x i)]T d

=

d

∑ (si)T Λ−r 1ujg(xji) = ∑ contj(Ti 2) j=1

j=1

(29)

where uj is the jth column of VTr . The SPE statistic measures the total reconstruction error of all process variables. The contribution of each process variable to the SPE statistic is simply its corresponding reconstruction error cont j(SPEi) = [g (xji) − g (xji)]2̂

(30)

4. CASE STUDIES Two cases studies are presented in this section to demonstrate the effectiveness of the proposed PCA-based process monitoring model. The first case study is a continuous stirred tank heater (CSTH) model developed by Thornhill et al.,38 and the second case study is the benchmark TE process.39 4.1. Continuous Stirred Tank Heater. The Simulink module of the CSTH was first developed by Thornhill et al.38 The CSTH consists of a steam-heated tank in which hot and cold water are uniformly mixed. There are three monitored variables: tank level, cold water inlet flow rate, and outlet flow temperature. These monitored variables are related to each other through a set of nonlinear functions and are controlled using proportional−integral−derivative (PID) controllers. Consequently, the relationships between the monitored variables are also nonmonotonic. A process schematic of the CSTH is presented in Figure 2. In addition, detailed information on the CSTH model can be found at http:// personal-pages.ps.ic.ac.uk/~nina/CSTHSimulation/index.htm.

Figure 2. Process schematic of the CSTH. F

DOI: 10.1021/acs.iecr.5b03397 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Article

Industrial & Engineering Chemistry Research

Figure 3. Probability density functions (CSTH) of the original process variables (first row) and the transformed process variables (second row).

space of standard scale-invariant PCA is not able to capture most of the data variance. A significant portion of the variance is retained in the residual space, leading to better performance by the SPE statistic in terms of fault detection rate as compared to semiparametric PCA with the Spearman correlation. The process monitoring charts of all three techniques for faults 1 and 2 are presented in Figures 4 and 5, respectively. In terms of fault diagnosis performance, the contribution plots are generated as the total contributions across the first 10

Table 1. Process Monitoring Results for the CSTH Case Study semiparametric PCA (Spearman)

PCA fault detection rate (%) fault 1 fault 2 false alarm rate (%)

semiparametric PCA (distance)

T2

SPE

T2

SPE

T2

SPE

68 61 1

44 60 1

80 82 2

15 21 1

80 80 2

81 75 1

Figure 4. Process monitoring charts for fault 1 of the CSTH case study. G

DOI: 10.1021/acs.iecr.5b03397 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Article

Industrial & Engineering Chemistry Research

Figure 5. Process monitoring charts for fault 2 of the CSTH case study.

Figure 6. Contribution plots for fault 1.

faulty samples for each monitored variable. The contribution plots are presented in Figures 6 and 7. It can be observed that the fault diagnosis performances for all thre techniques are very similar. For the first fault condition, the decomposition of the T2 statistic is able to provide the correct diagnosis, which is abnormal behavior of the temperature. The large increases in both the T2 and SPE statistics for scale-invariant PCA and semiparametric PCA (distance correlation) indicate that this fault condition causes significant systematic variations and the disruption of the correlation

structure. However, the disturbance is introduced as Gaussian noise, which can be easily captured by the T2 statistic, leading to correct fault diagnosis. On the other hand, the SPE statistic detecting non-Gaussian variations and disruptions in the correlation structure in the residual space is less effective in this respect. This observation is also confirmed by the second fault condition, which is introduced as a disturbance with an extremely skewed beta distribution. In this case, the decomposition of the SPE statistic is able to provide the same diagnosis result as the T2 statistic, namely, abnormal H

DOI: 10.1021/acs.iecr.5b03397 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Article

Industrial & Engineering Chemistry Research

Figure 7. Contribution plots for fault 2.

Figure 8. Schematic diagram of the TE process.

behavior of the inlet flow rate. This simple case study demonstrates that the proposed technique is effective in the monitoring of nonlinear processes. 4.2. Tennessee Eastman Process. In this section, semiparametric PCA is applied to the more complex benchmark TE process for further evaluation of its effectiveness. The proposed technique is also compared to KPCA, KICA,

improved KICA, SPA, and moving-window KPCA (MWKPCA),40 in addition to PCA, for performance assessment. The TE process has five major operating units: an exothermic two-phase reactor, a product condenser, a vapor− liquid ash separator, a recycle compressor, and a reboiled product stripper. A schematic of the chemical plant is shown in I

DOI: 10.1021/acs.iecr.5b03397 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Article

Industrial & Engineering Chemistry Research

using the methods described in refs 13 and 41. For Improved KICA, the lower bound for rejecting samples with high similarity is set to 0.0001, and the number of retained process samples is determined to be 463. For SPA, the nonoverlapping window size for training is set to 20; the number of training SPs for SPA is 50. For each SP, 185 statistics are retained according to the selection criteria presented in the work of Wang and He.19 Note that the number of SPs is significantly smaller than the number of statistics in each SP. To build the subspace model, standard PCA is applied to the SPs to extract PCs. During online monitoring, the monitoring window is forwarded one sample at a time to generate an online SP. The T2 and SPE statistics of SP are computed for fault detection. For movingwindow KPCA, the monitoring window also moves forward one sample at a time. A KPCA model is built online for each moving window. However, the monitoring statistics of the data samples of a new moving window are computed based on the KPCA model built 50 windows before. MWKPCA is computationally more expensive than KPCA. The required number of PCs to be retained for the proposed technique is determined to be 11 through cross-validation. For a consistent comparison, the number of PCs for all of the other techniques was set to the same value. For the proposed technique with the Spearman and distance correlations, the training data samples are first transformed through eq 17. Likewise, the probability density functions of process variables X1 (A feed), X10 (purge rate), and X12 (separator level) are presented in Figure 9. The transformed process variables in the previous example follow standard Gaussian distributions. For real-time monitoring, 20 different fault conditions are preprogrammed into the TE process simulation. In this case study, the first 15 fault conditions with known fault types are used to test the proposed process monitoring technique. The last five fault conditions are discarded, as it is difficult to assess whether the correct fault diagnoses are achieved with unknown fault types. These tested fault conditions are summarized in Table 3. The TE process is monitored for 7200 sample indices. All of the fault conditions are introduced at sample index 3000. The

Figure 8. Detailed information on this benchmark process can be found in ref 39. In this case study, 22 continuously measured process variables associated with all five operating units are used for process monitoring. These monitored process variables are summarized in Table 2. Table 2. Monitored Process Variables for the TE Process variable number

variable description

units

X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 X17 X18 X19 X20 X21 X22

A feed (stream 1) D feed (stream 2) E feed (stream 3) A and C feeds (stream 4) recycle flow (stream 8) reactor feed rate (stream 6) reactor pressure reactor level reactor temperature purge rate (stream 9) separator temperature separator level separator pressure separator underflow (stream 10) stripper level stripper pressure stripper underflow (stream 11) stripper temperature stripper steam flow compressor work reactor cooling water outlet temperature condenser cooling water outlet temperature

kscmh kg/h kg/h kscmh kscmh kscmh kPa gauge % °C kscmh °C % kPa gauge m3/h % kPa gauge m3/h °C kg/h kW °C °C

Similarly to the first case study, the process monitoring techniques are first trained with normal process data. In this case study, because of the larger number of process variables, the number of training data samples is set to 1000. For KPCA and KICA, radial basis kernels (RBKs) are used for kernel mapping. The hyperparameters of the RBKs are determined

Figure 9. Probability density functions (TE process) of the original process variables (first row) and the transformed process variables (second row). J

DOI: 10.1021/acs.iecr.5b03397 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Article

Industrial & Engineering Chemistry Research

particular, for IDV11 and -12, where fault detection rates of greater than 90% are achieved. The main reason for this superior performance is the ability of SPA to explore high-order statistics and to take into account autocorrelation and crosscorrelation between process data samples. Semiparametric PCA with the distance correlation matrix offers the second best performance. The first reason for the proposed PCA technique to achieve high performance is that the distance correlation used is able to model any type of nonlinear relationships between process variables as long as they are not statistically independent. In comparison, the performances of KPCA, KICA, and improved KICA in modeling nonlinear relationships rely heavily on the selection of the kernel and its corresponding hyperparameters. A best practice for selecting these parameters might not exist. In addition, kernel mapping is not only expensive but also intractable; therefore, there is no guarantee that all nonlinear relationships between process variables are accommodated in the kernel feature space. These limitations of KPCA, KICA, and improved KICA are the main reasons for their suboptimal performances in the TE process case study. The second reason is that the semiparametric Gaussian transformation converts all types of variations into their Gaussian counterparts while still preserving crucial statistical information (mean, variance, and correlation); this enables more useful features to be retained in the feature space. Likewise, for online monitoring, all of the real-time data samples are transformed through the same semiparametric transformation such that they also follow Gaussian distributions. Note that the mean and the variance of the training data samples are persevered through this online transformation as well. This ensures that the transformed online data samples follow exactly the same Gaussian distribution as the training data. Because the transformation is monotonic and rankpreserving, the relative deviation of the online data from the normal operating region is also preserved. Subsequently, the abnormal deviations of the faulty data samples from normal operating conditions are converted into abnormal deviations

Table 3. Tested Fault Conditions for the TE Process Case Study fault no.

fault description

IDV1

IDV8

A/C feed ratio, B composition constant (stream 4) B composition, A/C feed ratio constant (stream 4) D feed temperature (stream 2) reactor cooling water inlet temperature condenser cooling water inlet temperature A feed loss (stream 1) C header pressure loss, reduced availability (stream 4) A, B, C feed composition (stream 4)

IDV9

D feed temperature (stream 2)

IDV10

C feed temperature (stream 4)

IDV11

reactor cooling water inlet temperature

IDV12

condenser cooling water inlet temperature

IDV13 IDV14 IDV15

reaction kinetics reactor cooling water valve condenser cooling water valve

IDV2 IDV3 IDV4 IDV5 IDV6 IDV7

signal type step step step step step step step random variation random variation random variation random variation random variation slow drift sticking sticking

process monitoring results for the TE process case study are summarized in Table 4. For IDV3, -4, -5, -7, -9, and -15, the PID controllers associated with the process variables relating to the fault conditions quickly correct the abnormal variations and steer the process operation back to normal. This leads to low fault detection rates for all of the techniques. However, the proposed technique with both the Spearman and distance correlations is able to provide double-digit fault detection rates, giving clear indications of the faults. For all other fault conditions, the SPA method yields the best process monitoring performance, in

Table 4. Process Monitoring Results for the TE Process Case Study fault detection rate (%)

PCA fault no. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

KPCA

KICA I2

improved KICA

T2

SPE

T2

SPE

99.69 98.67 2.54 1.35 1.28 99.79 2.37 90.83 1.55 1.68 22.68 33.43 82.57 27.23 0.84

99.87 99.61 1.54 1.06 1.17 100 4.14 93.46 1 40.68 51.34 22.65 94 99.41 1.25

100 99.18 2.75 1.42 1 43.25 5.76 92.36 2.38 21.17 84.31 35.09 91.68 98.52 1.21

100 99.22 0.03 0.38 0 100 4.85 98 0 74.65 54.55 25.07 93.17 95.17 0.5

100 99.23 8.8 1.47 1.38 97.69 2.33 74.89 1.45 67.8 55.21 37.65 91.2 97.5 3.01

100 100 1.5 1.3 1.83 100 5.04 93 1.8 71 79.21 9.88 96.13 100 3.87

1.38

0.8

1.21

1.29

2.8

1.45

SPE

I2

SPE

99.79 99.40 11.95 4.83 3.79 100 13.38 95.19 10.45 88.19 92.05 47.50 94.02 100 4.69

99.90 99.55 15.07 10.76 9.40 100 17.52 95.62 26.45 88.57 94.31 38.83 94.55 100 11.38 False Alarm 2.97 1.63 K

SPA T2 100 99.87 0 0.4 0.21 100 4.95 98.12 2.88 74.45 97.64 66.19 93.26 100 0 Rate [%] 2.47

MWKPCA

semiparametric PCA (Spearman)

semiparametric PCA (distance)

SPE

T2

SPE

T2

SPE

T2

SPE

100 100 2.88 1.71 0.98 100 9.36 97.76 23.81 96.81 98.86 90.98 93.29 100 1.76

36.24 6.95 2.14 3.64 1.55 31.14 2.75 96.05 10.50 74.62 77.86 66.81 91.86 97.29 4.19

97.33 96.88 0.26 0.74 0.21 37.07 1.45 95.64 0.74 68.38 81.48 25.55 90.86 96.86 2.98

99.86 99.55 15.56 13.21 12.76 100 20.07 98.07 24.45 70.9 86.31 51.19 96.1 93.76 13.95

99.36 96.69 13.81 10.81 10.29 100 18.81 97.81 20.67 46.26 90.43 43.52 94.33 54.45 10.52

99.76 99.02 15.55 20.45 19.71 100 27.02 97.90 38.02 86.77 96.31 47.26 94.90 96.68 14.45

99.26 99.60 14.27 22.79 21.83 100 24.32 98.88 37.60 56.12 93.12 46.14 95.86 99.87 12.93

1.87

2.21

1.52

1.2

1.3

1.23

1.65

DOI: 10.1021/acs.iecr.5b03397 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Article

Industrial & Engineering Chemistry Research

Figure 10. TE process monitoring charts for IDV11.

Figure 11. TE process monitoring charts for IDV12.

from the mean under the Gaussian transformation, which can be effectively quantified by the T2 and SPE statistics, leading to improved fault detection performance. The process monitoring charts for IDV11 and IDV12 for all of the compared techniques are shown in Figures 10 and 11, respectively. On the other hand, both KPCA and KICA provide better results than basic PCA. Improved KICA outperforms conventional KICA, particularly for fault conditions IDV3, IDV7, IDV11, and IDV12. This is attributed to the fact that nonrelevant data samples are discarded. The remaining data

samples for improved KICA better capture the distinct features of normal process operation. In contrast, moving-window KPCA, which is also capable of capturing dynamic behavior of process operation, provides unsatisfactory performance. The reason for this deficiency is that, after a fault is injected into the process, as the monitoring window moves forward, faulty data samples are also included to build KPCA for each monitoring window. If the system becomes stabilized, the patterns of faulty data samples in different monitoring windows become very similar to each other. The KPCA models for each monitoring L

DOI: 10.1021/acs.iecr.5b03397 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Article

Industrial & Engineering Chemistry Research

Figure 12. TE process fault diagnosis results for IDV11.

5. CONCLUSIONS In this article, a new formulation of the PCA-based technique is proposed to overcome the major weaknesses of traditional PCA. The major novelties of this formulation are as follows: The semiparametric Gaussian transformation is able to recondition the process data to follow a multivariate Gaussian distribution to be applied while preserving the important statistical information and the correlation structure of the process data. This allows the standard PCA procedure to be effectively applied to explain non-Gaussian variance. In addition, the use of nonlinear correlation measures makes it possible to retain nonlinear correlation structures between process variables in the feature space. Finally, the proposed technique is computationally more efficient than KPCA and KICA. It does not require delicate tuning of hyperparameters as compared to SPA and MWKPCA. Meanwhile, as expensive kernel mapping is avoided, fault diagnosis through reverse projection is also possible with the proposed technique. These strengths of semiparametric PCA were confirmed in two case studies. The results from both case studies demonstrate that the proposed technique is able to provide satisfactory process monitoring performance at a low computation cost. The proposed semiparametric transformation is able to preserve the first two moments of the process. The first two moments are adequate for this study, as the transformed process data follow multivariate Gaussian distributions. However, there is still a significant amount of statistical information stored in the higher-order moments. Additionally, the proposed technique, like conventional PCA, assumes that the latent space is static. Therefore, it is not able to deal with process dynamics and multimode operation. In future work, semiparametric transformation will be improved to preserve the high-order moments of process data. The proposed technique will also be integrated with recursive PCA for dynamic process

window capture very similar features, leading to degraded performance. This is confirmed by the low detection rates for fault conditions IDV1, IDV6, and IDV7, where the system is subject to step faults (static faults) and becomes stabilized shortly thereafter. For varying fault conditions, such as IDV10, IDV11, and IDV12, moving-window KPCA outperforms KICA. The performance of standard PCA was found to be the worst for almost all fault conditions, because of its assumption of ideal process characteristics (linear relationship and pure Gaussian variations). Another major advantage of semiparametric PCA as compared to KPCA, KICA, SPA, and MWKPCA is that fault diagnosis through reverse projection is possible. To demonstrate this, the fault diagnosis results of IDV11 are presented in Figure 12. For IDV11, the increased random variation in the cooling water supply to the reactor tank disrupts the balance of the chemical reaction inside the tank. It is not difficult to infer that the first monitored process variable that is directly affected is the reactor temperature (X9). The second monitored process variable impacted is the reactor cooling water outlet temperature (X21). This analysis conforms with the fault diagnosis results made by semiparametric PCA in which X9 and X21 exhibit significant contributions. In comparison, the SPE statistic is able to identify only the second-most related process variable, and T2 failed completely in fault diagnosis. This case study demonstrates that the proposed semiparametric PCA provides satisfactory performance for relatively complex processes. Although many other techniques similar to SPA might offer better process monitoring performance, the proposed technique is simple and more efficient. These two features establish a firm basis for applying the proposed technique to larger-scale real industrial processes in which there are substantially more process variables. M

DOI: 10.1021/acs.iecr.5b03397 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Article

Industrial & Engineering Chemistry Research

(18) Jiang, Q.; Yan, X. Nonlinear plant-wide process monitoring using MI-spectral clustering and Bayesian inference-based multiblock KPCA. J. Process Control 2015, 32, 38−50. (19) Wang, J.; He, Q. P. Multivariate statistical process monitoring based on statistics pattern analysis. Ind. Eng. Chem. Res. 2010, 49 (17), 7858−7869. (20) Sorsa, T.; Koivo, H. N. Application of artificial neural networks in process fault diagnosis. Automatica 1993, 29 (4), 843−849. (21) Chiang, L. H.; Kotanchek, M. E.; Kordon, A. K. Fault diagnosis based on Fisher discriminant analysis and support vector machines. Comput. Chem. Eng. 2004, 28 (8), 1389−1401. (22) Yélamos, I.; Escudero, G.; Graells, M.; Puigjaner, L. Performance assessment of a novel fault diagnosis system based on support vector machines. Comput. Chem. Eng. 2009, 33 (1), 244−255. (23) Yu, J. A Bayesian inference based two-stage support vector regression framework for soft sensor development in batch bioprocesses. Comput. Chem. Eng. 2012, 41, 134−144. (24) Jack, L.; Nandi, A. Fault detection using support vector machines and artificial neural networks, augmented by genetic algorithms. Mech. Syst. Signal Process. 2002, 16 (2), 373−390. (25) Yu, J. A support vector clustering-based probabilistic method for unsupervised fault detection and classification of complex chemical processes using unlabeled data. AIChE J. 2013, 59 (2), 407−419. (26) Chen, J.; Liao, C.-M. Dynamic process fault monitoring based on neural network and PCA. J. Process Control 2002, 12 (2), 277−289. (27) Gonzaga, J.; Meleiro, L.; Kiang, C.; Maciel Filho, R. ANN-based soft-sensor for real-time process monitoring and control of an industrial polymerization process. Comput. Chem. Eng. 2009, 33 (1), 43−49. (28) Trivedi, P. K.; Zimmer, D. M. Copula Modeling: An Introduction for Practitioners; Now Publishers Inc.: Delft, The Netherlands, 2007. (29) Rüschendorf, L. On the distributional transform, Sklar’s theorem, and the empirical copula process. J. Stat. Plann. Inference 2009, 139 (11), 3921−3927. (30) Liu, H.; Han, F.; Yuan, M.; Lafferty, J.; Wasserman, L. Highdimensional semiparametric Gaussian copula graphical models. Ann. Stat. 2012, 40 (4), 2293−2326. (31) Myers, J. L.; Well, A.; Lorch, R. F. Research Design and Statistical Analysis; Routledge: New York, 2010. (32) Székely, G. J.; Rizzo, M. L.; Bakirov, N. K. Measuring and testing dependence by correlation of distances. Ann. Stat. 2007, 35 (6), 2769−2794. (33) Sklar, A. Random variables, distribution functions, and copulas: A personal look backward and forward. Lect. Notes-Monogr. Ser. 1996, 1−14. (34) Liu, H.; Lafferty, J.; Wasserman, L. The nonparanormal: Semiparametric estimation of high dimensional undirected graphs. J. Mach. Learn. Res. 2009, 10, 2295−2328. (35) Conover, W. J. Practical Nonparametric Statistics, 3rd ed.; John Wiley & Sons: Hoboken, NJ, 1999. (36) Bro, R.; Kjeldahl, K.; Smilde, A.; Kiers, H. Cross-validation of component models: A critical look at current methods. Anal. Bioanal. Chem. 2008, 390 (5), 1241−1251. (37) Botev, Z.; Grotowski, J.; Kroese, D. Kernel density estimation via diffusion. Ann. Stat. 2010, 38 (5), 2916−2957. (38) Thornhill, N. F.; Patwardhan, S. C.; Shah, S. L. A continuous stirred tank heater simulation model with applications. J. Process Control 2008, 18 (3), 347−360. (39) Downs, J. J.; Vogel, E. F. A plant-wide industrial process control problem. Comput. Chem. Eng. 1993, 17 (3), 245−255. (40) Liu, X.; Kruger, U.; Littler, T.; Xie, L.; Wang, S. Moving window kernel PCA for adaptive monitoring of nonlinear processes. Chemom. Intell. Lab. Syst. 2009, 96 (2), 132−143. (41) Lee, J.-M.; Yoo, C.; Choi, S. W.; Vanrolleghem, P. A.; Lee, I.-B. Nonlinear process monitoring using kernel principal component analysis. Chem. Eng. Sci. 2004, 59 (1), 223−234.

monitoring and a mixture model for multimode process monitoring.



AUTHOR INFORMATION

Corresponding Author

*E-mail: fi[email protected]. Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS H.Y. gratefully acknowledges the financial support provided as a graduate study scholarship by the Australian Maritime College and University of Tasmania. F.K. gratefully acknowledges the financial support provided by the Natural Sciences and Engineering Research Council of Canada (NSERC) and Vale Research Chair Grant that enabled this collaborative work.



REFERENCES

(1) Borgognone, M. a. G.; Bussi, J.; Hough, G. Principal component analysis in sensory analysis: Covariance or correlation matrix? Food Qual. Preference 2001, 12 (5), 323−326. (2) Kettaneh, N.; Berglund, A.; Wold, S. PCA and PLS with very large data sets. Comput. Stat. Data Anal. 2005, 48 (1), 69−85. (3) Bakshi, B. R. Multiscale PCA with application to multivariate statistical process monitoring. AIChE J. 1998, 44 (7), 1596−1610. (4) Li, W.; Yue, H. H.; Valle-Cervantes, S.; Qin, S. J. Recursive PCA for adaptive process monitoring. J. Process Control 2000, 10 (5), 471− 486. (5) Wang, X.; Kruger, U.; Irwin, G. W. Process monitoring approach using fast moving window PCA. Ind. Eng. Chem. Res. 2005, 44 (15), 5691−5702. (6) Chen, J.; Liu, K.-C. On-line batch process monitoring using dynamic PCA and dynamic PLS models. Chem. Eng. Sci. 2002, 57 (1), 63−75. (7) Jolliffe, I. Principal Component Analysis. In Encyclopedia of Statistics in Behavioral Science; Everitt, B. S., Howell, D., Eds.; Wiley: New York, 2005; available at http://onlinelibrary.wiley.com/doi/10. 1002/0470013192.bsa501/abstract (Accessed Sept. 6, 2015). (8) Hyvärinen, A.; Oja, E. Independent component analysis: Algorithms and applications. Neural Networks 2000, 13 (4), 411−430. (9) Hyvärinen, A.; Oja, E. A fast fixed-point algorithm for independent component analysis. Neural Comput. 1997, 9 (7), 1483−1492. (10) Lee, J. M.; Qin, S. J.; Lee, I. B. Fault detection and diagnosis based on modified independent component analysis. AIChE J. 2006, 52 (10), 3501−3514. (11) Jiang, Q.; Yan, X. Non-Gaussian chemical process monitoring with adaptively weighted independent component analysis and its applications. J. Process Control 2013, 23 (9), 1320−1331. (12) Jiang, Q.; Yan, X.; Tong, C. Double-weighted independent component analysis for non-Gaussian chemical process monitoring. Ind. Eng. Chem. Res. 2013, 52 (40), 14396−14405. (13) Lee, J. M.; Qin, S. J.; Lee, I. B. Fault detection of non-linear processes using kernel independent component analysis. Can. J. Chem. Eng. 2007, 85 (4), 526−536. (14) Zhang, Y. Fault detection and diagnosis of nonlinear processes using improved kernel independent component analysis (KICA) and support vector machine (SVM). Ind. Eng. Chem. Res. 2008, 47 (18), 6961−6971. (15) Zhang, Y.; Qin, S. J. Improved nonlinear fault detection technique and statistical analysis. AIChE J. 2008, 54 (12), 3207−3220. (16) Yu, J. A nonlinear kernel Gaussian mixture model based inferential monitoring approach for fault detection and diagnosis of chemical processes. Chem. Eng. Sci. 2012, 68 (1), 506−519. (17) Yu, H.; Khan, F.; Garaniya, V. A probabilistic multivariate method for fault diagnosis of industrial processes. Chem. Eng. Res. Des. 2015, 104, 306−318. N

DOI: 10.1021/acs.iecr.5b03397 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX