Nonlinear Process Monitoring Using Data-Dependent Kernel Global

Oct 21, 2015 - ABSTRACT: A new nonlinear dimensionality reduction method called data-dependent kernel global−local preserving projections (DDKGLPP) ...
1 downloads 5 Views 2MB Size
Subscriber access provided by UNIV OF LETHBRIDGE

Article

Nonlinear process monitoring using datadependent kernel global-local preserving projections Lijia Luo, Shiyi Bao, Jianfeng Mao, and Di Tang Ind. Eng. Chem. Res., Just Accepted Manuscript • DOI: 10.1021/acs.iecr.5b02266 • Publication Date (Web): 21 Oct 2015 Downloaded from http://pubs.acs.org on October 29, 2015

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Industrial & Engineering Chemistry Research is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Nonlinear process monitoring using data-dependent kernel global-local preserving projections Lijia Luo†∗, Shiyi Bao†, Jianfeng Mao†, Di Tang†, †

College of Mechanical Engineering, Zhejiang University of Technology, Engineering Research Center of Process Equipment and Remanufacturing, Ministry of Education, Hangzhou, China

ABSTRACT: In this paper, a new nonlinear dimensionality reduction method called data-dependent kernel global-local preserving projections (DDKGLPP) is proposed and used for process monitoring. To achieve performance improvements, DDKGLPP uses a data-dependent kernel rather than a conventional kernel. A unified kernel optimization framework is developed to optimize the data-dependent kernel by minimizing a data structure preserving index. The optimized kernel can unfold both global and local data structures in the feature space. The data-dependent kernel principal component (DDKPCA) and data-dependent kernel locality preserving projections (DDKLPP) also can be developed under the unified kernel optimization framework. However, unlike DDKPCA and DDKLPP, DDKGLPP is able to preserve both global and local structures of the data set when performing dimensionality reduction. Consequently, DDKGLPP is more powerful in capturing useful data characteristics. A DDKGLPP-based monitoring method is then proposed for nonlinear processes. Its performance is tested in a simple nonlinear system and the Tennessee Eastman (TE) process. The results validate that the DDKGLPP-based method has much higher fault detection rates and better fault sensitivity than those methods based on KPCA, KGLPP, DDKPCA and DDKLPP.



Corresponding Author. Tel.: +86 (0571) 88320349.

E-mail address: [email protected] (L.J. Luo). 1

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1. Introduction

Process monitoring is helpful to maintain normal production and ensure process safety by early warning abnormal operating conditions caused by process faults.1-2 Multivariate statistical process monitoring (MSPM) has attracted a great deal of attention in recent decades.3-6 A common feature of MSPM methods is that they mainly utilize process data and rarely rely on process mechanisms or prior knowledge. Benefiting from this data-driven nature, MSPM is much easier to implement in real processes in comparison with conventional model-based techniques.7 Because the key information for characterizing process status is often covered by the severe coupling among process variables and noise in raw measurement data, MSPM typically employs dimensionality reduction techniques to seek a reduced space, which captures main features of process data, and a residual space, which mainly contains redundant information and noise.4,8 Then, it simultaneously monitors data variations in reduced space and residual space by constructing monitoring statistics to detect process faults.8 Lots of MSPM methods have been developed on the basis of conventional multivariate statistical methods, including principal component analysis (PCA),1 Fisher discriminant analysis (FDA),9 locality preserving projections (LPP),10 neighborhood preserving embedding (NPE),11 etc. Among them, PCA is the most popular one. PCA maps data into a reduced space that preserves most variance information, i.e., global Euclidean structure, of the data set.12 However, PCA ignores the local neighborhood structure of the data set, which is very important for characterizing topological relations among data points. The performance of PCA is thus degraded due to the loss of local data structure.13 LPP and NPE, contrary to PCA, only focus on preserving the local neighborhood structure of the data set,14 while neglects the global data structure. They may map data into a narrow area, resulting in the loss of data variance.13 To faithfully represent data characteristics in a reduced 2

ACS Paragon Plus Environment

Page 2 of 40

Page 3 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

space, several new linear dimensionality reduction methods are developed by combining PCA with LPP or NPE.13,15-19. These methods can preserve both global and local structures of the data set. For instance, Luo13 proposed a global-local preserving projections (GLPP) method, which naturally unifies PCA and LPP under the same framework. Ma et al.18 proposed a local and nonlocal embedding (LNLE) method by integrating PCA with NPE. Miao et al.19 developed a nonlocal structure constrained neighborhood preserving embedding (NSC-NPE) method by adding a nonlocal sub-objective function into the objective function of NPE, which is actually a combination of PCA and NPE. All these methods are verified to have better performance than PCA, LPP or NPE. Applying above linear methods for nonlinear process data may lead to poor and unreliable results, because they are inappropriate to describe the nonlinear relations between process variables. To handle this problem, some nonlinear process monitoring methods have been proposed using the kernel trick, such as kernel principal component analysis (KPCA),20,21 kernel Fisher discriminant analysis (KFDA),22 and so on. These kernel methods carry out corresponding linear algorithms in a higher dimensional feature space, which is nonlinearly related to the input space and implicitly defined by a kernel function. The kernel function plays an important role in kernel methods, and an inappropriate kernel function may result in poor performance. However, it is difficult to choose an optimal kernel function. Most existing kernel methods simply use several standard kernel functions, such as Gaussian and polynomial kernels.23 Actually, a standard kernel function cannot ensure the optimal performance, because it may not reflect main data characteristics. Therefore, considerable effort has been dedicated to optimize parameters in standard kernel functions or develop some new kernel functions.24-29 In particular, a general form of data-dependent kernel was developed in refs 27-29. This data-dependent kernel can be easily optimized according to a specific objective to 3

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

achieve performance improvements.28 By integrating this data-dependent kernel into KPCA, Shao et al.30 have proposed an effective monitoring method for nonlinear processes. Moreover, Weinberger et al.26,31 proposed a semidefinite embedding (SDE) method as a particular form of KPCA. Unlike KPCA, SDE automatically learns an optimized kernel matrix from training data by solving a semidefinite programming (SDP) problem. SDE is more powerful than KPCA in extracting the underlying data manifold.31 Based on SDE, Shao and Rong32 as well as Liu et al.33 have developed two nonlinear process monitoring methods, which have better monitoring abilities than conventional KPCA-based methods. In this paper, a new nonlinear dimensionality reduction method called data-dependent kernel global-local preserving projections (DDKGLPP) is proposed. Firstly, kernel GLPP (KGLPP) is developed as a nonlinear extension of GLPP.13 KGLPP can simultaneously preserve global and local data structures. KPCA and kernel LPP (KLPP) are proved to be two special cases of KGLPP. Then, employing the data-dependent kernel model,28 a unified kernel optimization framework is proposed by minimizing a data structure preserving index. This unified kernel optimization framework is applicable for KGLPP, KPCA and KLPP. The optimized data-dependent kernel can effectively unfold global and local data structures in the feature space. In this way, performance improvements of KGLPP, KPCA and KLPP are realized. Finally, a nonlinear process monitoring method is developed based on DDKGLPP. Its performance is tested in two case studies: a simple nonlinear system and the Tennessee Eastman (TE) process. The results demonstrate that the DDKGLPP-based monitoring method is significantly better than methods based KPCA, KGLPP, DDKPCA and DDKLPP.

4

ACS Paragon Plus Environment

Page 4 of 40

Page 5 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

2. Kernel global-local preserving projections (KGLPP) 2.1. Algorithm description

Global-local preserving projections (GLPP) is a new linear dimensionality reduction algorithm, which can simultaneously preserve global Euclidean structure (i.e., data variance) and local neighborhood structure of the data set.13 To extend the linear GLPP to the nonlinear case, a kernel GLPP (KGLPP) method is developed. Given an m-dimensional data set X = [ x1 , x2 ,L , xn ] ∈ℜm×n with n samples, KGLPP firstly maps X from the input space into a feature space through a nonlinear mapping φ : ℜm → F h .

Denote

the

mapped

data

set

in

the

feature

space

as

φ ( X ) = [φ ( x1 ), φ ( x2 ),L , φ ( xn )] ∈ F h×n , and suppose it is centered to φ ( X ) , i.e., ∑ i =1φ ( xi ) = 0 . n

KGLPP then seeks a low-dimensional representation Y = [ y1 , y2 ,L , yn ] ∈ F l×n (l ≤ h) of φ ( X ) , where yi = V T φ ( xi ) (i = 1, 2,…, n) is the projection of φ ( xi ) and V = [v1 , v2 ,L , v n ] ∈ F h×l is a transfer matrix, such that Y optimally preserves both global and local structures of the data set X. To find each transfer vector v in matrix V, KGLPP minimizes the following objective function

J KGLPP (v ) =η J Local (v ) + (1 − η ) J Global (v )

{ = η (∑ y D y

=

1 η ∑ ij ( yi − y j )2 Wij − (1 − η )∑ ij ( yi − y j ) 2Wˆij 2 i

i

ii

i

T

)

− ∑ ij yiWij y j T − (1 − η )

}

( ∑ y Dˆ y i

{ } = v φ ( X ) {η L − (1 − η ) Lˆ }φ ( X ) v

i

ii

i

T

− ∑ ij yiWˆij y jT

)

(1)

= v φ ( X ) η ( D − W ) − (1 − η )( Dˆ − Wˆ ) φ ( X )T v T

T

T

= v T φ ( X ) M φ ( X )T v where yi = v T φ ( xi ) ∈ℜ , η ∈ [0,1] is a tradeoff coefficient, Wij and Wˆij are weight coefficients that

represent adjacent and nonadjacent relations between xi and xj respectively, W and Wˆ ∈ℜ n×n are weight matrices, D and Dˆ ∈ℜn×n are diagonal matrices with diagonal entries being Dii = ∑ j Wij

5

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 40

and Dˆ ii = ∑ j Wˆij . The sub-objective function J Local = ∑ ij ( yi − y j ) 2 Wij 2 is the weighted sum of the pairwise squared distances between adjacent data points, which is associated with the local structure preserving. Minimizing JLocal forces adjacent data points in X to be mapped nearby in Y. The sub-objective function J Global = −∑ ij ( yi − y j )2 Wˆij 2 is the weighted sum of the pairwise

squared distances between nonadjacent data points, corresponding to the global structure preserving. Minimizing JGlobal forces nonadjacent data points in X to be mapped distantly in Y. Weight coefficients Wij and Wˆij can be defined in the form of “Heat kernel”13,34

 − xi − x j σ1  Wij = e 0 

2

 − xi − x j  σ2 Wˆij = e 0 

2

if x j ∈ Ω( xi ) or xi ∈ Ω( x j )

(2)

otherwise if x j ∉ Ω( xi ) and xi ∉ Ω( x j )

(3)

otherwise

or in the form of “Binary”

1 if x j ∈ Ω( xi ) or xi ∈ Ω( x j ) Wij =  otherwise 0

(4)

1 if x j ∉ Ω( xi ) and xi ∉ Ω( x j ) Wˆij =  otherwise 0

(5)

where σ1 and σ2 are empirical constants, and Ω( x ) denotes the neighborhood of x that is defined by k nearest neighbors. The relative importance of global and local structure preserving is controlled by the tradeoff coefficient η. One can either simply choose a η from 0 to 1, or calculate η by the following principle13

η S Local = (1 − η ) SGlobal ⇒ ηρ ( L) = (1 − η ) ρ ( Lˆ ) ρ ( Lˆ ) ⇒η = ρ ( L) + ρ ( Lˆ )

(6)

where S Local = ρ ( L) and SGlobal = ρ ( Lˆ ) are scales of JLocal(v) and JGlobal(v), ρ(·) denotes the spectral 6

ACS Paragon Plus Environment

Page 7 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

radius of a matrix, matrices L and Lˆ are defined in Eq. (1). The tradeoff coefficient η obtained via Eq. (6) can balance the relative importance of two sub-objective functions JLocal(v) and JGlobal(v). Imposing the constraint v T (ηφ ( X ) H φ ( X )T + (1 − η ) I )v = 1 with H = η D − (1 − η ) Dˆ on Eq. (1), the final optimization problem of KGLPP is min v T φ ( X ) M φ ( X )T v v

(7)

s.t. v T (ηφ ( X ) H φ ( X )T + (1 − η ) I )v = 1 Eq. (7) can be converted to an eigenvector problem

φ ( X ) M φ ( X )T v = λ (ηφ ( X ) H φ ( X )T + (1 − η ) I )v

(8)

⇒ φ ( X )T φ ( X ) Mφ ( X )T v = λφ ( X )T (ηφ ( X ) H φ ( X )T + (1 − η ) I )v

Because eigenvectors of Eq. (8) lie in the span of φ ( x1 ) , φ ( x2 ) , …, φ ( xn ) , they can be expressed as n

v = ∑ α iφ ( xi ) = φ ( X )α

(9)

i =1

where α1, α2, …, αn are expansion coefficients, and α = [α1 , α 2 ,L , α n ]T ∈ℜn . Similar to KPCA, a kernel matrix K ∈ℜn×n is defined for computing dot products of data points in the feature space

K ij = k ( xi , x j ) = φ ( xi ) ⋅ φ ( x j ) = φ ( xi )T φ ( x j )

(10)

K = K − KS − SK + SKS

(11)

and K can be centered to35

where S ∈ ℜn×n is a matrix with all entries equal to 1/n. Combining Eq. (8)~ (11) gives KMK α = λ (η KHK + (1 − η ) K )α

(12)

Let α1, α2, …, αl be eigenvectors of Eq. (12) that correspond to l smallest eigenvalues, namely λ1 ≤ λ2 ≤ …≤ λl. The projections (i.e., score vector) yk = [ yk ,1 , yk ,2 ,L , yk ,l ]T ∈ℜl of kth sample φ ( xk ) onto transfer vectors vj (j =1, …, l) are computed by n

n

i =1

i =1

yk , j = v j ⋅ φ ( xk ) = ∑ α j ,i (φ ( xi ) ⋅ φ ( xk )) = ∑ α j ,i k ( xi , xk ) = k ( X , xk )T α j 7

ACS Paragon Plus Environment

(13)

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 40

where α j = [α j ,1 , α j ,2 ,L , α j ,n ]T ∈ℜn , and k ( X , xk ) is the kth column vector of K . Suppose xnew is a new sample, corresponding to a sample φ ( xnew ) in the feature space. Its projections are n

n

i =1

i =1

ynew, j = v j ⋅ φ ( xnew ) = ∑ α j ,i (φ ( xnew ) ⋅ φ ( xi )) = ∑ α j ,i k ( xnew , xi ) = k ( X , xnew )T α j

(14)

with k ( X , xnew ) being

k ( X , xnew ) = k ( X , xnew ) − Ks − Sk ( X , xnew ) + SKs

(15)

where k ( X , xnew ) = [k ( x1 , xnew ),L , k ( xn , xnew )]T ∈ ℜn and s = [1 n ,L ,1 n] ∈ℜn .

2.2. Two special cases of KGLPP

Two limit values of η lead to two special cases of KGLPP. If choosing η = 1, Eq. (12) is reduced to KLKa = λ KDKa

(16)

which is precisely the eigenvector problem of KLPP, as shown in Eq. (3) in ref 14. This indicates that KLPP is a special case of KGLPP, which focuses on preserving the local data structure but ignores the global data structure. If choosing η = 0 and Wˆ = 1n×n , i.e., Wˆij = 1 , Eq. (8) is reduced to

−φ ( X ) Lˆφ ( X )T v = λ v

(17)

According to Eq. (1), Wˆij = 1 and Dˆ ii = ∑ j Wˆij = n , we have

v T φ ( X ) Lˆφ ( X )T v = v T φ ( X )( Dˆ − Wˆ )φ ( X )T v = ∑ y Dˆ y T − ∑ y Wˆ y T i

i

ii

i

ij

i

= n∑ i yi yiT − ∑ ij yi y Tj n n Denoting φˆ = (1 n ) ∑ i =1φ ( xi ) = 0 and y = (1 n ) ∑ i =1 yi = v T φˆ gives

8

ACS Paragon Plus Environment

ij

j

(18)

Page 9 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

(∑ y ) (∑ y ) − 2n ( ∑ y ) y + n yy

n∑ i yi yiT − ∑ ij yi y Tj = n∑ i yi yiT − = n∑ i yi yiT

i

i

T j

j

2

T

T

i

i

= n∑ i ( yi yiT − 2 yi y T + yy T ) =n

(19)

( ∑ ( y − y )( y − y ) ) T

i

i

i

= nv T

( ∑ (φ ( x ) − φˆ)(φ ( x ) − φˆ) ) v

= nv T

( ∑ φ ( x )φ ( x ) ) v

T

i

i

i

T

i

i

i

Substituting Eq. (19) and Eq. (18) into Eq. (17) yields 1 λ φ ( xi )φ ( xi )T )v = − 2 v = λ%v ∑ i( n n

(20)

Eq. (20) is similar to the eigenvector problem of KPCA (see Eq. (6) in ref 35), except that all eigenvalues multiply by -1/n2. KPCA thus can be viewed as a special case of KGLPP, which aims at preserving the global data structure but neglects the local data structure.

3. Optimizing the data-dependent kernel function 3.1. Constructing a data-dependent kernel

Standard kernels, such as Gaussian, polynomial and sigmoid kernels, can be used in KGLPP. These standard kernels show satisfying performance in many applications, while they may not be the optimal choice for a data set. Therefore, it is necessary to seek an optimized kernel to enhance the performance. In this paper, the data-dependent kernel used in refs 27-29 is chosen as the objective kernel to be optimized. Given a training data set x1 , x2 , …, xn ∈ ℜm , the data-dependent kernel is constructed as28 k ( xi , x j ) = q ( xi ) q ( x j ) k 0 ( xi , x j )

(21)

where k0 (⋅) is a basic kernel selected from standard kernels, and q (⋅) is a factor function of the form28 9

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 40

d

q ( x ) = β 0 + ∑ βi k1 ( x , ai )

(22)

i =1

where k1 ( x , ai ) = exp( − x − ai

2

σ ) is a Gaussian kernel,

combination coefficients (i.e., ∑ i β i2 = 1 ), and

{a ∈ℜ i

m

{βi ∈ ℜ, i = 1,L , d }

are normalized

, i = 1,L , d } are “empirical cores” that can

be chosen from the training data set or determined according to the distribution of training samples.28 Note that the data-dependent kernel in Eq. (22) satisfies the Mercer condition of kernel function.28 Denote kernel matrices that consist of k ( xi , x j ) and k0 ( xi , x j ) as K ∈ℜn×n and K 0 ∈ ℜn×n , respectively. Eq. (21) is rewritten as

K = QK 0Q

(23)

where Q is a diagonal matrix whose diagonal entries are q ( x1 ) , q ( x2 ) , …, q ( xn ) . Let

q = [q ( x1 ), q ( x2 ), L , q ( xn )]T ∈ℜn and β = [ β 0 , β1 , L , β d ]T ∈ℜ d +1 , we have 1 k1 ( x1 , a1 ) 1 k ( x , a ) 1 2 1 q= M M  1 k1 ( xn , a1 )

L k1 ( x1 , ad )   β 0  L k1 ( x2 , ad )   β1  = K1β  M  O M   L k1 ( xn , ad )   β d 

(24)

3.2. Kernel optimization

After constructed a data-dependent kernel K, an objective function is needed to optimize the combination coefficients β. Because features of a data set are closely related to its geometrical structure, an efficient kernel function should fully unfold both global and local structures of the data set in the feature space. In other words, after mapping the data set from the input space into the feature space, the variance of mapped data in the feature space should be maximized, and the neighborhood relations among data points in the input space should be retained in the feature space. To this end, it is better to map those nonadjacent data points in the input space as distant as possible

10

ACS Paragon Plus Environment

Page 11 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

in the feature space, but map those adjacent data points as nearby as possible. Thus, the following objective function should be minimized J1 = η JˆLocal − (1 − η ) JˆGlobal =

{

2 2 1 η ∑ ij φ ( xi ) − φ ( x j ) Wij − (1 − η )∑ ij φ ( xi ) − φ ( x j ) Wˆij 2

}

(25)

or Jˆ J 2 = Local = JˆGlobal

∑ ∑

2

ij

φ ( xi ) − φ ( x j ) Wij

ij

φ ( xi ) − φ ( x j ) Wˆij

2

(26)

where ϕ(·) denotes a nonlinear mapping, Wij, Wˆij and η are the same as those used in Eq. (1). The objective function J1 or J2, named data structure preserving index, is a measure of weighted pairwise squared distances between mapped data points in the feature space. J1 is more flexible than J2, because it uses a tradeoff coefficient η to adjust the tradeoff between JˆGlobal and JˆLocal . However, J2 is more convenient than J1, because it avoids choosing η. We shall show later that objective functions J1 and J2 result in two different optimization problems which are solved in different ways. According to the definition of kernel matrix K, we have 2 1 JˆLocal = ∑ ij φ ( xi ) − φ ( x j ) Wij 2 1 = ∑ ij ( K ii − K ij − K ji + K jj )Wij 2

= ∑i

( ∑ W )K j

ij

ii

(27)

− ∑ ij K ijWij

= ∑ i Dii K ii − ∑ ij K ijWij Substituting Eq. (23) and Eq. (24) into Eq. (27) gives JˆLocal ( β ) = β T K1T [ ( D − W ). ∗ K 0 ] K1β =β T K1T ( L. ∗ K 0 ) K1β where L.*K0 denotes the entry-by-entry product of matrices L and K0. Similarly, we have

11

ACS Paragon Plus Environment

(28)

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

JˆGlobal ( β ) = β T K1T  ( Dˆ − Wˆ ). ∗ K 0  K1β =β T K T ( Lˆ . ∗ K ) K β 1

0

Page 12 of 40

(29)

1

The optimal combination coefficients β* are thus obtained by solving the optimization problem min J1 ( β ) = min β T K1T  (η L − (1 − η ) Lˆ ). ∗ K 0  K1β β β

(30)

s.t. β T β = 1 or

min J 2 ( β ) = min β

β

β T K1T ( L. ∗ K 0 ) K1β β T K1T ( Lˆ . ∗ K 0 ) K1β

(31)

s.t. β T β = 1 Eq. (30) can be easily converted to an eigenvector problem K1T (η L − (1 − η ) Lˆ ). ∗ K 0  K1 β = λβ

(32)

Thus, β* is the eigenvector of Eq. (32) corresponding to the smallest eigenvalue. The general gradient approach is used to solve Eq. (31), which updates β by

β j +1 = β j − δ j

∂J 2 ( β j ) ∂β j

and β j +1 =

β j +1 β j +1

(33)

where δ j = δ 0 (1 − j N ) is the learning rate at jth iteration, δ0 is the initial learning rate, N denotes the prespecified maximum number of iterations. According to Eq. (31), it is easy to get

∂J 2 ( β j ) ∂β j

2 K1T ( L − J 2 ( β j ) Lˆ ). ∗ K 0  K1 = βj β Tj K1T ( Lˆ . ∗ K 0 ) K1β j

(34)

3.3. Data-dependent KGLPP (DDKGLPP)

The KGLPP based on the optimized data-dependent kernel is named as data-dependent KGLPP (DDKGLPP). Thus, data-dependent KPCA (DDKPCA) and data-dependent KLPP (DDKLPP) are two special cases of DDKGLPP. Fig. 1 illustrates the basic idea of DDKGLPP. Suppose eight data points in the input space R2 are divided into two groups {x1, x2, x3, x4} and {x5, x6, x7, x8} according

12

ACS Paragon Plus Environment

Page 13 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

to their distances (see subplot at bottom left). These data points are mapped to {ϕ(x1), ϕ(x2), ϕ(x3), ϕ(x4)} and {ϕ(x5), ϕ(x6), ϕ(x7), ϕ(x8)} in the feature space Fh by the underlying nonlinear mapping ϕ that corresponds to the optimized data-dependent kernel. According to properties of the data-dependent kernel, data points in the same group become closer in the feature space, while farther from those data points in the other group (see subplot at top). In this way, the local and global structures of the data set are fully unfolded in the feature space. Then, we seek a reduced space Fl by performing dimensionality reduction in the feature space using linear PCA, LPP or GLPP (see subplot at bottom right). The projection directions of PCA, LPP and GLPP are indicated by imaginary line, dotted line and dot dash line, respectively. PCA projects data points along the direction of maximum data variance. However, LPP projects data points along the direction of maximum neighborhood information. The projection direction of GLPP is in the middle of PCA and LPP, because GLPP concerns a combination of PCA and LPP. The angle θ is determined by the tradeoff coefficient η. More detailed discussion on the relationships among GLPP, PCA and LPP can be found in ref 13, where application examples in the Swiss-roll and intersecting data sets have been used to illustrate their differences. Because the feature space is nonlinearly related to the input space, linear projections in the feature space become nonlinear in the input space. However, we cannot know these nonlinear projections because of the unknown nonlinear mapping ϕ. In fact, a data-dependent kernel is calculated in the input space to avoid performing the nonlinear mapping ϕ. The input space is thus implicitly associated with the reduced space through this kernel.

13

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 40

4. Nonlinear process monitoring based on DDKGLPP

4.1. Monitoring model development Suppose X = [ x1 , x2 ,K , xn ] ∈ℜm× n is a normalized training data set, where m and n are numbers of process variables and samples. Applying DDKGLPP to X, a monitoring model is built as

K = YP + + E Y = KP

(35)

E = K − YP + where the symbol “+” denotes the Moore-Penrose pseudoinverse, K ∈ℜn×n is the centered data-dependent kernel matrix, P = [α1 , α 2 ,K , α l ] ∈ℜn×l is a loading matrix, α1, α2, …, αl are loading vectors obtained by solving Eq. (12), l is the number of principal components (PCs),

Y = [ y1 , y2 ,K , yn ]T ∈ℜn×l is a score matrix that is calculated via Eq. (13), and E ∈ℜn×n is the residual matrix. A test sample xnew ∈ ℜm can be projected onto the monitoring model by

k ( X , xnew ) = P +T ynew + enew ynew = P T k ( X , xnew )

(36)

enew = k ( X , xnew ) − P +T ynew where

y = [ y1 , y2 ,…, yl ]T ∈ℜl

is the score vector of xnew, k ( X , xnew )

is the centered

data-dependent kernel vector that is computed via Eq. (15) and Eq. (21) using the optimal combination coefficients β* obtained from the training data set.

4.2. Fault detection

T2 statistic and squared prediction error (SPE) statistic are developed to detect faults. T2 and SPE

statistics measure the variations of data in the model space and in the residual space, respectively. The T2 statistic of a sample x is defined as

14

ACS Paragon Plus Environment

Page 15 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

T 2 = yT Λ−1 y

(37)

where y is the score vector of x, and Λ = Y T Y (n − 1) is the covariance matrix of Y. The SPE statistic of a sample x is defined as SPE = k% ( X , x ) − kˆ ( X , x ) % % + )T − ( PP + )T ]k ( X , x ) = [( PP

2

(38)

where k ( X , x ) is the centered data-dependent kernel vector of x, k% ( X , x ) and kˆ ( X , x ) are reconstructions of k ( X , x ) , P is defined in Eq. (35), P% = [α1 , α 2 ,K , α Dim FS ] ∈ℜ n×DimFS is a new loading matrix with DimFS (l < DimFS ≤ n) being the effective dimension of the feature space. Distributions of T2 and SPE statistics of training samples can be estimated by kernel density estimation (KDE)36

1 n  T 2 − Ti 2  fˆ (T 2 ) = ∑k   nh i =1  h 

(39)

1 n  SPE − SPEi  fˆ ( SPE ) = ∑k  nh i =1  h 

(40)

where h is a smoothing parameter, and k(∙) is a kernel function. Then, control limits of T2 and SPE statistics are computed via the inverse cumulative distribution functions of fˆ (T 2 ) and fˆ ( SPE ) , respectively.

4.3. Process monitoring procedure

The DDKGLPP-based monitoring procedure consists of the offline modeling phase and the online monitoring phase: Phase I: Offline modeling (1) Normalize the training data set X to zero mean and unit variance. 15

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(2) Compute the optimal data-dependent kernel matrix. (3) Apply DDKGLPP to X and construct the monitoring model. (4) Compute T2 and SPE statistics of training samples and determine their control limits. Phase II: Online monitoring (1) Collect a new sample xnew and normalize it to zero mean and unit variance. (2) Compute the data-dependent kernel vector k ( X , xnew ) using the optimal combination coefficients β* obtained at step (2) in phase I. (3) Map xnew onto the monitoring model. (4) Compute T2 and SPE statistics of xnew and monitor whether they exceed the control limits.

5. Case studies

To test the performance of DDKGLPP-based monitoring method, two case studies are carried out in a simple nonlinear system and in the Tennessee Eastman (TE) process. In both case studies, the Gaussian kernel k ( x , y ) = exp(− x − y

2

σ ) is used as the basic kernel k0 (⋅) , where σ is the

Gaussian kernel width. One third of training samples are randomly selected to form the empirical core set {ai} in Eq. (22). DDKGLPP is compared with its two special cases, i.e., DDKPCA and DDKLPP. The optimal combination coefficients β* of DDKGLPP are computed by solving Eq. (32), where the tradeoff coefficient η is calculated by Eq. (6). Eq. (33) is applied to compute the optimal combination coefficients β* for DDKPCA and DDKLPP. The initial learning rate δ0 is set to 0.5, and the total iteration number N is 100. Moreover, the KGLPP and KPCA-based monitoring methods are also used for comparison. Since the data-dependent kernel function can be viewed as a modification of the basic kernel k0 (⋅) , the basic kernel k0 (⋅) is used in KGLPP and KPCA for pairwise 16

ACS Paragon Plus Environment

Page 16 of 40

Page 17 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

comparison. For all the five methods, the number of PCs (NPC) and the effective dimension of feature space (DimFS) should be determined when designing monitoring models and calculating monitoring statistics. However, it is not easy to select appropriate values of NPC and DimFS. Two commonly used methods, including cumulative percent variance (CPV) and cross-validation, do not work well in our case studies. Therefore, a two-step empirical method is used to determine NPC and DimFS. Firstly, as in ref 32, a rough value of NPC is selected based on the eigenvalue curve, which corresponds to the “turning point” from sharp decrease/increase to stabilization. If there is no such “turning point” in the eigenvalue curve, NPC is selected by the average eigenvalue approach,37 where eigenvalues with values larger/smaller than the average eigenvalue are all accepted. The initial DimFS is simply set as the number of training samples. Secondly, the rough values of NPC and DimFS are further adjusted by evaluating the monitoring performance for several test faults. Finally, those values having the best monitoring performance are selected.

5.1. A simple nonlinear system

The nonlinear system is20

x1 = t + e1 x2 = t 2 − 3t + e2

(41)

x3 = −t + 3t + e3 3

2

where t ∈ [0.01, 2] and e1 , e2 , e3 ~ N (0, 0.01) are noise. A training data set was generated according to Eq. (41). Four fault data sets corresponding to faults in Table 1 were also generated. Each data set consists of 500 samples. Choosing σ in the basic kernel k0 (⋅) as 15, five monitoring models, namely KPCA, KGLPP, DDKPCA, DDKLPP and DDKGLPP, were constructed. The main information 17

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

about five models is summarized in Table 2. The numbers (k) of neighbors were selected as 2, 10, 10 and 10 for DDKPCA, DDKLPP, DDKGLPP and KGLPP, respectively. For all the five models, the number (l) of PCs and the DimFS were chosen to be 3 and 5, respectively. Weight coefficients in DDKPCA and DDKLPP models were constructed in the form of “Binary”, while the “Heat kernel” weight coefficients were more appropriate for KGLPP and DDKGLPP models. Fault detection rates (FDRs) and false alarm rates (FARs) of five models for four faults are compared in Table 3. Obviously, DDKGLPP and DDKLPP have the largest mean FDR, followed by KGLPP and DDKPCA, and KPCA has the smallest mean FDR. DDKPCA gives much better monitoring results than KPCA, because its mean FDRs are significantly higher than those of KPCA. Similarly, DDKGLPP is also better than KGLPP. This indicates that using the data-dependent kernel can enhance the monitoring performance. DDKGLPP almost has the same mean FDRs as DDKLPP in both T2 and SPE statistics, while the mean FAR of DDKLPP is slightly higher than DDKGLPP. Therefore, the monitoring performance of DDKGLPP is the best, as it has the largest mean FDR and the smallest mean FAR. Fig. 2 shows the monitoring charts of five methods for fault 4. As shown in Fig. 2a, the T2 statistic of KPCA cannot detect the fault, and the SPE statistic detects fault 4 with a large delay until the 431st sample. Fig. 2b shows that T2 and SPE statistics of DDKPCA detect the fault from 400th and 431st samples, respectively. However, after that, many fault samples still stay below control limits of T2 and SPE statistics. Fig. 2c, Fig. 2d and Fig. 2e illustrate that T2 and SPE statistics of DDKLPP, DDKGLPP and KGLPP detect the fault from about the 357th sample, which is much earlier than KPCA and DDKPCA. In particular, T2 and SPE statistics of DDKLPP and DDKGLPP rise rapidly and almost fully exceed their control limits after the 431st sample, indicating a better monitoring 18

ACS Paragon Plus Environment

Page 18 of 40

Page 19 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

reliability than KPCA and DDKPCA. The performance of DDKGLPP is better than DDKLPP and KGLPP because of its fewer false alarms or higher fault detection rate.

5.2. Tennessee Eastman process

The Tennessee Eastman (TE) process is widely used as a benchmark simulation for testing monitoring methods.2,38 Fig. 3 shows a flowchart of TE process with the control structures recommended by Lyman and Georgakis.39 The TE process mainly contains five unit operations: a reactor, a condenser, a compressor, a separator and a striper.38 There are 41 measurement variables, including 22 continuous measurements and 19 composition measurements, and 12 manipulated variables in the TE process. Detailed descriptions of the TE process can be found in refs 38 and 39. This

case

study

uses

the

simulation

data

from

Braatz’s

home

page

(http://web.mit.edu/braatzgroup/index.html). These simulation data includes a training data set obtained under normal operating conditions and 21 fault data sets generated with the fault modes in Table 4. Each data set has 960 samples, and each fault occurred at the 160th sample. As listed in Table 5, 22 continuous measurements and 11 manipulated variables are selected as monitoring variables. The remaining 19 composition measurements, which are unable to measure online, and an uncontrolled manipulated variable, i.e., agitation speed, are excluded. In this case study, the parameter σ in the basic kernel k0 (⋅) was set to 27D, where D is the dimension of input space, i.e., D = 33. Table 6 summarizes the main information of five monitoring models. Numbers (k) of neighbors were set as 10 for DDKPCA, DDKLPP, DDKGLPP and KGLPP models. DDKPCA and DDKLPP use the “Binary” weight coefficients, while the “Heat kernel” weight coefficients were used in KGLPP and DDKGLPP. 19

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Table 7 lists FDRs of five methods for 21 faults in the TE process. The mean FDRs and mean FARs of all faults for five methods are compared in Table 8. Note that faults 3, 9 and 15 are small faults that impose very little disturbance on monitoring variables, and therefore they are difficult to detect.2,40 KPCA, KGLPP, DDKPCA and DDKLPP almost fail to detect these three faults as their FDRs are very low. However, the T2 statistic of DDKGLPP can detect fault 15 with a FDR of 22%. Five methods show similar performance for those easily detectable faults, including faults 1, 2, 4, 6-8, 12-14, 17 and 18. The improvement of DDKGLPP is significant for faults 5, 10, 11, 16, 19-21. For these faults, either T2 or SPE statistic of DDKGLPP gives much larger FDRs than those of KPCA, KGLPP, DDKPCA and DDKLPP. There is a clear improvement of using the data-dependent kernel. Taking faults 4, 10, 11, 16, 19-21 as examples, FDRs of the SPE statistic of DDKPCA are significantly increased as compared to KPCA. The performance of DDKGLPP is also better than KGLPP. In overall, DDKGLPP provides the best monitoring results, because it has much larger mean FDRs than other four methods. Monitoring charts of five methods for two representative faults, i.e., fault 10 and fault 19, are shown in Fig. 4 and Fig. 5. Fig. 4 shows that all monitoring statistics have similar variation trends. The main differences among five methods lie in the period between sample 350 and sample 650 as well as the end of the process. In these two periods, many samples in the monitoring charts of KPCA, KGLPP, DDKPCA and DDKLPP fall below the control limits, especially for the T2 statistics of KPCA and DDKPCA as well as the SPE statistic of DDKLPP. As a result, FDRs of KPCA, KGLPP, DDKPCA and DDKLPP are relatively small. However, as shown in Fig. 4d, T2 and SPE statistics of DDKGLPP response quickly to the occurrence of faults, and only a small number of samples fall below the control limit of T2 statistic in the fault condition, resulting in a higher FDR. All monitoring 20

ACS Paragon Plus Environment

Page 20 of 40

Page 21 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

charts in Fig. 5 correctly indicate that the fault 19 persists from the 161st sample to the end of the process. As shown in Fig. 5a-c and Fig. 5e, however, almost a half of fault samples are missed by KPCA, DDKPCA, DDKLPP and KGLPP. Fig. 5d indicates that both T2 and SPE statistics of DDKGLPP can detect most of fault samples. Fig. 6 compares score plots of first two PCs of the normal data set and fault data set 19, where the 99% confidence ellipsoid (CE) of normal data is plotted. The reason for choosing the first two PCs is that they are associated with two “largest” eigenvalues and thus more representative than other PCs. Obviously, KPCA maps all data points in a larger area to maximize the data variance, because it only aims at preserving the global data structure. As a result, KPCA fails to clearly separate fault data from normal data, since most of fault data are located inside the CE of normal data (see Fig. 6a). As shown in Fig. 6b, DDKPCA has similar results as KPCA. Although DDKPCA takes the local data structure into account when constructing the data-depended kernel, this advantage is partly suppressed by only mapping data along the direction of maximum variance when applying PCA for dimensionality reduction. Although the first two PCs of KPCA and DDKPCA fail to separate fault data, their T2 statistics still can detect fault 19 (see Fig. 5). This is because the T2 statistic is computed based on 26 PCs. However, the first two PCs only reflect a small part of all information of 26 PCs. DDKLPP projects normal data into a narrow area, because only the local neighborhood structure of the data set is preserved in the dimensionality reduction process. Fortunately, DDKLPP partly separates fault data form normal data by mapping some fault data outside the CE of normal data, as shown in Fig. 6c. Different from DDKPCA and DDKLPP, DDKGLPP preserves both global and local data structures in processes of constructing the data-depended kernel as well as performing dimensionality reduction. Thus, normal data are mapped into a smaller area but fault data are 21

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

scattered over a larger area. Consequently, most of fault data are located outside the CE of normal data, and thus well separated from normal data, as shown in Fig. 6d. Fig. 6e shows that KGLPP also can separate most fault data from normal data. However, its performance is not as good as DDKGLPP, because the detected number of fault samples seems to be fewer than that of DDKGLPP. The above results demonstrate that both global and local data structures contain important process information that is helpful for process monitoring. Therefore, a part of fault information may be lost if either the global or local data structure is destroyed. This may explain why KPCA, DDKPCA and DDKLPP provide poor monitoring results. Although KGLPP takes both global and local data structures into account, its monitoring performance is still limited in that it does not use an optimal kernel. By simultaneously using an optimized data-dependent kernel and taking both global and local data structures into account, DDKGLPP has the advantage to fully capture the underlying data characteristics that are closely related to process faults. DDKGLPP is therefore more sensitive to data changes induced by faults, achieving performance improvements.

6. Conclusion

A new nonlinear dimensionality reduction method called data-dependent kernel global-local preserving projections (DDKGLPP) was proposed. DDKGLPP has two outstanding advantages. Firstly, DDKGLPP uses a data-dependent kernel rather than a standard kernel for achieving performance improvements. A unified kernel optimization framework is developed to optimize the data-dependent kernel by minimizing a data structure preserving index. This optimization framework is also applicable for KPCA and KLPP, resulting in DDKPCA and DDKLPP methods. The optimized 22

ACS Paragon Plus Environment

Page 22 of 40

Page 23 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

data-dependent kernel can unfold data structures in the feature space more effectively than standard kernels, such as Gaussian and polynomial kernels. Secondly, different from DDKPCA and DDKLPP, DDKGLPP aims at preserving both global and local data structures when performing dimensionality reduction. DDKPCA and DDKLPP actually are two special cases of DDKGLPP with some particular parameters. Therefore, DDKGLPP is able to capture more useful data characteristics. A nonlinear process monitoring method was then proposed based on DDKGLPP. Two case studies demonstrate that the DDKGLPP-based monitoring method significantly outperforms those methods based KPCA, KGLPP, DDKPCA and DDKLPP in terms of higher fault detection rate and better fault sensitivity.

Acknowledgements

This study was supported by the National Natural Science Foundation of China (No. 61304116) and the Zhejiang Provincial Natural Science Foundation of China (No. LQ13B060004).

References

(1) Nomikos, P.; MacGregor, J. F. Monitoring batch processes using multiway principal component analysis. AIChE J. 1994, 40, 1361–1375. (2) Chiang, L. H.; Braatz, R. D.; Russell, E. L. Fault Detection and Diagnosis in Industrial Systems; Springer: New York, 2001. (3) Ge, Z.; Song, Z.; Gao, F. Review of recent research on data-based process monitoring. Ind. Eng. Chem. Res. 2013, 52, 3543−3562. (4) Qin, S. J. Survey on data-driven industrial process monitoring and diagnosis. Annu. Rev. Control

2012, 36, 220–234.

23

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(5) MacGregor, J. F.; Cinar, A. Monitoring, fault diagnosis, fault-tolerant control and optimization: Data driven methods. Comput. Chem. Eng. 2012, 47, 111–120. (6) Yao, Y.; Gao, F. A survey on multistage/multiphase statistical modeling methods for batch processes. Annu. Rev. Control 2009, 33, 172–183. (7) Qin, S. J. Statistical process monitoring: basics and beyond. J. Chemometrics 2003, 17, 480–502. (8) MacGregor, J. F.; Kourti, T. Statistical process control of multivariate processes. Control Eng. Pract. 1995, 3, 403–414. (9) Yu, J. Localized fisher discriminant analysis based complex process monitoring. AIChE J. 2011, 57, 1817−1828. (10) Hu, K. L.; Yuan, J. Q. Multivariate statistical process control based on multiway locality preserving projections. J. Process Control 2008, 18, 797–807. (11) He, X. F.; Cai, D.; Yan, S. C.; Zhang, H. J. Neighborhood preserving embedding. In Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV’05), 17–21 Oct. 2005, Beijing, China, IEEE Computer Society, Washington, DC, 2005. (12) Jolliffe, I. T. Principal component analysis; Springer: New York, 2002. (13) Luo, L. Process monitoring with global-local preserving projections. Ind. Eng. Chem. Res. 2014, 53, 7696–7705. (14) He, X. F.; Niyogi, P. Locality preserving projections. In Proceedings of the Conference on Advances in Neural Information Processing Systems, 8–13 Dec. 2003, Vancouver, Canada, MIT Press, Cambridge, MA, 2004. (15) Luo, L.; Bao, S.; Gao, Z.; Yuan, J. Batch process monitoring with tensor global-local structure analysis. Ind. Eng. Chem. Res. 2013, 52, 18031–18042. 24

ACS Paragon Plus Environment

Page 24 of 40

Page 25 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

(16) Luo, L.; Bao, S.; Gao, Z.; Yuan, J. Tensor global-local preserving projections for batch process monitoring. Ind. Eng. Chem. Res. 2014, 53, 10166–10176. (17) Zhang, M.; Ge, Z.; Song, Z.; Fu, R. Global–local structure analysis model and its application for fault detection and identification. Ind. Eng. Chem. Res. 2011, 50, 6387–6848. (18) Ma, Y. X.; Song, B.; Shi, H. B.; Yang, Y. W. Fault detection via local and nonlocal embedding. Chem. Eng. Res. Des. 2015, 94, 538–548. (19) Miao, A. M.; Ge, Z. Q.; Song, Z. H.; Shen, F. F. Nonlocal structure constrained neighborhood preserving embedding model and its application for fault detection. Chemom. Intell. Lab. Syst. 2015, 142, 184–196. (20) Lee, J.-M.; Yoo, C.; Choi, S. W.; Vanrolleghem, P. A.; Lee, I.-B. Nonlinear process monitoring using kernel principal component analysis. Chem. Eng. Sci. 2004, 59, 223–234. (21) Choi, S. W.; Lee, C.; Lee, J.-M.; Park, J. H.; Lee, I.-B. Fault detection and identification of nonlinear processes based on kernel PCA. Chemom. Intell. Lab. Syst. 2005, 75, 55–67. (22) Yu, J. Nonlinear bioprocess monitoring using multiway kernel localized fisher discriminant analysis. Ind. Eng. Chem. Res. 2011, 50, 3390–3402. (23) Müller, K.-R.; Mika, S.; Rätsch, G.; Tsuda, K.; Schölkopf, B. An introduction to kernel-based learning algorithms. IEEE T. Neural Networ. 2001, 12, 181–201. (24) Vapnik, V. Statistical Learning Theory; Wiley: New York, 1998. (25) Zhang, D.; Chen, S.; Zhou, Z.-H. Learning the kernel parameters in kernel minimum distance classifier. Pattern Recogn. 2006, 39, 133–135. (26) Weinberger, K. Q.; Sha, F.; Saul, L. K. Learning a kernel matrix for nonlinear dimensionality reduction. In Proceedings of the Twenty-First International Conference on Machine Learning Banff, 25

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Alberta, Canada, 2004, (27) Amari, S.; Wu, S. Improving support vector machine classifiers by modifying kernel functions. Neural Networks 1999, 12, 783–789. (28) Xiong, H.; Swamy, M. N. S.; Ahmad, M. O. Optimizing the kernel in the empirical feature space. IEEE T. Neural Networ. 2005, 16, 460–474. (29) Chen, B.; Liu, H.; Bao, Z. Optimizing the data-dependent kernel under a unified kernel optimization framework. Pattern Recogn. 2008, 41, 2107–2119. (30) Shao, J. D.; Rong, G.; Lee, J. M. Learning a data-dependent kernel function for KPCA-based nonlinear process monitoring. Chem. Eng. Res. Des. 2009, 87, 1471–1480. (31) Weinberger, K. Q.; Saul, L. K. Unsupervised learning of image manifolds by semidefinite programming. Int. J. Comput. Vision, 2006, 70, 77–90. (32) Shao, J. D.; Rong, G. Nonlinear process monitoring based on maximum variance unfolding projections. Expert Syst. Appl. 2009, 36, 11332–11340. (33) Liu, Y. J.; Chen, T.; Yao, Y. Nonlinear process monitoring and fault isolation using extended maximum variance unfolding. J. Process Control 2014, 24, 880-891. (34) Belkin, M.; Niyogi, P. Laplacian eigenmaps and spectral techniques for embedding and clustering. In Proceedings of the Conference on Advances in Neural Information Processing Systems 14, 2001, Vancouver, Canada, MIT Press, Cambridge, MA, 2002. (35) Schölkopf, B.; Smola, A. J.; Müller, K.-R. Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation 1998, 10, 1299–1319. (36) Martin, E. B.; Morris, A. J. Non-parametric confidence bounds for process performance monitoring charts. J. Process Control 1996, 6, 349–358. 26

ACS Paragon Plus Environment

Page 26 of 40

Page 27 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

(37) Valle, S.; Li, W.; Qin, S. J. Selection of the number of principal components: the variance of the reconstruction error criterion with a comparison to other methods. Ind. Eng. Chem. Res. 1999, 38, 4389–4401. (38) Down, J. J.; Vogel, E. F. A plant-wide industrial process control problem. Comput. Chem. Eng.

1993, 17, 245–255. (39) Lyman, P. R.; Georgakis, C. Plant-wide control of the Tennessee Eastman problem. Comput. Chem. Eng. 1995, 19, 321–331. (40) Lee, J.; Qin, S. J.; Lee, I.-B. Fault detection and diagnosis based on modified independent component analysis. AIChE J. 2006, 52, 3501–3514.

27

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 28 of 40

List of Tables Table 1. Four faults in the nonlinear system Fault No. Fault variable Start time

Description

Fault type

increase of x1 by 0.005(k−250) 1

x1

251st sample

Slow drift (k is the sample number)

2

x2

251st sample increase of x2 by −1

3

x3

251st sample

4

t

251st sample decrease of t by −0.5

Step

x3 = −1.2t 3 + 3.1t 2 + e3

Random Step

Table 2. Model parameters and settings for five monitoring models Model

k

l

DimFS

η

Weight type

KPCA

-

3

5

-

-

10 3

5

DDKPCA

2

3

5

0

Binary

DDKLPP

10 3

5

1

Binary

DDKGLPP 10 3

5

KGLPP

0.850 Heat kernel

0.957 Heat kernel

Table 3. Fault detection results of five methods for faults in the nonlinear system KPCA T2 SPE

KGLPP

DDKPCA

DDKLPP

DDKGLPP

T2

SPE

T2

SPE

T2

SPE

T2

SPE

Fault 1 58

100

100

100

100

100

100

100

100

92

Fault 2 32

46

71

70

63

57

76

74

76

72

Fault 3 51

11

44

44

49

51

48

48

48

49

Fault 4

0

17

30

31

19

9

38

34

38

38

Mean FDR (%)

35

44

61

61

58

54

66

64

66

63

Mean FAR (%)

1

1

1

1

3

3

3

3

1

1

FDR (%)

*Bold values present the best performance.

28

ACS Paragon Plus Environment

Page 29 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Table 4. Faults in the TE process No.

Fault variable

Type

1

A/C feed ratio (stream 4)

Step

2

B composition (stream 4)

Step

3

D feed temperature (stream 2)

Step

4

Reactor cooling water inlet temperature

Step

5

Condenser cooling water inlet temperature

Step

6

A feed loss (stream 1)

Step

7

C header pressure loss-reduced availability (stream 4) Step

8

A, B, C feed composition (stream 4)

Random

9

D feed temperature (stream 2)

Random

10

C feed temperature (stream 4)

Random

11

Reactor cooling water inlet temperature

Random

12

Condenser cooling water inlet temperature

Random

13

Reaction kinetics

Slow drift

14

Reactor cooling water valve

Sticking

15

Condenser cooling water valve

Sticking

16~20 Unknown

Unknown

21

Constant

The valve for stream 4 was fixed at the steady state position

29

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48

Table 5. Monitoring variables in the TE Process No. Variable name

No. Variable name

No. Variable name

1

A feed (stream 1)

12

Product separator level

23

D feed flow (stream 2)

2

D feed (stream 2)

13

Product separator pressure

24

E feed flow (stream 3)

3

E feed (stream 3)

14

Product separator underflow

25

A feed flow (stream 1)

4

A and C feed (stream 4)

15

Stripper level

26

A and C feed flow (stream 4)

5

Recycle flow (stream 8)

16

Stripper pressure

27

Compressor recycle value

6

Reactor feed rate (stream 6)

17

Stripper underflow (stream 11)

28

Purge valve (stream 9)

7

Reactor pressure

18

Stripper temperature

29

Separator pot liquid flow (stream 10)

8

Reactor level

19

Stripper steam flow

30

Stripper liquid product flow (stream 11)

9

Reactor temperature

20

Compress work

31

Stripper steam valve

10

Purge rate (stream 9)

21

Reactor cooling water outlet temperature

32

Reactor cooling water valve

11

Product separator temperature 22

Separator cooling water outlet temperature 33

30

ACS Paragon Plus Environment

Condenser cooling water flow

Page 30 of 40

Page 31 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Table 6. Model parameters and settings for five monitoring models Model

k

l

DimFS

η

Weight type

KPCA

-

26

960

-

-

KGLPP

10

20

650

0.95

Heat kernel

DDKPCA

10

26

960

0

Binary

DDKLPP

10 438

550

1

Binary

DDKGLPP 10

20

650

0.932 Heat kernel

31

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 32 of 40

Table 7. Fault detection rates (%) of five methods for 21 faults in the TE process KPCA Fault no.

T2

KGLPP

SPE

T2

SPE

DDKPCA

DDKLPP

DDKGLPP

T2

T2

T2

SPE

SPE

SPE

1

99

99

99

99

100

100

100

100

100

100

2

98

98

98

97

98

98

99

99

99

99

3

3

3

0

0

2

7

5

6

2

3

4

100

82

100

93

100

94

99

100

100

99

5

26

100

100

100

26

100

100

100

100

100

6

100

100

100

100

100

100

100

100

100

100

7

100

100

100

100

100

100

100

100

100

100

8

98

98

96

96

98

98

98

98

99

98

9

3

2

1

1

3

5

3

4

2

3

10

46

78

77

80

50

85

70

68

89

75

11

78

62

71

73

78

69

73

73

84

74

12

99

99

97

95

99

100

99

100

100

99

13

95

94

93

92

95

94

95

95

96

95

14

100

100

100

100

100

100

100

100

100

100

15

7

4

4

9

7

7

9

8

22

3

16

29

78

75

23

30

83

58

64

72

75

17

96

88

95

94

96

92

94

94

97

95

18

90

89

90

91

90

90

91

90

92

90

19

57

49

53

69

57

63

56

30

87

76

20

64

61

67

71

63

67

65

62

79

73

21

50

42

51

47

50

45

51

49

59

45

*Bold values present the best performance.

32

ACS Paragon Plus Environment

Page 33 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Table 8. Mean fault detection rates/false alarm rates (%) of 21 faults for five methods KPCA

KGLPP

T2 SPE T2 SPE

DDKPCA DDKLPP DDKGLPP T2

SPE

T2

SPE

T2

SPE

Mean FDR 68

73

75

73

69

76

75

73

80

76

Mean FAR

1

1

1

1

2

3

2

4

2

1

*Bold values present the best performance.

33

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure captions Figure 1. The basic idea of DDKGLPP. Figure 2. Monitoring charts of fault 4 in the nonlinear system. (a) KPCA, (b) DDKPCA, (c) DDKLPP, (d) DDKGLPP, (e) KGLPP.

Figure 3. Flowchart of the TE process. Figure 4. Monitoring charts of fault 10 in the TE process. (a) KPCA, (b) DDKPCA, (c) DDKLPP, (d) DDKGLPP, (e) KGLPP.

Figure 5. Monitoring charts of fault 19 in the TE process. (a) KPCA, (b) DDKPCA, (c) DDKLPP, (d) DDKGLPP, (e) KGLPP.

Figure 6. Score plots of first two PCs for fault 19 and normal data. (a) KPCA, (b) DDKPCA, (c) DDKLPP, (d) DDKGLPP, (e) KGLPP.

34

ACS Paragon Plus Environment

Page 34 of 40

Page 35 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Figure 1. The basic idea of DDKGLPP.

35

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 2. Monitoring charts of fault 4 in the nonlinear system. (a) KPCA, (b) DDKPCA, (c) DDKLPP, (d) DDKGLPP, (e) KGLPP.

36

ACS Paragon Plus Environment

Page 36 of 40

Page 37 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Figure 3. Flowchart of the TE process.

37

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 4. Monitoring charts of fault 10 in the TE process. (a) KPCA, (b) DDKPCA, (c) DDKLPP, (d) DDKGLPP, (e) KGLPP.

38

ACS Paragon Plus Environment

Page 38 of 40

Page 39 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Figure 5. Monitoring charts of fault 19 in the TE process. (a) KPCA, (b) DDKPCA, (c) DDKLPP, (d) DDKGLPP, (e) KGLPP.

39

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 6. Score plots of first two PCs for fault 19 and normal data. (a) KPCA, (b) DDKPCA, (c) DDKLPP, (d) DDKGLPP, (e) KGLPP.

40

ACS Paragon Plus Environment

Page 40 of 40