Simultaneous Fault Detection and Isolation Based on Transfer Semi

Please consult the journal's reference style for the exact appearance of these ... Get another CAPTCHA Get an audio CAPTCHA Get an image CAPTCHA Help...
0 downloads 0 Views 393KB Size
Subscriber access provided by UNIV OF SOUTHERN INDIANA

Process Systems Engineering

Simultaneous Fault Detection and Isolation Based on Transfer Semi-Nonnegative Matrix Factorization Qilong Jia, Yingwei Zhang, and Wen Chen Ind. Eng. Chem. Res., Just Accepted Manuscript • DOI: 10.1021/acs.iecr.9b00030 • Publication Date (Web): 22 Apr 2019 Downloaded from http://pubs.acs.org on April 22, 2019

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 12 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Simultaneous Fault Detection and Isolation Based on Transfer Semi-Nonnegative Matrix Factorization Qilong Jia1,3 , Yingwei Zhang∗1,2 , and Wen Chen3 1. Department of Information Science and Engineering, Northeastern University, Shenyang, 110819, China. 2. State Key Laboratory of Synthetical Automation of Process Industry, Northeastern University, Shenyang, 110819, China. 3. Division of Engineering Technology, Wayne State University, Detroit, MI, 48202, USA.

Fault diagnosis techniques deal with challenging problems such as how to guarantee the safe operation of equipments, improve efficiency and quality of productions, and avoid occurrence of disastrous events. Due to great development and extensive applications of distributed control systems, numerous measurements can be collected from various sensors and stored in the database. Data-driven fault diagnosis techniques aim to extract the useful information from tremendous historical measurements to achieve fault detection and isolation purpose. Traditional data-driven fault diagnosis methods include principal component analysis (PCA)1−5 , partial least squares (PLS)6−11 , Fisher discriminant analysis (FDA)12−15 , independent component analysis (ICA)16−25 , and their variants. As a novel multivariate statistical analysis method, nonnegative matrix factorization (NMF) has been applied to image analysis, data mining, text clustering, voice processing, biomedical engineering, and chemical engineering. For NMF, both the original matrix and factor matrices are restricted to containing nonnegative elements. The NMF is a two-factor matrix factorization algorithm that decomposes the original matrix into two low-rank factor matrices whose product can 26 Index Terms—Fault Detection, Fault Isolation, Data- well approximate the original matrix . The dissimilarity beDriven Process Monitoring, Transfer Semi-Nonnegative Ma- tween the original matrix and its approximation can be measured by the Frobenius matrix norm or the modified Kullbacktrix Factorization. Leibler divergence. Furthermore, the NMF can simply reduce the computation of two low-rank matrices by minimizing the I. I NTRODUCTION dissimilarity between the original matrix and the product of T is really a challenging task to build accurate analyti- two low-rank matrices. In addition, the NMF employs the cal models, such as discrete-time state-space models and well-known multiplicative updating rules for entries of matrix continuous-time state-space models, for complex industrial factors to search the solutions26 . processes due to complicated relationships between process The NMF can be used for the purpose of dimensional reducvariables and implicit operational mechanisms. As a result, tion and clustering, and it has been proved to be superior over model-based monitoring approaches are limited for extensive traditional multivariate statistical analysis methods because it applications to complex industrial fault diagnosis. Compared allows only additive, not subtractive, of the parts to form a with model-based fault diagnosis approaches, data-driven fault whole objective. Moreover, it is already proved that the NMF diagnosis approaches do not rely on process models and has close relationship with the well-known spectral clustering they are feasible provided data in absence of faults are and K-means clustering. For example, Li et al.27 provided an available. However, fault diagnosis performance of data-driven insightful review of NMF-based clustering algorithms. Various approaches highly relies on the quality of training data. extensions and modifications of the standard NMF algorithm have been proposed for solving a variety of specific problems E-mail: [email protected]. such as image analysis, data mining, text clustering, and voice ACS Paragon Plus Environment Abstract—This paper proposes a simultaneous fault detection and isolation approach based on a novel transfer semi-nonnegative matrix factorization (TSNMF) algorithm. Different from the existing nonnegative matrix factorization (NMF) algorithm, TSNMF takes advantages of a few labeled samples and geometry structures of sample spaces to improve performance. Labeled samples refer to as a type of samples whose memberships are already known. On the contrary, unlabeled samples are a type of samples whose memberships are unknown. Theoretically, we will demonstrate how labeled samples and geometry structures of sample spaces can improve fault detection and isolation performance. More importantly, the proposed simultaneous fault detection and isolation approach can achieve fault detection and isolation purpose without use of monitoring statistics, which means it is easy to be implemented than the existing approaches. In comparison with the existing fault detection and isolation methods, the proposed detection and isolation scheme can perform excellent performance. In addition, the proposed approach can be readily used for newly coming samples and demonstrates good generalization, which promotes an online fault detection and isolation scheme. At last, a numerical example and a case study on the penicillin fermentation process will demonstrate effectiveness of the proposed approaches.

I

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 12

processing, etc. A well-known variant of NMF called Semi- is carried out to evaluate the performance of the proposed NMF proposed by Ding et al.28 is a special case of the NMF, approaches. At last, conclusions are summarized. which makes no constraints on the original matrix and basis matrix while the elements in coefficient matrix are restricted II. R ELATED W ORK to be nonnegative. Another famous variant of NMF, called 28 Convex NMF, proposed also by Ding et al. , represents Recent studies have shown that matrix factorization such the cluster centroids as convex combinations of the columns as singular value decomposition (SVD), PCA, NMF, and of original matrix. In Convex NMF, we could interpret the their variants have close relationship with K-means clustering. cluster centroids as weighted sums of training samples, i.e. the In this section, we will introduce the existing results on cluster centroids will be restricted to lie in the training sample matrix factorization in clustering applications. This section is space. Recent research studies have indicated that if labeled composed of two subsections. In Subsection A, the standard samples and geometrical structures of a sample space are used, NMF and some of its variants are introduced. In subsection then algorithm performance will be significantly improved. B, the relationship between matrix factorization and K-means For example, Cai et al.29 proposed a graph regularized NMF is illustrated. (GNMF) by incorporating the geometry structure of the sample space into NMF. Liu et al.30 proposed a constrained NMF algorithm for image representation through incorporating the A. Introduction to Nonnegative Matrix Factorization labeled samples into the standard NMF. The NMF has been successfully applied to the field of fault NMF is a very popular data analysis method whose goal diagnosis. For example, Li et al.31 proposed a modified NMF is to learn a set of bases to represent an objective through (MNMF)-based fault detection approach, and MNMF-based combining parts to form a whole. The NMF differs from fault detection approach was evaluated through a case study traditional matrix factorization techniques because it enforces on the well-known benchmark Tennessee Eastman process. the elements of the matrices to be nonnegative. Suppose that Moreover, Li et al. 32 presented a new statistical fault detection there exist n samples from k different clusters. Or equivalently, method based on an NMF framework. The new method is given a sample matrix X = [x1 , . . . , xn ] where each column called generalized nonnegative matrix projection-based fault of X denotes a sample of m dimensions. Standard NMF aims detection whose goal is to extract the latent variables from to find two low-rank nonnegative factor matrices U and V train samples to achieve the purpose of fault detection. In whose product can well approximate the original matrix X as addition, Li et al.33 proposed a fault detection method based on follows the NMF for non-Gaussian processes, and the fault detection X ≈ UV. (1) approach was demonstrated to be superior over PCA-based and ICA-based fault detection approaches. where U is a matrix of m × k, V is a matrix of k × n, and In this paper, a novel simultaneous fault detection and k ≪ min{m, n} for dimensional reduction and clustering. isolation scheme based on transfer semi-nonnegative matrix The error between the original matrix X and its approximafactorization (TSNMF) will be proposed. The new TSNMF- tion UV can be measured by the Frobenius matrix norm or based fault detection and isolation scheme can achieve si- the modified Kullback-Leibler divergence. Taking Frobenius multaneous fault detection and isolation without monitoring matrix norm as an example, we will show that the factor statistics. Different from the standard NMF, in this work, matrices U and V can be obtained by solving the following the TSNMF will take advantage of the labeled samples and constrained optimization problem sample space geometry structures. The new algorithm has no restrictions on variable distributions such as Gaussian {U, V} = arg min ∥X − UV∥2F (2) X≥0,U≥0,V≥0 distribution, and the nonnegative restriction on training sample matrices is also released. We will theoretically demonstrate the where operator ∥ · ∥ denotes Frobenius norm of a matrix. reasons why taking advantages of labeled samples and sample In addition to NMF, another two well-known variants of space geometry structures can improve algorithm performance. NMF, called semi-NMF and convex NMF, can be formulated The remainder of this article is organized as follows. In as the following optimization problems, respectively. part A of the next section, standard NMF, semi-NMF and convex NMF are introduced briefly, and in part B, the relations {U, V} = arg min ∥X − UV∥2F , (3) between a variety of variants of NMF and K-means are V≥0 discussed. In section III, a new matrix factorization, called TSNMF, is proposed. Moreover, formal convergence proof of where U is a matrix of m × k, and V is a matrix of k × n. TSNMF algorithm is given. In addition, the algorithm outline {U, V} = arg min ∥X − XUV∥2F . (4) of TSNMF is summarized in a table for readers to better underU≥0,V≥0 stand it. In section IV, TSNMF-based online simultaneous fault detection and isolation scheme is developed. In section V, the where U is a matrix of n × k, and V is a matrix of k × n. performance of TSNMF, PCA, semi-NMF and convex NMF The multiplicative update rule suggested by Lee and are evaluated through a case study on a numerical example. Seung26 can be used to solve above optimization problems In section VI, a case study on penicillin fermentation process by updating U and V, alternatively. ACS Paragon Plus Environment

Page 3 of 12 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

B. Relationship between the NMF and K-means If performing K-means clustering on X, then one can obtain cluster centroid matrix P = [p1 , . . . , pk ] and cluster indicator matrix Q where Qij = 1, if xj belongs to i-th cluster, Qij = 0, otherwise, and Qij denotes the element of matrix Q locating at i-th row and j-th column. The objective function of K-means can be formulated as J(P, Q) =

n ∑ k ∑

Qij ∥xi − pj ∥2F = ∥X − PQT ∥2F . (5)

i=1 j=1

As a result, it can be immediately concluded from (2) to (5) that NMF and K-means have similar objective functions. Moreover, if we allow Qij to vary between interval (0, ∞) instead of data set {0,1}, then K-means becomes the NMF and its variants28 . From the viewpoint of dimension reduction, in NMF, the columns of U span a low-dimensional subspace also called latent space, and each column of V denotes the coordinate of a sample in latent space. Meanwhile, from the viewpoint of clustering, in NMF, the columns of U represent the cluster centroids, and the elements of V represent the cluster indicators. III. T RANSFER S EMI -N ONNEGATIVE M ATRIX FACTORIZATION A. Geometry-Based NMF Geometry-based NMF aims to construct two graphs in original sample spaces (denoted by X-space) and latent space (denoted by V-space), respectively such that X-space and Vspace have the same structure. Then, one can find the category of each sample and capture cluster results of samples in Vspace. To be specific, if two data points xi and xj are close to each other, then vi and vj are close to each other as well, as indicated in Figure 1. For example, Kuang34 proposed a geometry-based NMF called symmetric NMF. It can achieve clustering purpose based on geometry structures of sample spaces by solving the following matrix factorization problem W ≈ VT V

(6)

x1 x3

x5

X - space

v4

v1 v3

x4

x2

v2

v5

V - space

Fig. 1. Geometry structures of X-space and V-space.

Mika et al.35 suggested a method to determine parameter δ by writing δ = mc where m is the dimension of the input space, and c is the variance of the data. Actually, there exist two types of graphs. One type of graphs are called fully-connected graphs where any two nodes are connected by an edge. The other type of graphs are called sparse graphs where every node is connected to its k nearest neighbors34 . In our manuscript, we adopt a fully-connected graph to construct weight matrix W. For the samples, as depicted in Figure 1, it is straightforward to have that x1 , x2 , and x3 are in the same cluster while x4 and x5 belong to another cluster. Therefore, the similarity matrix W definitely takes the following form   • • • ◦ ◦  • • • ◦ ◦     W= (7)  • • • ◦ ◦   ◦ ◦ ◦ • •  ◦ ◦ ◦ • • where black dots denote elements of a matrix with large values, and hollow dots denote elements of small values31 . By performing the symmetric NMF on similarity matrix W, we have   • • • ◦ ◦  • • • ◦ ◦    • • • ◦ ◦  W=    ◦ ◦ ◦ • •  ◦ ◦ ◦ • •   (8) • ◦  • ◦  [v1 v2 v3 v4 v5 ]   • • • ◦ ◦  =  • ◦  ◦ ◦ ◦ • • .  ◦ •  ◦ • VT V

where W is a similarity matrix of a nearest neighbor graph. Nearest neighbor graphs are widely used in spectral graph theory and manifold learning theory to describe geometry structure of a sample space. A graph G can be defined as G = (D, W) where D = [d1 , . . . , dn ] denotes sample set, and Wij denotes the edges between samples di and dj . Here, symbol Wij denotes the element of matrix W locating at i-th row and j-th column. The most commonly used definitions of weight matrix W are summarized as follows. 1) 0-1 Weighting: if xi and xj are neighbor points, then Wij can be chosen as Wij = 1, and Mij = 0, otherwise. 2) Heat Kernel Weighting: if xi and xj are connected, then Wij can Obviously, in latent space, v1 , v2 and v3 are in the same be chosen as Wij = exp(−∥xi − xj ∥22 /δ 2 ). 3) Dot-Product cluster, meanwhile v4 and v5 are in another cluster. It means Weighting: if xi and xj are connected, then Wij = xTi xj . that symmetric NMF on similarity matrices can yield a latent Obviously, W is a nonnegative and symmetric matrix. For heat space whose geometry structure is identical to the original kernel weighting, there is a parameter δ need to be specified. sample space. Moreover, elements of the factor matrix V can ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

be used as the clustering indicators of original samples. More specifically, the largest element of each column of V indicates the membership of the corresponding sample in X-space. The factor matrix V can be computed by solving the following constrained optimization problem min J(V) = ∥VT V − W∥2F .

(9)

V≥0

It worth noticing that although symmetric NMF can uncover the category of each sample based on geometry structures of the sample space and latent space, it has no generalization capability because it ignores the cluster centroids.

B. Transfer Semi-Nonnegative Matrix Factorization We are now ready to introduce our new transfer seminonnegative matrix factorization algorithm. Given an unlabeled sample matrix Xu = [x1u , . . . , xnuu ] and a labeled sample matrix Xl = [x1u , . . . , xnul ] , where each column of Xu and Xl is an m-dimensional sample. The objective function of TSNMF takes the form of 2

J(U, Vu ) = ∥[Xl Xu ] − U[Vl Vu ]∥F

2 +α [Vl Vu ]T [Vl Vu ] − W F

(10)

Page 4 of 12

The factor matrices U and Vu can be obtained by solving the following constrained optimization problem 2

min J(U, Vu ) = ∥[Xl Xu ] − U[Vl Vu ]∥F

2 +α [Vl Vu ]T [Vl Vu ] − W F 2 2 = ∥Xl − UVl ∥F + ∥Xu − UVu ∥F

[ T ] [ ] 2

V Vl V T Vu W11 W12 l l

+α −

VuT Vl VuT Vu W21 W22 F 2 2 = ∥X u − UVu ∥F ( l − UVl ∥F + ∥X

2 ) 2 +α VlT Vl − W11 F + VlT Vu − W12 F (

2

2 ) +α VuT Vl − W21 + VuT Vu − W22 .

Vu ≥0

F

F

(11) For computation of the factor matrix U, according to optimization problem (11) and keeping the terms related only to U in J(U, Vu ), we have the following objective function minimized J(U) = ∥Xl − UVl ∥2F + ∥Xu − UVu ∥2F = tr(XTl Xl − 2XTl UVl + VlT UT UVl ) (12) +tr(XTu Xu − 2XTu UVu + VuT UT UVu ) where operator tr(·) denotes the trace of a matrix. Let the derivative of J(U) with respective to U be zero, we have ∂J(U) = 2UVl VlT − 2Xl VlT + 2UVu VuT − 2Xu VuT = 0. ∂U (13) It follows from equation (13) that

where positive constant α denotes a tradeoff parameter, and matrix Vl can be initialized in advance because the category U = (Xu VuT + Xl VlT )(Vu VuT + Vl VlT )† (14) of each sample in Xl is already known. For example, Vij = T T 1 if xjl belongs to i-th cluster; otherwise, Vij = 0. There where the pseudo-inverse of matrix Vu Vu + Vl Vl was used exist two significant differences between TSNMF and NMF- in above derivations because it may be a singular matrix. Next, the Lagrange method, the most common method for like algorithms. First, both unlabeled and labeled samples are constrained optimization problems, is employed to derive the used as training data for TSNMF. Second, objective function of updating rule for Vu . Let Ψ = [Ψij ] be Lagrange multipliers the TSNMF contains geometry structure of the sample space associated with the constraints Vu = [(Vu )ij ] ≥ 0. Keeping for better clustering performance. Here, we have to clarify that the terms related only to Vu in J(U, Vu ), then the Lagrange the meanings of term ‘transfer’ are two-fold: on the one hand, function L(V , Ψ) can be defined as u it means the information transfer from the labeled samples to the unlabeled samples; on the other hand, it refers to the fact L(Vu , Ψ) = ∥Xu − UVu ∥2F + α∥VuT Vu − W22 ∥2F that geometry structure transfer from the sample space to the +2α∥VlT Vu − W12 ∥2F − tr(ΨT Vu ) latent space. = tr(XTu Xu − 2XTu UVu + VuT UT UVu ) T W22 ) +αtr(VuT Vu VuT Vu − 2VuT Vu W22 + W22 Specifically, the incorporation of labeled samples allows T T T T +2αtr(Vu Vl Vl Vu − 2Vu Vl W12 + W12 W12 ) TSNMF to take advantage of the labeled samples to improve −tr(ΨT Vu ). fault detection and isolation performance. Because the mem(15) berships of the labeled samples in Xl are already known, Letting the derivative of L(Vu , Ψ) with respective to i.e. the prior knowledge about the memberships of different types of labeled samples in matrix Xl is incorporated into (Vu )ij be zero, we have TSNMF model through indicator matrix Vl . As a result, the ∂L(Vu , Ψ) = 2(UT UVu − UT Xu )ij fault detection and isolation approaches induced by TSNMF ∂(Vu )ij (16) will definitely take advantage of the prior knowledge as well. +4α(Vu VuT Vu − Vu W22 )ij Moreover, because Xl and Xu contain same types of samples, +4α(Vl VlT Vu − Vl W12 )ij − Ψij = 0. and the matrix U already helps to generate correct memberTo guarantee the nonnegativity of Vu , we need the followships for labeled samples in Xl , as given by the initialized ing decomposition on the matrices in (16). For any matrix indicator matrix Vl . Thus, matrix U will definitely help to A with mixed signed elements, one can partition it into generate correct memberships for unlabeled samples in Xu substraction of two positive matrices using the following as well. In conclusion, the incorporation of labeled samples decomposition form28 in TSNMF is beneficial to generate correct memberships for labeled and unlabeled samples, simultaneously. A = A+ − A− (17) ACS Paragon Plus Environment

Page 5 of 12 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

where matrix A+ = (|A|+A)/2 denoting a matrix with same dimensions as A whose elements are positive elements of A, A− = (|A| − A)/2 denoting a matrix with same dimensions as A whose elements are negative elements of A, and operator | · | is the element-wise absolute value of a matrix. Then, applying (17) to (16) yields − − + T T T (UT UVu )+ ij − (U UVu )ij + (U Xu )ij − (U Xu )ij T T +2α(Vu Vu Vu − Vu W22 + Vl Vl Vu − Vl W12 )ij −Ψij = 0. (18) Multiplying (Vu )ij to both sides of (18) and using the Kuhn-Tucker condition Ψij (Vu )ij = 0, we have − T ((UT UVu )+ ij − (U UVu )ij )(Vu )ij − + T T +((U Xu )ij − (U Xu )ij )(Vu )ij +2α(Vu VuT Vu − Vu W22 )ij (Vu )ij +2α(Vl VlT Vu − Vl W12 )ij (Vu )ij = 0.

(19)

The following multiplicative updating rules for (Vu )ij can be immediately obtained from equation (19) (Vu )ij = (Vu )ij × + T (UT UVu )− ij + (U Xu )ij + 2α(Vu W22 + Vl W12 )ij

. − T T T (UT UVu )+ ij + (U Xu )ij + 2α(Vu Vu Vu + Vl Vl Vu )ij (20) Theorem 1: The objective J(U, Vu ) in (10) is nonincreasing under updating rule (20) for a fixed U, i.e., the proposed TSNMF algorithm is convergent. The proof of Theorem 1 begins with the definition of auxiliary functions. As the most widely used technique for convergence analysis, the auxiliary function approach is employed to prove Theorem 1. An auxiliary function is defined as: G(x, x′ ) is an auxiliary function of F (x) if and only if G(x, x′ ) satisfies G(x, x′ ) ≥ F (x) and G(x, x) = F (x). The concept of auxiliary function is very important for proof of convergence, and F (x) is non-increasing under the following updating rule x′ = arg min G(x, x′ ) (21) x

because F (x′ ) ≤ G(x, x′ ) ≤ G(x, x) = F (x)36 . As above illustrated, convergence of updating rule (20) can be proved through construction of a proper auxiliary function of J(U, Vu ) and prove that the updating rule (20) is identical to the updating rule derived from the auxiliary function of J(U, Vu ). The construction of an auxiliary function can be easily reduced to find the upper bound of each positive term and lower bound of each negative term in the objective function to be minimized. Keeping the terms related only to Vu in J(U, Vu ), we have F (Vu ) = tr(−2XTu UVu + VuT UT UVu ) +αtr(VuT Vu VuT Vu − 2VuT Vu W22 ) +2αtr(VuT Vl VlT Vu − 2VuT Vl W12 ).

(22)

Now, we construct G(Vu , Vu′ ) = 4αtr(VuT Vu′ V′ u Vu′ + VuT Vl VlT Vu′ ) T − +2tr(VuT (UT UVu′ )+ + (X( u U) Vu ) ) ∑ (Vu )ik T ′ − ′ −2 (U UVu )ik (Vu )ik 1 + log (Vu′ )ik ik ) ( ∑ (Vu )ik + T ′ −2 (U Xu )ik (Vu )ik 1 + log (Vu′ )ik ik ( ) ∑ (Vu )ik ′ ′ −4α (Vu W22 )ik (Vu )ik 1 + log (Vu′ )ik ik ) ( ∑ (Vu )ik ′ −4α (Vl W12 )ik (Vu )ik 1 + log (Vu′ )ik ik (24) as an auxiliary function of F (Vu ) . The derivative of G(Vu , Vu′ ) with respective to (Vu )ij is T

∂G(Vu , Vu′ ) T = 4α(Vu′ V′ u Vu′ + Vl VlT Vu′ )ij ∂(Vu )ij − T +2(UT UVu′ )+ ij + 2(U Xu )ij T ′ − ′ ′ (U UV u )ij (Vu )ij (UT Xu )+ ij (Vu )ij −2 −2 (Vu )ij (Vu )ij (Vu′ W22 )ij (Vu′ )ij (Vl W12 )ij (Vu′ )ij −4α − 4α . (Vu )ij (Vu )ij (25) Letting (25) be zero, one can obtain the following updating rule for Vij that is identical to (20) (Vu )ij = arg min G(Vu , Vu′ ) = (Vu )ij × (Vu )ij

+ T (UT UVu )− ij + (U Xu )ij + 2α(Vu W22 + Vl W12 )ij

. − T T T (UT UVu )+ ij + (U Xu )ij + 2α(Vu Vu Vu + Vl Vl Vu )ij (26) The proof of the Theorem 1 will be complete if G(Vu , V′ u ) is really an auxiliary function of F (Vu ). Actually, G(Vu , V′ u ) is an auxiliary function of F (Vu ). This is because, for any matrices E ∈ ℜ+ and F ∈ ℜ+ , the following inequality holds according to the basic inequality x ≥ 1 + log(x) for any x > 036 ( ( )) Fki Eik F′ ki 1 + log . F′ ki i,k i,k (27) Immediately, we have G(Vu , Vu ) = F (Vu ), G(Vu , V′ u ) ≥ F (Vu ) according to (27). As a result, G(Vu , V′ u ) is an auxiliary function of F (Vu ). Thus, the proof of Theorem 1 is completed. As a matter of fact, Theorem 1 implies more in what follows. It is noted that the second-order derivative of G(Vu , V′ u ) with respective to (Vu )ij and (Vu )lk is given by tr(EF) =



Function F (Vu ) can be approximated by its first-order Taylor series expansion at V′ u as follows where F (Vu ) ≈ 2tr(VuT UT UVu′ − XTu UVu ) T ′ ′T ′ T ′ (23) +4αtr(Vu Vu Vu Vu − Vu Vu W22 ) +4αtr(VuT Vl VlT Vu′ − VuT Vl W12 ). ACS Paragon Plus Environment

Eik Fik ≥



∂ 2 G(Vu , Vu′ ) = σil σjk Hij ∂(Vu )ij ∂(Vu )lk { σil =

1, i=l 0, otherwise

(28)

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

TABLE I A LGORITHM Steps 1 2 3 4 5 6 7 8 9 10 11 12

OUTLINE OF

TSNMF

Calculations Obtain the training samples containing labeled samples and unlabeled samples; Initialize parameters δ, α Calculate W and Vl Initialize Vu with a nonnegative matrix loop update U as in (14) f or i = 1 to k f or j = 1 to nu U pdate (Vu )ij as in (20) end end end loop until convergence.

Page 6 of 12

Suppose that xnew is a new sample measured online, then its membership can be determined by cluster indicator vector vnew . The indicator vector xnew has the following relation with the matrix U xnew = Uvnew + e

(31)

where e is an error vector. The cluster indicator vnew ∈ ℜϕ+1 and vnew = + [(vnew )i ]ϕ+1 ; it can be computed by solving the following i=1 constrained optimization problem vnew = arg min ∥xnew − Uvnew ∥2F . Vnew ≥0

(32)

For the problem (32), a Lagrange function L(vnew , φ) can be defined as and ′ ′ (UT UVu′ )− (UT Xu )+ ij (Vu )ij ij (V u )ij 2 Hij = 2 + 2 2 (Vu )ij (Vu )ij (VlT W12 )ij (V′ u )ij (Vu′ W22 )ij (Vu′ )ij + 4α . +4α (Vu )2ij (Vu )2ij (29) ∂ 2 G(Vu ,V′ u ) Because ∂(Vu )ij ∂(Vu )lk is a diagonal matrix with nonnegative elements. Thus G(Vu , V′ u ) is a convex function with respective to Vu . Furthermore, updating rule (20) not only is convergent but also can make G(Vu , V′ u ) achieve the global optimal28 . The algorithm outline of TSNMF is described as in Table I for readers to better understand it.

L(vnew , φ) = xTnew xnew − 2xTnew Uvnew T +vnew UT Uvnew − φT vnew

where vector φ is a Lagrange multiplier associated with constraint vnew ≥ 0. Letting the derivative of L(vnew , φ) with respective to (vnew )i be zero, we have ∂L(vnew , φ) = 2(UT Uvnew )i − 2(UT xnew )i − φi = 0. ∂(vnew )i (34) Letting p = UT Uvnew and q = UT xnew , then (34) can be rewritten in the following concise form 2(p+ − p− )i − 2(q+ − q− )i − φi = 0 +

IV. S IMULTANEOUS FAULT D ETECTION

AND I SOLATION

Another contribution of this paper is to induce a simultaneous fault detection and isolation approach based on the proposed TSNMF. Fault detection and isolation are significant in real-world applications because it can provide fault diagnosis results in real time after a fault has been detected. Fault detection attempts to make a decision about whether a fault has occurred or not. Fault isolation, in this paper, aims to find the fault type from a set of candidate faults. Suppose that there exist a total of ϕ types of faults and ith type fault sample set is denoted as Xfi = [x1fi , . . . , xnfii ] where ni denotes the number of i-th type of fault samples with i = 1, . . . , ϕ. Let Xt = [x1t , . . . , xnt t ] be the sample set consisting of fault-free samples where nt is the number of samples. In addition, the unlabeled sample set is expressed as Xu = [x1u , . . . , xnuu ] where nu is the number of unlabeled samples. At last, the training sample set can be denoted as X = [Xt , Xf1 , . . . , Xfϕ , Xu ]. For notation simplicity, let n = ∑ϕ nu + nt + i=1 ni . Performing TSNMF on X ∈ ℜm×n , one can obtain the cluster centroid matrix U ∈m×(ϕ+1) and cluster indicator (ϕ+1)×n matrix V ∈ ℜ+ . After U and V are obtained, the factorization of TSNMF takes the form of

(33)

(35)



where p and p are two vectors with same dimensions as p; they consist of positive and negative elements of p, respectively. Using the Kuhn-Tucker condition φi (vnew )i = 0 , we have the following updating rule for (vnew )i (vnew )i = (vnew )i

(p− )i + (q+ )i . (p+ )i + (q− )i

(36)

After indicator vnew is obtained, one can determine the membership of xnew according to the largest element of the vector vnew . To be specific, for different samples, if the maximal elements of their indicator vectors obtained by (36) have the same row index, then they should be in the same cluster. Since, the indicators for the labeled samples are already known. Thus, for any new sample xnew , if the maximal element of its indicator vnew has same row index as the maximal elements of the indicators for j-th type of samples, then xnew is a sample from j-th type of samples. Thus, if the labeled samples contain both fault-free samples and all types of faulty samples, then (36) can recognize the memberships of any newly coming samples no matter if they are faulty or not. Moreover, different from the existing datadriven fault diagnosis approaches, our proposed scheme can achieve fault detection and isolation without use of monitoring statistics and can be performed simultaneously. X = UV + E (30) For readers to better understand the application procedures where matrix U contains cluster centroids corresponding to of the proposed simultaneous fault detection and isolation fault-free samples and each type of fault samples, V contains approach, we list the following fault detection and isolation cluster indicators, and E denotes the residual error. flow chart. ACS Paragon Plus Environment

row index

(a) PCA

0 2 −0.5 2

3

4

5

6

7

column index (b) Semi−NMF 0.6

1

0.4 0.2

2 1

2

3

4

5

6

7

0

column index

Fig. 2. Cluster indicator matrices from PCA and Semi-NMF, respectively.

(a) Convex−NMF 0.6

1

0.4 0.2

2 1

V. A N UMERICAL E XAMPLE In this section, a simple numerical example is employed to evaluate the effectiveness of the proposed TSNMF in data clustering. Moreover, some existing approaches such as Kmeans, PCA, semi-NMF, and convex NMF will be used for comparison. Matrices Xl and Xu are labeled and unlabeled sample sets, respectively, where each column vector denotes a sample, and each row denotes observations of a variable. The training sample set is constructed as X = [Xl Xu ] where   1.3 2.5 6.5 7.3  2.4 2.6 −5.5 −4.6     4.1 −6.1 −7.7  Xl =  4.5   5.6 6.2 8.1 5.3  −3.5 −4.2 7.2 6.5

0.5

1

1

row index

1) Collect training samples including labeled samples and unlabeled samples to construct a sample matrix, and the labeled samples should include fault-free samples and all types of faulty samples. 2) Initialize matrix Vl and parameters α, δ; 3) Determine the similarity matrix W; 4) Perform TSNMF on the sample matrix to obtain a matrix U; 5) For any newly sample xnew , using equation (36) to obtain an indicator vector xnew . 6) Fault detection and isolation for the sample xnew can be performed using the indicator vector vnew . First, find the row index of the maximal elements in indicator vnew and indicators of each type of labeled samples. Second, if the maximal element of vnew has same row index as the maximal element of indicators of fault-free samples, then xnew is a fault-free sample. If the maximal element of vnew has same row index as the indicators of j-th type of faulty samples, then xnew is a faulty sample from j-th type of faut.

2

3

4

5

6

7

0

column index (b) TSNMF 1

row index

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

row index

Page 7 of 12

1 0.5 2 1

2

3

4

5

6

7

0

column index

Fig. 3. Cluster indicator matrices from Convex NMF and TSNMF, respectively.

Next, we perform TSNMF on sample matrix X = [Xl Xu ] while performing PCA, semi-NMF, and convex NMF on sample matrix Xu . Moreover, in TSNMF, heating kernel weight and with δ = 2 is chosen to construct the similarity matrix W,  and α is chosen to be 1. In addition, cluster indicator matrix  [ ] 1.5 2.6 2.6 7.3 5.1 6.0 8.1 1 1 0 0  1.8  3.3 3.1 −5.3 −8.3 −4.7 −7.5  Vl is initialized as Vl = . The algorithms  0 0 1 1 . 5.6 4.5 6.1 −6.8 −7.8 −6.9 −6.2 Xu =    semi-NMF and convex NMF used in this simulation can be  4.1 5.5 4.4 6.6 8.3 5.2 6.5  found in28 . The cluster indicator matrices obtained using each −4.6 −3.2 −2.8 3.5 6.3 4.5 7.6 algorithm are listed in the following figures. Obviously, the samples in Xl and Xu are clearly separable, In Figure 2 and Figure 3, blocks denote the elements of which can be demonstrated by using K-means algorithm cluster indicator matrices, and elements with large values are on them. The cluster indicator vk−means obtained through dark in color. Figure 2 and Figure 3 indicate that in addiperforming K-means on sample matrix X = [Xl Xu ] is given tion to PCA, semi-NMF, convex NMF, and TSNMF provide as follows correct clustering results because, in the first three columns of indicator matrices, the values in the upper rows are larger vk−means = [ 1 1 2 2 1 1 1 2 2 2 2 ] than the values in lower rows; meanwhile, in the remaining where vk−means indicates that the first three columns of Xu columns, the values in the lower rows are larger than the values should be in the same cluster, and the remaining four columns in upper rows. However, TSNMF gives the best cluster results should be in another cluster. Moreover, the first two columns in this clustering task because it provides more sharper cluster of Xl should be in a cluster same to the first three columns indicators than the other algorithms. In summary, TSNMF of Xu , while the last two columns of Xl should be in another is superior over traditional clustering algorithms because it cluster that is the same as the last four columns of Xu . As takes advantages of labeled samples and geometry structure a result, in TSNMF, convex NMF, and Semi-NMF, U should of the sample space. In addition, Figure 4 also displays be a matrix of size 5 × 2, and V should be a matrix of size that the TSNMF finds the correct cluster centroids while 2 × 11. the cluster centroids found by the other methods have great ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

TABLE II C OMPUTATION E FFICIENCIES

15 Samples PCA Semi−NMF Convex NMF TSNMF

10

Approaches P CA Semi − N M F Convex − N M F T SN M F

5 variable 2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 12

0

Time (Unit) 0.048 (Seconds) 0.0171 (Seconds) 0.1098 (Seconds) 0.0133 (Seconds)

−5

TABLE III I NITIAL C ONDITIONS38

−10

−15 −2

0

2

4 variable 1

6

8

10

Fig. 4. Cluster centroids distribution.

deviation from the correct cluster centroids. From view point of geometry, a cluster centroid for a specific type of samples is a vector that locates in the distribution domain of these samples. From view point of algebra, a cluster centroid for a specific type of samples is the mean vector of these samples. In Figure 4, variable 1 and 2 refer to as the variables that their observations are given by the first row and second row, respectively. We now list the specific values of elements of each cluster indicator matrix as follows for retrieval V [ P CA = 0.11 −0.54 0.32 0.06 −0.59 0.002 V [ Semi−N M F = 0.32 0.33 0.33 0 0 0 V [ Convex N M F = 0.49 0.51 0.51 0 0.002 0

−0.028 0.684 0.195 −0.53

0.056 0 0.63 0.744

] 0.295 0.209 −0.226 0.531

0 0 0.57 0.739

]

0.101 0.06 0.04 0.013 0.599 0.721 0.55 0.716

]

Variables Substrate DO Biomass Concentration P enicillin Concentration Culture V olume CO2 PH T emperature Generated Heat

Values (Units) 15.0 (g/L) 1.16 (mmol/L) 0.1 (g/L) 0 (L/h) 100.0 (L) 0.5 (mmol/L) 5.0 198.0(K) 0 (kcal)

to the performance by PCA, semi-NMF, and convex NMF. As a result, taking advantages of labeled samples and geometry structure of sample space by the TSNMF does not cause degradation in low-rank approximation accuracy. In addition to low-rank approximation performance, computation efficiency is another performance index for evaluating each algorithm. We employ the tic-toc function in Matlab software to calculate the time spent by each algorithm. The time spent by each algorithm in this data clustering task is listed in Table II for comparison. Although Table II indicates that all of these algorithms show satisfactory computation efficiency, TSNMF is the highest computation efficiency algorithm in this data clustering task. VI. A C ASE S TUDY O N A P ENICILLIN F ERMENTATION P ROCESS

In this section, a benchmark penicillin fermentation process is employed to evaluate the performance of the proposed fault diagnosis approaches. The penicillin production simulator was V [u= ] developed by Cenk Undey, Gulnur Birol, and Ali Cinar at 0.915 0.96 0.955 0.0028 0 0 0 Process Modeling, Monitoring and Control Research Group, . 0 0 0 0.875 1.047 0.8085 1.048 Department of Chemical and Environment Engineering, IlliAs a two-factor matrix factorization algorithm, theoretically, nois Institute of Technology. More detailed descriptions of the PCA has the best performance in low-rank approximation. PenSim v2.0 simulator can be found at the following website: So, we evaluate the low-rank approximation performance of http://www.chee.iit.edu/∼ control/software.htm. The flow chart each algorithm by comparing with PCA. The residual errors of the penicillin fermentation process is depicted in Figure 537 . The time duration for the simulation is 300 hours and generated by each algorithm are calculated as follows sampling time interval is one hour. System initial conditions P CA : ∥Xu − P(:, 1 : 2)P(:, 1 : 2)T Xu ∥F = 3.1886 and set points of variables are given by Tables III and IV, respectively38 . A total of 16 process variables, as listed in Semi − N M F : ∥Xu − UVSemi−N M F ∥F = 3.2163 Table V37 , are monitored. Moreover, 10 fault-1 samples, 10 Covex − N M F : ∥Xu − UVConvex N M F ∥F = 3.2087 fault-2 samples, and 90 unlabeled samples were used to train TSNMF model where the 90 unlabeled samples contain 30 T SN M F : ∥Xu − UVu ∥F = 3.4565 normal samples, 30 fault-1 samples, and 30 fault-2 samples. where P(:, 1 : 2) is the load matrix in PCA consisting of the In addition, data models of PCA, semi-NMF, and convex NMF were trained using the 90 unlabeled samples. first two load vectors. As we can see from above residual errors, TSNMF has satA sample distribution plot, as shown in Figure 6, indicates isfactory low-rank approximation performance that is similar that the training samples are separable. The parameters in ACS Paragon Plus Environment

Page 9 of 12

pH

TABLE V M ONITORED VARIABLES37

FC

Substrate Tank

Acid

Base

FC

T Fermenter Air

Cold Water

Hot Water

No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Variables (Units) Aeration Rate (L/h) Agitator P ower (W ) Substrate F eed Rate (L/h) Substrate F eed T emperature (K) Dissolved Oxygen Saturation (%) Biomass Concentration (g/L) P enicillin Concentration (g/L) Culture V olume (L) CO2 Concentration (mmole/L) PH T emperature (K) Generated Heat (kcal/h) Acid F low Rate (mL/h) Base F low Rate (mL/h) Cold W ater F low Rate (L/h) Hot W ater F low Rate (L/h)

Fig. 5. Penicillin Fermentation Process37 46

TABLE IV S ET P OINTS38

Normal Samples Fault 1 Samples Fault 2 Samples

44

Values (Units) 8.6 (L/h) 30.0 (W ) 0.042 (L/h) 296 (K) 5.0 298.0 (K)

Agitator Power (W)

42

Variables Aeration Rate Agitator P ower Substrate F eed F low Rate Substrate F eed T emperature PH T emperature

40 38 36 34 32 30

TNMF were chosen as follows: the heating kernel weight with δ = 4 is chosen to construct the similarity matrix W, and α is set to be 1; training sample set X = [Xn , Xf1 , Xf2 , Xu ] is a matrix with size 16 × 120 where fault-free sample set Xn is of size 16 × 10, fault-1 sample set Xf1 is of size 16 × 10, fault-2 sample set Xf2 is of size 16 × 10, unlabeled sample set Xu is of size 16 × 90, U is of size 16 × 3, and Vu is of size 3 × 90. Moreover, Vl can be initialized as   1 ··· 1 0 ··· 0 0 ··· 0 . Vl = 0 · · · 0 1 · · · 1 0 · · · 0 0 · · · 0 0 · · · 0 1 · · · 1 3×30

28 8.5

9

9.5

10

10.5

11

11.5

12

12.5

13

Aeration Rate (L/h)

Fig. 6. Sample distribution plot.

PCA 0.2 1

0.15 0.1

row index

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

0.05 2

0 −0.05

Two types of faults are selected for fault detection and −0.1 isolation purpose and description of each fault is given in Table −0.15 38 3 VI , and 50 samples from each fault sample set were selected −0.2 for testing purpose. Then, a total of 220 samples were used in 50 100 150 200 the simulation studies where from 121-th to 170-th are fault-1 column index samples to be detected and isolated, and from 171-th to 220-th are fault-2 samples to be detected and isolated. Fig. 7. Fault detection and isolation results from PCA. The fault detection and isolation results obtained by using PCA are depicted in Fig. 7 where the PCA method fails to identify the memberships of the testing faulty samples. Semi-NMF and convex NMF do not provide satisfactory fault TABLE VI FAULT D ESCRIPTIONS detection and isolation results as well, as shown in Figure 8 and 9. Meanwhile, Fig. 10 indicates that TSNMF provides No. of Fault Variables Involved Fault Types correct fault detection and isolation. Because the 121-th to 1 Aeration Rate Step 2 Agitator P ower Step 170-th samples are identified to be fault 1, and the 171-th to 220-th samples are identified to be fault 2. Specifically, from ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

Convex−NMF 0.12 1

row index

0.1

0.08 2 0.06

0.04 3

0.02

50

100

150

200

column index

Fig. 8. Fault detection and isolation results from convex NMF.

Semi−NMF 1.4

1.2

1

row index

1

0.8 2 0.6

0.4 3

0.2

50

100

150

200

column index

Fig. 9. Fault detection and isolation results from semi-NMF.

TSNMF 1 0.9 1

0.8 0.7

row index

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

0.6 2

0.5 0.4 0.3 0.2

3

0.1 50

100

150

0

200

column index

Fig. 10. Fault detection and isolation results from TSNMF.

TABLE VII C OMPUTATION E FFICIENCIES Approaches P CA Semi − N M F Convex − N M F T SN M F

Time (Unit) 0.3084 (Seconds) 3.1505 (Seconds) 9.0962 (Seconds) 5.5121 (Seconds)

Page 10 of 12

11-th columns to 20-th columns and from 121-th columns to 170-th columns in indicator matrix, the values in the second row are larger than the elements in the other rows, i.e. the testing fault-1 samples are successfully identified. Similarly, from 21-th columns to 30-th columns and from 171-th columns to 220-th columns in Figure 10, the values in the third row are larger than the elements in the upper rows, i.e. the testing fault-2 samples are successfully identified. Next, we attempt to analyze the reasons why TSNMF can outperform the other methods. Different from the other methods, TSNMF model incorporates labeled samples and geometry structures of sample space. First, labeled samples allow TSNMF to take advantages of prior knowledge about the memberships of the labeled samples to improve fault detection and isolation performance. Second, the incorporation of geometry structures is also beneficial for TSNMF to improve fault detection and isolation performance. Because, the incorporation of geometry structures makes it possible for TSNMF to perform fault detection and isolation by latent variables, located in the latent space spanned by the columns of indicator matrix V. At last, let us evaluate the computation efficiency of each algorithm. Time spent by each algorithm for detecting and isolating the unlabeled samples is listed in Table VII. Table VII indicates that PCA spends the least time in this fault detection and isolation task. The reason why PCA computes so fast is that PCA performs factorization on sample matrix only by a singular value decomposition without use of iterations. Moreover, the superiority of PCA in computation efficiency will become even more significant than the other algorithms with increase of the quantity of samples. In addition, in comparison to semi-NMF and convex NMF, TSNMF shows satisfactory computation efficiency as well. VII. C ONCLUSIONS We have presented a new matrix factorization algorithm called transfer semi-nonnegative matrix factorization. The new factorization approximately factorizes a sample matrix containing labeled and unlabeled samples into the product of two low-rank matrices. Moreover, we have interpreted the relationships between transfer semi-nonnegative matrix factorization and K-means clustering. In addition, the theoretical proof of the convergence of this new factorization algorithm has been given. The new method has obvious interpretability for the clustering and provides sharper indicators in comparison with the existing approaches through a numerical example and a case study on the penicillin fermentation process. The cluster centroid matrix can be readily used for newly coming samples and can demonstrate good generalization, which allows us develop an online simultaneous fault detection and isolation scheme. Simulation results have shown that TSNMF outperforms the existing algorithms in fault diagnosis and demonstrates significant capability of analyzing real-world data sets.

VIII. ACKNOWLEDGEMENTS The authors thank the financial support by the China Scholarship Council (CSC) [2017] 3109. ACS Paragon Plus Environment

Page 11 of 12 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

R EFERENCES [1] Qin, S. J. Statistical process monitoring: basics and beyond. J. Chemometrics 2003, 17, 480−502. [2] Alcala, C.; Qin, S. J. Reconstruction based contribution for process monitoring. Automatica 2009, 45, 1593−1600. [3] Duina, R.; Qin, S. J. Subspace approach to multidimensional fault identification and reconstruction. AIChE J. 1998, 44, 1813−1831. [4] Lee, J. M.; Yoo, C. K.; Choi, S. W.; Vanrolleghem P. V.; Lee, I. B. Nonlinear process monitoring using kernel principal component analysis. Chem. Eng. Sci. 2004, 59, 223−234. [5] Ku, W.; Storer, R.; Georgakis, C. Disturbance detection and isolation by dynamic principal component analysis. Chemometr. Int. Lab. Syst. 1995, 30, 179−196. [6] Qin, S. J.; Duina, R. Determining the number of principal components for best reconstruction. J. Process Control 2000, 10, 145−150. [7] Zhou, D. H.; Li, G.; Qin, S. J. Total projection to latent structures for process monitoring. AIChE J. 2010, 56, 168−178. [8] Fearn, T. On orthogonal signal correction. Chemom Intell. Lab. Syst. 2000, 50, 47−52. [9] Li, G.; Qin, S. J.; Zhou, D. H. Geometric properties of partial least squares for process monitoring. Automatica 2010, 46, 204−210. [10] Choi, S. W.; Lee, I. B. Multiblock PLS-based localized process diagnosis. J. Process Control 2005, 15, 195−306. [11] Westerhuis, J. A.; Gurden, S. P.; Smilde, A. K. Generallized contribution plots in multivariate statistical process monitoring. Chemometr. Int. Lab. Syst. 2000, 51, 95−114. [12] Ge, Z. Q.; Zhong, S. Y.; Zhang, Y. W. Semisupervised kernel learning for FDA model and its application for fault classification in industrial processes. IEEE Trans. Ind. Inf. 2016, 12, 1403−1411. [13] Shi, H.; Liu, J.; Wu, Y.; Zhang, K.; Zhang, L.; Xue, P. Fault diagnosis of nonlinear and large-scale processes using novel modified kernel Fisher discriminant analysis approach. Int. J. Syst. Sci. 2016, 47, 1095−1109. [14] Zhong, S.; Wen, Q.; Ge, Z. Q. Semi-supervised Fisher discriminant analysis model for fault classification in industrial processes. Chemometr. Int. Lab. Syst. 2014, 138, 203−211. [15] He, Q. P.; Qin S. J.; Wang, J. A New Fault Diagnosis Method Using FaultDirections in Fisher Discriminant Analysis. AIChE J. 2005, 51, 555−571. [16] Cichocki, A.; Douglas, S. C.; Amari, S. Robust techniques for independent component analysis (ICA) with noisy data. Neurocomputing 1998, 22, 113−129. [17] Hyv¨ arinen, A.; Oja, E. Independent component analysis by general nonlinear Hebbian-like learning rules. Sig. Pro. 1998, 64, 301−313. [18] Hyv¨ arinen, A.; Oja, E. Independent component analysis: algorithms and applications. Neu. Net. 2000, 13, 411−430. [19] Fan, J. C.; Wang, Y. Q. Fault detection and diagnosis of non-linear nonGaussian dynamic processes using kernel dynamic independent component analysis. Inf. Sci. 2014, 259, 369−379. [20] Hyv¨ arinen, A. Independent component analysis in the presence of Gaussian noise by maximizing joint likelihood. Neurocomputing 1998, 22, 49−67. [21] Hyv¨ arinen, A. Fast and robust fixed-point algorithms for independent component analysis. IEEE Trans. Neu. Net. 1999, 10, 626−634. [22] Douglas, S. C.; Cichocki, A. Neural Networks for Blind Decorrelation of Signals. IEEE Trans. Signal Process. 1997, 45, 2829−2842. [23] Cardoso, J. F.; Laheld, B. H. Equivariant adaptive source separation. IEEE Trans. Signal Process. 1996, 44, 3017−3030. [24] Yang, H. H. Serial updating rule for blind separation derived from the method of scoring. IEEE Trans. Signal Process. 1999, 47, 2279−2285. [25] Karhunen, J.; Pajunen, P.; Oja, E. The nonlinear PCA criterion in blind source separation: Relations with other approaches. Neurocomputing 1998, 22, 5−20. [26] Lee, D.; Seung, H. Learning the parts of objects by non-negative matrix factorization. Nature 1999, 6755, 788−791. [27] Li T.; Ding, C. The relationships among various nonnegative matrix factorization methods for clustering. Sixth International Conference on Data Mining (ICDM’06) 2006, Hong Kong, China. [28] Ding, C.; Li, T.; Jordan, M. I. Convex and semi-nonnegative matrix factorizations. IEEE trans. Patt. Analy. Mach. Int. 2010, 32, 45−55. [29] Cai, D.; He, C. F.; Han, J. W.; Huang, T. S. Graph regularized nonnegative matrix factorization for data representation. IEEE Trans. Patt. Ana. Mach. Int. 2011, 33, 1548−1560. [30] Liu, H. F.; Zhao, Z. H.; Li, X. L.; Cai, D.; Huang, T. S. Constrained nonnegative matrix factorization for image representation. IEEE Trans. Patt. Ana. Mach. Int. 2012, 34, 1299−1311.

[31] Li, N.; Yang, Y. P. Statistical process monitoring based on modified nonnegative matrix factorization. J. Int. & Fuz. Syst. 2015, 28, 1359−1370. [32] Li, X. B.; Yang, Y. P.; Zhang, W. D. Statistical process monitoring via generalized non-negative matrix projection. Chemometr. Chemom. Int. Lab. Syst. 2013, 121, 15−25. [33] Li, X. B.; Yang, Y. P.; Zhang, W. D. Fault detection method for nonGaussian processes based on non-negative matrix factorization. AsiaPacific J. Chem. Eng. 2013, 8, 362−370. [34] Kuang, D.; Ding, C.; Park, H. Symmetric nonnegative matrix factorization for graph clustering. Proceedings of the 2012 SIAM international conference on data mining, 2012. [35] Mika, S.; Sch¨olkopf, B.; Smola, A. J.; M¨uller, K. R.; Scholz M.; R¨atsch, G. Kernel PCA and de-noising in feature space. Proc. Advances Neural Inform. Processing Syst. II. 1999, 536−542. [36] Yang, Z. R.; Oja, E. Linear and nonlinear projective nonnegative matrix factorization. IEEE Trans. Neu. Net. 2010, 21, 734−749. [37] Lee, J. M.; Yoo, C. K.; Lee, I. B. Fault detection of batch processes using multiway kernel pricipal component analysis. Comp. & Chem. Eng. 2004, 28, 1837−1847. [38] Zhai, L. R.; Zhang, Y. W.; Guan, S. P.; Fu, Y. J.; Feng, L. Nonlinear Process Monitoring Using Kernel Nonnegative Matrix Factorization. Canada J. Chem. Eng. 2018, 96, 554−563.

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

Data Normal Samples

Penicillin Fermentation Process

Fault-1 Samples

0.6 2

0.4 0.2

3 100

150

column index

200

0

Agitator Power (W)

1 0.8

50

TSNMF

Fault-2 Samples

1

row index

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

50

Normal Samples Fault 1 Samples Fault 2 Samples

45 40 35 30 25 8

9

10

12

11

13

Aeration Rate (L/h)

Fig. 11. TOC graphic.

ACS Paragon Plus Environment

Page 12 of 12