Process Data Analytics via Probabilistic Latent Variable Models: A

Publication Date (Web): August 31, 2018 ... Through a probabilistic viewpoint, this paper carries out a tutorial review of probabilistic latent variab...
1 downloads 0 Views 1MB Size
Subscriber access provided by Kaohsiung Medical University

Review

Process data analytics via probabilistic latent variable models: A tutorial review Zhiqiang Ge Ind. Eng. Chem. Res., Just Accepted Manuscript • DOI: 10.1021/acs.iecr.8b02913 • Publication Date (Web): 31 Aug 2018 Downloaded from http://pubs.acs.org on September 1, 2018

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 48 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

p(x | t )

µ

Probabilistic Description

Pt

Latent Variable Modeling

= p (x | t )  (x | Pt + µ , β −1I)

μ

p ( x)

t

µ

p (t )

t p (t ) =  (t | 0, I)

P

x

β −1I p ( x) =  ( x | µ , M )

= M PT P + β −1 I

Applications for Process Data Analytics

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Process data analytics via probabilistic latent variable models: A tutorial review Zhiqiang Ge∗ State Key Laboratory of Industrial Control Technology, Institute of Industrial Process Control, College of Control Science and Engineering, Zhejiang University, Hangzhou, China, 310027

Abstract Dimensionality reduction is important for the high-dimensional nature of data in the process industry, which makes latent variable modeling methods popular in recent years. By projecting high-dimensional data into a lower-dimensional space, latent variables models are able to extract key information from process data while simultaneously improve the efficiency of data analytics. Through a probabilistic viewpoint, this paper carries out a tutorial review of probabilistic latent variable models on process data analytics. Detailed illustrations of different kinds of basic probabilistic latent variable models (PLVM) are provided, as well as their research statuses. Besides, more counterparts of those basic PLVMs are introduced and discussed for process data analytics. Several perspectives are highlighted for future research on this topic.

Keywords: Process data analytics; probabilistic modeling; Latent variable model.



Corresponding author:

E-mail address: [email protected] (Ge Z.)

1

ACS Paragon Plus Environment

Page 2 of 48

Page 3 of 48 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

1. Introduction With the wide use of distributed control systems and new measurement devices, a large amount of data have been recorded and collected in the process industry, which makes data analytics popular in the past years. Compared to traditional modeling methods which typically incorporated prior knowledge or experiences from engineers, process data modeling and analytics are much more flexible, which means the data model can be built more easily and quickly. From the process data, useful information can be extracted, which is then transferred to effective knowledge for process understanding and decision support [1][2][3]. For example, data-based models have been developed for online process monitoring and fault diagnosis, which can provide real-time information for process operating condition, as well as root causes location if there is any abnormality/fault that happens in the process [4][5][6]; Inferential or soft sensors have been developed from historical process data for online estimation or prediction of key indices/quality variables [7][8]; Data clustering methods have been developed for operating mode identification; Data-driven classification models have been proposed for discriminant analytics of various data patterns, such as process faults, production grades, batch manners, and so on [9][10][11][12]. Along the past two decades, lots of process data modeling and analytics have been developed [13]-[26], among which the latent variable models have played an important role. Due to the high-dimensional nature of the process data, dimensionality reduction is always necessary, otherwise, data analytics could be quite difficult. As a result, latent variable data modeling methods such as principal component analysis (PCA) and partial least squares (PLS) have become very popular in process data analytics. By projecting the data into a lower-dimensional space, latent variables models are able to extract key information from the data while simultaneously improve the efficiency of the data analytic procedures. For example, PCA/PLS can reduce the dimensionality of the process data to two or three dimensions, in 2

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

which data visualization become physically available, and the main variations of the process can also be monitored effectively. To date, latent variable models have been widely used for process data monitoring, discriminant analysis, regression modeling, clustering, classification, etc. Typically used latent variable models include PCA, PLS, and independent component analysis (ICA). Recently, Bartolucci et. al. [27] discussed on the topic of latent variable models in dealing with the complexities of big data from different perspectives, such as simplification of data structure, flexible representation of dependence between variables, reduction of selection bias, as well as problems involved in parameter estimation. However, traditional latent variable models are lack of probabilistic interpretation of the process data. In fact, almost all process variables are contaminated by random noises. Hence, the process variables inherently perform through the statistical manner, not in the deterministic way. In this case, it is better to use the probabilistic model structure, which provides a more natural expression for the process data. Actually, there are several additional advantages of using the probabilistic data model. Firstly, the probabilistic model can naturally deal with missing values in the dataset, which is very common in practice. Secondly, an efficient expectation-maximization (EM) algorithm can be applied for parameter estimation in the probabilistic model, which could greatly reduce the computation burden, particularly for high-dimensional industrial process data. Thirdly, it is more straightforward to generalize the single model structure to the mixture model case, in order to handle more complicated problems. Besides, the probabilistic model can naturally exploit Bayesian methods for model selection and parameter tuning, which can simultaneously avoid the over-fitting problem and make full use of the modeling dataset. In the past years, various probabilistic forms of the latent variable model have been introduced or newly developed for process data analytics. For example, the probabilistic PCA model has been introduced for process monitoring, and later a supervised form of this model which is also called probabilistic PCR has

3

ACS Paragon Plus Environment

Page 4 of 48

Page 5 of 48 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

been proposed for soft sensor modeling and online quality prediction. Recently, a probabilistic form of the ICA model has been introduced for non-Gaussian process modeling and monitoring. The independent latent spaces are specified with Student’s t formulation to account for both Gaussian and non-Gaussian characteristics while the additional noise term is further served as a complement for explaining underlying process uncertainties. More recently, a probabilistic form of the PLS model has also been introduced for process data regression modeling, with an extension to the mixture form for addressing more complicated data regression problem. Additionally, based on those basic probabilistic latent variable models, various counterparts have been developed for performance improvements of process data analytics in the past years. A detailed review of those related counterparts will be provided in section 2 of the present paper. The rest of paper is organized as follows. In section 2, a detailed review of existing probabilistic latent variable models for process data analytics is provided, followed by some perspectives for future research in the next section. Finally, conclusions are made.

2. Probabilistic latent variable models for process data analytics In this section, the main ideas and research statuses of different kinds of probabilistic latent variable models are illustrated. The main focus is put on four main probabilistic latent variable models, namely probabilistic PCA, factor analysis, probabilistic PLS, and Probabilistic ICA, which have become quite popular in process data analytics in recent years. Other related probabilistic latent variable models will also be briefly introduced in this section. Additionally, comparative discussions among different methods are made for detailed illustrations of application statuses of those methods for process data analytics.

4

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 48

2.1. Probabilistic PCA As a probabilistic counterpart of the basic PCA model, the probabilistic form of the PCA was firstly proposed by Tipping and Bishop [28]. It is based on a generative model structure, the formulation of the probabilistic PCA method is given as

x = µ + Pt + e

(1)

where x ∈ R m represents the process variable, m is the number of process variables, µ ∈ R m contains mean values of process variables, t ∈ R

k

is the latent variable, k is the number of latent variables

P ∈ R m×k is the weighted matrix, e ∈ R m is a zero mean white noise term with variance β −1I , thus

p (e) = N (e | 0, β −1I) . In the Probabilistic PCA model, the prior distribution of the latent variable t is assumed to be a standard Gaussian distribution, thus p(t ) = N (t | 0, I ) . Based on the definition of the model structure and the assumption of the latent variables, the conditional probability of the process −1

variable x can be determined as p(x | t ) = N (x | Pt + µ, β I) . Then, the marginal likelihood of x can be calculated by integrating out the latent variables, given as

p ( x | P , β ) = ∫ p (x | t , P , β ) p (t )dt

(2)

An illustration for the probabilistic PCA model is provided in Figure 1.

p(x | t )

Pt

µ

p (x | t ) =

µ

(x | Pt + µ , β −1I)

p( x)

t

µ

p (t )

P

x

β −1I

t p (t ) =  (t | 0, I)

p(x) =

(x | µ , M )

M = PT P + β −1 I

Figure 1: Illustration of the probabilistic PCA model 5

ACS Paragon Plus Environment

Page 7 of 48 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Given the dataset X = ( x1 , x 2 ,L , x n ) of N

data samples, P and

β can be determined by

maximizing the following likelihood function N

L( P, β ) = ln ∏ p(x i | P, β )

(3)

i =1

To do this, the EM algorithm can be appied for computation efficiency. In the E-step, statistics for latent variables are estimated as t n = ( β −1I + PT P ) PT xn

(4)

t n tTn = β −1 ( β −1I + PT P ) + t n tTn

(5)

−1

−1

While in M-step, parameters are updated: N  N  P =  ∑ t n xTn   ∑ t n tTn   n =1   n =1 

β −1 =

−1

(6)

1 N T ∑ ( xn − Pt n ) ( xn − Pt n ) ND n =1

(7)

The optimal value for the two parameters can be determined when the EM algorithm gets converged. For the application of process monitoring, two statistics T 2 and SPE can be constructed for monitoring the variation in the main operating region and the residual space. For a new process data sample

x new , the estimated latent variable can be calculated by applying the probabilistic PCA model, given as t new = Qx new = PT ( PPT + β −1I) −1 x new

(8)

The estimated variance of the latent variable is given as

var( t new ) = Q( PPT + β −1I )QT

(9)

Then, the T 2 statistic can be constructed as 2 Tnew = t Tnew (var(t new )) −1 t new

(10)

On the other hand, the SPE statistic can be constructed as

e new = x new − Pt new = ( I − PQ)x new SPEnew = eTnew ( β −1I ) −1 e new 6

ACS Paragon Plus Environment

(11)

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The control limits of both two statistics can be determined by the

2 = χγ2 (k ) , χ 2 distribution, e.g. Tlim

SPElim = χγ2 (m) , where γ is the significant level. If any one of the two monitoring statistics has exceeds its corresponding control limit, a fault alarm needs to be reported in the process [4]. Since the probabilistic PCA model was introduced for process data analytics, more and more applications have been made in the process industry. For example, Chen and Liu [29] developed a mixture principal component analysis network for extraction of fuzzy rules from process data; Kim and Lee [30] introduced the probabilistic PCA model for process monitoring; Choi et. al. [31] developed a maximum-likelihood based mixture principal component analysis model and used it for the fault detection purpose; Thissen et. al. [32] proposed a similar mixture latent variable model for multivariate statistical process control; Chen and Sun [33] reviewed the applications of both probabilistic principal component analysis and mixture probabilistic principal component analysis models, and discussed some implementation issues that provide alternative perspective on their application to process monitoring; Later, a similar branch and bound method was developed for isolation of faulty variables through missing variable analysis by Kariwala et. al [34]. Yang et. al. [35] developed an aligned mixture probabilistic principal component analysis for fault detection of multimode chemical processes. Ge and Song [36] extended the basic probabilistic PCA model to the nonlinear form through the introduction of the kernel trick, and used it for probabilistic monitoring of industrial processes. He et. al. [37] proposed a branch and bound approach to construct a reconstruction-based multivariate contribution analysis for fault isolation. Ge et. al. [38] developed a mixture probabilistic PCR model for soft sensing of multimode processes, which is actually a supervised form of the mixture probabilistic PCA model. Zhou et. al. [39] proposed a similar probabilistic latent variable regression model for process-quality monitoring. Recently, the probabilistic PCA model has been extended to the robust form for process monitoring, which can effectively handle both outliers and

7

ACS Paragon Plus Environment

Page 8 of 48

Page 9 of 48 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

missing data in the modeling stage [40]. Later, the robust probabilistic PCA model was modified to include the information of quality data, and was used for soft sensing of key process variables in the process [41]. Yang et. al. [42] developed an aligned mixture probabilistic principal component analysis model for fault detection of multimode chemical processes. To overcome the model selection problem, the Bayesian regularization method has been employed for determination of the number of latent variables in both mixture probabilistic PCA and supervised probabilistic PCA models, in which the importance of each latent variable can also be effectively evaluated through the introduction of hyperparameters [43] [44]. Ge and Song [45] introduced a variational inference component analysis method for robust monitoring and fault reconstruction. Liu et. al. [46] used a similar variational inference PCA model as an initial data pre-processing step for soft sensor application in the wastewater treatment process. In order to incorporate more unlabeled training data samples for performance improvement, Ge and Song [47] developed a semi-supervised Bayesian form of the supervised probabilistic PCA model, and used it for soft sensor modeling with both labeled and unlabeled data samples. Recently, this basic semi-supervised probabilistic latent variable model has been extended to both mixture and nonlinear form under the maximum likelihood modeling framework [48] [49]. Zhou et. al. [50] also developed a semi-supervised probabilistic latent variable model for process monitoring with unequal sample sizes of process variables and quality variables. More recently, Zhu et. al. [51] developed a robust form of the semi-supervised mixture probabilistic principal component regression model and applied it for soft sensing of key variables in the process. Furthermore, the basic probabilistic PCA model has also been successfully extended to the dynamic form [52][53] for fault classification and local weighted form [54] for nonlinear feature extraction and soft sensor modeling, based on which the data analytic performance has been improved in those corresponding cases.

8

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

Supervised Probabilistic PCA

Semi-supervised Probabilistic PCA

er K

l ne

ar n un i ng la w be ith le d lab s a el m ed pl e s a nd

ar le

ni

ng

Nonlinear Probabilistic PCA

Le

Mixture modeling

Mixture Probabilistic PCA

Probabilistic PCA

Bayesian regularization Model selection

Ti m ese n io

ta

t ia

da

ar V

rie s

al EM

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 48

Dynamic Probabilistic PCA

Variational Probabilistic PCA

Robust Probabilistic PCA

Figure 2: Illustrations of different counterparts of the probabilistic PCA model. Figure 2 provides an overview of the relationships among different counterparts of the probabilistic PCA model. It can be seen that different probabilistic PCA models can be transformed to each other in terms of solving corresponding modeling issues. For example, the mixture probabilistic PCA model can be extended to the nonlinear form by employing the kernel trick into the mixture probabilistic modeling framework; the supervised probabilistic PCA model can be made for description of semi-supervised datasets through incorporating unlabeled data samples for model construction; through the introduction of the variational inference strategy, the robust probabilistic PCA model can be trained under the Bayesian learning framework, with introduction of additional hyperparameters, and so on. Figure 3 and Figure 4 illustrate the application statuses of different probabilistic PCA models in general areas and for process data analytics. It can be seen that mixture probabilistic PCA, robust probabilistic PCA, and dynamic probabilistic PCA dominate the most applications in general areas, while mixture probabilistic PCA and dynamic probabilistic PCA are two of the most widely used methods in process data analytics. Detailed

9

ACS Paragon Plus Environment

Page 11 of 48 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

illustrations of several representative models and their applications for process data analytics are introduced in the following subsections.

Figure 3: General application status of different probabilistic PCA models.

Figure 4: Application status of different probabilistic PCA models in process data analytics.

2.1.1. Mixture probabilistic PCA model The model structure of the mixture form of probabilistic PCA model is given as [31]:

x i ,k = µ k + Pk t i ,k + ei ,k , k = 1, 2,L, K K

x i = ∑ p( k )x i ,k , i = 1, 2,L, n

(12)

k =1

where K is the number of local models, i = 1, 2,L , n , p ( k ) is the mixing proportional value of each

10

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 48

K

individual model, with constraint

∑ p(k ) = 1 .

Pk is the weighting matrix of the k-th individual model,

k =1

t k ∈ R q×1 is the latent variable vector, q is the number of retained latent variables in each local model,

e k ∈ R m×1 is the noise vector. An illustration of the mixture probabilistic PCA model is given in Figure 5. µ1

µ2

t1

µK

t2

σ I

σ I P1

2 x,K

P2

x1

tK

σ I

2 x,2

2 x,1

PK

x2

xK

p(1 )

x Figure 5: Illustration of the mixture probabilistic PCA model. Similar to the single PPCA model, it is assumed that both probability density functions of the latent variable and the measurement noise in each individual model are Gaussian; thus p ( t k ) = N (0, I ) ,

p (e k ) = N (0, σ x2,k I) . Therefore, the marginal distribution of the process variables in each individual model is given as:

p (x | Pk , σ x2,k ) = ∫ p (x | t k , Pk , σ x2,k ) p ( t k )dt k

(13)

To obtained the optimal parameter set of the model , the following likelihood function needs to be maximized: n

n

n

K

i =1

i =1

i =1

k =1

L( X | Θ) = ln ∏ p (x i | Θ) = ∑ ln p (x i | Θ) = ∑ ln ∑ p (x i | k , Θ) p (k )

(14)

where Θ = {Θ}k = {Pk , σ x ,k , µ x ,k } . It is noted that it is quite difficult to obtain a closed solution through 2

traditional optimization methods for such a nonlinear optimization problem. Fortunately, the efficient Expectation-Maximization (EM) algorithm can be applied [55]. Alternatively, the maximization of the expected complete-data Log likelihood function is carried out in the EM algorithm, where both t and k

11

ACS Paragon Plus Environment

Page 13 of 48 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

are treated as hidden variables. As a result, the expected complete-data Log likelihood function value with respect to joint distribution of t and k can be derived as n

K

E[ L( X | Θ)] = ∑∑ ∫ p( t i ,k , k | x i , Θold ) ln[ p (x i , t i ,k , k | Θ)]dt i ,k i =1 k =1

n

K

= ∑∑ p (k | x i , Θold ) ∫ p ( t i ,k | x i , k , Θold ) ln[ p( x i , t i ,k , k | Θ)]dt i ,k i =1 k =1 n

(15)

K

= ∑∑ p (k | x i , Θold ) ∫ p ( t i ,k | x i , k , Θold ) ln[ p( x i , t i ,k | k , Θ) p( k )]dt i ,k i =1 k =1 n

K

= ∑∑ p (k | x i , Θold ){ln p( k ) + ∫ p ( t i ,k | x i , k , Θold ) ln[ p( x i , t i ,k | k , Θ)]dt i ,k } i =1 k =1

Given the parameters Θold from in the previous step, the aim of the E-step is to calculate the posterior probabilities of hidden variables t and k . While in the M-step, the parameter set Θnew is updated by maximizing the expected complete-data Log likelihood function E [ L( X | Θ)] . When the EM algorithm convergences, the optimal parameter set can be obtained. Detailed derivations of both two steps can be found in Tipping and Bishop [56].

2.1.2. Supervised PPCA model The supervised probabilistic PCA model can be formulated by the following generative model structure, which is also known as probabilistic PCR model [38]

where P ∈ R

m×q

, C∈ R

r ×q

x = µ x + Pt + e

(16)

y = µ y + Ct + f

(17)

are the loading matrix and regression matrix, t ∈ R

component vector, and p (t ) = N (0, I ) , where I is an identity matrix, e ∈ R

q×1

m×1

is the principal

and f ∈ R

r ×1

are

process noises of the measurement and predicted variables, respectively. In this model, both two noises are 2 assumed to be Gaussian, p (e) = N (0, σ x I) , and p (f ) = N (0, σ y I ) , in which 2

σ x2 and σ y2 are noise

variances. An illustration of the supervised probabilistic PCA model is given in Figure 6.

12

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

µx

µy

t

P

Page 14 of 48

x

y

σ x2I

σ y2I

C

Figure 6: Illustration of the supervised probabilistic PCA model. The marginal probability p ( x, y ) is given as [57]

p ( x, y | P, C, σ x2 , σ y2 ) = ∫ p ( x | t, P, σ x2 ) p ( y | t, C, σ y2 ) p (t )dt

(18)

Similalry, the optimal parameter set {P, C, σ x , σ y } can be computed by maximizing the following 2

2

likelihood function n

L (P, C, σ x2 , σ y2 ) = ln ∏ p (xi , y i | P, C, σ x2 , σ y2 )

(19)

i =1

Detailed derivation of the EM algorithm for this model can be found in [57]. By considering multiple individual models, the mixture form of the supervised PCA model can be formulated as:

x i ,k = µ x ,k + Pk t i ,k + ei ,k ; y i ,k = µ y,k + Ck t i ,k + fi ,k K

K

k =1

k =1

x i = ∑ p ( k ) x i ,k ; y i = ∑ p ( k ) y i ,k

(20)

where k = 1, 2,L , K i = 1, 2,L , n , p ( k ) is the proportional value of each individual model, with K

constraint

∑ p(k ) = 1 .

Pk and Ck are weighting matrices of the k-th individual model, t k ∈ R q×1 is

k =1

the latent variable vector, e k ∈ R

m×1

and fk ∈ R

r ×1

are noise vectors of input and output variables in

each individual model. Similarly, the EM algorithm can be applied to determine the optimal parameter set by optimizing the following likelihood function:

13

ACS Paragon Plus Environment

Page 15 of 48 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

n

n

i =1

i =1

L( X, Y | Θ) = ln ∏ p ( x i , y i | Θ) = ∑ ln p ( x i , y i | Θ) n

K

i =1

k =1

(21)

= ∑ ln ∑ p( x i , y i | k , Θ) p( k ) where Θ = {Θ}k = {Pk , Ck , σ x ,k , σ y ,k , µ x ,k , µ y ,k } . Detailed derivation of the EM algorithm can be found 2

2

in [38].

2.1.3. Kernel probabilistic PCA model Similarly to the Probabilistic PCA model, the EM algorithm can also be used for parameter learning of the kernel PPCA method, which includes two iterative steps: Expectation (E-step) and Maximization (M-step). In the E-step, the model parameters are fixed for calculation of the expected distribution of the latent variable, given as [36]

E (t ) = ( PT P + β −1I ) −1 PT x

(22)

E (tt T ) = β −1 ( PT P + β −1I ) −1 + E (t ) E T (t )

(23)

In the M-step, the log-likelihood function is to be maximized with respect to the model parameters. As a result, the following updates can be obtained: n

n

i =1

i =1

Pˆ = (∑ xi t Ti )( ∑ t i tTi ) −1

(24)

β −1 = ∑ {xTi xi − 2 E T (t i | xi )Pˆ T xi + Tr[ E (t i tTi | xi )Pˆ T Pˆ ]}/(mn)

(25)

n

i =1

For kernel PPCA model learning, a nonlinear EM algorithm should be developed, which can be done by introducing a kernel trick in the E-step and the M-step, respectively [36]. In the first step, the sum of covariance can be defined as C =

n

∑t t

T i i

. Given X = [ x1 , x 2 ,L , x n ]

T

i=1

T ˆ can be rewritten as [36] and T = [t1 , t 2 ,L , t n ] , the parameter matrix P

P = XT TC−1

14

ACS Paragon Plus Environment

(26)

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 48

In order to determine the distribution of the latent variable in the E-step, PT P and P T x are calculated as follows

PT P = C−1TT XXT TC−1

(27)

PT x = C−1TT Xx

(28)

Through introducing the kernel trick K = XXT eqs. (27) and (28) can be reformulated as follows

PT P = C−1TT KTC−1

(29)

PT x = C−1TT k

(30)

where k = XxT is a kernel vector. Therefore, the first and second order statistics of the latent variable can be obtained as

E (t ) = (C−1TT KTC−1 + β −1I) −1 C−1TT k

(31)

E (tt T ) = β −1 (C−1TT KTC−1 + β −1I) −1 + E (t ) E T (t )

(32)

Then the matrix T and C can be re-calculated as

ˆ = KTC−1 (C−1TT KTC−1 + β −1I ) −1 T

(33)

ˆ = nβ −1 (C−1TT KTC−1 + β −1I) −1 + TT T C

(34)

It can be found that with the introduction of the kernel trick, the explicit nonlinear relationship between the original variable and the latent variable has been eliminated. Therefore, the EM algorithm of the kernel PPCA method can be iteratively calculated through eqs. (33) and (34), until the following optimization log-likelihood function gets converged [36]

n L ( X) = − {ln C−1TT KTC−1 + β −1I 2 −

β n

−1

−1

−1

(35) −1

−1

−1

tr[KTC (C T KTC + β I ) C T K ]} + cont T

T

where tr (⋅) is an operator for trace value calculation, cont represents the constant term in the log-likelihood function. 15

ACS Paragon Plus Environment

Page 17 of 48 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

2.1.4. Semi-supervised probabilistic PCA model Following a similar generative model structure, the semi-supervised probabilistic PCA model can be formulated as follows [48]

x i = µ x + Pt i + e i

(36)

y j = µ y + Ct j + f j

where i = 1, 2,L , n , j = 1, 2,L , n1 , n1 is the size of the labeled dataset, and n2 = n − n1 is the size of the unlabeled dataset. P ∈ R m×q , C ∈ R r ×q are weighted matrices, where m is the number of input variables, and r is the number of output variables. t ∈ R

q×1

is the latent variable vector, e ∈ R

m×1

and

f ∈ R r×1 are noises of input and output variables, respectively. An illustration of the semi-supervised probabilistic PCA model is provided in Figure 7.

t

µx

x

P

µy

t

x

y

σ x2I

σ y2I

C

Figure 7: Illustration of the semi-supervised probabilistic PCA model. Given labeled dataset

X1 = [x1 , x 2 ,L , x n1 ]T , Y = [ y1 , y 2 ,L , y n1 ]T

and unlabeled dataset

X 2 = [x n1 +1 , x n1 + 2 ,L , x n1 + n2 ]T , the marginal distribution of both input and output variables can be calculated as:

p (x j , y j | P, C, σ x2 , σ y2 ) = ∫ p (x j | t j , P, σ x2 ) p (y j | t j , C, σ y2 ) p (t j )dt j

(37)

p (x n1 +i ,| P, σ x2 ) = ∫ p (x n1 +i | t n1 +i , P, σ x2 ) p (t n1 +i )dt n1 + i

(38)

where j = 1, 2,L , n1 , i = 1, 2,L , n2 . Then, the log likelihood function can be derived as follows

16

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 48

L ( X, Y) = L ( X1 , Y) + L ( X 2 ) n1

n2

(39)

= ln ∏ p (x j , y j | P, C, σ , σ ) + ln ∏ p (x n1 +i | P, σ x2 ) 2 x

2 y

j =1

i =1

Through maximizing the Log likelihood function, the parameter set of the semi-supervised probabilistic PCA model Θ = {P, C, σ x , σ y } can be optimized. Similarly, the single semi-supervised probabilistic 2

2

PCA model can be easily extended to the mixture form, detailed information of which can be found in [48].

2.1.5. Robust Probabilistic PCA model Given dataset Y = {y n | y n ∈ R D }nN=1 , the robust PPCA method aims to learn the following latent variable model [40]

y = Wx + µ + e where W ∈ R

D×d

is the weighting matrix, x ∈ R denotes the latent variables, µ ∈ R d

(40) D

is the mean

values, and e refers to the measurement noise. Different from the traditional PPCA method, robust PPCA defines a student t-distribution for the latent variable. The prior of the latent space is given as

x ~ t ( 0, I d ,ν ) and the prior for the noise is given as e ~ t ( 0, Λ,ν ) , where ν is the degree of freedom. Λ = τ I D is the variance and I d is the d-dimensional identity matrix. The basic form of multivariate student t-distribution is given as follows:

 ν + d  −1/2 −(ν + d ) /2 Γ Σ  ( x − µ )T Σ −1 ( x − µ )  2   t ( x|µ, Σ,ν ) = 1 +   d /2  ν ν    Γ   (νπ ) 2

(41)



where Γ ( x ) = z



x −1 − z

e dz is the gamma function. Typically, this non-Gaussian distribution can be

0

derived by an infinite integration of Gaussians, given as follow [58]: ∞

t ( x | µ, Σ,ν ) = ∫ N ( x | µ, u −1Σ )Ga ( u | ν / 2,ν / 2 ) du . 0

17

ACS Paragon Plus Environment

(42)

Page 19 of 48 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

β α α −1 − β u Ga ( u | α , β ) = u e Γ (α )

(43)

A comparison between the Gaussian distribution and student-t distribution is shown in Figure 8, while the difference between the basic and robust probabilistic PCA models is illustrated in Figure 9.

Figure 8: Comparison between Gaussian distribution and student-t distribution.

µ

t

P

x

β −1I

µ

x

u

W

y

e

Figure 9: Comparison between probabilistic PCA and robust probabilistic PCA models. Similarly, the single robust probabilistic PCA model can be extended to the mixture form. Given dataset Y , the mixture robust PPCA model aims to approximate the overall data distribution by a mixture of J local RPPCA individual models, given as follows [58]: J

p ( Y ) = ∑ c j p (Y | θ j ) .

(44)

j =1

J

where c j is the component weight subjected to

∑c

j

{

= 1 , θ j = µ j ,τ j , Wj ,ν j

j =1

{

for the jth individual model. The model parameter set Θ = c j , θ j

}

}

is the parameter set

J j =1

following complete data log-likelihood

18

ACS Paragon Plus Environment

can be derived by maximizing the

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

J

Page 20 of 48

N

log L ( Θ )mrppca = ∑∑ z jn log p ( x jn , y n ; θ j ) .

(45)

j =1 n =1

Here, the indicator variable z jn = 1 if the nth sample is from the jth individual model. The parameters can be iteratively optimized through the EM algorithm. Similar to the probabilistic PCA model, the robust model can also be made supervised, which can describe the relationship between two sets of process variables. Therefore, a robust regression model can be derived. With this regression model, a robust soft sensor can be constructed for online estimation of key variables in the process. Detailed descriptions of algorithms and industrial applications of the robust supervised probabilistic PCA model can be found in [41].

2.1.6. Bayesian regularization of probabilistic PCA model It is noted that an important assumption of the probabilistic PCA model is that the dimensionality of the latent variable k is known beforehand. In fact, if the number of data samples is limited, the selection of latent variable number will become problematic. This is because the probabilistic PCA method itself does not provide any mechanism to determine the effective latent variable dimensionality. If there are not enough data samples available for cross-validation, it is difficult to determine this important number. Therefore, it is desired that the number of effective latent variables could be determined automatically through the model development step, especially when training data samples are limited. Fortunately, this problem can be well solved by the Bayesian regularization method, in which a prior distribution over the weighting matrix P is introduced. Then the posterior distribution of the weighting matrix p(P | X) can be determined by multiplying the prior and the likelihood function. In this method, a hyperparameter vector α = {α1 , α 2 , L , α d } is introduced to control the dimensionality of the latent space, which is defined as [59]

19

ACS Paragon Plus Environment

Page 21 of 48 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

d

p(P | α) = ∏ ( i =1

α i m /2 1 2 ) exp{− α i pi } 2π 2

(46)

where p i is the i-th column of the weighting matrix P . Each α i controls the inverse variance of p i , if it is a large value, the corresponding p i will tend to be very small, and thus can be removed from the latent space loading matrix. For Bayesian regularization of the mixture probabilistic PCA model, a hyperparameter matrix can be defined as follows [43]

α11 α12 L α1d  α α L α  21 22 2d  α=  M M α ci M    α C1 α C 2 L α Cd 

(47)

where c = 1, 2,L , C is the number of local model in the mixture model, i = 1, 2,L , d = m − 1 . Furthermore, if the information of the quality/key variables can be incorporated, the Bayesian regularization method can also be applied to the supervised probabilistic PCA model, where two hyperparameter matrices can be defined, given as

α11 α12 L α1d  α α L α  21 22 2d  α=  M M α ci M    αC1 αC 2 L αCd 

(48)

 β11 β12 L β1d  β β L β  21 22 2d  β=  M M β ki M     β K 1 β K 2 L β Kd 

(49)

where c = 1, 2,L , C is the number of local model in the mixture model, i = 1, 2,L , d = m − 1 . Each

α i or β i in the hyperparameter matrices controls the inverse variance of the corresponding vector in the weighting matrices. More detailed information about the Bayesian regularization method for the supervised probabilistic PCA model can be found in [44].

20

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 48

2.2. Factor analysis Similar to the basic PPCA method, the aim of FA model training is to find the optimal parameter set

Θ = {P, Σ} , the model structure of which is described as follows x = µ + At + e where A = [a1 , a 2 , ⋅⋅ ⋅, ak ] ∈ R

m×k

(50)

is the loading matrix, the variances matrix of measurement noise e

is denoted as Σ = diag{σ p } p =1,2,L,m , in which different noise levels have been assumed for different 2

measurement variables. An illustration of the FA model is shown in Figure 10.

µ

t

A

x

Σ

Figure 10: Illustration of the FA model. In the past years, quite a lot of applications have been made based on the FA model. For example, Kim et. al. [60] developed a FA based model for calibration, prediction and process monitoring under an incomplete data case; Setarehdan [61] proposed a modified evolving window based FA model and applied it for process monitoring; Ilin et. al. [62] developed a nonlinear dynamical factor analysis model for state change detection; Jiang et. al. [63] proposed a resolution method for two-way data from on-line Fourier-transform Raman spectroscopic, which is based on parallel vector analysis (PVA) and window factor analysis (WFA). Wise et. al. [64] made a comparison among principal component analysis, multiway principal component analysis, trilinear decomposition and parallel factor analysis for fault detection in a semiconductor etch process. Surribas et. al. [65] combined the parallel factor analysis method with PLS regression model, and applied to the on-line monitoring of Pichia pastoris cultures. Amigo et. al. [66] developed an on-line parallel factor analysis model for monitoring of bioprocesses. Bahram and 21

ACS Paragon Plus Environment

Page 23 of 48 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Mabhooti [67] proposed a rank annihilation factor analysis method for multicomponent kinetic spectrophotometric determination by using difference spectra. Ge and Song [68] extended the single FA model to the mixture model, and developed a monitoring scheme for multimode industrial processes. Kompany-Zareh [69] formulated an on-line monitoring method for a continuous pharmaceutical process by using parallel factor analysis and unfolding multivariate statistical process control representation. Wen et. al. [70] proposed a data-based linear Gaussian state-space model for monitoring dynamic processes, which is actually a special dynamic form of the traditional FA model. Jiang and Yan [71] modified the basic factor analysis model, and used it for multivariate statistical process monitoring. Kuo et. al. [72] used a dynamic factor analysis model for Identifying nearshore groundwater and river hydrochemical variables influencing water quality of Kaoping River Estuary. Ma and Shi [73] proposed a multimode process monitoring method based on aligned mixture factor analysis. Tong et. al. [74] formulated a novel alternating least-squares method based on fixed region scanning evolving factor analysis (FRSEFA) and made an application for process monitoring. Jiang and Yan [75] developed an adaptively weighted factor analysis model and used it for probabilistic monitoring of chemical processes. More recently, Zhao et. al. [76] made a probabilistic analysis of monitoring statistics for the factor analysis model in presence of both complete and incomplete measurements. Zhu et. al. [77] proposed a Bayesian robust factor analyzer based Dirichlet process mixture model for modeling multimode process data. Melendez et. al. [78] developed a parallel factor analysis model for monitoring data from a grape harvest in Qualified Designation of Origin Rioja including spatial and temporal variability. Ge recently extended the basic FA model to supervised form, based on which different soft sensor models were constructed for online measurements of key variables in the process [79]. Yao and Ge [80] developed a locally weighted prediction method for latent factor analysis with supervised and semi-supervised process data, and later extended it to the adaptive form

22

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

for the application in state shifting processes [81]. Ge and Chen [82] recently developed a novel dynamic probabilistic latent variable model for process data modeling and regression application. Figure 11 provides an overview of the relationships among different counterparts of the factor analysis model. Similar to the probabilistic PCA model, different counterparts of the FA model can also be transformed to each other by incorporating additional information or use different model structures. Toe examine the research statuses of different FA models, Figure 12 and Figure 13 illustrate the application statuses of those models in general areas and for process data analytics. It can be seen that the dynamic factor analysis model has dominated the most applications in both general areas and process data analytics. Besides, the mixture FA model, robust FA model, and the parallel FA model have also play important roles in the applications for process data analytics. Detailed illustrations of several representative FA models and

Le ar n un ing la w be ith le d lab sa el m ed pl es and

r ea lin n l o /n d e ng o ni ce m r ea a l l sp ne tate r e K s

their applications for process data analytics are introduced in the following subsections.

Ti m eda t

na sio en m ts d i se e- ta re da Th

se rie s a

l

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 11: Overview of different counterparts of the factor analysis model.

23

ACS Paragon Plus Environment

Page 24 of 48

Page 25 of 48 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Figure 12: General application status of different FA models.

Figure 13: Application status of different FA models in process data analytics.

2.2.1. Mixture FA model In the mixture FA model, it is assumed that K local FA models are incorporated. As a result, the multivariate distribution of the process variable can be given as K

p ( x) = ∑ p ( x | k ) p (k )

(51)

k =1

K

where p ( k ) is the mixing proportion of each local FA model, subjected to

∑ p(k ) = 1 . Then k =1

24

ACS Paragon Plus Environment

p(x | k )

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 26 of 48

can be calculated as follows

p ( x | k ) = ∫ p ( x | t , k ) p ( t | k ) p ( k ) dt

(52)

which is a Gaussian distribution N (µ k , Pk PkT + Σ k ) . A description of the mixture FA model is given in Figure 14. µ1

t1

Σ1

µ2

µK

t2

ΣK

Σ2

A1

x1

tK

A2

AK

x2

xK

x Figure 14: Illustration of the mixture FA model. Given dataset X ∈ R n×m , the aim of the mixture FA model is to estimate the parameter sets

Θ k = {µ k , Pk , Σ k , p (k )}(k = 1, 2,L , K ) through maximizing the following Log likelihood function n

n

K

i =1

i =1

k =1

L ( X, Θ) = ln ∏ p (xi | Θ) = ∑ ln[∑ p (xi , k | Θ)]

(53)

Again, the EM algorithm can be employed to avoid the complex nonlinear optimization problem. The objective function for the mixture MLMFA model is given as n

K

E ( L) = ∑∑ p (k | xi , Θ old ) ln( p (xi , t , k | Θ new )) i =1 k =1

n

(54)

K

= ∑∑ p (k | xi , Θ old ){ln[ p (k | Θ new )] + ∫ p (t | xi , k , Θ old ) ln( p (xi , t | Θ new )) dt} i =1 k =1

Through iteratively update the E-step and the M-step of the EM algorithm, the optimal parameter set can be determined. More detailed description of the mixture FA model can be found in [68].

25

ACS Paragon Plus Environment

Page 27 of 48 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

2.2.2. Robust mixture FA model Given dataset Y = { y n } , y n ∈ R , n = 1, 2,..., N . The aim of robust mixture FA is to obtain the P

distribution combined by a finite mixture of K local Robust FA components as follows: K

p (Y ) = ∑ ci p (Y | θi ) .

(55)

i =1

K

∑c

= 1 , θ i is the parameter set for the ith

y n = Wi x in + µ i + ei

(56)

Here, ci is the component weight, which is subjected to

i

i =1

local model. The structure of each local model is given as:

The Student’s t distribution for both latent and measurement variables is defined as [83]:

 ν + P  −1/2 −(ν + P ) /2 Γ Σ  ( y − µ )T Σ −1 ( y − µ )  2   t ( y|µ, Σ,ν ) = 1 +  .  P /2  ν ν    Γ   (νπ ) 2

(57)



Here, Γ ( x ) = z



x −1 − z

e dz is the gamma function. µ and Σ refer to the mean vector and diagonal

0

covariance matrix,

ν is the prior parameter which is also known as the degree of freedom. The

probabilistic model structure for robust mixture FA is depicted in Figure 15.

x

U Z

Y

A

Σ

Figure 15: Illustration of the robust mixture FA model. For simplicity, a binary latent indicator variable zni ∈ {0,1} can be defined for each measurement, and z ni = 1 if y n is from the ith local model. Similarly, the optimal parameter set of robust Mixture FA can be determined by the EM algorithm, through maximizing the following complete data log-likelihood function [83]: I

N

log L ( Θ ) RMFA = ∑∑ zin log p ( x in , y n ; θi ) . i =1 n =1

26

ACS Paragon Plus Environment

(58)

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 28 of 48

2.2.3. Dynamic FA model Based on the structure of the state space model, the dynamic FA model can be formulated as [70]

t ( k + 1) = At ( k ) + w (k + 1)

(59)

x(k + 1) = Pt ( k + 1) + v ( k + 1)

where k = 1, 2,L , N − 1 is the sample number of process data, A ∈ ℜl ×l is the state space matrix,

P ∈ ℜm×l is the loading matrix, w ~ N (0, Γ) and v ~ N (0, Σ) are noises with zero mean and variance matrices Γ and Σ . The distribution of the latent variables t is assumed to be Gaussian

t ~ N (µ, V ) . The initial distribution of the latent variable t is assumed as t (1) ~ N (µ1 , V1 ) . Therefore, the model parameter set of the dynamic FA model can be represented as Θ = {A, P, Σ, Γ, µ1 , V1} . An illustration of the dynamic FA model structure is depicted in Figure 16.

A

Γ

t1

t2

t N −1

tT

x1

x2

x N −1

xN

Σ

P

Figure 16: Illustration of the dynamic FA model. If the state transition matrix A of system (59) is set to be zero, then the dynamic model degrades to a static model. In static models such as PPCA and FA, the process can be explained by latent variables whose dimensionality is lower than the measurements, due to the correlations among different variables. In the dynamic FA model, the data information can be captured by those state variables defined in a lower dimensional space. The outputs of model (59), x , are measurements of the system, while t are denoted as hidden variables. Given the model parameters Θ and measurements data x(1) , x(2) , L , x( k ) ,

27

ACS Paragon Plus Environment

Page 29 of 48 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

the hidden states t (1) , t (2) , L , t ( k ) , as well as the later measurements x( k + 1) , x(k + 2) , L , can be estimated via the Kalman filter. Given the initial distribution of the latent variable, the conditional distributions of the predicted latent variable and the original variable can be formulated, which also follow Gaussian distributions, namely

p (t k +1 | t k ) = N ( At k , Γ) , p ( x k +1 | t k +1 ) = N (Pt k +1 , Σ) . The log likelihood function of the complete data can be given as follows [70] n

n

k =1

k =2

ln p ( X, T | Θ ) = ∑ ln p ( x k | t k , P, Σ ) + ∑ ln p (t k | t k −1 , A, Γ, ) + ln p (t1 | µ1 , V1 )

(60)

where X and T are datasets of the measurement and latent variables, respectively. To maximize the complete-data log likelihood, the parameter set Θ can be estimated through an iterative EM algorithm. More detailed information about the parameter learning process can be referred to [70].

2.2.4. Bayesian robust FA model In the Bayesian robust FA model, the conditional distribution for variance τ is defined with Gamma formula as: I

P

p( τ ) = ∏∏ Ga(τ i | aτ , bτ )

(61)

i =1 p =1

where aτ , bτ are the priors. The conditional distribution for mean vector µ given scaling prior and variance can then be defined as follows: I

P

p(µ | β, τ ) = ∏∏ N ( µip | 0, βi−1τ ip−1 )

(62)

i =1 p =1

Here, scaling variable β is introduced to control the variance range of mean vector, its explicit distribution is given by a Gamma function: I

p (β) = ∏ Ga( βi | aβ , b β ) . i =1

28

ACS Paragon Plus Environment

(63)

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 30 of 48

The variance scaling variable U is given as: z

ν ip ν ip  ni  p(U | Z, ν) = ∏∏∏ Ga  unip | ,  , 2 2  n =1 i =1 p =1  N

I

P

(64)

Given mixture coefficients, the latent indicator Z can be defined as follows: N

I

p ( Z | π ) = ∏∏ π izni .

(65)

n =1 i =1

Notice that a distinguished scaling variable has been defined to capture the various characteristics from each implicated dimension. For simplicity, the latent variable X can be defined as follows [84]: N

zni

I

p ( X | Z ) = ∏∏ N ( x ni | 0, I Q ) ,

(66)

n =1 i =1

where I Q denotes the Q-dimensional identity matrix. Then, the measurement space is given as: N

I

P

zni

−1 p ( Y | Z, U, X, W, µ, τ ) = ∏∏∏ N ( ynp | w ip x ni + µip ,τ ip−1unip ) .

(67)

n =1 i =1 p =1

Here Wip refers to the pth row of loading matrix, whose distribution is given as [85]: I

P

Q

p ( W | τ, α ) = ∏∏∏ N ( wipq | 0,τ ip−1α iq−1 ) .

(68)

i =1 p =1 q =1

where the prior scaling variable is given as: I

Q

p ( α ) = ∏∏ Ga (α iq | aα , bα ) .

(69)

i =1 q =1

It should be noted that a prior on each column of the projection matrix w ip. is defined, based on which each dimension in the latent space can be adjusted by the column of the corresponding score matrix. More detailed information about the Bayesian robust factor analysis model can be referred to [77].

2.3. Probabilistic PLS Different from the basic PPCA method, the main idea of the probabilistic PLS model is to use a part of latent variables of x to explain y , and simultaneously keep the rest latent variables to explain its own 29

ACS Paragon Plus Environment

Page 31 of 48 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

information. The model structure of probabilistic PPLS is given as [86]

where P ∈ R

m×qs

, C∈ R

r ×qs

x = µ x + Pt s + Qt b + e x

(70)

y = µ y + Ct s + e y

(71)

and Q ∈ R

m×qb

are loading matrices, t s ∈ R

vector that used to explain the information of y , t b ∈ R

qb ×1

qs ×1

is the latent variable

is the rest latent variable vector that used to

explain x , µ x and µ y are mean vectors of x and y , e x ∈ R

m×1

and e y ∈ R

r ×1

are measurement

noises of x and y . An illustration of the probabilistic PLS model is provided in Figure 17.

µx

ts

tb

µy

C

x

y

C

P

Σx

Σy

Figure 17: Illustration of the probabilistic PLS model. In the PPLS model, both latent variable and the measurement noise are assumed to be Gaussian, thus

p ( t s ) = N (0, I) ,

p ( t b ) = N (0, I) ,

p (e x ) = N (0, Σ x ) , and

p (e y ) = N (0, Σ y ) . Here,

heterogeneous noise variances have been assumed for both x and y , thus Σ x = diag{σ x ,u }u =1,2,L,m 2

and

Σ y = diag{σ y2,v }v =1,2,L,r

.

Given

datasets

X = [x1 , x 2 ,L, x n ]T ∈ R n×m

and

Y = [ y1 , y 2 ,L, y n ]T ∈ R n×r , the optimal parameter set of the probabilistic PLS model

Θ = {µ x , µ y , P, Q, C, Σ x , Σ y } can be determined by maximizing the following log-likelihood function n

L( X , Y | µ x , µ y , P, Q, C, Σ x , Σ y ) = ln ∏ p ( x i , y i | µ x , µ y , P, Q, C, Σ x , Σ y )

(72)

i =1

Similarly, the single probabilistic PLS model can be extend to the mixture form, in which K local probabilistic PLS components are assumed to be incorporated. The mixture form of the probabilistic PLS

30

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 32 of 48

model is given as K

K

k =1

k =1

K

K

x i = ∑ p ( k )x i ,k = ∑ p ( k )(µ x ,k + Pk t is,k + Qk t ib,k + e x ,i ,k ) (73)

y i = ∑ p( k ) y i ,k = ∑ p ( k )(µ y ,k + C t + e y ,i ,k ) s y , k i ,k

k =1

k =1

where i = 1, 2,L , n , n is the number of data samples. Pk ∈ R are loading matrices of the k-th individual model, t i ,k ∈ R s

explain the information of y i , t i ,k ∈ R b

qb ×1

qs ×1

m × qs

, Ck ∈ R

r × qs

and Q k ∈ R

m×qb

is the latent variable vector that used to

is the rest latent variable vector that used to explain x i ,

µ x and µ y are mean vectors of x i and y i , e x ,i ,k ∈ R m×1 and e y ,i ,k ∈ R r×1 are measurement noises in the k-th local model. Given the datasets X = [x1 , x 2 ,L , x n ] ∈ R T

n ×m

, Y = [ y1 , y 2 ,L , y n ] ∈ R T

n×r

, the optimal

parameter set Θ = {Θk }k =1,2,L,K = {µ x ,k , µ y ,k , Pk , Qk , Ck , Σ x ,k , Σ y ,k } of the mixture model can be obtained by maximizing the following likelihood function n

n

i =1

i =1

L( X, Y | Θ) = ln ∏ p ( x i , y i | Θ) = ∑ ln p ( x i , y i | Θ) n

K

i =1

k =1

(74)

= ∑ ln ∑ p( x i , y i | k , Θ) p( k ) K

where p ( k ) is the mixing proportional value of each local model, subjected to

∑ p(k ) = 1 . Detailed k =1

derivation of EM algorithm for the mixture probabilistic PLS model is provided in [86].

2.4. Probabilistic ICA Similar to PPCA, the model structure of PICA is provided as the following generative form

x n = Asn + e

(75)

where sn is the non-Gaussian distributed variable vector and noise e is assumed to be Gaussian. Suppose there are a total of r independent variables, the prior and likelihood can be defined as [87]:

31

ACS Paragon Plus Environment

Page 33 of 48 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

r

p ( sn ) = ∏ p ( snj )

(76)

j =1

p ( x n | sn , A, β ) =

β − β2 (x e 2π

n − Asn

)T ( x n − Asn )

(77)

The probabilistic distribution of all non-Gaussian variables can be approximated by the means of explicit Gaussian mixtures. This method, however, one needs to determine the mixture components which should be quite difficult. Alternatively, the Student’s t distribution can be employed to deal with the non-Gaussianity of the variables. Actually, the Student’s t distribution with adjustable tails can be essentially considered as an infinite mixture of Gaussians with various scaling variances. In other words, the Student’s t sources (with zero mean in ICA) unify Gaussian mixtures as a limit case [88], which can be given as:

ν + 1 Γ j   2  p ( snj ) = t ( snj |0,σ 2j ,ν j ) = ν  Γ  j  ν jπσ 2j 2 ∞

(

 ( s j )2  1 + n   ν jσ 2j   

(

)

− ν j +1 /2

(78)

)

= ∫ N snj | 0, ( unj ) σ 2j Ga ( unj | au j , bu j ) dunj 0

where Γ ( x ) =



∫z

−1

e dz is the gamma function, Ga ( •) denotes Gamma distribution with priors

x −1 − z

0

au j = ν j 2 and bu j = ν jσ 2j 2 , σ j is the scaling value, ν j is the degree of freedom that regulates the tail heaviness, the Student’s t distribution tends to be Gaussian as ν → ∞ [89]. For comparison, the graphical structures for both Probabilistic PCA and probabilistic ICA are shown in Figure 18.

32

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 34 of 48

σ

µ

t

P

x

β −1I

µ

s

A

x

ν

u

β −1I

Figure 18: Graphical structures for probabilistic PCA and probabilistic ICA For parameter learning of the probabilistic ICA model, one can resort to the variational Bayesian EM (VBEM) method for some deterministic approximations [90][91]. By defining T ~ {S, U} for latent variables, the log-likelihood is given as follows

log p ( X | Θ ) = log

p ( T, X | Θ ) p (T | X, Θ)

(79)

Through introducing an auxiliary distribution q ( T) as the variational term, we can get

log

p ( T, X | Θ) p ( T, X | Θ ) q ( T ) dT = log ∫ q ( T ) p ( T | X , Θ) p ( T | X, Θ ) q ( T )

p ( T, X | Θ ) p ( T | X, Θ)   ≥ ∫ q ( T ) log − log  dT q ( T) q ( T )  

(80)

= F ( q ( T ) , Θ ) + KL ( q || p ) ≥ F ( q ( T ) , Θ) It should be noted that the first inequality is obtained based on Jensen’s inequality, while the second one can be derived from the fact that Kullback-Leibler divergence is non-negative and KL ( q || p ) = 0 only if

(

the variational function equals to the true conditional posterior. F q ( T ) , Θ

)

is called lower bound,

maximizing which is equivalent to maximize the log-likelihood since one has

log p ( X | Θ ) − F ( q ( T ) , Θ ) = KL ( q || p ) ≥ 0

(81)

It is worth to notice here that the VB method tries to minimize the hidden variable distributions. However,

33

ACS Paragon Plus Environment

Page 35 of 48 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

the optimization of lower bound is still hard due to the intractable marginalization calculation. Hence, variational approximate can be further applied by factorizing the latent distributions independently as

q ( T) ≈ q ( S ) q ( U ) . By using mean field approximation and further constraining each latent distribution as an easy-to-handle conjugate-exponential-family distribution, a VBEM algorithm can be efficiently developed. More detailed derivation of the VBEM algorithm for the probabilistic ICA model can be found in [92].

2.5. Discussions and Summary Besides of the probabilistic latent variable models that were introduced in the above subsections, there are also some other probabilistic latent variable models that have been introduced for process data analytics in the past years. Here are some recent application examples. Ge and Song [93] introduced a nonlinear probabilistic latent variable model for process monitoring, which is called generative topographic mapping. Compared to the linear probabilistic PCA method, the monitoring performance has been improved by the generative topographic mapping method. Based on the Gaussian process model structure, a Gaussian process latent variable model was introduced for nonlinear probabilistic monitoring [94], which can be considered as a special form of nonlinear probabilistic PCA model. Chen and Jiang [95] developed hidden semi-Markov models for diagnosis of multiphase batch operation. Yu [96] developed a multiway discrete hidden Markov model-based approach for dynamic batch process monitoring and fault classification. A support vector clustering-based probabilistic method was proposed for unsupervised fault detection and classification of complex chemical processes [97]. Sen et. al. [98] developed a multiway continuous hidden Markov model-based approach for fault detection and diagnosis. Bartolucci et. al. [99] provided a review on a general latent Markov model framework for the analysis of longitudinal data. Besides, several books on the topic of hidden Markov models and their applications have also been published [100][101]. Jiang 34

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

and Yan [102] extended the neighborhood preserving embedding model to the probabilistic weighted form, and applied it for chemical process monitoring. Li et. al. [103] proposed an increasing mapping based hidden Markov model for dynamic process monitoring and diagnosis. More recently, Shang et. al. [104] constructed a probabilistic slow feature analysis-based representation learning from massive process data, and used it for soft sensor modeling. Escobar et. al. [105] combined generative topographic mapping and graph theory unsupervised approach for nonlinear fault identification. Wen et. al. [106] developed a multimode process monitoring scheme based on mixture canonical variate analysis model. Chen and Ge [107] developed a switching linear dynamical system-based approach for process fault detection and classification purposes. Zhu et. al. [108] proposed a Bayesian robust linear dynamic system approach for dynamic process monitoring. Zhou et. al. [109] developed a multimode process monitoring scheme based on switching autoregressive dynamic latent variable model. Yuan et. al. [110] proposed a weighted linear dynamic system for feature representation and soft sensor application in nonlinear dynamic industrial processes. Due to the limited page of the current paper, we are unable to review all probabilistic latent variable models that have been used for process data analytics. The application statuses of some representative probabilistic latent variable models are summarized here, including probabilistic PCA, factor analysis, probabilistic PLS, probabilistic ICA, hidden Markov model, linear dynamical system, Gaussian process latent variable model, and generative topographic mapping. Figure 19 describe the application statuses of those probabilistic latent variables in general research areas, while the application statuses in process data analytics are summarized in Figure 20. It can be seen that the factor analysis model is the most widely used in general research areas, while the hidden Markov model and the linear dynamical system model have also accounted lots of applications. For the application of process data analytics, probabilistic PCA, factor analysis and hidden Markov model are three

35

ACS Paragon Plus Environment

Page 36 of 48

Page 37 of 48 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

most popular methods which have occupied almost 80% of all applications up to date. Although the application examples of other probabilistic latent variable models are quite limited, it is no doubt that their potential abilities in process data analytics will be further extracted. At the same time, it can be expected that more and more new probabilistic latent variable models will be developed or introduced for process data analytics.

Figure 19: Application statuses of different probabilistic latent variable models in general research areas.

Figure 20: Application statuses of different probabilistic latent variable models in process data analytics.

36

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

3. Perspectives for future research With more and more requirements for data analytics in the process industry, the probabilistic latent variable model will keep playing an important role. While lots of probabilistic latent variable models have already been introduced for process data analytics in the past several decades, the power of probabilistic latent variable models has far not yet been fully explored. At the same time, there are some challenging problems of the probabilistic latent variable itself or when it is applied for process data analytics that need further investigations and should be well addressed. The following subsections provide discussions and highlights on some important issues related to the probabilistic latent variable model driven process data analytics, which we think may lead more future researches on this topic.

3.1. Model selection and performance evaluation For data-driven modeling, model selection is always an important issue that needs to be paid attention. Generally speaking, all data models are wrong but some are useful. Therefore, we need to assume a model structure when carrying out process data modeling and analytics. However, this is quite a difficult task which may related to the nature of the process, the data characteristic of particular operating conditions, the aim of data modeling and analytics, and so on. For probabilistic latent variable model, particularly, an appropriate type of the probabilistic model needs to be selected on the basis of process analysis and data examination. For example, if all process variables are Gaussian distributed and they are linear correlated with each other, a linear Gaussian probabilistic latent variable model such as probabilistic PCA is enough for data characterization; If some process variables are non-Gaussian distributed, or the relationships among different variables are nonlinear, then a non-Gaussian/nonlinear probabilistic latent variable models need to be employed; If the dynamic nature of the process data is significant, a dynamical probabilistic latent variable model is required. 37

ACS Paragon Plus Environment

Page 38 of 48

Page 39 of 48 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

When a data model has been selected, its performance needs to be evaluated with the application in the process. If a degradation of the performance has been detected, the model should be updated or replaced, otherwise, both effectiveness and efficiency of process data analytics may be greatly deteriorated. However, how to evaluate the performance of the data model is still an open question. Recently, Ge and Liu [111] proposed an analytic hierarchy process based fuzzy decision fusion system for model performance evaluation and prioritization, based on which an application for process monitoring has been carried out. For probabilistic latent variable model, the performance evaluation scheme needs to be designed specifically, which should take into account both of the probabilistic model structure and the impacts of latent variables. To date, however, performance evaluation for probabilistic latent variable models has rarely been researched, although it should be a key step in process data analytics.

3.2. Large-scale process data analytics With the development of the modern process industry, the scale of the process has become very large. Recently, modeling and monitoring for large-scale processes or plant-wide processes become quite popular. For probabilistic latent variable models, it could be a challenging to handle the big dataset from large-scale processes, like other data-driven methods. Fortunately, a distributed modeling method has been proposed for plant-wide process monitoring, which provided a general modeling framework for large-scale process data analytics [112][113]. Based on this distributed modeling framework, distributed probabilistic latent variable models can be developed. However, several important issues need to be considered in order to guarantee the effectiveness of the distributed probabilistic latent variable model. First, how to divide the whole process into different blocks should be reconsidered under the probabilistic latent variable modeling framework. Second, model selection and performance evaluation in various parts of the process need to be well addressed, as well as model validation and maintenance. Third, how to effectively integrate different 38

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

probabilistic latent variable models is also an important step for large-scale process analytics. Furthermore, the big data problem which has been quite popular in recent years needs to be well considered, particularly for large-scale industrial processes. Several pieces of research on the topic of big data have already been published for process data analytics [114][115][116].

3.3. Probabilistic model fusion Although different types of probabilistic latent variable models have been introduced for process data analytics in the past years, they have their own abilities to deal specific data characteristics and process natures. In other words, for any single probabilistic latent variable model, while it provides a good modeling and analytical performance in one condition, it may not have satisfactory results under another modeling environment. Therefore, different probabilistic latent variable models have diverse modeling performances in a specific process. It has been illustrated that model selection and performance evaluation provide an effective way to control the model structure for the probabilistic latent variable model. However, an alternative method is to fusion different probabilistic latent variable models. To do this, the advantages of various probabilistic latent variable models can be enhanced, while their shortcomings can be significantly reduced. A more promising way for performance improvement is to use model selection as an initial step, based on which different probabilistic latent variable models are fused for further process data analytics.

4. Conclusions In this paper, a tutorial review on probabilistic latent variable models has been carried out for the purpose of process data analytics. Particularly, four typically used probabilistic latent variable models

39

ACS Paragon Plus Environment

Page 40 of 48

Page 41 of 48 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

including probabilistic PCA, factor analysis, probabilistic PLS, and probabilistic ICA are illustrated, with their research statuses discussed in detail. Besides of those basic forms of probabilistic latent variable models, more PLVM counterparts have been introduced for process data analytics. Application statuses of different types of probabilistic latent variable models are discussed and analyzed. For future research on this topic, several perspectives are highlighted, including model selection and performance evaluation, large-scale process data analytics, and probabilistic model fusion.

Acknowledgements This work was supported in part by the National Natural Science Foundation of China (NSFC) (61722310, 61673337), the Natural Science Foundation of Zhejiang Province (LR18F030001), and the Fundamental Research Funds for the Central Universities 2018XZZX002-09.

References [1] Ge Z, Song Z, Ding S, Huang B. Data mining and analytics in the process industry: the role of [2] [3] [4] [5] [6] [7] [8]

machine learning. IEEE Access, 2017, 5, 20590-20616 Ge Z. Review on data-driven modeling and monitoring for plant-wide industrial processes. Chemometrics & Intelligent Laboratory Systems, 2017, 171, 16-25. Kano M, Ogawa M. The state of the art in chemical process control in Japan: Good practice and questionnaire survey. J. Proc. Cont. 2010; 20: 969–982. Qin SJ. Survey on data-driven industrial process monitoring and diagnosis. Annual Reviews in Control. 2012;36(2):220-234. Ge Z, Song Z, Gao F. Review of recent research on data-based process monitoring. Industrial & Engineering Chemistry Research, 2013, 52, 3543-3562. J. MacGregor, A. Cinar, Monitoring, fault diagnosis, fault-tolerant control and optimization: Data driven methods, Computers and Chemical Engineering, 2012, 47, 111-120. S. Khatibisepehr, B. Huang, S. Khare, Design of inferential sensors in the process industry: A review of Bayesian methods. J. Process Control (2013), 23, 1575-1596. Yao Y, Gao F. A survey on multistage/multiphase statistical modeling methods for batch processes. Annual Reviews in Control, 2009, 33, 172-183.

40

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

[9] Wang XZ. Data mining and knowledge discovery for process monitoring and control. Springer,

London, 2012. [10] Chiang, L. H.; Braatz, R. D.; Russell, E. L., Fault detection and diagnosis in industrial systems. [11] [12] [13] [14] [15] [16]

[17] [18] [19] [20]

[21]

[22]

[23]

[24] [25] [26] [27] [28]

Springer: 2001. Kruger, U.; Xie L., Statistical monitoring of complex multivariate processes: JohnWiley & Sons Ltd, West Sussex, UK, 2012. Ge, Z.; Song, Z., Multivariate Statistical Process Control: Process Monitoring Methods and Applications. Springer, London, 2013. Liu H, Shah S, Jiang W. On-line outlier detection and data cleaning. Computers and Chemical Engineering, 2004, 28, 1635-1647. Imtiaz S, Shah S. Treatment of missing values in process data analysis. Canadian Journal of Chemical Engineering, 2008, 86, 838-858. Nelson P, Taylor P, MacGregor J. Missing data methods in PCA and PLS: Score calculations with incomplete observations. Chemometrics and Intelligent Laboratory Systems, 1996, 35, 45-65. Russell E, Chiang L, Braatz R. Fault detection in industrial processes using canonical variate analysis and dynamic principal component analysis. Chemometrics and Intelligent Laboratory Systems, 2000, 51, 81-93. Choi S, Morris J, Lee I. Dynamic model-based batch process monitoring. Chemical Engineering Science, 2008, 63, 622-636. Kourti T, Lee J, MacGregor J. Experiences with industrial applications of projection methods for multivariate statistical process control. Computers and Chemical Engineering, 1996, 20, 745-750. Thornhill N, Cox J, Paulonis M. Diagnosis of plant-wide oscillation through data-driven analysis and process understanding. Control Engineering Practice, 2003, 11, 1481-1490. Chiang L, Russell E, Braatz R. Fault diagnosis in chemical processes using Fisher discriminant analysis, discriminant partial least squares, and principal component analysis. Chemometrics and Intelligent Laboratory Systems, 2000, 50, 243-252. Venkatsubramanian V, Rengaswamy R, Yin K, Kavuri S. A review of process fault detection and diagnosis Part I: Quantitative model-based methods. Computers and Chemical Engineering, 2003, 27, 293-311. Kassidas A, Taylor P, MacGregor J. Off-line diagnosis of deterministic faults in continuous dynamic multivariable processes using speech recognition methods. Journal of Process Control, 1998, 8, 381-393. Chen T, Morris J, Martin E. Probability density estimation via an infinite Gaussian mixture model: application to statisticalprocess monitoring. Journal of the Royal Statistical Society Series C-Applied Statistics, 2006, 55, 699-715. Thornhill N, Choudhury M, Shah S. The impact of compression on data-driven process analyses. Journal of Process Control, 2004, 14, 389-398. Wang R, Edgar T, Baldea M, Nixon M, Wojsznis W, Dunia R. Process fault detection using time-explicit Kiviat diagrams. AIChE Journal, 2015, 61, 4277-4293. Dunia R, Edgar T, Nixon Mark. Process Monitoring Using Principal Components in Parallel Coordinates. AIChE Journal, 2013, 59, 445-456. Bartolucci, F., Bacci, S., & Mira, A. On the role of latent variable models in the era of big data, Statistics and Probability Letters, 2018, 136, 165-169. Tipping M, Bishop C. Probabilistic Principal Component Analysis. Journal of the Royal Statistical

41

ACS Paragon Plus Environment

Page 42 of 48

Page 43 of 48 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

[29] [30] [31] [32] [33] [34] [35]

[36] [37] [38] [39] [40] [41] [42]

[43] [44] [45] [46] [47] [48] [49]

Society, Series B, 1999, 61, 611-622. J. H. Chen, J. L. Liu. Using mixture principal component analysis networks to extract fuzzy rules from data. Ind. Eng. Chem. Res. 39 (2000) 2355-2367. Kim D, Lee I. Process monitoring based on probabilistic PCA. Chemometrics & Intelligent Laboratory Systems, 2003, 67, 109-123. S. W. Choi, E. B. Martin, A. J. Morris. Fault detection based on a maximum-likelihood mixture principal component analysis (PCA). Ind. Eng. Chem. Res. 44 (2005) 2316-2327. U. Thissen, H. Swierenga, A. P. deWeijer, R. Wehrens, W. J. Melssen, L. M. C. Buydens. Multivariate statistical process control using mixture modeling. J. Chemometrics 19 (2005) 23-31. Chen T, Sun Y. Probabilistic contribution analysis for statistical process monitoring: A missing variable approach. Control Engineering Practice, 2009, 17, 469-477. Kariwala V, Odiowei P, Cao Y, Chen T . A branch and bound method for isolation of faulty variables through missing variable analysis. Journal of Process Control, 2010, 20, 1198-1206. Yang Y, Ma Y, Song B, Shi H. An aligned mixture probabilistic principal component analysis for fault detection of multimode chemical processes. Chinese Journal of Chemical Engineering, 2015, 23, 1357-1363. Ge Z, Song Z. Kernel Generalization of PPCA for Nonlinear Probabilistic Monitoring. Industrial & Engineering Chemistry Research, 2010, 49, 11832-11836. He B, Yang X, Chen T, Zhang J. Reconstruction-based multivariate contribution analysis for fault isolation: A branch and bound approach. Journal of Process Control, 2012, 22, 1228-1236. Ge Z, Gao F, Song Z. Mixture probabilistic PCR model for soft sensing of multimode processes. Chemometrics & Intelligent Laboratory Systems, 2011, 105, 91-105. Zhou L, Chen J, Song Z, Ge Z, Miao A. Probabilistic latent variable regression model for process-quality monitoring. Chemical Engineering Science, 2014, 116, 296-305. Zhu J, Ge Z, Song Z. Robust modeling of mixture probabilistic principle component analysis and process monitoring application. AIChE Journal, 2014, 60, 2143-2157. Zhu J, Ge Z, Song Z. Robust Supervised Probabilistic Principal Component Analysis model for soft sensing of key process variables. Chemical Engineering Science, 2015, 122, 573-584. Yang Y, Ma Y, Song B, Shi H. An aligned mixture probabilistic principal component analysis for fault detection of multimode chemical processes. Chinese Journal of Chemical Engineering, 2015, 23, 1357-1363. Ge Z, Song Z. Mixture Bayesian Regularization Method of PPCA for Multimode Process Monitoring. AIChE Journal, 2010, 56, 2838-2849. Ge Z. Mixture Bayesian regularization of PCR model and soft sensing application. IEEE Transactions on Industrial Electronics, 2015, 62, 4336-4343. Ge Z, Song Z. Robust monitoring and fault reconstruction based on variational inference component analysis. Journal of Process Control, 2011, 21, 462-474. Liu Y, Chen J, Sun Z, Li Y, Huang D. A probabilistic self-validating soft-sensor with application to wastewater treatment. Computers and Chemical Engineering, 2014, 71, 263-280. Ge Z, Song Z. Semi-supervised Bayesian method for soft sensor modeling with unlabeled data samples. AIChE Journal, 2011, 57, 2109-2119. Ge Z, Huang B, Song Z. Mixture semi-supervised principal component regression model and soft sensor application. AIChE Journal, 2014, 60, 533-545. Ge Z, Huang B, Song Z. Nonlinear semi-supervised principal component regression for soft sensor

42

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

[50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60]

[61] [62] [63]

[64]

[65]

[66]

[67]

[68]

modeling and its mixture form. Journal of Chemometrics, 2014, 28, 793-804. Zhou L, Chen J, Song Z, Ge Z. Semi-supervised PLVR models for process monitoring with unequal sample sizes of process variables and quality variables. Journal of Process Control, 2015, 26, 1-16. Zhu J, Ge Z, Song Z. Robust semi-supervised mixture probabilistic principal component regression model development and application to soft sensors. Journal of Process Control, 2015, 32, 25-37. Zhu J, Ge Z, Song Z. Dynamic Mixture Probabilistic PCA Classifier modeling and application for Fault Classification. Journal of Chemometrics, 2015, 29, 361-370. Zhu J, Ge Z, Song Z. HMM driven robust probabilistic principal component analyzer for dynamic fault classification. IEEE Transactions on Industrial Electronics, 2015, 62, 3814-3821. Yuan X, Ye L, Bao L, Ge Z, Song Z. Nonlinear feature extraction for soft sensor modeling based on weighted probabilistic PCA. Chemometrics & Intelligent Laboratory Systems, 2015, 147, 167-175. Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via the EM algorithm. J. Royal Stat. Society. Series B, 1977; 39: 1-39. Tipping M, Bishop C. Mixture Probabilistic Principal Component Analysers. Neural Computation, 1999, 11, 443-482. Yu SP, Yu K, Tresp V, Kriege HP, Wu MR. Supervised probabilistic principal component analysis. 12th ACM International Conference on Knowledge Discovery and Data Mining 2006; 464-473. C. Archambeau, N. Delannay, and M. Verleysen, "Mixtures of robust probabilistic principal component analyzers," Neurocomputing, vol. 71, pp. 1274-1282, 2008. Bishop CM. Bayesian PCA. Advanced in Neural Information Proceeding Systems 1999; 11: 382-388. Kim D, Yoo C, Kim Y, Jung J, Lee I. Calibration, prediction and process monitoring model based on factor analysis for incomplete process data. Journal of Chemical Engineering of Japan, 2005, 38, 1025-1034. Setarehdan S. Modified evolving window factor analysis for process monitoring. Journal of Chemometrics, 2004, 18, 414-421. Ilin A, Valpola H, Oja E. Nonlinear dynamical factor analysis for state change detection. IEEE Transactions on Neural Networks, 2004, 15, 559-575. Jiang J, Ozaki Y, Kleimann M, Siesler H. Resolution of two-way data from on-line Fourier-transform Raman spectroscopic monitoring of the anionic dispersion polymerization of styrene and 1,3-butadiene by parallel vector analysis (PVA) and window factor analysis (WFA). Journal of Chemometrics, 2004, 70, 83-92. Wise B, Gallagher N, Butler S, White D, Barna G. A comparison of principal component analysis, multiway principal component analysis, trilinear decomposition and parallel factor analysis for fault detection in a semiconductor etch process. Journal of Chemometrics, 1999, 13, 379-396. Surribas A, Amigo J, Coello J, Montesinos J, Valero F, Maspoch S. Parallel factor analysis combined with PLS regression applied to the on-line monitoring of Pichia pastoris cultures. Analytical and Bioanalytical Chemistry, 2006, 385, 1281-1288. Amigo J, Surribas A, Coello J, Montesinos J, Maspoch S, Valero F. On-line parallel factor analysis. A step forward in the monitoring of bioprocesses in real time. Chemometrics & Intelligent Laboratory Systems, 2008, 92, 44-52. Bahram M, Mabhooti M. Rank annihilation factor analysis for multicomponent kinetic spectrophotometric determination using difference spectra. Journal of Chemometrics, 2009, 23, 236-247. Ge Z, Song Z. Maximum-likelihood mixture factor analysis model and its application for process

43

ACS Paragon Plus Environment

Page 44 of 48

Page 45 of 48 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

[69]

[70] [71] [72]

[73] [74]

[75] [76]

[77]

[78]

[79] [80]

[81] [82] [83] [84] [85] [86]

monitoring. Chemometrics & Intelligent Laboratory Systems, 2010,102, 53-61. Kompany-Zareh M. On-Line Monitoring of a Continuous Pharmaceutical Process Using Parallel Factor Analysis and Unfolding Multivariate Statistical Process Control Representation. Journal of the Iranian Chemical Society, 2011, 8, 209-222. Wen Q, Ge Z, Song Z. Data-based linear Gaussian state-space model for dynamic process monitoring. AIChE Journal, 2012, 58, 3763-3776. Jiang Q, Yan X. Multivariate Statistical Process Monitoring Using Modified Factor Analysis and Its Application. Journal of Chemical Engineering of Japan, 2012, 45, 829-839. Kuo Y, Jang C, Yu H, Chen S, Chu H. Identifying nearshore groundwater and river hydrochemical variables influencing water quality of Kaoping River Estuary using dynamic factor analysis. Journal of Hydrology, 2013, 486, 39-47. Ma Y, Shi H. Multimode Process Monitoring Based on Aligned Mixture Factor Analysis. Industrial & Engineering Chemistry Research, 2014, 53, 786-799. Tong P, Wu T, Wang X, Zhang H, Kang Y, Du Y. A novel alternating least-squares method based on fixed region scanning evolving factor analysis (FRSEFA) and its application in process monitoring. Analytical Methods, 2014, 6, 7883-7890 Jiang Q, Yan X. Probabilistic monitoring of chemical processes using adaptively weighted factor analysis and its application. Chemical Engineering Research and Design, 2014, 92, 127-138. Zhao Z, Li Q, Huang B, Liu F, Ge Z. Process monitoring based on factor analysis: Probabilistic analysis of monitoring statistics in presence of both complete and incomplete measurements. Chemometrics & Intelligent Laboratory Systems, 2015, 142, 18-27. Zhu J, Ge Z, Song Z. Multimode process data modeling: A Dirichlet process mixture model based Bayesian robust factor analyzer approach. Chemometrics & Intelligent Laboratory Systems, 2015, 142, 231-244. Melendez E, Sarabia L, Ortiz M. Parallel factor analysis for monitoring data from a grape harvest in Qualified Designation of Origin Rioja including spatial and temporal variability. Chemometrics & Intelligent Laboratory Systems, 2015, 147, 167-175. Ge Z. Supervised latent factor analysis for process data regression modeling and soft sensor application. IEEE Transactions on Control Systems Technology, 2016, 24, 1004-1011. Yao L, Ge Z. Locally weighted prediction methods for latent factor analysis with supervised and semi-supervised process data. IEEE Transactions on Automation Science and Engineering, 2017, 14, 126-138. Yao L, Ge Z. Moving Window Adaptive Soft Sensor for State Shifting Process Based on Weighted Supervised Latent Factor Analysis. Control Engineering Practice, 2017, 61, 72-80. Ge Z, Chen X. Dynamic probabilistic latent variable model for process data modeling and regression application. IEEE Transactions on Control Systems Technology, 2018, 10.1109/TCST.2017.2767022. Y. Fang and M. K. Jeong, "Robust probabilistic multivariate calibration model," Technometrics, vol. 50, pp. 305-316, 2008. J. Luttinen, A. Ilin, and J. Karhunen, "Bayesian robust PCA of incomplete data," Neural processing letters, vol. 36, pp. 189-202, 2012. J.-h. Zhao and P. L. Yu, "A note on variational Bayesian factor analysis," Neural Networks, vol. 22, pp. 988-997, 2009. Zheng J, Ge Z, Song Z. Probabilistic learning of partial least squares regression model: Theory and industrial applications. Chemometrics & Intelligent Laboratory Systems, 2016, 158, 80-90.

44

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

[87] M. E. Tipping and N. D. Lawrence, "Variational inference for Student-t models: robust Bayesian

interpolation and generalised component analysis," Neurocomputing, vol. 69, pp. 123-141, 2005. [88] M. Svensén and C. M. Bishop, "Robust Bayesian mixture modelling," Neurocomputing, vol. 64, pp.

235-252, 2005. [89] T. Chen, E. Martin, and G. Montague, "Robust probabilistic PCA with missing data and contribution analysis for outlier detection," Computational Statistics & Data Analysis, vol. 53, pp. 3706-3716, 2009. [90] M. J. Beal, "Variational algorithms for approximate Bayesian inference," University of London, 2003. [91] M. I. Jordan, Z. Ghahramani, T. S. Jaakkola, and L. K. Saul, "An introduction to variational methods for graphical models," Machine learning, vol. 37, pp. 183-233, 1999. [92] Zhu J, Ge Z, Song Z. Non-Gaussian industrial process monitoring with probabilistic independent component analyzer. IEEE Transactions on Automation Science and Engineering, 2017, 14, 1309-1319. [93] Ge Z, Song Z. A nonlinear probabilistic method for process monitoring. Industrial & Engineering Chemistry Research, 2010, 49, 1770-1778. [94] Ge Z, Song Z. Nonlinear probabilistic fault detection based on Gaussian process latent variable model. Industrial & Engineering Chemistry Research, 2010, 49, 4792-4799. [95] Chen J, Jiang Y. Development of hidden semi-Markov models for diagnosis of multiphase batch operation. Chemical Engineering Science, 2011, 66, 1087-1099. [96] Yu J. Multiway discrete hidden Markov model-based approach for dynamic batch process monitoring and fault classification. AIChE Journal, 2012, 58, 2714-2725. [97] Yu J. A Support Vector Clustering-Based Probabilistic Method for Unsupervised Fault Detection and Classification of Complex Chemical Processes Using Unlabeled Data. AIChE Journal, 2013, 59, 407-419. [98] Sen D, Raihan D, Chidambaram M. Multiway continuous hidden Markov model-based approach for fault detection and diagnosis. AIChE Journal, 2014, 60, 2035-2047. [99] Bartolucci, F., Farcomeni, A., and Pennoni, F. Latent Markov models: a review of a general framework for the analysis of longitudinal data with covariates (with discussion), Test, 2014, 23, 433-486. [100] Bartolucci, F., Farcomeni, A. & Pennoni, F. (2013), Latent Markov Models for Longitudinal Data, Chapman and Hall/CRC press, Boca Raton, FL. [101] Zucchini, Walter, Iain L. MacDonald, and Roland Langrock. Hidden Markov models for time series: an introduction using R. Chapman and Hall/CRC, 2016. [102] Jiang Q, Yan X. Probabilistic Weighted NPE-SVDD for chemical process monitoring. Control Engineering Practice, 2014, 28, 74-89. [103] Li Z, Fang H, Xia L. Increasing mapping based hidden Markov model for dynamic process monitoring and diagnosis. Expert Systems with Applications, 2014, 41, 744-751. [104] Shang C, Huang B, Yang F, Huang D. Probabilistic slow feature analysis-based representation learning from massive process data for soft sensor modeling. AIChE Journal, 2015, 61, 4126-4139. [105] Escobar M, Kaneko H, Funatsu K. Combined generative topographic mapping and graph theory unsupervised approach for nonlinear fault identification. AIChE Journal, 2015, 61, 1559-1571. [106] Wen Q, Ge Z, Song Z. Multimode dynamic process monitoring based on mixture canonical variate analysis model. Industrial & Engineering Chemistry Research, 2015, 54, 1605-1614. [107] Chen X, Ge Z. Switching LDS-based Approach for Process Fault Detection and Classification. Chemometrics & Intelligent Laboratory Systems, 2015, 146, 169-178.

45

ACS Paragon Plus Environment

Page 46 of 48

Page 47 of 48 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

[108] Zhu J, Ge Z, Song Z. Bayesian robust linear dynamic system approach for dynamic process

monitoring. Journal of Process Control, 2016, 40, 62-77. [109] Zhou L, Zheng J, Ge Z, Song Z, Shan S. Multimode Process Monitoring Based on Switching

Autoregressive Dynamic Latent Variable Model. IEEE Transactions on Industrial Electronics, 2018, 65, 8184-8194. [110] X Yuan, Y Wang, C Yang, Z Ge, Z Song, W Gui. Weighted Linear Dynamic System for Feature Representation and Soft Sensor Application in Nonlinear Dynamic Industrial Processes. IEEE Transactions on Industrial Electronics, 2018, 65, 1508-1517. [111] Ge Z, Liu Y. Analytic Hierarchy Process Based Fuzzy Decision Fusion System for Model Prioritization and Process Monitoring Application. IEEE Transactions on Industrial Informatics, 2018, 10.1109/TII.2018.2836153. [112] Ge Z, Chen J. Plant-wide industrial process monitoring: A distributed modeling framework. IEEE Transactions on Industrial Informatics, 2016, 12, 310-321. [113] Ge Z. Distributed predictive modeling framework for prediction and diagnosis of key performance index in plant-wide processes. Journal of Process Control, 2018, 65, 107-117. [114] Zhu J, Ge Z, Song Z. Distributed Parallel PCA for Modeling and Monitoring of Large-scale Plant-wide Processes with Big Data. IEEE Transactions on Industrial Informatics, 2017, 13(4), 1877-1885. [115] Yao L, Ge Z. Big data quality prediction in the process industry: a distributed parallel modeling framework. Journal of Process Control, 2018, 68, 1-13. [116] Yao L, Ge Z. Scalable Semi-supervised GMM for Big Data Quality Prediction in Multimode Processes. IEEE Transactions on Industrial Electronics, 2018, 10.1109/TIE.2018.2856200.

46

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

TOC Graphic

p (x | t )

Pt

µ

p( x | t) =

µ

( x | Pt + µ , β −1I)

p (x )

t

µ

p (t )

x

P

t p (t ) =  (t | 0, I)

β −1I p ( x) =

(x | µ , M)

M = PT P + β −1 I

0.32

CO Content

0.3 0.28 0.26 Real-time Output Estimation Estimation of Measured Output Real Value of Measured Output

0.24 0.22 0

0.2

0.4

0.6

0.8

1 Sample Points

1.2

1.4

1.6

1.8

2 4

x 10

0.3 0.29 CO Content

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 48 of 48

0.28 0.27 0.26 Real-time Output Estimation Estimation of Measured Output Real Value of Measured Output

0.25 0.24 3000

3500

4000

4500 Sample Points

5000

5500

6000

47

ACS Paragon Plus Environment