Online Performance Monitoring and Modeling Paradigm Based on

Subscriber access provided by CORNELL UNIVERSITY LIBRARY

Article

An Online Performance Monitoring and Modeling Paradigm based on Just-in-time Learning and Extreme Learning Machine for Non-Gaussian Chemical Process Xin Peng, Yang Tang, Wenli Du, and Feng Qian Ind. Eng. Chem. Res., Just Accepted Manuscript • Publication Date (Web): 10 May 2017 Downloaded from http://pubs.acs.org on May 16, 2017

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Industrial & Engineering Chemistry Research is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 41

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

An Online Performance Monitoring and Modeling Paradigm based on Just-in-time Learning and Extreme Learning Machine for Non-Gaussian Chemical Process Authors 5

Xin Peng, Yang Tang, Wenli Du, Feng Qian* Author affiliations the Key Laboratory of Advanced Control and Optimization for Chemical Process, Ministry of Education, East China University of Science and Technology, Shanghai, China

10

*Corresponding author: Feng Qian: [email protected] the Key Laboratory of Advanced Control and Optimization for Chemical Process, Ministry of Education, East China University of Science and Technology, Shanghai, China

15

1

ACS Paragon Plus Environment


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

An Online Performance Monitoring and Modeling Paradigm based on Just-in-time Learning and Extreme Learning Machine for Dynamic Non-Gaussian Chemical Process Monitoring 20

Abstract 25

This paper proposes a novel performance monitoring and online modeling method to deal with a non-Gaussian chemical process with multiple operating conditions. Based on the framework of the proposed method, a kernel Extreme Learning Machine (ELM) technique is used to efficiently extract features from high dimensional process data efficiently. Besides, the Fastfood kernel is introduced into kernel ELM to accelerate computation efficiency, which is

30

relatively low at the prediction time. Then, a modified Just-in-time learning (JITL) technique is applied for online modeling. In JITL, a novel similarity index, called modified adjusted cosine similarity (MACS), is proposed so as to improve the prediction performance of online modeling. The proposed paradigm provides an efficient, accurate and fast approach to monitor and model the multimode chemical process. The validity and effectiveness are evaluated by applying the

35

method to a synthetic non-Gaussian multimode model and the distillation system. Keywords: Non-linear process, Non-Gaussian process, Extreme learning machine, Fastfood kernel, Gaussian kernel, Just-in-time learning, Online modeling, Process monitoring,

2


Page 2 of 41

Page 3 of 41

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


1 40

Introduction In the last decade, the monitoring of modern industrial manufacturing systems (e.g. chemical

engineering

1, 2

and biological industry

3, 4

) received increasing attention because of the higher

process safety and quality requirements 5-7. In such large systems, most of the maintenance costs are allotted to the control system malfunction (i.e. aging, unanticipated interactions of components, and misuse of components) 8. However, the detection of abnormalities and their

45

corresponding root causes becomes increasingly difficult because of the complexity and scale of these modern process systems. Statistical learning based methods in the field of chemometrics and process modeling and monitoring were recently developed 9. Various methods are based on conventional statistical approaches and are related to time series analysis, model regression, and classification.

50

Statistical-based methods, which are often referred to as statistical process monitoring (SPM), have achieved great success due to the high availability of historical process data from distributed control systems (DCS). These data can be used to construct a statistical model for predicting and supervising the status of the process. Monitoring methods based on SPM take the serial and spatial correlation of process data into account, which provides a more efficient and

55

precise monitoring framework than the conventional model-based approaches and methods that merely determine the control threshold for each observation. This feature of SPM-based methods renders them effective especially when the process data are high-dimensional and highly correlated multivariate datasets. To recapitulate, the procedure of Multivariate Statistical Process Monitoring (MSPM)

60

10

mainly focuses on the decorrelation of the high-dimensional

data, for the extraction of the key features indicated in process data. By analyzing the information in key features that reflect the operation condition of the process

11

, the faulty

behavior that occurred in the process can be detected. Principal component analysis (PCA) and partial least squares (PLS) are two of the most comprehensively researched methods in MSPM

65

12, 13

in the field of high dimensional process

modeling and monitoring. PCA is a dimension reduction procedure concerned with elucidating the covariance structure of a set of measurements. In particular, PCA allows us to identify the principal directions in which the raw process data varies. Similar to PCA method, PLS is concerned with the spatial correlation of process data, however, unlike PCA, PLS attempts to decompose the measurements into a feature space where the correlation between the predictor

70

and predicted variables is maximized 14. For these methods, the derivation of the control interval for Hotelling’s T2 and squared prediction error (SPE) monitoring statistics is based on the assumption that the normal operation data follow a multivariate Gaussian distribution 15, which may not be satisfied by practical data distributions. In fact, practical industrial process data is

3



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 41

often significantly non-Gaussian. Consequently, conventional PCA and PLS usually suffers if

75

there is no proper modification to deal with the non-Gaussianity. Total PLS (T-PLS) is proposed to analyze the inherent flaw of PLS and provide a more detailed decomposition for process variables matrix, which is beneficial for non-Gaussian process monitoring16. Meanwhile, Peng proposed Total KPLS (T-KPLS)

17

model for nonlinear non-Gaussian process and Jia

18

extended this model in a quality related non-Gaussian process monitoring by singular value

80

decomposition. Other than PCA and PLS, some novel methods, such as Independent Component Analysis (ICA), are proposed to monitor non-Gaussian processes by separating the raw data by independent component (IC) instead of the conventional orthogonal components as in the framework of PCA 19. The ICs are assumed to be non-Gaussian and mutually independent in

85

terms of high order statistics, which imply that the ICs can maintain the non-Gaussian features more effectively than the traditional PCA/PLS-based method 20. A number of applications based on ICA have been successfully applied to chemical process monitoring

21

. However, the

conventional ICA is still based on the linear assumption, which may result to unsatisfactory performance in practical applications. Hence, several modifications of ICA were proposed. For

90

instance, Zhao et al.22 proposed a Kernel ICA-PCA to deal with the nonlinear feature of the data. Jiang et al. developed a double weighted strategy ICA

23

to increase the sensitivity of fault

detection. However, some industrial processes may change their operation mode according to their

corresponding

considerations

95

24

manufacturing

strategies,

product

requirements,

and

economic

. In these cases, conventional MSPM methods, such as PCA and ICA, may

perform poorly under multimode situation due to their single operation mode assumption

25

.

Considering that the PCA and ICA both suffer from the improperly interpret the process data, a Bayesian network based dimension reduction method

26

was proposed to deal with the non-

linear non-Gaussian variables. Although this method was successfully applied to non-linear chemical process. However, similar to the traditional PCA and ICA, it still lack the mechanism

100

to deal with the multimode process data because it is merely a dimension reduction method. In terms of multimode process modeling, current state-of-the-art methods can be categorized into three major classes. The first class is based on a precise global model 27. By combining the features from all operation conditions, this class focuses on merging all the local models by minimizing the dissimilarity among all local models. The second class involves building

105

multiple local models according to each individual operating condition. In this strategy, the key factors of performance monitoring are the classification criteria for the monitored sample and the clustering method to properly divide the training data into multiple subsets. K-means and fuzzy C-means are two of the most commonly used approaches to separate the training data into

4


Page 5 of 41

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


their corresponding clusters. The third class uses adaptive models that are updated periodically

110

to track the variation of the data structure. Several adaptive techniques include recursive 29

moving windows (MW) , time difference (TD)

30

and Just-in-time learning (JITL)

28

,

31

. These

methods are effective especially when applied to the performance monitoring of processes with mode-shifting

32

. However, the adaptive model tends to be over-trained when the updating

standard is improperly selected.

115

The present paper demonstrates the construction of an appropriate adaptive paradigm for online performance monitoring and modeling of a multimode non-Gaussian process. Considering that the conventional monitoring methods (e.g. PCA and ICA), which suffer from the low efficiency of extracting non-Gaussian features from the raw process data, and the traditional Gaussian kernel, which may cause high computational cost. The newly proposed

120

paradigm provides an efficient way to project higher-dimensional non-linear raw process data into a low-dimensional linear feature space using a relative low computation cost to avoid the aforementioned disadvantages. In this paper, extreme learning machine (ELM) is introduced into the performance monitoring and modeling because of its outstanding performance in extracting features from high dimensional process data. To be specific, the proposed method

125

first identifies the current operation mode by kernel density estimation (KDE). Meanwhile, an interesting and promising approximate kernel expansion, Fastfood kernel, which accelerates the computation time of kernel function significantly, is introduced to replace the conventional kernel tricks for the purpose of accelerate the computation speed of ELM. This modified ELM provides an efficient framework for online fault detection. Then, once a fault is detected in the

130

process, the online JITL method is applied to estimate the current status. In this phase, a modified similarity index is introduced to enhance the precision of the prediction result of the proposed ELM-based modeling and monitoring method. This hybrid paradigm provides a rapid and accurate approach to online modeling and monitoring the status of a non-Gaussian process with multi-operation modes. The newly proposed method is applied to a distillation system with

135

three operation points to validate its efficiency and efficacy. The remainder of the paper is organized as follows: the preliminaries of the basic idea of extreme learning machine, non-linear projection kernel tricks, and JITL are briefly discussed in Section 2. Next, in Section 3, the details of the proposed method are presented. In Section 4, the performance of the newly proposed method is proven by applying it to a synthetic non-Gaussian

140

multimode toy model and a distillation system with three operation modes. Finally, , a brief conclusion is presented in Section 5.

2

Preliminaries 2.1

Extreme learning machine

5



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 41

ELM is a promising data mining method that has been successfully applied in the fields of

145

regression and large dataset and multi-labeled dataset classification. The framework of ELM (as shown in Figure 1) is based on the least square theory for modeling the training dataset. ELM was originally designed for the single-hidden layer feedforward neural networks and was subsequently applied to a more generalized networks, which are dissimilar to look like conventional neural network 33.

150

In contrast to conventional feedforward neural network training methods such as support vector machines (SVM), ELM can model data with a single layer of hidden nodes; where the weights connecting the inputs to hidden nodes are randomly assigned and never updated. Therefore, the training speed of ELM is relatively faster than SVM. This feature renders ELM a suitable method for online process modeling and monitoring. The training stage of ELM

155

consists to two phases. In the first phase, the goal of ELM is to construct the hidden layer by using a fixed number of mapping neurons that are randomly generated. In this phase, Sigmoid 34

or Gaussian function can be selected as a mapping function

. In the second phase, ELM

focuses on the output weights by minimizing the sum of the squared losses of the prediction errors, as shown in Equation (6).

160

In the ELM framework, the data sample, x =

{ x1 , x2 ,L , xn }

T

∈¡

n× m

, where n is the number

of samples and m is the dimension of each sample, is observed and used to learn features,. The sample is expressed as follows:

x = Hs s = WBx,

,

(1)

where s refers the set of independent components and H ∈¡

165

mixing matrix. W ∈¡

n× n

and B ∈¡

n× n

n×n

represents the unknown

are the demixing matrix and components selecting

matrix, respectively. Therefore, each component si can be defined as: L

si ( x ) = ∑ bi Gi ( x, ai , bi ) .

(2)

i =1

where Gi ( x, ai , bi ) is the output of the ith hidden node to the output node. ELM aims to resolve the optimization problem through:

170

= J (b , s)

Gi ( x, ai , bi ) b − s 2 . 2

(3)

In Equations (2) and (3), L denotes the number of hidden nodes in the ELM, β is the weight parameter in the matrix B, which is defined as βββ =[ 1 L L ] , and it connects the feature

6


Page 7 of 41

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


space and its corresponding components; gi is the activation function; and ai and bi are the weight and bias parameters between the hidden nodes and inputs, respectively.

175

The activation function for a specific hidden node can be expressed by:

Gi ( x, = ai , bi ) gi ( ai x + bi ) .

(4)

In contrast to conventional training methods such as SVM, all parameters except βi are generated randomly. After resolving the optimization problem in Equation (4) (using the least square method), for a specific input process data x = { x1 , x2 ,L , xn } , the weight matrix β can T

180

be calculated by:

β = H † s,

(5)

where H † is the Moore-Penrose generalized inverse of H , which is:

 G ( x1 , a1 , b1 ) K G ( xL , a1 , bL )    H = M O M  .    G ( xn , aL , b1 ) L G ( xn , aL , bL )  n× L However, in the original framework of ELM, the process noise affects the model accuracy

185

because all the parameters are generated randomly. Therefore, some of the hidden nodes are irrelevant to the process data. If too many nodes are used, the ELM model would be over fitted, yet, with too few nodes, it will be under fitted. Therefore, a sparse method is introduced into ELM to improve performance with noisy data. Therefore, the objective function of ELM can be transformed as:

= J (b , s)

190

gi ( x, ai , bi ) b − s 2 + ζ b 1 , 2

(6)

where β 1 is the L1-norm of β and ζ is its sparse tuning parameter. Initially, the number of nodes generated is higher than that of the nodes needed to represent the process data. The L1norm is used to reduce the redundant hidden nodes. Alternatively, other modified methods, such as pruned-ELM (P-ELM)

195

35

and optimally pruned-ELM (OP-ELM)

36

, are proposed to avoid

overfitting with the presence of noisy data.

2.2

Kernel Trick

Kernel methods, which are a class of algorithms for machine learning and pattern analysis, have received attention in the field of the nonlinear chemical process recent decades. By using

200

the kernel tricks, ELM can effectively deal with the nonlinear modeling. The idea of kernel based ELM is to nonlinearly map the process data into a feature space, where in the feature space, the data has a more linear structure. Then, in the feature space, ELM can be used to

7



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 41

properly extract the data features. To be specific, by calculating the inner products between the projections of all pairs of data in the higher-dimensional feature space, kernel tricks may

205

provide a more linear data structure in the feature space. The key idea of the kernel trick is to n 1, 2,L , n ) and then apply a define a nonlinear mapping xi a F ( xi ) ∈ F for xi ∈¡ , ( i =

linear method (i.e., PCA, PLS or ELM) in the newly defined feature space F. n

After whitening the data (viz.

0 ), the covariance matrix in the space F can be ∑Φ(x ) = i

i =1

calculated by:

1 n T ΦΦ ( xi ) ( xi ) . ∑ n i =1

Cov =

210

(7)

Thus, the principle components can be extracted by calculating the eigenvectors of the matrix

Cov :

λ ⋅ Q= Cov ⋅ Q.

(8)

Instead of directly Eigen-decomposing the covariance matrix C, the kernel tricks can be

215

applied as an alternate method to search for the principal components. The kernel can be defined by a gram matrix as:

[ K ]ij = K ( xi , x j ) = ΦΦ ΦΦ ( xi ) ( x j ) = ( xi ) , ( x j ) . T

(9)

Thus, the kernel matrix is obtained as:

K =ΘT Θ, 220

( )

ΦΦ where Θ = ΦΦ ( xi ) , x j ( x1 ) ,L , ( xn ) . By introducing a kernel function k ( xi , x j ) =

,

the inner products in the feature space can be calculated, and the nonlinear mapping can be avoided. Regarding the kernel function, some of the widely used kernels are as follows: Sigmoid:

225

Polynomial: Gaussian RBF:

(

= k ( xi , x j ) tanh δ 0 xi , x j + δ1 ( xi , x j ) k=

( x ,x i

(

j

)

+1

k ( xi , x j )= exp − xi − x j

)

τ

2

2σ 2

)

where δ 0 and δ1 are the parameters of the sigmoid, τ is a positive integer for polynomial kernel, and σ is the bandwidth of a Gaussian RBF kernel. Once the kernel matrix is obtained, the mean centering and variance scaling can be performed as:

230

K%= K − 1n K − K 1n + 1n K 1n.

8


(10)

Page 9 of 41

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


In Equation (10), the whitening matrix 1n can be defined as follows:

1 K 1 1  1n =  M O M . n  1 L 1n×n 2.3 235

(11)

Just-in-time learning framework

Just-in-Time learning (JITL), also called model-on-demand, instance-based learning, or lazing learning, was developed as a dynamic approach of modeling a nonlinear system

37, 38

in

the fields of chemical process modeling, monitoring and controlling. Compared with conventional modeling methods, the JITL method focused on modeling the current situation according to a set of the nearest or the most similar dataset and, meanwhile. the JITL model

240

performs online learning only when it is needed 39. Therefore, this model is inherently adaptive according to the changes in process characteristics

40

. This feature enables the JITL to use

process data collected from the nominal operation condition for offline modeling and then update the model according to the online process data. To be brief, the JITL method is particularly suitable when the samples are not fully available

245

or the modes of the process are changed during the online monitoring phase. Compared with the conventional offline global modeling, JITL modeling focuses on local model structures, which can be constructed by using the relevant samples. Consequently, the current status of the process can be described by a local JITL model. The detailed framework of both global and JITL modeling is illustrated in Figure 2.

250

The major steps of JITL are: 1.

Relevant data samples are selected to match the new monitored sample according to

certain similarity measurements (e.g., Euclidean distance, Mahalanobis distance or mutual information 41);

255

2.

A local model is constructed based on the relevant samples.

3.

Model outputs (e.g., monitoring statistics or model prediction) are derived according to

both local model and new monitored sample. Recently, several interesting JTIL based modeling methods have been proposed. A probabilistic JITL (P-JITL) is proposed to deal with the data samples that contain missing values in the chemical process. This method provides a symmetric Kullback–Leibler divergence

260

measurement to measure the differences between two distributions31. Distance based angle-based JITLs

43

42

and

were also applied to process modeling. The monitoring result and

estimation of JITL are affected by the improper selection of relevant sample. Thus, a suitable

9



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 41

similarity criterion for the relevant samples is important in JITL. Proposing a modified JITL online modeling method for online ELM modeling is beneficial.

265 3

Online ELM modeling and monitoring based on Fastfood kernel and JITL

3.1

Fastfood kernel-based ELM

Considering that conventional ELM and aforementioned LARS-based ELM are still linear

270

modeling methods, their ability to model and monitor high dimensional nonlinear processes is still limited. A natural idea is to combine an effective kernel with ELM for the nonlinear process 44

. By using a kernel trick, the lower-dimensional nonlinear data can be mapped to a higher

dimensional linear feature space described by hidden nodes. The main advantage of the kernel trick is that it avoids nonlinear optimization, which can be

275

complicated

and

computationally

expensive.

To

clarify,

in

process

dataset

x = { x1 , x2 ,L , xk L , xn } , considering that the nonlinear mapping h ( xk ) of process data is T

unknown,

the

hidden

layer

feature

mapping

can

be

expressed

as

h ( xk ) = [ g ( xk , a1 , b1 ),L , g ( xk , aL , bL ) ] , where L denotes the number of hidden nodes in the ELM.

280

According to Equation (5) and the partial derivative of Equation (6), β can be defined as:

β HT ( = where s = {s1 , s2 ,L , sn }

T

I

ζ

+ H T H ) −1 s.

(12)

and H is the Walsh-Hadamard matrix. Thus, the corresponding

output functions are calculated by:

= f ( xk ) h= ( xk ) β h ( xk ) H T (

285

I

ζ

+ H T H ) −1 s.

(13)

By utilizing the kernel trick, in the kernel space,

Θ = HH T Θi , j =k ( xi , x j ) = ΦΦ ( xi ) , ( x j ) . Thus, Equation (13) can be transformed as:

10


(14)

Page 11 of 41

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


I = f ( xk ) h ( xk ) H T ( + HH T ) −1 s

ζ

T

 k ( xk , x1 )  −1 I    M   + Θ  s. =  ζ   k ( xk , xn )  

(15)

The conventional Gaussian function or Sigmoid can be selected as the kernel function for

290

kernel-based ELM. However, due to the computation cost and time 45 of these two kernels, they may not suitable for online modeling. Therefore, a specific approximate kernel expansion, which can cut down the computation time, is needed. Despite the successful application in the field of nonlinear process modeling, the disadvantage of state-of-art kernel application in high-dimensional process monitoring is that

295

large scale data makes the computing the kernel function extremely expensive, especially in online modeling and monitoring phase. To overcome this problem, Le, Sarlos and Smola

46

proposed an approximate kernel expansions called the Fastfood kernel to accelerate computation time. The idea of the fastfood kernel is based on fitting the kernel approximation via a product of

300

diagonal and simple matrices through the Walsh-Hadamard transform 47.

{ x1 , x2 ,L , xn }

For an m-dimensional = x

T

∈¡

n× m

, the approximate Gaussian radial basis

function (RBF) kernel feature mapping can be defined as:

(

)

1 Φ ( xi ) = exp i [Vxi ] j . k ( xi , x j ) = n = i 1,= 2,L , n; j 1, 2,L , m

(16)

where V is a diagonal matrix,

305

V =

1

σ n

SHGΠAE

(17)

In Equation (17), S, G, and E are diagonal random matrices and thus can be computed once and then stored. The parameter Π denotes a random permutation matrix. A stands for a WalshHadamard transform matrix, which can be recursively calculated by using the Hadamard ordered Fast Walsh–Hadamard transform (FWHT) as:

310

1 1  A2 = =  and An 1 −1

1  H n −1 H n −1   . 2  H n −1 − H n −1 

(18)

FWHT is considered a generalized form of a Fourier transform, which is orthogonal, symmetric and linear. In summary, the computation cost of Fastfood kernel is lower than that of the conventional Gaussian RBF kernel

(

)

, which are O n 2 m and O ( n log n ) respectively.

48

11



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 41

This characteristic makes Fastfood kernel capable of online modeling. Instead of the

315

conventional Gaussian RBF kernel, ELM can effectively extract the nonlinear features in the feature space within a relative low computation cost by using the Fastfood kernel trick.

3.2

Similarity index based on posterior probability and cosine similarity

In JITL framework, the samples with high similarity to the new monitored sample (often also

320

referred to as the relevant dataset) can be selected to build a local model. Therefore, the similarity criterion is crucial of online JITL modeling. The Euclidean distance and Mahalanobis distance are two of the most commonly used similarity indices. However, due to the neglect of non-Gaussian features of process data, the performance of conventional JITL model is not always satisfactory. Especially, when the process data have apparent multimode features, the

325

global distance-based may not characterize the local features of dataset properly. Cheng and Chiu proposed a comprehensive similarity factor (SF), which combines both the distance and angle index, and showed better performance than that of the Euclidean distance 43. Practically, the assumption that the distribution of the process data follows a unimodal Gaussian distribution may become invalid. Hence, a similarity index combined with mode

330

clusters information and conventional similarity index could be more appropriate for modeling the practical process data. The schematic diagram of the proposed index can be found is shown in Figure 3. Given a process with M modes x = { x1 , x2 ,L , xi ,L , xn } , where k represents the kth mode T

= and X k 335

{x , x ,L , x } 1

2

nk

k

T

∈¡

nk ×m

represents the process data sampled from the kth operation

mode, the whole process dataset can be expressed as X = { X 1 MX 2 M L ,MX M } . For a new T

monitored sample xnew that needs online modeling, a cosine similarity index can be calculated as:

simcos ( xnew , xi ) =

xnew , xi xi

2

xnew

(19) 2

However, measuring the similarity by using Equation (19), has two crucial drawbacks: (1) the differences in distribution scales among different modes are not considered and (2) this equation

340

is a global similarity measure, however, because each point belongs to a particular sub-mode, its similarity index should be weighted to its particular sub-mode. To solve this problem, an adjusted local cosine similarity (ACS) index is constructed for each sub-mode, and a single index (called the modified adjusted local cosine similarity (MACS) index) is subsequently

12


Page 13 of 41

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


obtained by weighting each sub-mode according to the probability that the data point belongs to

345

that sub-mode. The ACS is defined as:

xnew − X k , xi − X k

ACS ( xnew , xi ) =

xi − X k

(20)

xnew − X k

2

2

where X k denotes the mean of the samples from all k modes. To determine the probabilities of membership in each group, we use the Bayesian inference:

p ( Ck xnew ) =

p ( Ck ) ⋅ p ( xnew Ck ) M

∑ p (C ) ⋅ p ( x i

i =1

350

new

(21)

Ci )

In Equation (21), p ( Ck ) denotes the prior probability that an arbitrary sample is generated from the kth mode Ck . Then, we can define MACS:

= MACS ( xnew , xi )

∑ p (C M

j =1

j

xnew ) ⋅ ACS ( xnew , xi )

(

(22)

)

In the above equation, the posterior probability p C j xnew is actually determined by the conditional probability p ( xnew Ck ) , which can be estimated by using KDE.

355

Multivariate KDE via the Parzen–Rosenblatt window method is extensively used as a nonparametrical approach to estimate a probability density function of each output si from local Kernel ELM model which represents the kth mode. Given a new monitored sample xnew , a multivariate kernel estimator can be constructed by:

p ( xnew Ck ) = 360

1 nk

nk

1  xnew − x j ,k   h  

∑K h  j =1

(23)

where K stands for a kernel function matrix, h is the Parzen-window bandwidth that works as a smooth parameter in KDE method, and nk is the number of samples in the mode. The density function can be considered as a measurement for a new monitored sample whether it belongs to the same mode as the reference data or not. That is, a high density function of the monitored samples to a specific mode, it indicates that this sample possibly belongs to the same mode of

365

the reference data. The kernel function K should be unimodal and symmetrically smooth with a peak at zero. As discussed in Section 2.2, several widely used kernels can be applied to estimate the kernel density. However, considering that the choice of kernel function is not the key point of this section and a Gaussian kernel is always a safe choice estimation.

13


49

, it will be used for future


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ϕ (u ) =

370

1 −u2 2 e . 2π

Page 14 of 41

(24)

Thus, by introducing a Gaussian kernel, Equation (23) can be transformed as:

= p ( xnew Ck )

1 nk hk

 ( x − x )2  new i ,k . exp  − ∑ 2   2hk 2p i =1   nk

(25)

According to Bayesian inference theory, if the prior is determined, then the posterior probability will be affected only by the conditional probability. The value, which will be

375

relatively small or even zero when the new monitored sampled does not belong to kth mode, will reflect the actual local structural relationship of the monitored sample with each sub-mode. The choice of window bandwidth is crucial for the final estimation of KDE. Several factors, such as the dimension of training dataset, data distribution, and choice of the kernel function. In this paper, an adaptive window bandwidth method proposed by Botev, Grotowski and Kroese 50

380

is applied to obtain a suitable bandwidth h for each mode. The performance of KDE can be assured because the sufficient training data from each mode of the multimode process can be easily obtained, the performance of KDE can be assured. Once

p ( xnew Ck ) is calculated, its corresponding similarity index can be derived as Equation (22). The relevant samples are subsequently selected according to this similarity index for future

385

online JITL modeling. In summary, the proposed method has two major stages as illustrated in Chart 1. In the framework of the proposed method, the JITL is based on a newly proposed similarity index (i.e, MACS) and focuses on selecting the suitable sample set. Then, the Fastfood kernel based modified ELM uses the sample set, which is selected by the JITL, for local modeling. To be

390

specific, in the offline stage, the training dataset is utilized the construct local ELM models by using Fastfood kernel-based ELM algorithm. Both the local models and training data are stored for the future online stage. In the online monitoring stage, the new monitored samples are classified into certain modes according to its posterior probability. This sample can be monitored according to its corresponding local ELM model. Once the abnormality is detected,

395

the relevant dataset is selected according to a modified cosine similarity index. The online JITLbased modeling and estimation method are then used to predict the process status.

4

Case Studies and discussion 4.1

Illustrative synthetic example

14


Page 15 of 41

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

400


A synthetic example is generated to reveal the performance of the proposed multimode Fastfood-kernel based-ELM approach. The synthetic model consists 7 inputs and 1 output, and there are 3 different operating modes. First, the 5 source variables are generated according to the following equations:

 s1 ( t ) = 2 cos ( 0.08t ) sin ( 0.06t )  =  s2 ( t ) sin ( 0.3t ) + 3cos ( 0.1t )  = .  s3 ( t ) sin ( 0.4t ) + 3cos ( 0.1t )  =  s4 ( t ) cos ( 0.1t ) − sin ( 0.05t )  s ( t ) uniformly distributed noise in[ −1,1] =  5 405

The mixing matrices Γ1 and Γ 2 are defined as:

 0.86  0.79  Γ1 = 0.67   0.23  0.34 1 0 1 1  1 1  Γ 2 =1 1 1 1  1 1 1 1 

−0.55

0.17

−0.33

0.65

0.32

0.12

0.46

−0.28

0.27

0.15

0.56

0.84

0.95

0.12

0.47

0.89

0.2

0.8 

−0.97

0.4

0.5

0 0  0  0 . 0  1 1 1 1 0 1 1 1 1 1  0 0 1 1 1

0 0 0 1 1

0 0 0 0 1

Mode 1:

Mode 2:

Mode 3:

0 0 0 0 0

x (t ) = Γ1 ( s ( t ) − 8 ) + enoise y ( t ) = 0.8 x1 ( t ) + 0.6 x2 ( t ) + 1.5 x3 ( t ) x ( t ) =Γ 2 Γ1 ( s ( t ) − 2 ) + enoise y ( t ) = 2.4 x2 ( t ) + 1.6 x3 ( t ) + 4 x4 ( t ) x ( t ) =Γ 22 Γ1 ( s ( t ) + 2 ) + enoise y ( t ) =1.2 x1 ( t ) + 0.4 x2 ( t ) + x4 ( t )

where e is a normally distributed noise:

enoise ~ N ( 0, 0.01) , and the output is polluted by a Gaussian noise:

15

T

  −0.74 −0.3 −0.45  0.23 0.13 0.14  0.92 0.19 0.56 

Modes 1, 2 and 3 are generated by:

410

(26)


(27)


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 41

y= y + h; h ~ N ( 0, 0.1) . 415

In the training phase, 250 samples are generated in each mode as a training dataset. Thus, 750 samples represents all 3 modes under normal conditions, respectively. The process starts at mode 1, and is followed by mode 2 and mode 3, sequentially. As shown in Figure 4 shows, the illustrative process that works under 3 different modes is significantly non-Gaussian. Alternatively, Figure 5 shows the observation of the seven inputs ( x ( t ) ), which reveals that the

420

dynamic process works under three different modes. In the testing phase, a set of test data generated from all 3 modes are used to illustrate the performance of the proposed method. The process starts at mode 2, followed by mode 1 and then ends at mode 3. Now, 3 samples (15th sample for mode 2, 275th sample for mode 1 and 550th sample for mode 3) are selected as examples, and their corresponding cosine similarity

425

and adjust cosine similarity to each sample from training data are shown in Figures 6 and 7. As the figures show, although both 2 similarity indices can correctly discriminate the modes, the performance is unsatisfactory, especially in mode 1 and mode 2, because of the very small difference between the samples in these modes. In a more complex system, this factor can easily lead to an improper selection of the relevant dataset in the JITL online modeling.

430

Then, the posterior probability is introduced into the similarity index to improve the performance of similarity index and, meanwhile, to identify the modes. The conditional probability of membership in mode 1 for all 7 inputs is shown in Figure 8. It can be seen in the figure that certain dimensions of conditional probabilities are relatively small or even zero due to the drift of model structure. This phenomenon will result to joint posterior probability near to

435

zero if the sample does not belong to that mode. The joint posterior probability of all 7 inputs is illustrated in Figure 9. From this figure, the observed that posterior probability techniques can be used to identify the mode structure. Therefore, combining the adjusted cosine similarity and posterior probability, the new proposed MACS can select the relevant dataset more appropriately than the conventional cosine similarity. The performance of MACS is shown in

440

Figure 10; all three modes can be unambiguously identified from the data. After the discussion of the similarity index of online JITL modeling, a scenario (Scenario 1) is designed to demonstrate the performance of the newly proposed online ELM modeling and monitoring method. Considering that the data from mode 3 are easily differentiated from the other 2 modes (even with the basic cosine similarity) mode 3 will be excluded to show the

445

differences of ACS and MACS. In this scenario, the process starts in mode 1 under normal conditions. After that, in the second phase, a step fault (Fault 1) [ 0， 0， 11 ，， 0， 0， 0] is introduced T

into the process from the 251st sample and last for 250 samples. Then, the process is switched to

16


Page 17 of 41

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


mode 2 under normal conditions from samples 1001 to 1501. Finally, another step fault (Fault 2)

[-1,1, 0, 0, 0, 0, 0]

T

450

is added into the process at phase 4, for another 500 samples. Hotelling’s T2

is applied to monitor the process and the confidence level of control limit is set to 95%. The performance of the proposed method is shown in Figure 11. Taking the second phase as an example, when the fault is detected, both MACS and ACS similarity indices are applied to select the relevant dataset. Figure 12 shows that the estimation result of MACS outperforms that of ACS.

455 4.2

Distillation column system

Distillation is the physical separation of a mixture into two or more chemical products with different boiling points through selective evaporation and condensation. More volatile components are separated from the mixture when it is heated. The vapor that comes off will be

460

essentially complete separation, or it may be a partial separation that increases the concentration of selected components of the mixture than that in liquid from which it involved. In industrial chemistry, the distillation column has been considered as a practically universal important operation unit. For continuous distillation (also known as fractional distillation), the process will have at least two output fractions. The overhead product, which actually is one or more volatile

465

distillate fractions, will be boiled, vaporized and then condensed to a liquid. Meanwhile, the bottom product, which consists of the less volatile components will exit from the bottom of the column

51

. In practice, the continuous rectification process is a multistage countercurrent

distillation operation. The real-time monitoring and modeling of the status of the distillation column are crucial for improving the control quality.

470

The original model, created by Villalba Torán 52, has 4 manipulated variables, 4 controlled variables and 3 input measured disturbances, plus 41 mole fractions and temperatures corresponding to every column stage. A schematic showing a distillation column that illustrates this model is shown in Figure 13. Explanatory measurements are listed in Table 1. Figure 14 depicts the data characteristics of the normalized output measurements of B, D, xB, and yD,

475

respectively. As can been seen in this figure, the process status varies drastically and dynamically. In practice, the product purity is the key index for the continuous process, however, the online component analysis of xB and yD may not be available in a distillation column. Therefore, according to Jain 53, the product purity index PI can be defined as:

PI = xB ⋅ yD

480

(28)

In practical industrial processes, distillation columns are often operated under different operating conditions that are related to various quality requirements. In this paper, 3 operating

17



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 41

conditions are introduced into the fractional distillation system. The details of these 3 modes can be found in Table 2. Figure 15 shows the normal probability plots for each variable with all three modes. The

485

behavior of data will be linear if the dataset follows a certain Gaussian distribution, otherwise, distortion occurs when the process actually follows another kind of distribution. It can be seen from Figure 15 that all the outputs are non-Gaussian. Similarly, the other measurements (not shown) are also non-Gaussian. Therefore, applying a kernel-based multimode learning method is reasonable.

490

In constructing an appropriate Fastfood kernel-based ELM for each mode, it is important to determine the ideal number of outputs. According to Sivalingamaiah and Reddy 54, the number

ks of outputs s can be calculated by: ks = log 2 ( N msr 2)

(29)

where N msr denotes the dimension of the measurement. In this distillation process, there are 90

495

measurements are conducted each sample. Hence, the output number will be defined as 6. Note that the output number can even be less than 6 due to the data redundancy. Consequently, the requirement of the sparsity of ELM in Equation (6) can be ensured. Then, due to the fulfillment of sparsity of ELM, satisfied performance can be expected. Several types of abnormal condition can be introduced into the distillation column process.

500

The detail description is listed in Table 3. In this distillation process simulation, the proportional–integral (PI) controller failure refers to the case where the valve malfunctions, preventing the PI controller from tracking the set point. Two scenarios are designed to discuss the monitoring and modeling qualities of the newly proposed method. Detailed information on these 2 cases is listed in Table 4. These 2 scenarios

505

cover all 3 major operation modes and typical fault (Ramp, spike, and PI controller failure). The confidence parameter ( α ) of control limits used in both scenarios is set as 5%. The training data are sampled from all 3 operation modes. In each mode, the distillation column will work for 6 hours, the sample time is set as 0.5 minutes. So there will be 720 samples for each mode and the whole training dataset consists 2160 samples representing 3 normal operation modes. In

510

addition, 20 samples for each type of abnormal event will be added to the training dataset as well. The Additive White Gaussian Noises (AWGN) are added into the process data for both offline training and online monitoring phase, the signal-to-noise ratio is 99%. To demonstrate the effectiveness of our technique, we constructed Scenarios 2 and 3 (Scenario 1 was presented in the Section 4.1). The details of each scenario are shown in Table 4.

18


Page 19 of 41

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

515


As seen in Figure 16, the occurrences of the predefined faults 2 and 4 cause the drift of process model, which leads to the drop of the joint probability. However, considering that the joint conditional probability is extremely low or even zero, the proposed method can still identify the mode effectively. Once the new monitored sample is classified to a certain mode according to the posterior probability, the online monitoring based on the newly proposed

520

Fastfood kernel based ELM can be employed to monitor the distillation process status and the performance of this method in terms of application to scenario 2 as shown in Figure 17. It can be seen that the method is capable of quickly detecting both faults. In scenario 2, when the abnormalities are detected, the online JITL modeling can be applied to serve as a soft sensor to predict the behavior of the distillation system. For example, in stage

525

2, the fault 2 is introduced into the process and monitored by the proposed Fastfood kernel based method, then the JITL modeling is required once the fault is detected. In this paper, the number of relevant samples is 12. The prediction result is shown in Figure 18, revealing that both ACS and MACS perform reasonably well. The root-mean-square deviation (RMSE) index is applied to measure the

530

online modeling performance, which is 7.3071× 10−5 for MACS and 1.0617 × 10−4 for ACS, respectively indicating that the MACS outperforms ACS in this case. From Equation (29), the outputs of kernel-based ELM can be set to 7 to achieve satisfactory modeling performance. However, due to data redundancy, the dimension of outputs can be reduced even further. Given the complexity of MACS, which requires the calculation of the

535

posterior probability, the CPU time should be taken into consideration. Figure 19 shows the effect of number of the outputs on CPU time and RMSE for the MACS calculation. Computation cost increases approximately linearly, whereas its corresponding RMSE does not decrease proportionally and significantly after the 3 outputs. Consequently, in order to achieve balance between modeling quality and computation cost, a rational choice for the output number

540

of ELM is 3 or 4. In this paper, the outputs number is set as 4 for the distillation system. Once the output number of ELM is determined, the results from the analyses of the three indices (standard cosine, ACS, and MACS) with original RBF kernel or fastfood kernel are shown in Table 5. The RMSE of MACS with a Fastfood kernel is inferior to that with an original Gaussian RBF kernel, but the computational cost is reduced by 57.86% of the MACS

545

whereas that RBF kernels and performance is reduced only slightly. In addition, the KDE and calculation of posterior probability significantly increase the CPU time (which explains the faster performance of the cosine and ACS methods). However, the progress in terms of modeling quality should not be neglected. Therefore, the combination of MACS and the fastfood kernel will keep a proper balance between CPU time and model quality.

19



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

550

The online JITL modeling result and its corresponding estimation error in the fourth phase (Mode 2, Fault 4) is shown in at Figure 20. For scenario 3, the process starts at mode 2. In phase 2, a ramp fault occurs at the molar feed rate in F, the amplitude in 20% and its duration is 10 minutes. Then, in the next phase (phase 3), the process switches into mode 3 for an hour. Finally, a PI controller failure occurs in yD in the

555

last phase. To be brief, the monitoring result, online JITL prediction results, and their corresponding estimation error are illustrated in Figures 21, 22 and 23, respectively. As shown in Figure 21, the monitoring performance for Scenario 3 is satisfying and both faults are quickly identified. This finding is achieved because the proposed method can identify the differences between mode 2 and mode 3 and monitor the process status according to its

560

corresponding mode. Once the abnormalities are detected, the online JITL model can track the dynamic characteristics properly by KDE based cosine similarity techniques. The predictions of JITL model are beneficial for future fault diagnosis and the loss assessment when the fault happens. Especially, the predictions of JITL model can serve as a soft sensor, which can provide useful information for further system controlling when the abnormality occurs.

565 5

Conclusion In this paper, a novel online monitoring and modeling method is proposed for the

performance monitoring and mode identification for non-Gaussian multimode process. In the paradigm of the proposed method, the KDE techniques and Bayesian classifier method are

570

utilized to estimate the likelihood of each operation mode for each monitored sample data according to its corresponding posterior probability. Then, the performance monitoring can be employed based on the local model constructed by using the fastfood kernel based ELM method, which can identify the abnormalities in the process efficiently. ELM provides a novel and accurate way to model a non-Gaussian process by projecting the raw process data into a higher

575

dimensional hidden nodes space and then extracting the proper information from hidden nodes. Once a fault is detected, the current process status is crucial for further fault diagnosis and correction and the loss evaluation, which can be estimated by an online JITL-based method. The relevant dataset of the monitored samples is always beneficial because it can provide some underlying process information that is helpful for the prediction of process status. Therefore, for

580

the purpose of the improving the quality of online modeling, a novel relevant dataset selection method based on adjusted cosine similarity and Bayesian classifier is proposed in this paper. Case studies on both a numerical model and the distillation system benchmark process reveal the effectiveness in the field of non-Gaussian multimode process monitoring. However, the transitional status between different modes cannot be strictly classified into one certain mode.

20


Page 20 of 41

Page 21 of 41

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

585


The Bayesian classifier of the proposed method does not take the transition data into account, which may result in the method being unable to classify the transition data from the fault data. Further study will focus on the online modeling and monitoring of the multimode process with transition data. Therefore, the differences of transition status and fault status in the multimode process should be analyzed to avoid faulty alarm when the process switches from one operation

590

point to another operation condition.

References

595

600

605

610

615

620

625

1. Chen, J. Y.; Yu, J.; Mori, J.; Rashid, M. M.; Hu, G. S.; Yu, H. L.; FloresCerrillo, J.; Megan, L., A non-Gaussian pattern matching based dynamic process monitoring approach and its application to cryogenic air separation process. Comput Chem Eng 2013, 58, (45), 40-53. 2. Peng, X.; Tang, Y.; Du, W.; Qian, F., Multimode Process Monitoring and Fault Detection: A Sparse Modeling and Dictionary Learning Method. IEEE Transactions on Industrial Electronics 2017, PP, (99), 1-1. 3. Gregersen, L.; Jorgensen, S. B., Supervision of fed-batch fermentations. Chemical Engineering Journal 1999, 75, (1), 69-76. 4. Peng, X.; Tang, Y.; He, W.; Du, W.; Qian, F., A Just-in-Time Learning based Monitoring and Classification Method for Hyper/Hypocalcemia Diagnosis. IEEE/ACM Trans Comput Biol Bioinform 2017. 5. Qin, S. J.; Valle, S.; Piovoso, M. J., On unifying multiblock analysis with application to decentralized process monitoring. J Chemom 2001, 15, (9), 715-742. 6. Ge, Z.; Song, Z., Process Monitoring Based on Independent Component Analysis−Principal Component Analysis (ICA−PCA) and Similarity Factors. Ind Eng Chem Res 2007, 46, (7), 2054-2063. 7. Zhao, C. H.; Gao, F. R., Fault-relevant Principal Component Analysis (FPCA) method for multivariate statistical modeling and process monitoring. Chemometrics and Intelligent Laboratory Systems 2014, 133, 1-16. 8. Venkatasubramanian, V.; Rengaswamy, R.; Kavuri, S. N.; Yin, K., A review of process fault detection and diagnosis. Comput Chem Eng 2003, 27, (3), 327-346. 9. Yin, S.; Ding, S. X.; Haghani, A.; Hao, H.; Zhang, P., A comparison study of basic data-driven fault diagnosis and process monitoring methods on the benchmark Tennessee Eastman process. J. Process. Contr. 2012, 22, (9), 1567-1581. 10. MacGregor, J.; Cinar, A., Monitoring, fault diagnosis, fault-tolerant control and optimization: Data driven methods. Comput Chem Eng 2012, 47, 111-120. 11. Joe Qin, S., Statistical process monitoring: basics and beyond. J Chemom 2003, 17, (8-9), 480-502. 12. Zhang, Y. W.; Ma, C., Fault diagnosis of nonlinear processes using multiscale KPCA and multiscale KPLS. Chem Eng Sci 2011, 66, (1), 64-72. 13. Ding, S. X., Data-driven design of monitoring and diagnosis systems for dynamic processes: A review of subspace technique based schemes and some recent results. J Process Control 2014, 24, (2), 431-449.

21



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

630

635

640

645

650

655

660

665

670

14. Li, G.; Liu, B.; Qin, S. J.; Zhou, D., Quality relevant data-driven modeling and monitoring of multivariate dynamic processes: the dynamic T-PLS approach. IEEE Trans Neural Netw 2011, 22, (12), 2262-71. 15. Yu, J.; Qin, S. J., Multiway Gaussian Mixture Model Based Multiphase Batch Process Monitoring. Ind Eng Chem Res 2009, 48, (18), 8585-8594. 16. Zhou, D.; Li, G.; Qin, S. J., Total projection to latent structures for process monitoring. AIChe Journal 2009, 56, (1), 168-178. 17. Peng, K. X.; Zhang, K.; Li, G., Quality-Related Process Monitoring Based on Total Kernel PLS Model and Its Industrial Application. Mathematical Problems in Engineering 2013, 2013, (4), 1-14. 18. Jia, Q. L.; Zhang, Y. W., Quality-related fault detection approach based on dynamic kernel partial least squares. Chemical Engineering Research & Design 2016, 106, 242-252. 19. Chien, J. T.; Hsieh, H. L., Convex Divergence ICA for Blind Source Separation. IEEE Transactions on Audio Speech and Language Processing 2012, 20, (1), 302-313. 20. And, Z. G.; Song, Z., Process Monitoring Based on Independent Component Analysis−Principal Component Analysis (ICA−PCA) and Similarity Factors. Ind Eng Chem Res 2007, 46, (7), 2054-2063. 21. Chen, J.; Yu, J.; Mori, J.; Rashid, M. M.; Hu, G.; Yu, H.; Flores-Cerrillo, J.; Megan, L., An independent component analysis and mutual information based nonGaussian pattern matching method for fault detection and diagnosis of complex cryogenic air separation process. In 2013; pp 2797-2802. 22. Zhao., C.; Gao., F.; Wang, F., Nonlinear Batch Process Monitoring Using PhaseBased Kernel-Independent Component Analysis− Principal Component Analysis (KICA− PCA). Ind. Eng. Chem. Res. 2009, 48, 12. 23. Jiang, Q. C.; Yan, X. F.; Tong, C. D., Double-Weighted independent Component Analysis for Non-Gaussian Chemical Process Monitoring. Ind Eng Chem Res 2013, 52, (40), 14396-14405. 24. Wang, F. L.; Tan, S.; Peng, J.; Chang, Y. Q., Process monitoring based on mode identification for multi-mode process with transitions. Chemometrics and Intelligent Laboratory Systems 2012, 110, (1), 144-155. 25. Ma, H. H.; Hu, Y.; Shi, H. B., A novel local neighborhood standardization strategy and its application in fault detection of multimode processes. Chemometrics and Intelligent Laboratory Systems 2012, 118, 287-300. 26. Gonzalez, R.; Huang, B.; Lau, E., Process monitoring using kernel density estimation and Bayesian networking with an industrial case study. ISA Trans 2015, 58, 330-347. 27. Ma, Y. X.; Shi, H. B., Multimode Process Monitoring Based on Aligned Mixture Factor Analysis. Ind Eng Chem Res 2014, 53, (2), 786-799. 28. Qin, S. J., Recursive PLS algorithms for adaptive data modeling. Comput Chem Eng 1998, 22, (4-5), 503-514. 29. Jiang, J. H.; Berry, R. J.; Siesler, H. W.; Ozaki, Y., Wavelength interval selection in multicomponent spectral analysis by moving window partial least-squares regression with applications to mid-infrared and near-infrared spectroscopic data. Anal. Chem. 2002, 74, (14), 3555-65. 30. Kaneko, H.; Funatsu, K., Maintenance-free soft sensor models with time difference of process variables. Chemometrics and Intelligent Laboratory Systems 2011, 107, (2), 312-317.

22


Page 22 of 41

Page 23 of 41

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

675

680

685

690

695

700

705

710

715

720


31. Yuan, X.; Ge, Z.; Huang, B.; Song, Z., A Probabilistic Just-in-Time Learning Framework for Soft Sensor Development With Missing Data. IEEE Transactions on Control Systems Technology 2017, 25, (3), 1124-1132. 32. Choi, S. W.; Martin, E. B.; Morris, A. J.; Lee, I. B., Adaptive multivariate statistical process control for monitoring time-varying processes. Ind Eng Chem Res 2006, 45, (9), 3108-3118. 33. Huang, G. B.; Chen, L., Enhanced random search based incremental extreme learning machine. Neurocomputing 2008, 71, (16-18), 3460-3468. 34. Huang, G.; Song, S.; Gupta, J. N.; Wu, C., Semi-supervised and unsupervised extreme learning machines. IEEE Trans Cybern 2014, 44, (12), 2405-17. 35. Rong, H. J.; Ong, Y. S.; Tan, A. H.; Zhu, Z. X., A fast pruned-extreme learning machine for classification problem. Neurocomputing 2008, 72, (1-3), 359-366. 36. Miche, Y.; Sorjamaa, A.; Bas, P.; Simula, O.; Jutten, C.; Lendasse, A., OPELM: optimally pruned extreme learning machine. IEEE Trans Neural Netw 2010, 21, (1), 158-62. 37. Fdez-Riverola, F.; Iglesias, E. L.; Diaz, F.; Mendez, J. R.; Corchado, J. M., Applying lazy learning algorithms to tackle concept drift in spam filtering. Expert Syst Appl 2007, 33, (1), 36-48. 38. Ge, Z. Q.; Song, Z. H., Online monitoring of nonlinear multiple mode processes based on adaptive local model approach. Control Eng Pract 2008, 16, (12), 1427-1437. 39. Cybenko, G., Just-in-time learning and estimation. NATO ASI SERIES F COMPUTER AND SYSTEMS SCIENCES 1996, 153, 423-434. 40. Kim, S.; Kano, M.; Hasebe, S.; Takinami, A.; Seki, T., Long-Term Industrial Applications of Inferential Control Based on Just-In-Time Soft-Sensors: Economical Impact and Challenges. Ind Eng Chem Res 2013, 52, (35), 12346-12356. 41. Chen, J.; Yu, J., Independent component analysis mixture model based dissimilarity method for performance monitoring of Non-Gaussian dynamic processes with shifting operating conditions. Ind Eng Chem Res 2013, 53, (13), 5055-5066. 42. Ge, Z.; Song, Z., A comparative study of just-in-time-learning based methods for online soft sensor modeling. Chemometrics & Intelligent Laboratory Systems 2010, 104, (104), 306-317. 43. Cheng, C.; Chiu, M. S., A new data-based methodology for nonlinear process modeling. Chem. Eng. Sci. 2004, 59, (13), 2801-2810. 44. Huang, G. B.; Zhou, H.; Ding, X.; Zhang, R., Extreme learning machine for regression and multiclass classification. IEEE Trans Syst Man Cybern B Cybern 2012, 42, (2), 513-29. 45. Yang, Z.; Moczulski, M.; Denil, M.; de Freitas, N.; Smola, A.; Song, L.; Wang, Z., Deep Fried Convnets. arXiv preprint arXiv:1412.7149 2014. 46. Le, Q.; Sarlos, T.; Smola, A. In Fastfood-computing hilbert space expansions in loglinear time, Proceedings of the 30th International Conference on Machine Learning, 2013; 2013; pp 244-252. 47. Ahmed, N.; Rao, K. R., Walsh-hadamard transform. In Orthogonal Transforms for Digital Signal Processing, Springer: 1975; pp 99-152. 48. Zhao, J.; Meng, D., FastMMD: Ensemble of Circular Discrepancy for Efficient Two-Sample Test. Neural Comput 2015, 27, (6), 1345-72. 49. Elgammal, A.; Duraiswami, R.; Harwood, D.; Davis, L. S., Background and foreground modeling using nonparametric kernel density estimation for visual surveillance. Proceedings of the IEEE 2002, 90, (7), 1151-1163.

23



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

725

730

735

50. Botev, Z. I.; Grotowski, J. F.; Kroese, D. P., Kernel Density Estimation Via Diffusion. Ann Stat 2010, 38, (5), 2916-2957. 51. Henley, E. J.; Seader, J. D.; Roper, D. K., Separation process principles. Wiley: 2011. 52. Villalba Torán, P. M. Multivariate statistical process monitoring of a distillation. 2013. 53. Jain, S.; Kim, J. K.; Smith, R., Operational Optimization of Batch Distillation Systems. Ind Eng Chem Res 2012, 51, (16), 5749-5761. 54. Sivalingamaiah, M.; Reddy, B. V., Texture Segmentation Using Multichannel Gabor Filtering. IOSR Journal of Electronics and Communication Engineering 2012, 2, 22-26.

Acknowlegement This work was supported by National Natural Science Foundation of China (61590923, 61422303, 61333010), and “Shu Guang” project supported by Shanghai Municipal Education Commission and Shanghai Education Development Foundation.

24


Page 24 of 41

Page 25 of 41

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


Figure Captions

740 Fig.1 Illustration of Extreme Learning Machine in the field of process modeling

Fig.2 Comparison between global modeling and JITL modeling monitoring method

25



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

745

Fig.3 Schematic diagram of the proposed similarity based on cosine similarity and posterior probability

Fig. 4. Normal probability plot for output of 3 modes in illustrative synthetic process

26


Page 26 of 41

Page 27 of 41

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

750


Fig. 5. Process status depicted by input observations of synthetic process

Fig. 6. Cosine similarity of 3 samples (15th, 275th, and 550th) to the training dataset

27



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Fig. 7. Adjusted cosine similarity of 3 samples (15th, 275th, and 550th) to the training 755

dataset

Fig. 8. Conditional probabilities of each input according to mode 1.

28


Page 28 of 41

Page 29 of 41

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


Fig. 9. Mode identification result based on joint posterior probability method (Mode 1)

760 Fig. 10. Modified adjust cosine similarity (MACS) of 3 samples (15th, 275th, and 550th) to the training dataset

29



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Fig. 11. T2 statistics monitoring performance of Illustrative Synthetic Example

765 Fig. 12. Prediction result of two similarity index in the second phase (Mode1, Fault 1, (a) for MACS, (b) for ACS)

30


Page 30 of 41

Page 31 of 41

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


Fig. 13. Schematic diagram of a typical distillation column

770 Fig. 14. Characteristics of normalized outputs data in the distillation system

31



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Fig. 15. Characteristics of normalized outputs data in the distillation system

775

Fig. 16. Joint conditional probability according to Mode 1 and Mode 2 in scenario 2

32


Page 32 of 41

Page 33 of 41

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


Fig. 17. Hotelling’s T2 statistics monitoring performance in terms of predesigned Scenario 2 of the distillation system

780

Fig. 18. Prediction result of two similarity index in Scenario 2 (Mode1, Fault 2, above for MACS, below for ACS)

33



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Fig. 19. CPU time and RMSE for different ELM outputs number for 12 relevant samples

785 Fig. 20. Prediction result and estimation result of Mode 2, Fault 4 in Scenario 2

34


Page 34 of 41

Page 35 of 41

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


Fig. 21. Hotelling’s T2 statistics monitoring performance in terms of predesigned Scenario 3 of the distillation system

790 Fig. 22. Prediction result and estimation result of Mode 2, Fault 1 in scenario 3

35



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Fig. 23. Prediction result and estimation result of Mode 3, Fault 5 in scenario 3

36


Page 36 of 41

Page 37 of 41

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


795

Chart 1. Flow chart for the proposed kernel based ELM monitoring and modeling method

37



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

800

Page 38 of 41

Table 1 Measurement and output for the distillation column Outputs

Descriptions

yD

the vapor composition of light components of D(mole fraction)

xB

the liquid composition of light components of B(mole fraction)

Measurements FM

Molar feed rate (kmol/min)

FV

Volumetric feed flow (L/h)

zF

feed composition (mole fraction)

TF

feed temperature(Celsius)

qF

the fraction of liquid in the feed

MD

the liquid holdup of the overhead (kmol)

MB

the liquid holdup of the bottom (kmol)

deltaL

variation of reflux flow rate(kmol/min)

deltaV

variation of boilup flow rate(kmol/min)

L

reflux flow rate(kmol/min)

V

boilup flow rate(kmol/min)

D

top product flow rate(kmol/min)

B

bottom product flow rate(kmol/min)

x2-x40

the liquid composition of light components at stage 2-40

T2-T40

Temperature at stage -40 (Celsius)

Table 2 Description of operation modes in distillation system Mode

xB

yD

L

V

1

0.01

0.99

2.6889

3.2294

38


Page 39 of 41

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


2

0.01

0.96

2.2537

2.8100

3

0.05

0.99

2.3243

2.8435

Table 3 Abnormal events and corresponding fault type Fault

Description

Type

1

Molar feed rate in F

Pulse/Ramp/Spike

2

Feed composition change in zF

Pulse/Ramp/Spike

3

Fluctuation of feed temperature in TF

Spike/Pulse

4

PI controller failure (xB)

Step

5

PI controller failure (yD)

Step

805 Table 4 Benchmark scenarios of the distillation column system No.

Test scenario Normal condition: Sample 1-120, Mode 1 Faulty condition: Sample 121-240, Mode 1 Fault 2: Feed composition change in zF (Type: spike, Start time: 0

Scenario 2

min, amplitude: 15%, duration: 20 min) Normal condition: Sample 241-360, Mode 2 Faulty condition: Sample 361-480, Mode 2 Fault 4: PI controller failure (xB) (Start time: 0 min) Normal condition: Sample 1-120, Mode 2

Scenario 3

Faulty condition: Sample 481-960, Mode 2 Fault 1: Molar feed rate in F (Type: ramp, Start time: 0 min,

39



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 40 of 41

amplitude: 20%, duration: 10 min) Normal condition: Sample 961-1440, Mode 3 Faulty condition: Sample 1441-1920, Mode 3 Fault 5: PI controller failure (yD) (Start time: 0 min)

Table 5 RMSE and CPU time of three different similarity indices with different kernel Method Cosine , Fastfood ACS, Fastfood MACS, exact Gaussian RBF MACS, Fastfood

RMSE(10-5) 17.131 11.9314

CPU time(second) 7.22 8.11

6.6825

68.78

7.3071

39.8

40


Page 41 of 41

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


Schematic diagram of the proposed similarity based on cosine similarity and posterior probability 150x97mm (96 x 96 DPI)


Online Performance Monitoring and Modeling Paradigm Based on

Recommend Documents