Quality-related locally weighted non-Gaussian regression based soft

Dec 9, 2018 - Quality-related locally weighted non-Gaussian regression based soft ... Eastman(TE) process and a pre-decarburization absorption unit...
3 downloads 0 Views 795KB Size
Subscriber access provided by YORK UNIV

Process Systems Engineering

Quality-related locally weighted non-Gaussian regression based soft sensing for multimode processes Yuchen He, Binbin Zhu, Chenyang Liu, and Jiusun Zeng Ind. Eng. Chem. Res., Just Accepted Manuscript • DOI: 10.1021/acs.iecr.8b04075 • Publication Date (Web): 09 Dec 2018 Downloaded from http://pubs.acs.org on December 14, 2018

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

1

Quality-related locally weighted non-Gaussian regression

2

based soft sensing for multimode processes Yuchen He1, Binbin Zhu1, Chenyang Liu1, Jiusun Zeng2*

3 4

1College

5 6

Zhejiang, China 2College

7 8

of Mechanical & Electrical Engineering, China Jiliang University, Hangzhou 310018,

of Metrology & Measurement Engineering, China Jiliang University, Hangzhou 310018, Zhejiang, China

Abstract

9

This paper develops a novel just-in-time (JIT) learning based soft sensor for multimode processes. The

10

involved multimode datasets are assumed to be non-Gaussian distributed and time varying. A supervised

11

non-Gaussian latent structure (SNGLS) is introduced to model the relationship between predictor variables

12

and quality variables. In order to handle the multimode process, a moving window approach is adopted, based

13

on which a new similarity measure is proposed by integrating window confidence and between-sample local

14

similarity. The similarity between the current query sample and the dataset in a specific window is quantified

15

by the window confidence using the support vector data description (SVDD). Based on the data in the moving

16

window, the SNGLS model is constructed and used to obtain the estimation of between-sample local

17

similarity. The two similarities are integrated and used as sample weights, and a locally weighted structure

18

is designed for key quality variable estimation. The performance of the developed method is demonstrated

19

by application studies to the Tennessee Eastman(TE) process and a pre-decarburization absorption unit. It is

20

shown that the proposed method outperformed competitive methods on the prediction accuracy of key quality

21

variables. 1

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1 2

Keywords: Multimode processes; Locally weighted non-Gaussian regression; Just-in-time learning; Soft sensor

3

2

ACS Paragon Plus Environment

Page 2 of 40

Page 3 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1

Industrial & Engineering Chemistry Research

1. Introduction

2

In the past few decades, soft sensors have been widely employed as an important technique to predict

3

key variables in industrial processes. It is a common practice to construct regression models to describe the

4

relationship between easy-to-measure process variables and key quality variables that are difficult to measure

5

online1-4. By introducing these regression models, it is possible to predict the variables that are of interest.

6

Among different kinds of soft sensing models, data-driven multivariate calibration methods have been widely

7

used. Classical data-driven multivariate calibration methods, such as principal component regression (PCR)5,

8

6,

9

12,

10

partial least square (PLS)7, 8, kernel partial least squares (KPLS) 9, 10and support vector machines (SVM)11, have been successfully applied to soft sensor modeling in fields like chemical engineering and medical

processes.

11

For soft sensor models, it is often the case that they perform well in the early stage. However, due to

12

variations of system and material conditions, changes of process dynamics are inevitable. The performance

13

of soft sensors may deteriorate as time evolves if the model is not consistent with the current situation. Hence,

14

soft sensors should be routinely updated to maintain its predictive accuracy, which is time-consuming and

15

costly. As a consequence, a large amount of methods have been proposed to cope with this situation. Among

16

them, the most popular is the Just-in-time learning (JITL) method13, 14. Traditional JITL method mainly

17

consists of three steps: selection of similar samples, construction of online models, and making prediction

18

according to the query sample, among which selection of similar samples and construction of online models

19

are the most critical.

20

In JITL methods, proper sample selection will lead to good prediction performance. Several kinds of

21

similarity measures have been proposed for sample selection, such as Euclidean-distance and subspace 3

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 40

1

angle15, 16. When defining similarity measure, most methods consider similarity between the inputs of training

2

samples and the query sample, whilst the output information in the training samples is largely ignored.

3

However, absence of output data may result in inappropriate sample selection due to loss of information17.

4

In order to cope with this problem, researchers have proposed several schemes to employ the output

5

information in the similarity measures. One possibility is to predict the output of the query sample using an

6

appropriate model first, and then define the similarity between the training samples and the query sample in

7

both input and output space18, 19. Note that the output of query sample in those methods is estimated based

8

on the input data, there is no new information added. In addition, introducing a new model may bring into

9

more uncertainty. Hence, such methods are problematic.

10

Bearing this in mind, Yuan et al. constructed a PLS model within a supervised latent space20. In this

11

method, the inputs of both training samples and the query sample are projected into the latent space, which

12

are further employed to construct similarity measure. In this way, the output information is utilized in the

13

modeling stage and the proposed method was proved to be more accurate than conventional methods. Despite

14

the above progress, two problems still remain. As is shown in the work of Yuan

15

structure is built up based on the assumptions that the process is Gaussian distributed and relatively stable.

16

Unfortunately, both assumptions do not hold for many practical processes. In addition, traditional locally

17

weighting techniques only consider one-to-one similarity between the query and training samples, whilst

18

ignoring the mode information of the training samples. As is true in practice, even the same inputs may

19

produce sharply different output values in different operational modes. Therefore, it is required to consider

20

the mode information in soft sensor development.

21

20,

the supervised latent

In this paper, a locally weighted non-Gaussian regression method is proposed using the SNGLS method, 4

ACS Paragon Plus Environment

Page 5 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

1

which is a non-Gaussian extension of traditional independent component regression (ICR) models21. Instead

2

of maximizing covariance, the latent structure is established by maximizing the mutual information between

3

input and output variables22, 23. The SNGLS not only considers the second-order information, but also higher

4

order information omitted in the PLS model, hence it is more suitable for modeling of non-Gaussian processes.

5

In order to cope with the multimode characteristics of the process, a moving window strategy is applied.

6

The similarity between the query sample and training sample is defined by integrating the window confidence

7

and between-sample similarity. The window confidence between the query sample and a data window is

8

estimated using the SVDD classifier. On the other hand, similarity between the query sample and a specific

9

sample (the between-sample similarity) is obtained using the SNGLS model. The reasons for using the

10

integrated similarity are double folded. As the window confidence only considers the similarity between the

11

query sample and the whole data window, it ignores the difference between samples in the same window.

12

Meanwhile, the between-sample similarity is estimated in individual base, it ignores the mode information

13

generally captured in a data window. In this paper, by using the integrated similarity, both the window

14

confidence and local sample-to-sample similarity are considered so that it considers more information and is

15

more suitable for modeling of multimode processes.

16

Comparing with traditional JITL methods which define similarity measures using only the information

17

in the input space whilst ignoring the information in output space, in this paper, we consider both the

18

information in input space and output space using the SNGLS. Moreover, both the window confidence and

19

between-sample confidence are considered in the similarity measure so that better modeling effects can be

20

achieved.

21

The remainder of this paper is organized as follows. Sample selection using supervised structures are 5

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 40

1

explained in Section 2. The NGR (Non-Gaussian regression) and SVDD methods are briefly introduced as

2

preliminaries in Section 3. Section 4 provides the details of the SNGLS. The locally weighted non-Gaussian

3

regression will be discussed in Section 5 where sample selection and the supervised non-Gaussian regression

4

are implemented for multimode processes soft sensing. In Section 6, the performance of the proposed method

5

is demonstrated through two study cases including the Tennessee Eastman (TE) benchmark and an industrial

6

process. Finally, conclusion summaries are made in Section 7.

7

2. Similarity measurement and sample selection using supervised latent

8

space

9

In conventional JITL soft sensor, similarities between the query sample and historical samples should

10

be calculated first. These similarities show the importance of individual historical sample in model

11

construction. Samples with bigger similarities are commonly assigned with larger weights in the training

12

model. Good similarity measure can greatly enhance the accuracy of online modeling. According to different

13

types of data information involved, the similarity measures in literature can be roughly divided into 3

14

categories.

15

2.1

Input information of historical data and query sample are used

16

In this category, only the inputs of historical data and query sample are involved in the similarity

17

measure while the outputs are not considered. Among numerous similarity measures proposed in literature,

18

Euclidean distance based similarity is perhaps the most popular, which is expressed as follows

19

DIS  ( xnew  xi )T ( xnew  xi )

(1)

20

where xi and DIS represent the ith historical sample and its corresponding Euclidean distance to the new

21

query sample xnew . Other relevant similarity measures include vector-angle based and relevance based 6

ACS Paragon Plus Environment

Page 7 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

1

methods12, 24, among others. The problem of these similarity measures is that the output of historical data is

2

totally neglected, which may result in loss of important information and have negative impact on the

3

prediction accuracy of developed soft sensors.

4

2.2

Both the input and output information of historical and query sample are used

5

To solve the problem mentioned in sub-section 2.1, an intuitive solution is to take the output data into

6

consideration so that the similarity of both input and output data can be measured. Unfortunately, the output

7

part in the query sample is always inaccessible. One solution is to replace the true output corresponding to

8

the query inputs by the predicted outputs

9

conventional methods like PCR 25. Then the output distance between the jth training sample and the query

10

y% new

based on the available data ( xnew , X and Y) using

sample can be further defined as Di , y 

11

yi  y%new N

 j 1

(2)

y j  y%new

12

The distances defined in Eqs. (1) and (2) are then integrated into a single similarity measure. Compared

13

to the approach in sub-section 2.1, this approach may improve the modeling accuracy to some extent since

14

the output is also involved. However, it should be noticed that the performance of sample selection highly

15

depends on the accuracy of predicted outputs y%new . This may cause problem when the prediction of output is

16

not that reliable.

17

2.3

Inputs of query data and all historical data are used

18

To solve the problems of similarity measures used in sub-section 2.1 and 2.2, another method is

19

proposed in Ref.20. Compared with the methods proposed in subsection 2.2, this method does not require

20

prediction of the output of the query sample and avoids additional modeling errors in the similarity measure. 7

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1

Instead, a supervised structure is established based on the training data at first. Then both historical samples

2

and the query sample are projected into the structures so that the similarities between corresponding score

3

vectors can be measured. Here, a simple example is given. Suppose there are three predictor variables x1 ,

4

x2 , x3 , which have the following correlation with the key variable y  x12  x22 . The query sample is given

5

as SQ (0, 0, 0) while three candidate historical samples are S1 (1, 0, 2), S2 (0, 1, 2), S3 (2, 0, 0). Apparently,

6

based on the criterion in sub-section 2.1, S3 is likely to be the most similar sample since it is the closest

7

sample to SQ . However, it is noticed that the key variable y has nothing to do with x3 , which indicates that

8

it is inappropriate to employ Euclidean distance directly onto the raw data in the similar sample selection.

9

The above example shows the importance of variable-wise relationship in sample selection procedure.

10

Compared with previous non-supervised methods, the third kind of methods better utilizes the relationship

11

between variables during the sample selection procedure.

12

This is achieved using the supervised structure of PLS. In PLS, score vectors of the inputs can be

13

obtained by projecting the raw data into the latent space and the relationship between the score vectors and

14

the output can be constructed simultaneously. By using the latent space structure of PLS, both the input and

15

output information are considered in the modeling stage. It does not require an additional modeling stage;

16

hence it can be more effective than the method proposed in Section 2.2.

17

The problem of the method proposed in Ref.20 is that the Gaussian distributed assumption of PLS limits

18

its application in many non-Gaussian situations. Also, a PLS model is built by maximizing the covariance of

19

two latent variables. That means the variables in the latent space only reveal the second order information.

20

Actually, variables in most modern industries, such as ammonia processes and polymerization reaction

21

processes, usually contain high order correlation. In these processes, PLS method is generally not suitable. 8

ACS Paragon Plus Environment

Page 8 of 40

Page 9 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

1

In this paper, a novel SNGLS based method is designed to overcome the shortcomings of PLS based

2

approaches. The details of the SNGLS are discussed in Section 4. Before that, some preliminaries will be

3

presented regarding the SVDD classifier and the NGR method.

4

3. Preliminaries

5

3.1

SVDD

6

The SVDD is an effective approach to determine whether one sample is similar to a reference dataset

7

for non-Gaussian distributed data. It has been widely used as a similarity measure in many occasions26, 27. It

8

constructs a hypersphere with a minimal volume to envelop as many samples as possible. The hypersphere

9

contains two important parameters: center a and radius Rs. Suppose WL samples are included in the reference

10

dataset, the parameters of the hypersphere can be derived as follows min F ( Rs , a)  Rs 2  C  l

11

l

2

s.t. si  a  Rs  i , i  0, i  1, 2,...,WL

(3)

2

12

where si indicates corresponding data while C and i represent tuning parameter and slack variable,

13

respectively. The above optimization problem can be transformed into the following form max  hi ( si , si )   hi h j ( si , si ) a

14

i

i

s.t.  hi  1,

j

(4)

hi  [0, C ]

i

15

where hi represents the coefficient of the ith support vector and the two parameters of the hypersphere can

16

be further given as follows a  hi si

17

(5) R  ( sk , sk )  2 hi ( sk , si )   hi h j ( si , s j ) i

i

j

18

where sk , k  1,..., K are corresponding support vectors. The closer a sample is from the center, the higher

19

similarity it has with the reference dataset. On the other hand, if a new sample falls out of the hypershpere, it 9

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1 2

Page 10 of 40

means that the sample is likely to be an outlier. 3.2

NGR

3

In our previous work, the NGR method is introduced to give a novel solution for non-Gaussian

4

regression 28, which is an extension of the ICA method. In ICA, independent components and their weight

5

vectors are obtained by maximizing the negentropy of the latent variable in the input data space. However, it

6

does not consider the relationship between input and output. In order to consider the relationship between

7

input and output variables, the NGR method was proposed in our previous work by introducing the mutual

8

information between the extracted latent variables from input and output space. In this way, both the non-

9

Gaussianity and the relationship between input and output space are considered. Different from ICA,

10

independent components are extracted by solving the following optimization problem  wi , ci  arg max{ J ( wiT  x X O )   I ( wiT  x X O , ciT  yYO )   J ( ciT  yYO )}, i  1,..., d i wi , ci

11

(6) s.t. w wi  1, c c  1 T i

T i i

12

where Xo and Yo represent the original data. wi , ci and di represent corresponding weight vectors and

13

number of latent variables in the NGR method, respectively. I ( wiT X O , ciT  yYO ) indicates the mutual

14

information between two independent components where  x and  y are the whitening matrices of Xo and

15

Yo, respectively. J (g) represents the negentropy operator. It should be noted that the first and third terms in

16

Eq.(6) have the same form with the optimal problem in traditional independent component analysis (ICA).

17

When data are Gaussian distributed, J ( wiT  x X O ) and J ( ciT  yYO ) equals to zero. In ICA method, the non-

18

Gaussian components can be obtained through the maximization of these terms while the correlation between

19

two different spaces is neglected. To overcome this problem, the NGR method tries to make a balance

20

between data non-Gaussianity and data correlation in Eq.(6). When   1,   0,   0 or   0,   0,   1 , 10

ACS Paragon Plus Environment

Page 11 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

1

latent variables in Eq.(6) reduce to independent components in ICA. The optimal problem becomes the

2

maximization of correlation coefficient if Xo and Yo are Gaussian distributed. For more details about the NGR

3

method, please refer to our previous work 28.

4 5

4. Supervised non-Gaussian latent structure (SNGLS)

6

The supervised modeling framework mentioned in sub-section 2.3 constructs a latent space for sample

7

selection. However, most of current methods are not suitable for modeling of non-Gaussian distributed data

8

in multimode processes. To solve the problem, a supervised non-Gaussian structure is applied to replace PLS

9

in the sample selection. In this paper, the NGR method is extended to form a non-Gaussian latent space model.

10

In order to clearly illustrate the method, process data are assumed to be stable in this section. In next section,

11

the non-Gaussian regression method will be extended to a locally weighted form to cope with the time varying

12

feature of the multimode processes.

13 14 15

4.1

The model structure of SNGLS

Suppose that Xo (n×m) and Yo (n×k) are the original input and output, respectively. The model structure of SNGLS can be presented as follows X O  TP T  E

16

(7) YO  UQ  F T

17

where T   t1 , t2 , , td  and U   u1 , u2 , , ud  are the score matrices. P and Q are the corresponding

18

loading matrices, E and F are the residual matrices for Xo and Yo. Note that Eq.(7) has a similar structure as

19

the PLS model where the weight vectors are obtained by maximizing the following covariance of latent

20

variables 29 11

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 40

 w j , c j  arg max cov ( wTj  x X O , c Tj  yYO ), j  1,..., d j w j ,c j

1

(8) s.t. w w j  1, c c j  1 T j

T j

2

where w j and c j are the corresponding weight vectors of the PLS method.  x and  y are the

3

whitening matrices of Xo and Yo, respectively. dj represents the number of latent variables in the PLS method.

4

It is noticed that covariance is only a second order statistic which cannot well describe the behavior of higher

5

order information. Also, the basic assumption that data in multimode processes are Gaussian distributed is

6

often invalid in practice. In order to deal with non-Gaussian distributed datasets, a non-Gaussian calibration

7

framework is proposed by replacing the covariance in Eq.(8) using mutual information in Eq.(6). The next

8

sub-section discusses the relationship between SNGLS and PLS.

9

4.2

Relationship between SNGLS and PLS

10

Before discussing the relationship between SNGLS and PLS, the concept of mutual information should

11

be introduced. According to previous references 22, 30, the mutual information between two random variables

12

f and g can be estimated as I ( f , g)  H ( f )  H ( g)  H ( f , g)

13

(9)

14

where H ( f )    p( f )logp( f )df and

15

variables, respectively. p( f ) and p( g ) indicate the corresponding marginal probability density functions.

16

H ( f , g )     p( f , g ) log p( f , g )dfdg is the joint entropy of variables f and g with p( f , g ) representing

17

the joint probability density function. Several methods have been proposed to solve the problem of mutual

18

information estimation

19

According to previous works 31, the marginal entropy can be estimated as follows

22, 30, 31.

H ( g )    p( g )logp( g )dg represent the marginal entropies of two

Among these methods, the Edgeworth expansion is applied in this paper.

12

ACS Paragon Plus Environment

Page 13 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

H ( p )  H( p ) 

1

1 d 1 d i ,i ,i 2 ( )     ( i ,i , j )2 12 i 1 4 i , j 1,i  j

(10)

d 1 ( i , j ,k ) 2   72 i , j ,k 1,i  j  k

2

where  p is a Gaussian density function of mean  and covariance  2 , (i, j, k), (i, j, k, l) and (i, j, k, l, p,

3

q) are input dimensions. hi , j ,k , hi , j ,k ,l

4

dimensions.  i , j ,k ,  i , j ,k ,l and  l , p ,q represent the standardized cumulants with corresponding dimensions.

5

Eq.(10) gives a comprehensive approach to estimate the marginal and joint entropy in Eq.(9). Then the

6

corresponding entropy can be obtained 31.

and hi , j ,k ,l , p ,q

are the Hermite polynomial with corresponding

1 1 1 E ( f 3 )2 H ( f )  ln  2f  ln 2   2 2 2 12( 2f )3 1 1 1 E ( g 3 )2 H ( g )  ln  g2  ln 2   2 2 2 12( g2 )3

7

1 1  E ( f 3 )2 E ( g 3 )2   H ( f , g )  ln( 2f  g2  ( 2fg ) 2 )  ln 2  1    2 12  ( 2f )3 ( g3 )3 

(11)

1  E ( f 2 g )2 E ( g 2 f )2    2 2 2  2 2 2 4  ( f )  g ( g )  f 

8

where E () represents the expectation of the variable.  2f ,  g2 and  2fg indicate the variance and

9

covariance of f and g, respectively. Substitute Eq.(11) into Eq.(9), the mutual information between the two

10

random variables can be rewritten as 2 2 1  E ( f 2 g ) 2 E ( g 2 f ) 2  1   ( )   I ( f , g )   2 2 2  2 2 2    ln  1  2fg 2   4  ( f )  g ( f )  g  2    f  g  

11

(12)

f  wiT  x X O and g  ciT  yYO . Note that the variances of both f and g equal to 1 since the

12

Suppose

13

existences of whitening matrices  x ,  y and constraints in Eq.(6). Therefore, Eq.(12) can be modified as

14

follows

15 16

I ( f , g) 



1  E ( f 2 g )2  E ( g 2 f )2   12 ln 1  ( 2fg )2  4



(13)

Substitute Eq.(13) into Eq.(6), the pair of weight vectors and the novel supervised structure in Eq.(7) 13

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1

can then be obtained. It should be noted that the cost function in Eq.(8) is already involved in the second term

2

of Eq.(13), which means the covariance-based supervised structure is considered as a special case of SNGLS.

3

Compared with the PLS method, SNGLS retains extra high order information, which improves the modeling

4

performance when data are non-Gaussian distributed. In the next section, SNGLS will be extended into a

5

locally weighted form to deal with multimode processes regression.

6 7

5. Locally weighted non-Gaussian regression for multimode processes

8

In this section, a novel framework involving sample similarity measure and locally weighting scheme

9

is designed for time-varying processes. For the purpose of sample selection and online modeling, a new

10

similarity measure is proposed by integrating the window confidence and between-sample local similarities.

11

The flowchart of the proposed method is shown in Figure1. Firstly, in order to handle the time-varying

12

characteristics of the process, the training data are divided into a series of overlapping windows, each of

13

which is considered stable so that it can be modeled by a single SNGLS model. The window confidence with

14

respect to the current query sample is quantified by SVDD. In order to get the between-sample similarities,

15

the SNGLS is constructed in each moving window to set up a series of local models. After that, both training

16

samples and the current query sample are projected onto the local models to obtain local similarities. The

17

global similarity between the same pair of training sample and query sample is then derived through the

18

integration of the local similarities and corresponding window confidence. Those global similarities are

19

considered as weights and assigned to each training sample, which finally leads to the locally weighted non-

20

Gaussian regression solution. In summary, the locally weighted non-Gaussian regression method consists of

21

four steps: 1) window confidence calculation, 2) local similarity measurement, 3) global similarity 14

ACS Paragon Plus Environment

Page 14 of 40

Page 15 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

1

measurement, 4) modeling and output prediction. These steps will be explained in detail in the following four

2

sub-sections.

3

[Figure 1 about here.]

4

5.1

Window confidence

5

In continuous multimode processes, it is always reasonable to assume that the data characteristics of

6

neighboring samples are similar, which makes it feasible to model the relationship between the input and

7

output variables within a local area through a single SNGLS. It is important to determine the window

8

confidence by quantifying the similarity between the query sample and the data window. The schematic of

9

window confidence calculation is shown in Figure 2. The training samples are first divided into a series of

10

overlapping windows with the length of L. In order to reduce computation load and avoid updating the SVDD

11

classifier each time the window slides forward, a window step of M was introduced, which can be determined

12

according to the discussion of Zhu et al.32 Local information can be well captured using the moving window

13

strategy and overlapping structure can also guarantee the consistency of process modeling.

14

Since the process data are assumed to be non-Gaussian distributed, SVDD is applied to determine the

15

window confidences between the query sample and each moving window. The confidence of the  th

16

window is defined as follows Rs ,

17

WS 

18

where   represents the query sample projected in the kernel space, and a and Rs , represent the center

19

and radius of the hypersphere in the  th window, respectively.

20

between the query sample and the center in the high dimension space. For the window confidence defined in

   a

(14)

2

   a

15

ACS Paragon Plus Environment

is the corresponding distance

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 40

1

Eq.(15), a greater WS value indicates that the query sample is more similar to the reference data in the

2

moving window.

3

[Figure 2 about here.]

4

5.2

Between-sample similarity computation

5

On the basis of the window division, a series of SNGLS models can be established in corresponding

6

moving windows. In order to capture the local information, the between-sample similarity is estimated based

7

on the SNGLS model in each moving window. The schematic of between-sample similarity measurement is

8

shown in Figure 3. Suppose that the rth training sample belongs to the  th moving window. An SNGLS

9

can be established using the samples in the window. Then both the input of training sample xr and the

10

current query sample xq are projected into the latent space to obtain corresponding latent scores tr and

11

tq . Sample distance based on the  th SNGLS can then be modified into the following form DIS  ( tq  tr )T ( tq  tr )

12 13

(16)

The local similarity between the two samples is further defined as follows Sim ,r 

14

1 DIS

(17)

15

where Sim ,r represents the between-sample local similarity between the rth training sample and the query

16

sample under the  th SNGLS.

17

[Figure 3 about here.]

18

5.3

Integrated similarities

19

In order to consider both the between-sample similarity and the mode information, a new similarity

20

measure is obtained by integrating the between-sample similarity and the window confidence. It should be

21

noted that by using the overlapping moving window, each sample may be assigned into several overlapping 16

ACS Paragon Plus Environment

Page 17 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

1

windows, making it difficult to determine which local similarity to use for the similarity integration. Assume

2

that the rth training sample is assigned to  different moving windows. Hence, there are  different

3

between-sample similarities between each training sample and the query sample. To handle this problem, the

4

integrated similarity is defined as follows 

Simr  WS Sim ,r

5

(18)

 =1

6

where Simr represents the integrated global similarity between the rth training sample and the query sample.

7

In fact, many kinds of integrating strategies can be used to get the integrated similarity from the window

8

confidence and between-sample similarities, for example, simply summing them up. However, by

9

multiplying them, both the window confidence and between-sample similarities should be relative great to

10

produce a significant similarity; while other integrating strategies like summing them up are not that effective.

11

In the integration, the local similarity Sim ,r will be given a great weight when corresponding window

12

confidence is large. On the contrary, the local similarity is unreliable if the window does not have a strong

13

relationship with the query sample. Eq.(18) tries to balance different local similarities among different

14

moving windows with respect to the same pair of samples. Compared with local similarities, the global

15

similarity can handle similarity measurement when the process is time-varying. Furthermore, it should be

16

noted that SNGLS parameters can be obtained in advance in the offline procedure, which will greatly reduce

17

the computation complexity.

18

The schematic of global similarity measurement is shown in Figure 4.

19

[Figure 4 about here.]

20

5.4

Output prediction using locally weighted structure

21

On the basis of the previous sub-section, the global similarities of each training samples respect to the 17

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 40

SIM   Sim1 Sim2  Simr  Simn  . These similarities are then

1

current query sample are obtained as

2

considered as weights and assigned to training samples for online modeling and output prediction. The locally

3

weighted training datasets XLW and YLW can be obtained as follows X LW  SIM gX O

4

(19) YLW  SIM gYO

5

Substitute the modified training datasets into eq.(7) T X LW  TLW PLW  ELW

6

(20) YLW  U LW Q

T LW

 FLW

7

ˆ where TLW, ULW, PLW, QLW, ELW and FLW are the corresponding parameters and the unmixing matrix W LW

8

can be obtained simultaneously. The regression parameter 

9

ˆ independent score vectors t =xq  W can be derived using least square regression, which is shown as LW

10

between key variables and extracted

follows Yˆ   T t + e

11

(21)

 = X , X

LW YLW



T ˆ ˆT ˆ TX LW W LW WLW  X LW  X LW  X LW WLW

12

where  X

13

modified training data. Yˆ represents the predicted output variables.

LW YLW

LW



1

and  X are the covariance matrix, whitening matrix and variance matrix of the LW

14

In the latent structure of Eq.(20), both the sample importance and variable importance are considered in

15

the locally weighted SNGLS, which makes it more applicable in the soft-sensing of multimode processes.

16

Latent variables that can best describe the input-output relationship are chosen to measure the sample

17

similarity while the sample importance is represented by different weights assigned to training samples.

18

Compared with other locally weighted methods, the proposed method can better handle the variable 18

ACS Paragon Plus Environment

Page 19 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

1

prediction when the process data are non-Gaussian distributed and time-varying. In next section, the

2

effectiveness of the proposed method will be demonstrated on a simulation benchmark and a real industrial

3

process. The main steps of the locally weighted non-Gaussian regression are summarized as follows:

4

1) Divide the training samples into a series of moving windows;

5

2) Calculate window confidences between each moving window and the query sample using Eq.(14);

6

3) Establish a series of SNGLS models using the data in each moving window;

7

4) Project the training samples in the window and the query sample into the corresponding SNGLS;

8

5) Calculate local similarities using Eq.(16) and Eq.(17);

9

6) Calculate global similarities using Eq.(18);

10

7) The above global similarities are considered as weights and assigned to each training samples;

11

8) Construct a locally weighted non-Gaussian latent structure using Eq.(20);

12

9) Predict key variables using Eq.(21);

13

10) Turn to next query sample and repeat from step 3) to step 9) until all online samples are predicted.

14 15

6. Case study 6.1

Case 1: TE benchmark

16

The Tennessee Eastman (TE) industrial process has been widely applied to test the performance of

17

process monitoring and soft sensing. The TE process includes 41 measurement variables and 12 manipulated

18

variables with 5 basic operation units. A detailed description of the process can be referred to Ref33. In order

19

to test the proposed method, 10000 training samples operated under two stable modes and one transition

20

mode are collected. For the purpose of soft sensor modeling, process variables with small variations are not

21

considered in order to avoid singularity. As a result, 28 variables are selected as input variables and the 19

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1

variable-A constituent in stream 6 is selected as the output variable, which are shown in Table 1. Hence, the

2

dimension of training data is 10000×29. Meanwhile, another 10000 samples are chosen as the test samples

3

which are collected under the same operation conditions as the training data. The dimension of online query

4

samples is 10000×28. According to Ref.32, window length L and window step M are set to be 20 and 4,

5

respectively. For the SNGLS method, the parameters  and  are set to 0.8 and 0.2 according to Ref.28.

6

[Table 1 about here.]

7

For comparison, LWNGR (Locally weighted Non-Gaussian Regression) and LWPLS (Locally weighted

8

Partial Least Squares) methods are also tested28. In LWNGR and LWPLS, sample similarities are obtained

9

based on Euclidean distance. These similarities are then designated as weights and assigned to each training

10

sample. The prediction results are shown in Figure 5, with subplot (a) showing the results of the proposed

11

method, subplots (b) and (c) showing the results of LWNGR and LWPLS. It can be seen that the proposed

12

methods can well track the quality variable even when fluctuation occurs at about 5000th -6000th sample

13

points.

14

On the other hand, LWNGR can also track the quality variable, however, not as well as the proposed

15

method. In sharp contrast, the performance of LWPLS deteriorates when sudden changes occur in process

16

condition. This is expected, as the basic assumption of LWPLS method that the process data follows Gaussian

17

distribution is not valid, making the similarity measure in LWPLS not reliable.

18 19 20

[Figure 5 about here.] To better evaluate the performance, the root mean squares error (RMSE) and the R 2 statistic are considered, which are defined as

20

ACS Paragon Plus Environment

Page 20 of 40

Page 21 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

1 N test

RMSE=

N test

1 R  1 2

(y i 1 N test

i

N test

 ( yˆ - y ) i

i 1

2

i

 yˆ ) 2

 ( y  yˆ )

(22)

2

i 1

2

where yi , y and yˆ i are the real value, average value and estimated value. N test is the number of test

3

samples. A higher RMSE value indicates worse accuracy and higher R 2 value indicate better accuracy. The

4

RMSE and R 2 values of all methods are shown in Table 2. It can be seen that the proposed method has a

5

better prediction performance in terms of both the RMSE and R 2 values.

6

[Table 2 about here.]

7

6.2

8

In this subsection, pre-decarburization absorption unit of a real industrial ammonia synthesis process is

9

considered. This unit is a critical component in the ammonia synthesis process which produces source

10

material NH3. In ammonia synthesis process, hydrogen is considered as one of the most critical variables.

11

Hydrogen is mainly generated from the raw material methane. Therefore, the pretreatment of the ammonia

12

synthesis process includes the methane transforming procedure. Classical methane transformation usually

13

includes three parts: the pre-reformer, the primary reformer and the secondary reformer. According to

14

practical operation, the transformation procedure is mainly triggered in the primary reformer. The natural

15

raw gas is the mixture of NH3 and CO2. Carbon dioxide should be eliminated from the natural gas. The Pre-

16

Decarburization absorption unit aims at absorbing the carbon dioxide in the raw natural gas, and this unit is

17

expected to be optimized in order to enhance the performance of hydrogen production processes. For more

18

details about the ammonia processes, interested readers can refer to Ref.33.

19

Case 2: Pre-decarburization absorption unit

In this case study, the content of residual CO2 (AI15002.PV) is considered as the quality variable and 21

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1

another 20 process variables are collected as input variables, which are shown in Table 3.

2

[Table 3 about here.]

3

In this case, the training dataset containing 6400 samples with fluctuations in the supplies are considered.

4

Another 6500 samples are used as test samples, which are collected under the same conditions as the training

5

data. Again, LWNGR and LWPLS are introduced for comparison. The prediction results are shown in Figure

6

6, with subplot (a) showing the results of the proposed method, subplots (b) and (c) showing the results of

7

LWNGR and LWPLS.

8

In Figure 6(c), the trends of the key variable can be roughly tracked (such as big changes around the

9

1000th sample point) using LWPLS due to the local weighted structure. However, it is noted that the

10

prediction deviations are relatively large across the whole process. This is expected, as in the LWPLS method,

11

the similarities between training samples and the query sample are measured according to Euclidean distance,

12

which may be inaccurate due to the non-Gaussianity of the process data. Furthermore, the latent structure of

13

PLS is constructed by maximizing the covariance between the predictor variables and the quality variable,

14

which neglecting the high order information. In Figure 6(b), a similar situation occurs when the LWNGR

15

method is applied. Compared to LWPLS method, the non-Gaussianity of process variables is considered in

16

the latent space construction. It can be seen that the prediction performance is improved comparing to that of

17

LWPLS. The residual content of CO2 can be well tracked when the process is relatively stable. Unfortunately,

18

the sample selection steps in LWPLS and LWNGR remain the same, making it difficult to track the frequent

19

changes occurring in the pre-decarburization unit. This can be seen from the comparison between Figure 6(a)

20

and Figure 6(b). The result indicates that the proposed method can better predict the content of residual CO2

21

even though strong fluctuations occur in the process. In order to better evaluate the prediction performance, 22

ACS Paragon Plus Environment

Page 22 of 40

Page 23 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

1

the RMSE and R 2 values are also considered and shown in Table 4, which further confirmed that our

2

method outperforms both LWNGR and LWPLS.

3

[Figure 6 about here.]

4

[Table 4 about here.]

5

7. Conclusion

6

In this work, a novel quality-related locally weighted non-Gaussian regression method is proposed for

7

multimode processes. Firstly, the SNGLS is proposed to depict the variable relationship in non-Gaussian

8

distributed data. Both the high order and low order information are considered in the latent space construction.

9

Secondly, a locally weighted approach is introduced to cope with the time-varying characteristics in the

10

process. Different from classical JITL method and its variants, sample selection is implemented under a

11

supervised manner where the latent space of SNGLS is used. Moving window strategy is applied to separate

12

the whole process into several overlapping segments, each of which is described by an SNGLS. The same

13

pair of training sample and query sample are then projected onto relevant SNGLS models to obtain a series

14

of local similarities. Simultaneously, the window confidence is evaluated by the SVDD method. On the basis

15

of the above discussion, the final similarity can be derived through the combination of local similarities and

16

window confidence. These similarities are considered as weights and assigned to original data. After that,

17

the new locally weighted dataset is adopted for non-Gaussian regression. Finally, algorithm effectiveness is

18

validated on the TE benchmark and the pre-decarburization absorption unit. The results show that the

19

proposed method can well predict the key variables.

20 21

Acknowledgement 23

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1 2

This work is supported by Zhejiang Provincial Natural Science Foundation (LQ19F030007) and National Natural Science Foundation of China (61673358).

3

24

ACS Paragon Plus Environment

Page 24 of 40

Page 25 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

1

Reference

2

1.

3

squares with orthogonal signal correction. Chemometrics and Intelligent Laboratory Systems 2005, 79, (1–

4

2), 22-30.

5

2.

6

Intelligent Laboratory Systems 2001, 58, (2), 109-130.

7

3.

8

independent component regression. Chemometrics and Intelligent Laboratory Systems 2009, 98, (2), 143-

9

148.

Kim, K.; Lee, J.-M.; Lee, I.-B., A novel multivariate regression approach based on kernel partial least

Wold, S.; Sjöström, M.; Eriksson, L., PLS-regression: a basic tool of chemometrics. Chemometrics and

Zhang, Y.; Zhang, Y., Complex process monitoring using modified partial least squares method of

10

4.

Massy, W. F., Principal Components Regression in Exploratory Statistical Research. Journal of the

11

American Statistical Association 1965, 60, (309), 234-256.

12

5.

Heidelberg, S. B., Principal Component Regression. Betascript Publishing 2010, 1954-1954.

13

6.

Surhone, L. M.; Timpledon, M. T.; Marseken, S. F.; Correlation, C.; Squares, T. S. O.; Analysis, R.,

14

Principal Component Regression. Springer Berlin Heidelberg: 2013; p 1954-1954.

15

7.

16

2007.

17

8.

18

Window Partial Least Squares. Industrial & Engineering Chemistry Research 2016, 49, (22), 11530-11546.

19

9.

20

Journal of Machine Learning Research 2002, 2, (2), 97-123.

21

10. Bai, Y.; Jian, X.; Long, Y. In Kernel Partial Least-Squares Regression, IEEE International Joint

22

Conference on Neural Network Proceedings, 2006; 2006; pp 1231-1238.

23

11. Feng, R.; Shen, W.; Shao, H. In A soft sensor modeling approach using support vector machines,

24

American Control Conference, 2003. Proceedings of the, 2003; 2003; pp 3702-3707 vol.5.

25

12. Zheng, X. X.; Feng, Q., Soft Sensor Modeling Based on PCA and Support Vector Machines. Journal of

26

System Simulation 2006, 18, (3), 739-741.

27

13. Ge, Z.; Song, Z., A comparative study of just-in-time-learning based methods for online soft sensor

Abdi, H., Partial Least Square Regression PLS-Regression. Encyclopedia of Measurement & Statistics

Liu, J.; Chen, D. S.; Shen, J. F., Development of Self-Validating Soft Sensors Using Fast Moving

Rosipal, R.; Trejo, L. J., Kernel partial least squares regression in reproducing kernel hilbert space.

25

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1

modeling. Chemometrics & Intelligent Laboratory Systems 2010, 104, (2), 306-317.

2

14. Cybenko, G., Just-in-Time Learning and Estimation. Nato Asi 1996.

3

15. Cheng, C.; Chiu, M. S., A new data-based methodology for nonlinear process modeling. Chemical

4

Engineering Science 2004, 59, (13), 2801-2810.

5

16. Cheng, C.; Chiu, M.-S., Nonlinear process monitoring using JITL-PCA. Chemometrics and Intelligent

6

Laboratory Systems 2005, 76, (1), 1-13.

7

17. Chen, M.; Khare, S.; Huang, B., A unified recursive just-in-time approach with industrial near infrared

8

spectroscopy application. Chemometrics & Intelligent Laboratory Systems 2014, 135, (14), 133-140.

9

18. Chen, M.; Khare, S.; Huang, B.; Zhang, H.; Lau, E.; Feng, E., Recursive Wavelength-Selection Strategy

10

to Update Near-Infrared Spectroscopy Model with an Industrial Application. Industrial & Engineering

11

Chemistry Research 2013, 52, (23), 7886-7895.

12

19. Chang, S. Y.; Baughman, E. H.; Mcintosh, B. C., Implementation of Locally Weighted Regression to

13

Maintain Calibrations on FT-NIR Analyzers for Industrial Processes. Applied Spectroscopy 2001, 55, (9),

14

1199-1206.

15

20. Yuan, X.; Huang, B.; Ge, Z.; Song, Z., Double locally weighted principal component regression for soft

16

sensor with sample selection under supervised latent structure. Chemometrics & Intelligent Laboratory

17

Systems 2016, 153, 116-125.

18

21. Zhao, C.; Gao, F.; Wang, F., An improved independent component regression modeling and quantitative

19

calibration procedure. AIChE Journal 2010, 56, (6), 1519-1535.

20

22. Kraskov, A.; Stögbauer, H.; Grassberger, P., Estimating mutual information. Physical Review E 2004,

21

69, (6), 066138.

22

23. Rashid, M. M.; Yu, J., A new dissimilarity method integrating multidimensional mutual information

23

and independent component analysis for non-Gaussian dynamic process monitoring. Chemometrics and

24

Intelligent Laboratory Systems 2012, 115, (0), 44-58.

25

24. Chen, K.; Liang, Y.; Gao, Z.; Liu, Y., Just-in-Time Correntropy Soft Sensor with Noisy Data for

26

Industrial Silicon Content Prediction. Sensors 2017, 17, (8), 1830.

27

25. Wang, Z.; Isaksson, T.; Kowalski, B. R., New approach for distance measurement in locally weighted

28

regression. Analytical Chemistry 1994, 66, (2), 249-260. 26

ACS Paragon Plus Environment

Page 26 of 40

Page 27 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

1

26. Zhu, Z.; Song, Z.; Palazoglu, A., Transition Process Modeling and Monitoring Based on Dynamic

2

Ensemble Clustering and Multiclass Support Vector Data Description. Industrial & Engineering Chemistry

3

Research 2011, 50, (24), 13969-13983.

4

27. Yao, M.; Wang, H.; Xu, W., Batch process monitoring based on functional data analysis and support

5

vector data description. Journal of Process Control 2014, 24, (7), 1085-1097.

6

28. Zeng, J.; Xie, L.; Kruger, U.; Gao, C., A non-Gaussian regression algorithm based on mutual

7

information maximization. Chemometrics and Intelligent Laboratory Systems 2012, 111, (1), 1-19.

8

29. Chiang, L. H., Data-driven Methods for Fault Detection and Diagnosis in Chemical Processes. 2000.

9

30. Tsimpiris, A.; Vlachos, I.; Kugiumtzis, D., Nearest neighbor estimate of conditional mutual information

10

in feature selection. Expert Systems with Applications 2012, 39, (16), 12697-12708.

11

31. Hulle, M. M. V., Edgeworth Approximation of Multivariate Differential Entropy. MIT Press: 2005; p

12

1903-1910.

13

32. Zhu, Z.; Song, Z.; Palazoglu, A., Process pattern construction and multi-mode monitoring. Journal of

14

Process Control 2012, 22, (1), 247-262.

15

33. Filippi, E., AMMONIA SYNTHESIS PROCESS. In US: 2006.

16 17

27

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 28 of 40

1

List of Figures

2

Figure 1. The flowchart of the proposed non-Gaussian regression method

14

3

Figure 2. The schematic of process localization and window confidence calculation

15

4

Figure 3. The schematic of between-sample local similarity measurement

16

5

Figure 4. The schematic of global similarity measurement

17

6

Figure 5. Comparison of prediction performance between the three methods in the TE process

20

7

Figure 6. Comparison of prediction performance between the three methods in the pre-decarburization

8

absorption unit

22

9

28

ACS Paragon Plus Environment

Page 29 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1 2

Industrial & Engineering Chemistry Research

Figure 1. The flowchart of the proposed non-Gaussian regression method

3

29

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1 2

Figure 2. The schematic of process localization and window confidence calculation

3

30

ACS Paragon Plus Environment

Page 30 of 40

Page 31 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

1 2

Figure 3. The schematic of between-sample local similarity measurement

3

31

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1 2

Figure 4. The schematic of global similarity measurement

3

32

ACS Paragon Plus Environment

Page 32 of 40

Page 33 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

1

2 3

Figure 5. The comparison of prediction performance between the three methods in the TE process: (a) the

4

proposed method; (b) LWNGR; (c) LWPLS

5

33

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1

2 3

Figure 6. Comparison of prediction performance between the three methods in the pre-decarburization

4

absorption unit: (a) the proposed method; (b) LWNGR; (c) LWPLS

5

34

ACS Paragon Plus Environment

Page 34 of 40

Page 35 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

1

List of Tables

2

Table 1. The predictor variables and output variable in the TE process

20

3

Table 2. The comparison of RMSE and R 2 statistics in the TE process

20

4

Table 3. The predictor variables and output variable in the pre-decarburization absorption unit

22

5

Table 4. The comparison of parameter RMSE and R 2 statistics in the pre-decarburization absorption unit 22

6

35

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1

Table 1. The predictor variables and output variable in the TE process No

Predictor variables

No

Predictor variables

1 2 3 4 5 6 7 8 9 10 11 12 13 14

A feed (stream 1) D feed (stream 2) E feed (stream 3) A & C feed (stream 4) Recycle flow (stream 8) Reactor feed rate (stream 6) Reactor pressure Reactor level Reactor temperature Purge rate (stream 9) Product separator temperature Product separator level Product separator pressure Product separator underflow (stream 10)

15 16 17 18 19 20 21 22 23 24 25 26 27 28

Stripper level Stripper pressure Stripper underflow (stream 11) Stripper temperature Stripper steam flow Reactor cooling water outlet temperature Separator cooling water outlet temperature D feed flow (stream 2) E feed flow (stream 3) A feed flow (stream1) A & C feed flow (stream 4) Separator pot liquid flow (stream 10) Stripper liquid product flow (stream 11) Reactor cooling water flow

2 3

36

ACS Paragon Plus Environment

Page 36 of 40

Page 37 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1

Industrial & Engineering Chemistry Research

Table 2. The comparison of RMSE parameter and R 2 statistics in the TE process

The proposed method

LWNGR

LWPLS

RMSE value

0.1418

0.2497

1.1431

R 2 statistics

0.9026

0.6981

-5.3266

2 3

37

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1

Table 3. The predictor variables and output variable in the pre-decarburization absorption unit

Tags

Predictor variables

FI15001.PV

The Flow-rate of Feed NG

LI15001.PV

The Level of 15-F001

PDI15001.PV

The Pressure Difference of 15-F001

PI15001.PV

The Pressure of Feed NG

TI15001.PV

The Temperature of Feed NG

LIC15002.PV

The Level of 15-F002

PDI15002.PV

The Pressure Difference of 15-C001

PIC15002.PV

The Pressure of Process Gas at 15-F001

TI15002.PV

The Temperature of Process Gas at 15-F002

PI15003.PV

The Pressure of Process Gas at 15-F002

TI15003.PV

The Temperature of Process Gas at 15-C001

LI15004.PV

The Level #1 of 15-C001

PI15004.PV

The Pressure of Process Gas to 15-C001

LI15005.PV

The Level #2 of 15-C001

TI15005.PV

The Temperature in the Middle of 15-C001

LI15006.PV

The Level #3 of 15-C001

PI15006.PV

The Pressure of Process Gas at the Top of 15-C001

TI15006.PV

The Temperature of Amine Liquor to 15-C001

TI15007.PV

The Temperature of Process Gas at the Top of 15-C001

LIC15010.PV

The Level of Regeneration Column

Tag

Output variable

AI15002.PV

The content of Residual CO2 in the process gas

2 3

38

ACS Paragon Plus Environment

Page 38 of 40

Page 39 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1

Industrial & Engineering Chemistry Research

Table 4. The comparison of parameter RMSE and R 2 statistics in the pre-decarburization absorption unit

The proposed method

LWNGR

LWPLS

RMSE value

0.0667

0.1872

0.2845

R 2 statistics

0.9572

0.6644

0.2135

2 3

39

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29

ACS Paragon Plus Environment

Page 40 of 40