Soft Sensor Development Based on the Hierarchical Ensemble of

Jun 15, 2016 - Then, a set of the sample partition based ensemble Gaussian process ... the corresponding subspace training data sets by the Gaussian m...
2 downloads 0 Views 5MB Size
Article pubs.acs.org/IECR

Soft Sensor Development Based on the Hierarchical Ensemble of Gaussian Process Regression Models for Nonlinear and Non-Gaussian Chemical Processes Li Wang,† Huaiping Jin,*,† Xiangguang Chen,*,† Jiayu Dai,† Kai Yang,†,‡ and Dongxiang Zhang† †

Department of Chemical Engineering, Beijing Institute of Technology, Beijing 100081, People’s Republic of China Beijing Research & Design Institute of Rubber Industry, Beijing 100143, People’s Republic of China



S Supporting Information *

ABSTRACT: Chemical processes are often characterized by nonlinearity, non-Gaussianity, shifting modes, and inherent uncertainty that pose significant challenges for accurate quality prediction. Therefore, a novel soft sensor based on the hierarchical ensemble of Gaussian process regression models (HEGPR) is developed for the quality variable predication of nonlinear and non-Gaussian chemical processes. The method first creates a set of diverse input variable sets based on multiple random resampling data sets and a partial mutual information criterion. Then, a set of the sample partition based ensemble Gaussian process regression model (SP-EGPR) is built from different input variable sets and the corresponding subspace training data sets by the Gaussian mixture model. Next, those influential local SP-EGPR models obtained after partial least-squares (PLS) pruning are used for the first level of ensemble learning. Finally, the second level of ensemble learning is achieved by integrating the high-performance predictions from local SP-EGPR models into the overall prediction mean and variance by the Bayesian inference and finite mixture mechanism. The usefulness and superiority of the proposed HEGPR soft sensor is verified with the Tennessee Eastman chemical process and industrial rubber-mixing process.

1. INTRODUCTION In chemical processes, accurate real-time measurements of quality variables are highly desirable, which plays an important role in process control, monitoring, and optimization. However, such key variables are traditionally measured through either infrequent offline analysis or expensive online measuring devices, which may lead to poor quality and control performances, high cost, and even safety issues.1 Over the last two decades, soft sensor technology, also known as inferential estimation or virtual sensing, has gained fast-growing attention in both academia and industry.2−5 The critical quality variables can be estimated online with predictive models instead of using hardware instruments or offline laboratory analysis. Moreover, these virtual sensors are essentially computational programs and thus economically feasible and easy to implement in real world applications. The heart of a soft sensor is the predictive model that describes the mathematical relationship between the target variable (difficult-to-measure variable) and the secondary variables (easyto-measure variables). In general, soft sensors can be classified into two groups including the first principle models and datadriven models. The former is desirable for industrial operations due to its easy interpretability. However, it requires in-depth © XXXX American Chemical Society

mechanical knowledge of chemical processes, which is often unavailable for complex industrial processes. In contrast, the datadriven soft sensors only rely on the operating data and use the easy-to-measure variables as inputs of the models for online prediction. Modern instrument and measurement techniques allow large amounts of process data to be collected, stored, and analyzed, making the data-driven soft sensor a promising solution to online quality estimation. Therefore, a variety of data-driven soft sensors have been developed and applied to chemical processes.6−8 The most common conventional data-driven soft sensors used in chemical plants are the multivariate statistical techniques such as principle component analysis (PCA)9−11 and partial leastsquares (PLS),12,13 which project the original process variable onto latent space and build the predictive models within a lowdimensional space. These approaches have gained popularity because of their capability of handling the strong collinearity between process variables. However, chemical processes are usually Received: January 18, 2016 Revised: May 13, 2016 Accepted: June 15, 2016

A

DOI: 10.1021/acs.iecr.6b00240 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Article

Industrial & Engineering Chemistry Research

conquer” to build locally valid prediction models. The most common local learning methods are the just-in-time (JIT) learning and multimodel method.27,35,36 A JIT learning model is dynamically built as the need arises, and the local model is discarded after the prediction is obtained. However, due to the high computational complexity, JIT learning based soft sensors are not so competitive in terms of its real-time performance. For a multimodel method, a collection of local models are constructed offline over different local domains and the overall prediction results are obtained by selecting the most relevant local model based on a deterministic way or by combing multiple local models using certain weighting strategies. The latter is also known as ensemble learning methods, which is the focus of the present work. Ensemble learning has been proven to be a valuable tool to develop soft sensors.37,38 It combines a set of local predictions to obtain the final prediction instead of relying on a single local model, which can remarkably improve the prediction accuracy of the soft sensors, especially for the unstable models that are sensitive to initializations and small changes of process data, such as neural network and rule learning algorithms.39 The first task of the ensemble learning algorithm is to partition the process data into several subsets. The way of creating local partitions depends on the algorithm. Conventional approaches for this purpose include bagging,40,41 boosting,42 clustering,43 and the subspace method.44 The first three methods manipulate the training samples, and the last one manipulates the input variables to construct the described local models. Up to date, both manipulation operations have been applied to many different learning algorithms to create ensembles.45,46 In practical applications, however, local models of many available ensemble soft sensors are built only by manipulating the training samples, and the diversity of input variables is ignored. In practice, the input variable selection for the soft sensor modeling often exhibits a degree of uncertainty due to the differences in personal knowledge and experience or the data sets used to evaluate the relevance between variables. In addition, a single input variable set is not enough to capture all of the complex characteristics of a chemical process, and thus, the resulting ensemble models cannot consistently perform well. Therefore, it is appealing to combine the sample and variable based partition methods in attempts to enhance the prediction accuracy of ensemble soft sensors. To address the issues mentioned above, a novel soft sensor based on the hierarchical ensemble of Gaussian process regression models (HEGPR) is proposed in the present work. The HEGPR method employs a hierarchical model structure to perform the ensemble at two levels, including the sample partition (SP) based ensemble by manipulating training samples and the variable partition (VP) based ensemble by manipulating input variables. At the first level of ensemble in the HEGPR approach, multiple SP based ensemble GPR models (SP-EGPR) are built with diverse input variable sets selected by a random resampling strategy and partial mutual information (PMI) criterion. The resulting subspace training data set of each input variable set is divided into different local domains by the Gaussian mixture model (GMM) method to build multiple GPR models for the isolated local regions. The posterior probability of any new test sample with respect to different local models can be estimated by the Bayesian inference strategy and the local models are further combined by the finite mixture mechanism. To ensure the diversity and efficiency of input variable sets, PLS regression is used to select

Figure 1. Flow diagram of the SP-EGPR method.

Figure 2. Flow diagram of the proposed HEGPR soft sensor.

characterized by complex nonlinearity, and thus, the resulting linear models become ill-suited and perform poorly. Therefore, linear methods are often combined with kernel functions, such as kernel PLS and kernel PCA,14−16 to handle process nonlinearity. Apart from the multivariate statistical analysis, some machine learning techniques, such as artificial neural network (ANN),17,18 support vector machine (SVM),19−21 and Gaussian process regression (GPR),22−25 have also been successfully introduced to soft sensor applications. While a variety of soft sensors have been developed for quality prediction of chemical processes, many of them rely on a single global model to achieve a universal generalization performance.26,27 Such methods rely on an assumption that the operation mode is constant throughout the whole process. In practice, however, industrial chemical processes often exhibit multimode/multiphase behaviors due to the changes of operation conditions or product demands.28−30 Thus, the global models may lead to inaccurate predictions in local operation regions. Alternatively, local learning methods have attracted increasing interests in soft sensor development due to their superiority of dealing with strong nonlinearity in multiphase/multimode processes.31−34 Compared with the global modeling methods, local learning based soft sensors employ the philosophy of “divide and B

DOI: 10.1021/acs.iecr.6b00240 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Article

Industrial & Engineering Chemistry Research

Figure 3. Flow diagram of the Tennessee Eastman process.

where C denotes the N × N covariance matrix with the ijth element defined by a covariance function, Cij = C(xi, xj; θ). Various covariance functions have been proposed.23,47 In our work, the Matérn covariance function with the noise term is selected as shown in the following:

the SP-EGPR models that perform well and exhibit small redundancy. At the second level of ensemble, the local prediction means and variances obtained from multiple SP-EGPR models that perform well are integrated to produce the final prediction mean and variance by the Bayesian inference and finite mixture mechanism again. The exploitation of the hierarchical ensemble learning model makes the HEGPR soft sensing method especially useful to address the modeling problems of multimode/multiphase chemical processes due to its sample partition based ensemble. In addition, the HEGPR model is insensitive to the input variable selection due to the diverse input variable sets built by the variable partition strategy. Therefore, the proposed HEGPR soft sensor is potentially able to provide superior prediction accuracy over the conventional single level based ensemble methods. The rest of this paper is organized as follows. Section 2 briefly reviews GPR, PMI criterion, and GMM. The novel HEGPR soft sensing method is described in detail in Section 3. In Section 4, the effectiveness and superiority of HEGPR is demonstrated by comparing its performance for the Tennessee Eastman chemical process and industrial rubber-mixing process with those of the conventional soft sensor methods. Conclusions are drawn in Section 5.

⎛ C(x i , x j ; θ) = σf 2⎜⎜1 + ⎝

(3)

where δij = 1 if i = j, otherwise, δij = 0; θ = {σf , l, denotes a set of non-negative hyper-parameters that define the covariance function and differentiate GPR from other parametric regression methods. The main task of training a GPR model is to estimate the hyper-parameter set θ. By applying a Bayesian approach, θ can be obtained by maximizing the log-likelihood function given as 2

(4)

The optimal hyper-parameters can be calculated from the partial derivative of the log-likelihood with respect to each hyperparameter (θ) as follows: ∂[log p(y|X)] ∂C ⎞ 1 ∂C 1 ⎛ = − tr⎜C−1 ⎟ + y TC−1 C−1y ⎝ ⎠ ∂θ ∂θ ∂θ 2 2 (5) ∂C ∂θ

where can be obtained from the covariance function and tr(·) is an operator used to calculate the trace of a matrix. The GPR model is a flexible, probabilistic, and nonparametric model, that can provide not only the prediction result (the mean) but also the confidence level (the variance) to an operator. For a new test data x*, the output of the GPR model obeys a Gaussian distribution with the mean (ŷ*) and variance (σ*2), as follows:

(1)

where ε is the Gaussian noise with zero mean and variance σn2 and f(·) represents the unknown functional dependence. The Gaussian process for regression is defined such that the output observations obey a Gaussian prior distribution with zero mean that can be expressed as

y ∼ 5(0, C)

σn2}

1 1 N log p(y|X) = − y TC−1y − log|C| − log(2π ) 2 2 2

2. PRELIMINARIES In this section, GPR, PMI criterion, and GMM are briefly reviewed. 2.1. Gaussian Process Regression. Consider a training data set consisting of N data points {xi, yi}i N= 1, the regression model characterizing the functional dependence of the output variable on the input variables can be formulated as y = f (x) + ε

3 xi − xj ⎞ ⎛ 3 xi − xj ⎞ ⎟exp⎜− ⎟ + δijσn 2 ⎟ ⎜ ⎟ l l ⎠ ⎝ ⎠

(2) C

y ̂ = k TC−1y * *

(6)

σ 2 = C(x , x ) − k TC−1k * * * * *

(7) DOI: 10.1021/acs.iecr.6b00240 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Article

Industrial & Engineering Chemistry Research where k* = [C(x*, x1), C(x*, x2), ..., C(x*, xN)]T is the covariance vector between the new sample and the training samples. 2.2. Partial Mutual Information. The PMI method provides a measure of the partial dependence between the new input and the output given a set of preselected inputs. Compared with the mutual information criterion, the PMI method can more effectively handle input redundancy and improve the computational efficiency by removing unnecessary variables.48 Consider a set of pre-existing predictors Z, the PMI between the variables Y and X can be defined as ⎡ f

PMI(X , Y |Z) =

probability of x with respect to the ith component can be calculated by P(Θi|x) =

X′

(x′, y′) ⎤ ⎥ dx′dy′ y ( ′ ) ⎥⎦ Y′ (8)

where x′ = x − E[x|Z ];

y′ = y − E[y|Z ]

(9)

where E[·] denotes the expectation operator. The conditional expectations and the effect of the existing predictors Z ensure that the resulting variables x′ and y′ represent only the residual information on variables X and Y. The true PMI values between the variables are often unknown in practical applications. Alternatively, a sample estimate of the PMI score can be formulated as49 1 N

PMI(X , Y |Z) =

⎡ f (x ′, yi′) ⎤ X′,Y ′ i ⎥ ⎢⎣ fX ′ (xi′)fY ′ (yi′) ⎥⎦

N

∑ ln⎢ i=1

(10)

where xi′ = xi − E[xi|Z], yi′ = yi − E[yi|Z], xi′ and yi′ represent the residual components of the ith data pair sample, and i = 1, ..., N. f X′(x′i ), f Y′(y′i ), and f X′,Y′(x′i ,y′i ) are the marginal and joint probability densities, respectively. 2.3. Gaussian Mixture Models. GMM is an effective probabilistic approach for data clustering.50 GMM assumes that all data points follow a mixture of a finite number of Gaussian distributions with unknown parameters. The input data x ∈ 9 N × d can be assumed to follow a C-component Gaussian mixture distribution given by C

P(x|Θ) =

∑ πi 5(x|Θi) i=1

(11)

where C denotes the number of Gaussian components. Θ = {π1, ..., πC, μ1, ..., μC, Σ1, ..., ΣC} is the vector of the parameters of the Gaussian mixture model, and πi is the prior probability of the ith Gaussian component. The mean vector μi and covariance matrix Σi specify a multivariate Gaussian distribution 5(μi , Σi) and the corresponding probability density function is as follows: P(x|Θi) =

1 d

(2π ) det(Σi)

⎡ 1 ⎤ exp⎢ − (x − μi )T Σi−1(x − μi )⎥ ⎣ 2 ⎦ (12)

The mixing coefficients satisfy C

∑ πi = 1, i=1

0 ≤ πi ≤ 1

(14)

3. PROPOSED HEGPR SOFT SENSOR In this section, we develop the HEGPR soft sensor. Briefly, multiple SP-EGPR models are first built with diverse input variable sets as the base-level ensemble members and used as the inputs for the high-level ensemble of the model. The PMI criterion based input variable selection is presented in Section 3.1, followed by the SP-EGPR method in Section 3.2. The HEGPR soft sensing framework is presented in Section 3.3, and the parameter selection of the HEGPR model is discussed in Section 3.4. The implementation procedure of the HEGPR soft sensor is summarized in Section 3.5. 3.1. Input Variable Selection. The aim of input variable selection is to select the “best” subset of input variables, also called predictors, to ensure the estimation accuracy and reliability of a soft sensor. The unnecessary predictors may lead to performance deterioration due to introduction of noise or collinearity to the estimation. A soft sensor works satisfactorily and economically only if the most relevant variables are measured and employed. However, input variable selection remains challenging in practice. To address this issue, Ge proposed a way of constructing diverse input variable sets based on principal component decomposition.53 In the present work, the PMI criterion presented in Section 2.2 is used to select the best input variables for building an HEGPR soft sensor. The true PMI values between variables are usually unknown; thus, they have to be estimated from the available data. In this work, a k-nearest neighbor estimator is employed.54 Two critical parameters including the optimal number of neighbors (k) and PMI threshold are needed for the PMI based input variable selection. A small k results in a large variance in the estimation. A large k gives a small variance that may converge PMI estimations to zero and thus cannot distinguish the highly dependent variables from the irrelevant variables. Therefore, k is determined by K-fold resampling and a permutation test.55 The PMI threshold decides when the input variable selection stops and thus significantly influences the selection results. In the present work, the termination criterion is determined based on a confidence limit.56,57 First, p different arrangements of the independent variables are created by bootstrapping. The αth percentile PMI score between the randomized variable and output variable is calculated and used as the confidence limit to determine whether the input variable has a significant dependence on the output. A PMI score of the original input variable greater than the threshold indicates significant dependence between the candidate variable and the output variable under the condition of the preselected variables. In the present application, p = 100 and α = 95% are used. The PMI based selection procedure of input variables is briefly summarized as follows: Step 1: Initialize the input variables as S = {Xi}id= 1 and the set of selected input variables as Z = ⌀. Step 2: Compute the PMI values between the candidate input and output variables, given the selected input variables Z. Step 3: Sort the PMI values calculated from the previous step. If the highest PMI score is greater than the 95th percentile

∬ fX′,Y ′ (x′, y′)ln⎢⎢ f X′(,Yx′′)f ⎣

πi 5(x|μi , Σi) C ∑k = 1 πk 5(x|μk , Σk )

(13)

The GMM model parameters can be estimated from a modified expectation maximization (E-M) algorithm.51,52 With the finite mixture model being established, the posterior D

DOI: 10.1021/acs.iecr.6b00240 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Article

Industrial & Engineering Chemistry Research by the finite mixture mechanism, that is

confidence limit, add the corresponding input variable to the selected set, that is, set Z ← Z + {Xi} and S ← S − {Xi}. Step 4: Repeat Steps 2 and 3 until the highest PMI score is less than the 95th percentile confidence limit and finish the input variable selection. 3.2. SP-EGPR Modeling. Chemical processes are often characterized by strong nonlinearity, non-Gaussian behavior, and multiplicity of operation modes/phase. Therefore, the conventional global model may not effectively determine the characteristics of local processes, leading to poor predictions for quality variables. To tackle this problem, SP-EGPR is developed, where the local domains are built by GMM clustering and local models are combined by Bayesian inference and the finite mixture mechanism. SP-EGPR is then used for creating the first level of ensemble in the HEGPR modeling framework. 3.2.1. Identification of Local Domains. A complex chemical process with multiple operation modes or phases needs to be divided into different local domains for ensemble modeling at first. It is assumed that the process data within a local domain are in the same mode/phase and have similar prediction characteristics. The multimode/multiphase process is then partitioned into local domains by a GMM model, which are defined by a set of local training data sets {X(1), y(1)}, {X(2), y(2)}, ..., {X(C), y(C)}. 3.2.2. Construction of Local GPR Models. The local GPR models can then be built with the local domain data sets denoted as 4 i = {X(i), y(i)}; i = 1, 2, ..., C . The local GPR model is a probabilistic model, and its output is a predictive Gaussian distribution specified by a mean and a variance. The mean is the required prediction, and the variance can be interpreted as the confidence level of the model.58 For any new test sample x*, the localized prediction output ̂ of the ith local model GPRi follows a Gaussian distribution: ŷ*,i

C

y ̂ = E (y ) = * *

where ŷ*,i is the prediction value of the ith local model and P(4 i|x ) is the corresponding posterior probability. * Similarly, the mixture variance σ*2 is calculated as follows: C

σ 2 = Var(y ) = E(y − E(y ̂ ))2 = * * * *

i=1

Eq 20 can be rewritten as C

σ 2= *

∑ P(4i|x*)E(y* − y*̂ ,i )2 + E(y*̂ ,i i=1

− y ̂ )2 *

+ E(2(y − y ̂ , i )(y ̂ , i − y ̂ )) (21) * * * * It can be concluded from eq 17 that the posterior probability satisfies C

∑ P(4i|x*) = 1

(22)

i=1

Thus, C

∑ P(4i|x*)E(y* − y*̂ ,i )2 = σ*,i 2 i=1

(23)

C

∑ P(4i|x*)E(y*̂ ,i i=1

− y ̂ )2 = (y ̂ , i − y ̂ )2 * * *

∑ P(4i|x*)E(2(y* − y*̂ ,i )(y*̂ ,i

(15)

i=1

(24)

i = 1, 2, ..., C

(25)

C

σ 2= *

3.2.3. Mixture of Local GPR Models. For a new test sample, a set of local prediction means and variances can be obtained from different local models GPRi, i = 1, 2, ..., C. These local predictions are combined to provide the final output by Bayesian inference and finite mixture mechanism.26 During the online operation, the posterior probabilities of x* with respect to different local domains 4 i(i = 1, 2, ..., C) can be estimated by Bayesian inference strategy as follows: P(x |4 i)P(4 i) P(x |4 )P(4 i) * , = C * i P(x ) ∑ P(x |4k)P(4k) k=1 * * i = 1, 2, ..., C

− y ̂ )) = 0 *

where σ*,i2 is the prediction variance calculated from the ith local GPR model. Finally, the overall variance is given as

(16)

∑ P(4i|x*)[σ*,i 2 + (y*̂ ,i i=1

− y ̂ )2 ] *

(26)

3.3. HEGPR Modeling. The overall HEGPR modeling consists of three critical steps including (i) creating multiple input variable sets; (ii) building local SP-EGPR models; (iii) combining local SP-EGPR models. 3.3.1. Creating Multiple Input Variable Sets. The ensemble at the first level of HEGPR can be easily realized by the SP-EGPR method. The ensemble at the second level is achieved by manipulating input variables to create the diversity of input variable selection, which enables one to construct a hierarchical model structure. Consider an initial training data set with N observations D = {xi, yi}, i = 1, ..., N, where xi consists of a large number of potential input variables, two steps are required to produce the essential NIVS input variable sets (IVS). First, NIVS subsets of training data D1, ..., DNIVS, each of which contains NRS samples, are built by random resampling. Each data point is selected with a probability of 1/N. NIVS different input variable sets are obtained from the NIVS resampled data sets by the PMI criterion.

P(4 i|x ) = *

(17)

where P(x |4 i) is the conditional probability calculated from * eq 12, and P(4 i) is the prior probability i = 1, 2, ..., C

∑ P(4i|x*)E(y* − y*̂ )2 (20)

Thus, the localized GPR models can be formulated as

P(4 i) = πi ,

(19)

i=1

C

y ̂ , i ∼ 5(E(y ̂ , i ), Var(y , i )) * * * ⎧ y ̂ , i = E(y , i ) = k T, iC−1y ⎪ * * * , GPR i: ⎨ ⎪ σ , i 2 = Var(y ) = C(x , x ) − k T, iC−1k , i ⎩ * * * * * *, i

∑ P(4i|x*)y*̂ ,i

(18)

The prediction output ŷ* of a SP-EGPR model for a new sample x* is given as a weighted sum of local prediction outputs E

DOI: 10.1021/acs.iecr.6b00240 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Article

Industrial & Engineering Chemistry Research Table 1. Input and Output Variables of the Soft Sensor for Quality Prediction symbol

variable description

symbol

variable description

x1 x2 x3 x4 x5 x6

x18 x19 x20 x21 x22 x23 x24

reactor temperature purge rate (stream 9) product separator temperature product separator level product separator pressure product separator underflow (stream 10) stripper level

x8 x9 x10 x11 x12 x13

D feed flow (stream 2) E feed flow (stream 3) A feed flow (stream 1) A and C feed flow (stream 4) purge valve (stream 9) separator pot liquid flow (stream 10) stripper liquid product flow (stream 11) reactor cooling water flow condenser cooling water flow A feed (stream 1) D feed (stream 2) E feed (stream 3) A and C feed (stream 4)

x14

recycle flow (stream 8)

x31

x15

reactor feed rate (stream 6)

y1

x16

reactor pressure

y2

x17

reactor level

y3

x7

x25 x26 x27 x28 x29 x30

stripper pressure stripper underflow (stream 11) stripper temperature stripper seam flow compressor work reactor cooling water outlet temperature separator cooling water outlet temperature E component measurement (stream 9) B component measurement (stream 9) C component measurement (stream 6)

Table 2. Operation Modes of the TE Process mode

G/H mass ratio

production rate (m3/h)

1 2 3 4 5

54/46 40/60 40/60 54/46 46/54

22.89 22.89 19.45 19.45 22.89

Table 3. Selected Variables from the Whole Training Data by PMI Criterion output

selected input variables

y1

x1, x2, x3, x5, x6, x9, x11, x12, x14, x15, x16, x17, x18, x19, x21, x22, x24, x27, x29, x31 x1, x2, x3, x11, x12, x14, x15, x17, x19, x21, x22, x24, x29, x31 x2, x3, x5, x11, x14, x15, x16, x17, x18, x21, x23, x24, x29

y2 y3

Figure 4. PMI values of the selected variables and their corresponding 95th percentile randomized sample PMI scores from the whole training data for quality variables y1, y2, and y3.

results of NIVS local SP-EGPR models are used as the input data and the actual output samples of the target variable are used as the output data, which are denoted as

Unlike conventional soft sensors developed only from a single input variable set, the HEGPR method employs a set of diverse combinations of input variables. Therefore, it is able to reduce the dependence and sensitivity of the prediction performance to the input variable selection and thus potentially improve the prediction accuracy. 3.3.2. Building Local SP-EGPR Models. Once multiple input variable sets {31, ..., 3 NIVS} are constructed, NIVS subspace data sets with the corresponding input variables can then be obtained. These variable partition based input data sets can be denoted as {X1, ..., XNIVS} with Xi{i = 1, ..., NIVS} corresponding to the input variables in 3 i . NIVS local SP-EGPR models can be developed with the local data sets {Xi, y}, i = 1, ... NIVS. Another issue of this step is how to remove local SP-EGPR models with redundancy and/or low performance. Actually, a good ensemble model should build independent subspace models which are not only accurate but also diverse. Therefore, PLS analysis is used for the pruning of local SP-EGPR models in this work. To build the PLS regression model, the prediction

XPLS = [y ̂ ,1 , ..., y ̂ , N ] * * IVS

(27)

y = [y1 , ..., yN ]T

(28)

where XPLS ∈ 9 N × NIVS and y ∈ 9 N × 1 are the input and output matrices, respectively. The ith input vector is ŷ*,i = [ŷ*,i1, ŷ*,i2, ..., ŷ*,iN]T, where ŷ*,ij is the prediction value of the ith SP-EGPR model for the jth training sample. A linear regression model between XPLS and y can be obtained by the PLS algorithm as follows: y ̂ = β0 + β1y ̂ ,1 + β2y ̂ ,2 + ··· + βi y ̂ , i + ··· + βN y ̂ , N IVS * IVS * * * * (29)

where ŷ*,i is the prediction output of the ith SP-EGPR model and βi is the regression coefficient indicating the importance of the corresponding local SP-EGPR model. To determine which F

DOI: 10.1021/acs.iecr.6b00240 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Article

Industrial & Engineering Chemistry Research

Figure 5. Selected variables for each input variable set in the HEGPR soft sensor of quality variable y1 (NRS = 800, NIVS = 40).

Table 4. Quantitative Comparison of Prediction RMSE and R2 from Different Soft Sensor Modeling Methods y1

y2 2

method

RMSE

R

GPR SP-EGPR VP-EGPR HEGPR1 HEGPR2 HEGPR

0.3452 0.3367 0.3123 0.3658 0.3366 0.2964

0.9857 0.9864 0.9883 0.9838 0.9864 0.9895

RMSE

R

0.2693 0.2434 0.1942 0.2475 0.2211 0.1832

0.9760 0.9804 0.9875 0.9797 0.9838 0.9889

M

∑i = 1 |βs, i| N

∑k =IVS1 |βk |

RMSE

R2

0.5587 0.5534 0.5229 0.4669 0.4962 0.4843

0.9589 0.9596 0.9640 0.9713 0.9675 0.9691

3.3.3. Combining Local SP-EGPR Models. A set of local SP-EGPR models for different input variable sets can be constructed during the training phase as described above. The real prediction of a new test sample is unknown. Therefore, the optimization of suitable models and their combination is important. Two strategies including the deterministic and ensemble methods are available to give the final prediction. The former chooses only one best local model. It performs well in handling multimode/multiphase operations. However, it may not be able to efficiently characterize the between-mode/ between-phase process dynamics.27 Therefore, the ensemble method has attracted growing attention in practical applications.14,59,60 In addition, recent studies have shown that the ensemble of a few individual local models, also called the selective ensemble, may have better generalization capability than simply combining all available local models.1,61 Thus, the selective ensemble strategy is used in the present work for the ensemble at the second level of HEGPR. In what follows, the combination of local SP-EGPR models is described in the Bayesian framework. Given a query sample x*,

model is retained for the ensemble at the second level of HEGPR, a contribution ratio (CR) is introduced on the basis of the regression coefficients in the present work. The absolute values of the regression coefficients β are sorted in a descending order of βs = [|βs,1|, |βs,2|, ..., |βs,NIVS|], where |βs,1| and |βs,NIVS| are the maximum and minimal values, respectively. Then, the CR value can be calculated as CR =

y3 2

(30)

where M denotes the final number of local SP-EGPR models used in the HEGPR model. A larger CR value indicates that more local models are retained. The presence of unimportant and/or redundant local models may lead to performance degradation instead of improvement and increase the computational complexity of the HEGPR model. Thus, an appropriate CR threshold, for example, 95%, is required to enable automatic selection of local SP-EGPR models. G

DOI: 10.1021/acs.iecr.6b00240 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Article

Industrial & Engineering Chemistry Research

Figure 6. Trends of the actual values, prediction values, confidence intervals, and prediction errors of quality variable y1 using the HEGPR soft sensor (NRS = 800, NIVS = 40).

P(IVSi) of each local model is assumed to be equal:

M pairs of local prediction means and variances are estimated with the corresponding local SP-EGPR models. Then, Me local predictions with low prediction variances are selected for the ensemble during online operation. The mixture predications are provided by Bayesian inference and a finite mixture mechanism as follows:

P(IVSi ) =

∑ P(IVSi|x*)y*̂ ,i

⎛ σ 2⎞ ,i P(x |IVSk ) = exp⎜⎜ −γ * ⎟⎟ * y ̂ ⎝ *, i ⎠

σ 2= *

(31)

i=1

Me

∑ P(IVSi|x*)[σ*,i 2 + (y*̂ ,i i=1

− y ̂ )2 ] *

(35)

where γ ≥ 0 is a tuning parameter that controls the weights. 3.4. Parameter Selection of the HEGPR Soft Sensor. Several critical parameters can significantly influence the prediction accuracy as well as the model complexity of the HEGPR soft sensor and, thus, need to be carefully selected. These parameters include (i) the number of the initial input variable sets (NIVS); (ii) the number of samples within a random resampling data set (NRS); (iii) the contribution ratio (CR); (iv) the number of latent variables (LV) for PLS regression; (v) the tuning parameter (γ) in eq 35; (vi) the number of local SP-EGPR models involved in the ensemble of each prediction (Me). Among these parameters, NIVS is determined by trial and error, CR is set to 95%, and LV is selected by 5-fold cross validation with the prediction results of various SP-EGPR models as the training data for the PLS regression. The remaining

(32)

where P(IVSi|x*) is the posterior probability of x* with respect to the ith local SP-EGPR model resulting from the ith input variable set and ŷ*,i and σ*,i2 are the predicted output and variance of the ith local SP-EGPR model. By the Bayes’ rule, P(IVSi|x*) is given as P(x |IVSi )P(IVSi ) P(IVSi |x ) = M * * ∑k =e 1 P(x |IVSk )P(IVSk ) *

(34)

P(x*|IVSk) is then evaluated from the prediction variance as follows:

Me

ŷ = *

1 , i = 1, 2, ..., Me Me

(33)

where P(x*|IVSk) is the conditional probability and P(IVSi) is the prior probability. Without any process or expert knowledge, H

DOI: 10.1021/acs.iecr.6b00240 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Article

Industrial & Engineering Chemistry Research

Figure 7. Trends of the actual values, prediction values, confidence intervals, and prediction errors of quality variable y2 using the HEGPR soft sensor (NRS = 600, NIVS = 40).

parameters including NRS, γ, and Me can be optimized by 5-fold cross validation on the whole training data. 3.5. Implementation of the HEGPR Soft Sensor. Figure 1 shows the flow diagram of the SP-EGPR method used to create the base-level ensemble models of the HEGPR soft sensor. Figure 2 shows the flow diagram of the HEGPR modeling. The detailed step-by-step procedure of the HEGPR soft sensor is summarized below: (i) Collect the process input and output data for model training. (ii) Create NIVS sets of training data by the random resampling method. (iii) Build NIVS input variable sets using the PMI criterion and then construct NIVS sets of subspace data sets with different input variables. (iv) Estimate the GMM models of each input variable set from the isolated subspace data set to divide the process into local domains. (v) Build the localized GPR models of each identified local domain from Step (iv). (vi) Develop NIVS SP-EGPR models based on the established input variable sets and conduct the first level ensemble of the HEGPR model. (vii) Select M local SP-EGPR models showing good performances and with small redundancy by the PLS regression method.

(viii) Estimate the local prediction means and variances of a new test sample by the corresponding SP-EGPR models. (ix) Select Me best local SP-EGPR models resulting in low prediction variances and combine the corresponding local prediction means and variances by Bayesian inference and the finite mixture mechanism.

4. APPLICATION EXAMPLE The performance of the proposed HEGPR soft sensing method is verified with two application examples by two performance indices including root-mean-square error (RMSE) and coefficient of determination (R2): RMSE =

1 Ntest

Ntest

∑ (yi ̂ − yi )2 i=1

(36)

N

2

R =1−

∑i =test1 (yi ̂ − yi )2 N

∑i =test1 (yi − y ̅ )2

(37)

where Ntest represents the number of testing samples, ŷi and yi are the ith predicted and actual measurements, respectively, and y ̅ denotes the mean value. RMSE indicates the prediction accuracy of the soft sensor model, and R2 provides total variance in the output data that can be explained by the soft sensor model. I

DOI: 10.1021/acs.iecr.6b00240 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Article

Industrial & Engineering Chemistry Research

Figure 8. Trends of the actual values, prediction values, confidence intervals, and prediction errors of quality variable y3 using the HEGPR soft sensor (NRS = 800, NIVS = 40).

The methods involved in the following investigation include: (i) GPR (Gaussian process regression): a global GPR model. (ii) SP-EGPR (sample partition based ensemble GPR model): a single-level ensemble model constructed by manipulating training samples. (iii) VP-EGPR (variable partition based ensemble GPR model): a single-level ensemble model constructed by manipulating input variables. (iv) HEGPR1 (hierarchical ensemble of GPR models): a twolevel ensemble model constructed by principal component decomposition presented by Ge53 and GMM. (v) HEGPR2 (hierarchical ensemble of GPR models): a twolevel ensemble model constructed by manipulating both training samples and input variables with PLS’s fixed regression coefficient. (vi) HEGPR (hierarchical ensemble of GPR models): a twolevel ensemble model constructed by manipulating both training samples and input variables proposed in this paper. The configuration of the computer is listed as follows: OS, Windows XP (32 bit); CPU, Pentium(R) Dual-Core E6600 (3.06 GHz × 2); RAM, 1.96G byte; Matlab 2010a is used. 4.1. Tennessee Eastman Process. 4.1.1. Process Description. The Tennessee Eastman (TE) process is a simulation

Table 5. Effects of the Number of Random Resampling Samples (NRS) and the Number of Initial Input Variable Sets (NIVS) on Prediction Results of Target Variable y1a prediction results (γ = 1) NRS

NIVS

M (CR = 95%)

Me

RMSE

R2

600

10 20 30 40 50 10 20 30 40 50 10 20 30 40 50

9 17 22 31 40 7 15 22 31 39 9 16 23 32 38

8 4 6 3 11 3 7 4 6 7 6 6 12 5 13

0.3281 0.3153 0.3111 0.3013 0.3104 0.3191 0.3191 0.2985 0.2964 0.3156 0.3166 0.3125 0.3103 0.3121 0.3080

0.9871 0.9881 0.9884 0.9891 0.9884 0.9878 0.9878 0.9893 0.9895 0.9881 0.9880 0.9883 0.9885 0.9883 0.9886

800

1000

a M denotes the number of local SP-EGPR models in the HEGPR model after PLS based pruning. Me denotes the number of local SPEGPR models involved in the ensemble of each prediction. CR denotes the contribution ratio for PLS running of local SP-EGPR models. γ is a tuning parameter in eq 35.

J

DOI: 10.1021/acs.iecr.6b00240 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Article

Industrial & Engineering Chemistry Research

sets are acquired at a sampling interval of 12 min in 5 different operation modes due to the changes in the G/H mass ratio and stripper underflow (production rate) (Table 2). Other set points during the simulation are held at the system default values. The training and test data set consist of 1350 and 451 samples, respectively. 4.1.2. Prediction Results and Discussion. To build different soft sensors, input variables are first selected by the PMI criterion. The GPR and SP-EGPR models employ a single input variable set selected from the whole training data, as tabulated in Table 3. Figure 4 shows the PMI values of the selected variables and the corresponding 95th percentile PMI scores from the whole training data. As can be seen, the PMI value of the selected variable becomes closer to the 95th percentile PMI score with the proceeding of the input variable selection. This indicates that the dependence between the remaining candidate variables and the output variable gradually decline with the addition of the preselected variables. The variable selection procedure stops only when the PMI value is lower than the 95th percentile PMI score. Unlike the GPR and SP-EGPR models, VP-EGPR and HEGPR apply a set of diverse input variable sets selected by random resampling and the PMI criterion. The diversity of input variables can be assured due to the random resampling of training data while PLS pruning is used to remove unimportant and redundant input variable sets. For example, the input variable sets for online prediction of quality variable y1 are illustrated in Figure 5. As can be seen, the input variables from different local training subsets are different from each other, implying the sensitivity of variable selection to the training samples. To test the prediction capabilities of the proposed HEGPR soft sensor, five operation modes are involved as presented in Table 2. A quantitative comparison of the prediction performance from different soft sensors is listed in Table 4. Clearly, the GPR model gives the worst prediction performance due to its global model structure. The single-level ensemble models, i.e., SP-EGPR and VP-EGPR, further improve the estimation accuracy by manipulating training samples and input variables to construct local models, respectively. As shown, the HEGPR1 soft sensor based on the principal component decomposition for the block division method gives better prediction performance for quality variable y3, but for quality variables y1 and y2, the prediction performance is very poor. Similarly, though the prediction result of the compared HEGPR2 is better than the global model (GPR) and the single-level ensemble models (SP-EGPR and VP-EGPR), the redetermined weight is better than PLS’s fixed regression coefficient. The result shows that, though, to some extent, the PLS regression can ensure the diversity and efficiency of input variable sets, there are still some poor local predictions for different test samples. In comparison, the HEGPR soft sensor proposed in this paper performs much better than the global model (GPR), the singlelevel ensemble models (SP-EGPR and VP-EGPR), and the other two hierarchical ensemble models. The improved prediction performance for HEGPR is due to its hierarchical ensemble strategy, which manipulates both training samples and input variables. On one hand, the sample partition (i.e., manipulation of samples) based ensemble can effectively handle the multimode/multiphase characteristics of chemical processes and thus provides more accurate predictions. On the other hand, the variable partition (i.e., manipulation of input variables) ensemble allows one to reduce the sensitivity of the

Figure 9. Selected local SP-EGPR models for each run of ensemble prediction using the HEGPR soft sensor during the online prediction of quality variable y1 (NIVS = 40, NRS = 800).

Figure 10. Prediction RMSE values of quality variable y1 using the HEGPR soft sensor before and after PLS pruning (NIVS = 40, CR = 95%).

of an industrial chemical process.62 The TE process has been widely used to develop, study, and evaluate algorithms for quality prediction and process monitoring and control.51,63,64 Figure 3 shows the flow diagram of the TE process. The process consists of five major operation units including an exothermic two-phase reactor, a product condenser, a vapor− liquid separator, a recycle compressor, and a product stripper. Two products (G and H) and one byproduct are produced via four irreversible and exothermic reactions with four reactants (A, C, D, and E). Among the 41 measurements of the process, 22 variables are continuous measurements and the rest are the chemical compositions from gas chromatography. Besides, there are 12 manipulated variables in the process. In the present study, 9 manipulated variables and 22 continuous measurements are selected as the initial inputs. The remaining 3 manipulated variables are removed because they remain constant. Three components, denoted as B, C, and E, are chosen as the outputs for soft sensor development. All the variables are tabulated in Table 1. The input and output data K

DOI: 10.1021/acs.iecr.6b00240 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Article

Industrial & Engineering Chemistry Research

Figure 11. Prediction RMSE values of local SP-EGPR models in the HEGPR soft sensor (NIVS = 40 for all 3 quality variables, and NRS = 800 for y1 and y3, NRS = 600 for y2). The red line represents the RMSE value of HEGPR, and the blue one represents the RMSE values of local SP-EGPR models. M is the number of local SP-EGPR models in the HEGPR model after PLS pruning.

input variables sets NIVS) on the prediction performance of HEGPR are investigated, as listed in Table 5. Given different NRS and NIVS, the number of local SP-EGPR models M in the HEGPR model are automatically determined after PLS pruning with CR = 95%, and then, γ and Me are selected by 5-fold cross validation. It can be seen that the model performance of HEGPR depends on both NRS and NIVS. The lowest RMSE and highest R2 are obtained with NRS = 800 and NIVS = 40. NRS should be neither too small nor too large. Small NRS may lead to the poor performance of local SP-EGPR models due to the insufficiency of training information for the PMI based variable selection. Conversely, large NRS cannot effectively ensure the diversity of input variable sets, which is not desirable for the ensemble learning of HEGPR to enhance the prediction accuracy. With regard to the creation of diverse input variable sets, a relatively large NIVS is preferable. On arrival of the online operation phase, only Me (≤M) best local predictions are selected for each run of ensemble

prediction performance to the input variable selection results, which further enhances the prediction accuracy of HEGPR. More intuitively, the prediction results of quality variables y1, y2, and y3 using HEGPR are depicted in Figures 6, 7, and 8, respectively. Though the entire process consists of 5 operation modes, the prediction values of y1, y2, and y3 still can coincide well with the actual measurements, indicating high prediction accuracy. These accurate prediction results confirm the effectiveness of the proposed HEGPR approach in modeling the switched dynamics due to the operating mode changes and in accounting for process uncertainty. As well as providing accurate online estimations, one additional advantage of HEGPR is its ability to provide the confidence interval indicated by the prediction variance, which is especially useful for plant operators to evaluate the reliability of online quality variable predictions. In what follows, the effects of the model parameters (i.e., the number of random resampling samples NRS and the number of L

DOI: 10.1021/acs.iecr.6b00240 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Article

Industrial & Engineering Chemistry Research

rubber-mixing process works in different shifting modes to produce various type of tires. Being the first step of manufacturing tires, the quality of the mixed rubber is crucial. Different kinds of additional agents are mixed during the rubber mixing process; then, various complex chemical reactions occur. Five input variables are selected: temperature in the mixer chamber, motor power, motor speed, ram pressure, and energy, which are listed in Table 6. These input variables and their delay variables

prediction instead of combining all local prediction results, as shown in Table 5. Take the online prediction of quality variable y1 as an example, the selection results of local SP-EGPR models for the second level of ensemble of HEGPR are illustrated in Figure 9. It can be seen that different local SP-EGPR models are integrated in the HEGPR model for estimation of different query points, which in turn reveals that the process is dynamic and changing, and thus, the HEGPR model needs to be adjusted by mixing different local SP-EGPR models. Additionally, it is worth emphasizing that not all initial input variable sets are appropriate for creating high-performance local SP-EGPR models to build the HEGPR model due to poor performance and/or redundancy. Therefore, PLS pruning is employed to select M most influential local SP-EGPR models from NIVS initial local models. Figure 10 shows the prediction RMSE values of component E in stream 9 (y1) using the HEGPR soft sensor before and after PLS pruning. The improved prediction accuracy of HEGPR after PLS pruning reveals that the model selection is useful to improve the prediction performance by discarding the poor and redundant local SP-EGPR models. Apart from the overall prediction performance of HEGPR, it is also interesting to know the performance of each local SP-EGPR model when being applied independently. The prediction RMSE values of local SP-EGPR models in the HEGPR model are illustrated in Figure 11. As shown in Figure 11a,b, the individual SP-EGPR models show a poor performance, whereas the prediction accuracy of HEGPR is significantly improved by combining the predictions of local SP-EGPR models. In comparison, Figure 11c shows that some local SPGPR models deliver similar and even better performance as the HEGPR model. In the first instance, this may invoke the idea that the sole local SP-EGPR model could replace the more complex ensemble model, i.e., HEGPR. However, considering the high variance of the predictions of local SP-EGPR models, in practice, it is very difficult to identify the local model that can perform best on the query data. Therefore, by applying the ensemble strategy, the proposed HEGPR method can effectively handle the strong variance of local predictions and thus achieve performance very close to the best local SP-EGPR model. 4.2. Industrial Rubber-Mixing Process. 4.2.1. Process Description. Another case study is a real industrial batch process, rubber-mixing process, which is in a tire company located in the east of China. The industrial mixers of the rubber-mixing process from the company are shown in Figure 12. The industrial

Table 6. Measurements during the Rubber Mixing Process symbol

variable description

x1 x2 x3 x4 x5

temperature in mixer chamber motor power motor speed ram pressure energy

are the variables for soft sensor development. The Mooney viscosity is selected as the output variable. Though the final quality of the rubber products highly depends on the quality of the mixed rubber and the Mooney viscosity reflects the degree of polymerization and the molecular weight of the mixed rubber, the Mooney viscosity cannot be measured online in real-time. The rubber mixing process is a batch process which continues for about 4 h. The Mooney viscosity value is obtained by a viscometer at the end of one batch. The long offline analysis delay brings difficulties to control the quality of mixed rubber. Therefore, it is highly desirable to develop an effective soft sensor for the prediction of Mooney viscosity. 4.2.2. Prediction Results and Discussion. The prediction result of Mooney viscosity is obtained from the soft sensor which is developed from the real industrial data. A quantitative comparison of the prediction performance of Mooney viscosity from different soft sensors is listed in Table 7. Table 7. Quantitative Comparison of Prediction RMSE and R2 from Different Soft Sensor Modeling Methods method

RMSE

R2

GPR SP-EGPR VP-EGPR HEGPR

5.4576 4.4036 3.8504 3.0627

0.8654 0.9124 0.9330 0.9576

More straightforward, the prediction results of Mooney viscosity from the HEGPR soft sensor are shown in Figure 13. The table displays that, similar to the application of the TE process, the prediction result of the HEGPR soft sensor is superior to GPR, SP-EGPR, and VP-EGPR. Moreover, the prediction values of Mooney viscosity can coincide well with the actual measurements. Therefore, the proposed HGEPR soft sensor is also applicable to these real-life data sets. The above application results confirm the superiority of HEGPR over other conventional methods in predicating the quality variables of nonlinear and non-Gaussian chemical processes. Finally, the real-time performance is important for a prediction model. Though the proposed HEGPR soft sensor is constructed offline, the computational prediction cost is much higher than the other three comparative soft sensors. However, the improvement of prediction performance is at the cost of increasing the complexity of the model. The computational cost

Figure 12. Industrial rubber mixers of the tire company. M

DOI: 10.1021/acs.iecr.6b00240 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Article

Industrial & Engineering Chemistry Research

Figure 13. Trends of the actual values, prediction values, confidence intervals, and prediction errors of Mooney viscosity using the HEGPR soft sensor (NRS = 800, NIVS = 40).

by the proposed approach, thus providing accurate online predictions of quality variables. In addition to soft sensor applications, the proposed HEGPR method can be potentially extended to address different modeling problems in nonlinear chemical and biological systems. Future work can be focused on the development of the hierarchical ensemble learning based adaptive soft sensors.

of the proposed method is closely related to the number of local models. The average CPU time for one query of data of the HEGPR soft sensor in rubber-mixing process is 1.5345 s, which is much lower than the analysis interval and can meet the request of real-time prediction for the chemical industry. Weighing the computational burden and the prediction accuracy, the proposed HEGPR method is acceptable. The continuous and timely measurements of Mooney viscosity can further enable the advanced control and monitoring.



ASSOCIATED CONTENT

* Supporting Information S

5. CONCLUSIONS In the present work, a novel soft sensor named HEGPR is proposed for the quality prediction of nonlinear and nonGaussian chemical processes. First, a set of input variable sets are obtained by a random resampling strategy and PMI criterion. A set of SP-EGPR models are then developed with each input variable set by GMM and GPR methods. During the online phase, the target quality variable prediction is achieved by the hierarchical ensemble learning at two levels involving manipulation of both training samples and input variables, which can significantly improve the prediction accuracy and reliability. Due to the two-level ensemble model structure, HEGPR allows one to deal with multiple operation modes and shifting process dynamics as well as reduce the sensitivity of the prediction performance to the input variable selection results. The proposed HEGPR soft sensor is applied to the TE chemical process with 5 operation modes being considered and a real industrial rubber-mixing process with different shifting modes. The application results show that the HEGPR approach delivers better prediction performances than the conventional global and ensemble models. The shifting dynamics, multiple operation modes, and process uncertainty can be well captured

The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.iecr.6b00240. A further illustration of the industrial rubber-mixing process is shown in the Figure S1. It is the diagram of the rubber-mixing process. (TIF)



AUTHOR INFORMATION

Corresponding Authors

*Tel.: +86 13601333018. Fax: +86 010 68914662. E-mail: [email protected] (H.J.). *Tel.: +86 13601333018. Fax: +86 010 68914662. E-mail: [email protected] (X.C.). Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS This work is financially supported by the International Science & Technology Cooperation Program of China (No. 2014DFR61080). N

DOI: 10.1021/acs.iecr.6b00240 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Article

Industrial & Engineering Chemistry Research



(24) Pani, A. K.; Mohanta, H. K. A survey of data treatment techniques for soft sensor design. Chem. Prod. Process Model. 2011, 6 (1), 1934−2659, DOI: 10.2202/1934-2659.1536. (25) Liu, Y.; Chen, T.; Chen, J. Auto-Switch Gaussian Process Regression-Based Probabilistic Soft Sensors for Industrial Multigrade Processes with Transitions. Ind. Eng. Chem. Res. 2015, 54 (18), 5037− 5047. (26) Jin, H.; Chen, X.; Wang, L.; Yang, K.; Wu, L. Adaptive soft sensor development based on online ensemble Gaussian process regression for nonlinear time-varying batch processes. Ind. Eng. Chem. Res. 2015, 54 (30), 7320−7345. (27) Jin, H.; Chen, X.; Yang, J.; Zhang, H.; Wang, L.; Wu, L. Multimodel adaptive soft sensor modeling method using local learning and online support vector regression for nonlinear time-variant batch processes. Chem. Eng. Sci. 2015, 131, 282−303. (28) Ge, Z.; Song, Z.; Wang, P. Probabilistic combination of local independent component regression model for multimode quality prediction in chemical processes. Chem. Eng. Res. Des. 2014, 92 (3), 509−521. (29) Soares, S. G.; Araújo, R. An on-line weighted ensemble of regressor models to handle concept drifts. Engineering Applications of Artificial Intelligence 2015, 37, 392−406. (30) Soares, S. G.; Araújo, R. A dynamic and on-line ensemble regression for changing environments. Expert Systems with Applications 2015, 42 (6), 2935−2948. (31) Kaneko, H.; Funatsu, K. Adaptive soft sensor based on online support vector regression and Bayesian ensemble learning for various states in chemical plants. Chemom. Intell. Lab. Syst. 2014, 137, 57−66. (32) Ni, W.; Brown, S. D.; Man, R. A localized adaptive soft sensor for dynamic system modeling. Chem. Eng. Sci. 2014, 111, 350−363. (33) Khatibisepehr, S.; Huang, B.; Khare, S.; et al. A probabilistic framework for real-time performance assessment of inferential sensors. Control Engineering Practice 2014, 26, 136−150. (34) Corona, F.; Mulas, M.; Haimi, H.; Sundell, L.; Heinonen, M.; Vahala, R. Monitoring nitrate concentrations in the denitrifying postfiltration unit of a municipal wastewater treatment plant. J. Process Control 2013, 23 (2), 158−170. (35) Liu, Y.; Gao, Z. Industrial melt index prediction with the ensemble anti-outlier just-in-time Gaussian process regression modeling method. J. Appl. Polym. Sci. 2015, 132 (22), 41958. (36) Ni, W.; Tan, S. K.; Ng, W. J.; Brown, S. D. Localized, adaptive recursive partial least squares regression for dynamic system modeling. Ind. Eng. Chem. Res. 2012, 51 (23), 8025−8039. (37) Shao, W.; Tian, X.; Wang, P. Soft sensor development for nonlinear and time-varying processes based on supervised ensemble learning with improved process state partition. Asia-Pac. J. Chem. Eng. 2015, 10 (2), 282−296. (38) Ge, Z.; Song, Z. Ensemble independent component regression models and soft sensing application. Chemom. Intell. Lab. Syst. 2014, 130, 115−122. (39) Dietterich, T. G. Ensemble Methods in Machine Learning. In Multiple Classifier Systems, Proceedings; Springer: Berlin, Heidelberg, London, 2000; pp 1−15. (40) Ge, Z.; Song, Z. Bagging support vector data description model for batch process monitoring. J. Process Control 2013, 23 (8), 1090− 1096. (41) Breiman, L. Bagging predictors. Machine Learning 1996, 24 (2), 123−140. (42) Cao, D.-S.; Xu, Q.-S.; Liang, Y.-Z.; Zhang, L.-X.; Li, H.-D. The boosting: A new idea of building models. Chemom. Intell. Lab. Syst. 2010, 100 (1), 1−11. (43) Briem, G. J.; Benediktsson, J. A.; Sveinsson, J. R. Multiple classifiers applied to multisource remote sensing data. IEEE Transactions on Geoscience and Remote Sensing 2002, 40 (10), 2291−2299. (44) Tao, D.; Tang, X.; Li, X.; Wu, X. Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence 2006, 28 (7), 1088−1099.

REFERENCES

(1) Shao, W.; Tian, X. Adaptive soft sensor for quality prediction of chemical processes based on selective ensemble of local partial least squares models. Chem. Eng. Res. Des. 2015, 95, 113−132. (2) Kadlec, P.; Gabrys, B.; Strandt, S. Data-driven soft sensors in the process industry. Comput. Chem. Eng. 2009, 33 (4), 795−814. (3) Kano, M.; Fujiwara, K. Virtual sensing technology in process industries: trends and challenges revealed by recent industrial applications. J. Chem. Eng. Jpn. 2013, 46 (1), 1−17. (4) Liu, Y.; Gao, Z.; Li, P.; Wang, H. Just-in-time kernel learning with adaptive parameter selection for soft sensor modeling of batch processes. Ind. Eng. Chem. Res. 2012, 51 (11), 4313−4327. (5) Slišković, D.; Grbić, R.; Hocenski, Ž . Methods for plant databased process modeling in soft-sensor development. AUTOMATIKA: časopis za automatiku, mjerenje, elektroniku, računarstvo i komunikacije 2012, 52 (4), 306−318. (6) Kadlec, P.; Grbić, R.; Gabrys, B. Review of adaptation mechanisms for data-driven soft sensors. Comput. Chem. Eng. 2011, 35 (1), 1−24. (7) Kano, M.; Ogawa, M. The state of the art in chemical process control in Japan: Good practice and questionnaire survey. J. Process Control 2010, 20 (9), 969−982. (8) Lin, B.; Recke, B.; Knudsen, J. K.; Jørgensen, S. B. A systematic approach for soft sensor development. Comput. Chem. Eng. 2007, 31 (5), 419−425. (9) Moore, B. Principal component analysis in linear systems: Controllability, observability, and model reduction. IEEE Trans. Autom. Control 1981, 26 (1), 17−32. (10) Zhu, J.; Ge, Z.; Song, Z. Robust supervised probabilistic principal component analysis model for soft sensing of key process variables. Chem. Eng. Sci. 2015, 122, 573−584. (11) Ge, Z.; Huang, B.; Song, Z. Mixture semisupervised principal component regression model and soft sensor application. AIChE J. 2014, 60 (2), 533−545. (12) Wang, Z. X.; He, Q.; Wang, J. Comparison of different variable selection methods for partial least squares soft sensor development. In American Control Conference (ACC), 4−6 June 2014; pp 3116−3121. (13) Galicia, H. J.; He, Q. P.; Wang, J. A reduced order soft sensor approach and its application to a continuous digester. J. Process Control 2011, 21 (4), 489−500. (14) Yu, J. Multiway Gaussian mixture model based adaptive kernel partial least squares regression method for soft sensor estimation and reliable quality prediction of nonlinear multiphase batch processes. Ind. Eng. Chem. Res. 2012, 51 (40), 13227−13237. (15) Cao, L. J.; Chua, K. S.; Chong, W. K.; Lee, H. P.; Gu, Q. M. A comparison of PCA, KPCA and ICA for dimensionality reduction in support vector machine. Neurocomputing 2003, 55 (1−2), 321−336. (16) Yuan, X.; Ge, Z.; Song, Z. Locally weighted kernel principal component regression model for soft sensing of nonlinear time-variant processes. Ind. Eng. Chem. Res. 2014, 53 (35), 13736−13749. (17) Zilouchian, A.; Jamshidi, M. Intelligent control systems using soft computing methodologies; CRC Press, Inc.: Boca Raton, FL, 2000. (18) Forouzantabar, A.; Talebi, H. A.; Sedigh, A. K. Adaptive neural network control of bilateral teleoperation with constant time delay. Nonlinear Dynamics 2012, 67 (2), 1123−1134. (19) Desai, K.; Badhe, Y.; Tambe, S. S.; Kulkarni, B. D. Soft-sensor development for fed-batch bioreactors using support vector regression. Biochem. Eng. J. 2006, 27 (3), 225−239. (20) Yu, J. A Bayesian inference based two-stage support vector regression framework for soft sensor development in batch bioprocesses. Comput. Chem. Eng. 2012, 41, 134−144. (21) Wang, J.; Yu, T.; Jin, C. On-line Estimation of Biomass in Fermentation Process Using Support Vector Machine1. Chin. J. Chem. Eng. 2006, 14 (3), 383−388. (22) Chen, T.; Ren, J. Bagging for Gaussian process regression. Neurocomputing 2009, 72 (7), 1605−1610. (23) Roberts, S.; Osborne, M.; Ebden, M.; Reece, S.; Gibson, N.; Aigrain, S. Gaussian processes for time-series modelling. Philos. Trans. R. Soc., A 2013, 371 (1984), 20110550. O

DOI: 10.1021/acs.iecr.6b00240 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Article

Industrial & Engineering Chemistry Research (45) Huang, Y.; Englehart, K. B.; Hudgins, B.; Chan, A. D. C. A Gaussian mixture model based classification scheme for myoelectric control of powered upper limb prostheses. IEEE Trans. Biomed. Eng. 2005, 52 (11), 1801−1811. (46) McCrary, J. Manipulation of the running variable in the regression discontinuity design: A density test. Journal of Econometrics 2008, 142 (2), 698−714. (47) Rasmussen, C. E. Gaussian processes for machine learning; MIT Press: Cambridge, 2006. (48) Yuan, C.; Zhang, X.; Xu, S. Partial mutual information for input selection of time series prediction. In 2011 Chinese Control and Decision Conference (CCDC); IEEE: New York, 2011; pp 2010−2014. (49) Sharma, A. Seasonal to interannual rainfall probabilistic forecasts for improved water supply management: Part 1A strategy for system predictor identification. J. Hydrol. 2000, 239 (1), 232−239. (50) Anzai, Y. Pattern Recognition and Machine Learning; Academic Press, Inc.: London, 1992. (51) Yu, J. Online quality prediction of nonlinear and non-Gaussian chemical processes with shifting dynamics using finite mixture model based Gaussian process regression approach. Chem. Eng. Sci. 2012, 82, 22−30. (52) Figueiredo, M. A. T.; Jain, A. K. Unsupervised learning of finite mixture models. IEEE Transactions on Pattern Analysis and Machine Intelligence 2002, 24 (3), 381−396. (53) Ge, Z. Quality prediction and analysis for large-scale processes based on multi-level principal component modeling strategy. Control Engineering Practice 2014, 31, 9−23. (54) Kraskov, A.; Stögbauer, H.; Grassberger, P. Estimating mutual information. Phys. Rev. E 2004, 69 (6), 066138. (55) François, D.; Rossi, F.; Wertz, V.; Verleysen, M. Resampling methods for parameter-free and robust feature selection with mutual information. Neurocomputing 2007, 70 (7), 1276−1288. (56) Fernando, T.; Maier, H.; Dandy, G. Selection of input variables for data driven models: An average shifted histogram partial mutual information estimator approach. J. Hydrol. 2009, 367 (3), 165−176. (57) Luna, I.; Soares, S.; Ballini, R. Partial mutual information criterion for modelling time series via neural networks. In Proc. of the 11th Information Processing and Management of Uncertainty International Conference, 2006; pp 2012−2019. (58) Gregorčič, G.; Lightbody, G. Gaussian process approach for modelling of nonlinear systems. Engineering Applications of Artificial Intelligence 2009, 22 (4−5), 522−533. (59) Yu, J.; Chen, K.; Rashid, M. M. A Bayesian model averaging based multi-kernel Gaussian process regression framework for nonlinear state estimation and quality prediction of multiphase batch processes with transient dynamics and uncertainty. Chem. Eng. Sci. 2013, 93, 96−109. (60) Li, X.; Su, H.; Chu, J. Multiple model soft sensor based on affinity propagation, gaussian process and bayesian committee machine. Chin. J. Chem. Eng. 2009, 17 (1), 95−99. (61) Tang, J.; Chai, T.; Yu, W.; Zhao, L. Modeling Load Parameters of Ball Mill in Grinding Process Based on Selective Ensemble Multisensor Information. IEEE Transactions on Automation Science and Engineering 2013, 10 (3), 726−740. (62) Downs, J. J.; Vogel, E. F. A plant-wide industrial process control problem. Comput. Chem. Eng. 1993, 17 (3), 245−255. (63) Grbić, R.; Slišković, D.; Kadlec, P. Adaptive soft sensor for online prediction and process monitoring based on a mixture of Gaussian process models. Comput. Chem. Eng. 2013, 58, 84−97. (64) He, Y.-L.; Geng, Z.-Q.; Zhu, Q.-X. Data driven soft sensor development for complex chemical processes using extreme learning machine. Chem. Eng. Res. Des. 2015, 102, 1−11.

P

DOI: 10.1021/acs.iecr.6b00240 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX