Article pubs.acs.org/IECR
Adaptive Soft Sensor Development Based on Online Ensemble Gaussian Process Regression for Nonlinear Time-Varying Batch Processes Huaiping Jin,* Xiangguang Chen,* Li Wang, Kai Yang, and Lei Wu Department of Chemical Engineering, Beijing Institute of Technology, Beijing 100081, People’s Republic of China S Supporting Information *
ABSTRACT: Traditional soft sensors may be ill-suited for batch processes because they cannot efficiently handle process nonlinearity and/or time-varying changes as well as provide the prediction uncertainty. Therefore, a novel adaptive soft sensor, referred to as online ensemble Gaussian process regression (OEGPR), is proposed for nonlinear time-varying batch processes. The batch process is first divided into multiple local domains using a just-in-time localization procedure, which is equipped with a probabilistic analysis mechanism to detect and remove the redundant local domains. Then the localized GPR models and probabilistic data descriptors (PDD) are built for all isolated domains. Using Bayesian inference, the posterior probabilities of any test sample with respect to different local domains are estimated and set as the adaptive mixture weights of local predictions. Further, the overall mean and variance of the predictive distribution of the target variable are estimated via the finite mixture mechanism. Additionally, the OEGPR method performs adaptation at two levels to handle time-varying behavior: (i) local GPR and PDD models; and (ii) the mixture weights. The effectiveness of the OEGPR approach is demonstrated through a simulated fed-batch penicillin fermentation process as well as an industrial fed-batch chlortetracycline fermentation process. black box) soft sensors.12 The former class of approaches requires in-depth understanding of the chemical and physical principles underlying the process, which is often unavailable in complex industrial applications. Alternatively, the data-driven methods only depend on plant operating data and rarely require any process knowledge. With the utilization of modern instrument and measurement techniques, large amounts of data in industrial processes are collected, stored, and analyzed, thereby rendering data-driven models preferable to the firstprinciple ones for complex processes.9−13 Thus, this work is concerned with data-driven soft sensors for providing accurate and reliable online predictions of critical variables in batch processes. The most common data-driven soft sensors are based on multivariate statistical techniques such as principle component regression (PCR),14−17 partial least-squares (PLS),18−25 and independent component regression (ICR).26−28 This type of methods usually projects the original data onto a lowerdimensional subspace to extract variable features and then identifies the regression model within the new subspace. These techniques gained popularity because of their statistical background, ease of interpretability, and strong capability of handling process data colinearity. However, these methods are essentially linear modeling techniques and thus cannot handle process nonlinearity unless certain nonlinear variations like kernel functions are introduced, such as kernel PCR (KPCR)29,30 and kernel PLS (KPLS).30−32 Moreover, a wide range of machine learning methods have also been widely
1. INTRODUCTION Batch processes have been around for many millennia, probably since the beginning of human civilization.1 Cooking, bread making, tanning, and wine making are some of the batch processes that humans relied upon for survival and pleasure. Nowadays, batch processes have been widely used to produce low-volume and high-value-added products, including food, pharmaceuticals, specialty chemicals, and materials for microelectronics, etc. However, the lack of reliable online measurements of key product quality variables poses a great challenge to control batch processes accurately, automatically, and optimally. Traditionally, these key variables are determined through offline chemical analysis or online analyzers, both of which often encounter unacceptable expenses of analytical instruments or large measurement delay. As a promising solution toward realtime online estimation, soft sensor techniques have gained increasing popularity in both academia and industry since the early 1970s2−5 until today, especially over the past two decades.6−13 There are several other popular terms to describe soft sensors in the literature, including software sensor, soft sensing, virtual sensor/sensing, inferential sensor/sensing/estimation, etc. The basic idea of a soft sensor is to first construct a mathematical model for describing the relationship between the difficult-to-measure variable and other highly related easyto-measure variables. Then the target variable can be predicted in an online fashion by using the online measurements of easyto-measure variables as model inputs. Consequently, the developed virtual sensors only rely on software programs instead of hardware sensors or analyzers. While a variety of soft sensors are available, the majority of these can be classified into two groups: model-driven (fundamental, first-principle) and data-driven (empirical, © XXXX American Chemical Society
Received: April 20, 2015 Revised: June 14, 2015 Accepted: July 14, 2015
A
DOI: 10.1021/acs.iecr.5b01495 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX
Article
Industrial & Engineering Chemistry Research
questionnaire survey,57 this issue was considered as the most important problem of current soft sensors. Traditionally, soft sensor models are not adaptive, and once deployed into real-life operation, the model does not change anymore. As opposed to this, most of the processes, where the soft sensors are applied, exhibit some kind of time-varying behavior. The main reasons for such a behavior include changes of process input (raw) materials, process fouling, abrasion of mechanic components, catalyst deactivation, switching of different product quality grades, changes in external environment (e.g., weather, seasons), etc. In machining learning, all these drifting problems are summarized under the term concept drift.58,59 As a result of the above changes, it is often the case that the performance of static soft sensors starts to deteriorate during online operation. Such behavior is called model degradation, which can be categorized into three types: shift of inputs, shift of outputs, and change of slope between inputs and outputs.60 According to the change rate, concept drift can be distinguished between: (i) gradual drifts with slow changes; and (ii) abrupt changes, where the concept changes immediately from one state to another. To reduce the performance deterioration, soft sensors need to be equipped with adaptation mechanisms to allow accommodating new process states automatically.59,60 To this end, various strategies have been proposed to enable online adaptation of soft sensors. A popular solution toward model adaptation is the moving window (MW) approach. As new samples are available, the window slides along the data so that the oldest sample is removed from and the newest sample is added to the model. There have been numerous soft sensors incorporating MW strategy for online adaptation.21,39,61−64 However, there are some drawbacks associated with the MW method. One of them is that all the data within the window require to be stored, which can be problematic for large windows and memory limited applications. Another critical issue is the setting of window size and step size, which has a critical influence on the model performance. Inappropriate setting can result in performance deterioration instead of improvement. Another set of adaptive techniques is based on the recursive adaptation methods. Unlike MW method, the recursive adaptation usually involves down-weighting the previous model by using a forgetting factor rather than adjusting the window data, e.g. the well-known recursive PLS (RPLS).65 Various adaptive soft sensors equipped with the recursive adaptation mechanism have been proposed.19,20,66,67 The recursive adaptation methods have similar issues with parameter selection as the MW technique. The selection of forgetting factor, for example, has a critical influence on the performance of adaptive soft sensors. Moreover, a particular drawback of recursive methods is that it cannot efficiently deal with abrupt changes because a query sampled just after an abrupt change becomes significantly from the prioritized samples. When the process is operated in a narrow region, the recursive method tends to adapt the soft sensor excessively. In addition to MW and recursive adaptation methods, time difference (TD) modeling was also proposed to maintain soft sensor performance.68,69 Instead of directly modeling the relationship between the input and output variables, TD models are built based on the time difference values of the input and output variables. As a result, the effects of deterioration with age such as the drift and gradual changes in the state of the plants can be well addressed without online updating of the soft sensor model. However, TD models
applied to soft sensor modeling, including artificial neural networks (ANN),33−35 support vector regression (SVR),36−41 least-squares SVR (LSSVR),42,43 Gaussian process regression (GPR),44−51 etc. To develop a high-performance data-driven soft sensor, two critical issues should be well tackled: (i) process nonlinearity and (ii) time-varying changes in process characteristics. In general, industrial processes can be classed into four categories: linear time-invariant (LTIV), linear time-variant (LTV), nonlinear time-invariant (NLTIV), and nonlinear time-variant (NLTV).22 As for batch processes, most of them belong to the NLTV type because they are usually characterized by strong inherent nonlinearity and time-varying behavior. Therefore, both nonlinearity and time-varying dynamics of processes need to be taken into full consideration when designing accurate and reliable soft sensors for batch processes. More issues concerning soft sensor development were discussed in a recent review.12 To deal with the process nonlinearity, the most straightforward approach is to use nonlinear techniques such as ANN, SVR, and GPR instead of linear models such as PCR and PLS. However, these methods are usually based on a single regression model given the underlying assumption of a constant operating phase and conditions throughout the batch process. For multiphase/multimode batch processes, global models, which aim to achieve a universal generalization performance, may result in inaccurate predictions in some local regions where process characteristics change. Moreover, due to the great model complexity, global modeling methods suffer from the difficulties in performing optimizations and updating models when new data are available. To overcome the limitations of global approaches, many attempts have been paid to local learning based soft sensor modeling. Local learning applies the philosophy of “divide and conquer” and constructs a set of local models, also called local experts, each of which is responsible for one specific operating region of the process such that the process nonlinearity can be well modeled. One can distinguish two common local learning methods in the field of soft sensing applications, namely ensemble learning19,41,47,52 and just-in-time (JIT) learning.32,53−56 Under the ensemble learning framework, process data are first divided into multiple local domains by manipulating input features or training samples. Then local models are built for all identified local domains. As a result, instead of using a global model, each of the local models provides a prediction of the target variable when implemented online. Further, all local predictions are combined to obtain the final estimation of the ensemble. Note that both construction of local domains and building of local models in ensemble modeling are completed offline. Conversely, JIT learning performs localization and local models building in an online manner. There are three main steps to predict the target variable under the JIT learning framework: (i) select a relevant data set similar to the query data from historical database via some similarity measure; (ii) construct a local model from the selected samples; and (iii) discard the built model after its use for estimation. The frequent reconstruction of local models enables the JIT model to efficiently capture the nonlinear relationship between the input and output variables. Apart from the process nonlinearity, another critical issue of soft sensor development is how to cope with changes in process characteristics and maintain high estimation accuracy for a long period of time, i.e. model maintenance. As revealed in a recent B
DOI: 10.1021/acs.iecr.5b01495 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX
Article
Industrial & Engineering Chemistry Research
becomes especially serious when deploying soft sensors into plants of batch processes with long operating period and numerous reactors. Also, JIT method encounters the difficulties in sharing soft sensor models because the online built local model is highly specialized in a narrow region thus it is not applicable to other states. Another challenge of JIT learning is the definition of an appropriate similarity index, which has a critical influence on the performance of JIT models. In contrast, ensemble learning based soft sensors are built offline based on the process data from different batches and tanks and thus can be applied to any similar batch or tank. As a result, the computational load is significantly reduced by sharing the offline built soft sensors for different batch operations. Therefore, this paper aims to develop adaptive soft sensor under the ensemble learning framework for nonlinear timevarying batch processes. Typically, the development of ensemble based adaptive soft sensors consists of four basic tasks: (i) localization of process or data; (ii) selection of local modeling technique and local models building; (iii) combination of local models; and (vi) adaptation of ensemble models. The first crucial task in ensemble learning is the localization, which partitions the entire process or data into local regions. As for data-driven modeling, the localization procedure focus on dividing the process data into different local domains and thus local models can be constructed from the corresponding local data sets. The common approaches for this purpose in soft sensing applications are clustering methods such as fuzzy cmeans (FCM) 5 0 , 7 2 and Gaussian mixture model (GMM).31,47−49 Nevertheless, there are several problems associated with these partition methods. First, the number of clusters usually needs to be determined in advance, whereas the quantitative and precise information about the process divisions is often unavailable in practice. Second, without retraining from scratch, it is difficult to include or remove a local domain online. Third, clustering methods do not consider overlapping between local domains. In industrial applications, however, different operating modes are often overlapped or have nonlinear boundaries. In such a case, the conventional divisions may fail to characterize the transient dynamics. An alternative popular way of achieving localization is the moving window (MW) method, which constructs local domains by selecting a local data set consisting of some consecutive samples included in a certain period of time.15,73 This method assumes that the correlation among variables in a window data set is expected to be very similar. However, MW localization encounters the difficulty in selecting appropriate window size and step size. An additional issue of this approach is that the window size is often predetermined and fixed. To enable adaptive localization, MW method can be integrated with statistical hypothesis testing methods such as t-test and x2test.19,24,25,41 Three basic steps are involved in such hybrid partition methods. First, an initial moving window is constructed by selecting a preset number of consecutive samples. Then a local model is constructed from the window data and provides the prediction residuals of the training data. Further, shift the window forward and construct a new window data set. Following this, the prediction residuals of the new window data can be calculated using the previously built model. Finally, statistical testing is performed on the two sets of residuals from two windows to decide if the new window samples belong to the same process state as the samples from the initial window. To ensure sufficient diversity, a mechanism
cannot account for the abrupt changes and may be ill-suited for online prediction due to their limited capability of handling missing values, infrequent samplings, and process outliers. This is because TD model is highly dependent on previous measurements, which are not always available in practical situations. An alternative solution for model adaptation is the offset compensation (OC) technique,21,23,66,67 which has been widely used for eliminating the prediction bias based on the prediction errors at previous sampling instants. However, OC correction method encounters a particular issue that it cannot function well when abrupt changes occur and/or previous measurements are acquired infrequently. For JIT soft sensors, model adaptation is conducted by updating the training database. A simple approach toward solving this problem is to use moving window strategy to include the newest sample to and remove the oldest sample from the database. However, one major concern of simply discarding old samples is that some important information contained in old data is ignored. This problem can be alleviated by using the maximal similarity replacement (MSR) rule.23,32 The rationale of MSR method is to use the newly measured sample to replace the old sample that is most similar to the new data. If only the time is used as the similarity measure, MW updating can be regarded as a special case of the MSR method. A comprehensive research concerning database management was conducted by Kaneko and Funastu,70 which presents a database monitoring index (DMI) to judge whether new measured data should be stored in a database or not. By using the DMI measure, data having much information are included in the database while data with little information are discarded. In this way, highly accurate JIT models can be constructed with less time by appropriately managing the database and maintaining sufficient information. Another problem of updating JIT models arises from the fact that, in most cases, only the space relevance, e.g. Euclidean distance based similarity criterion, is used to select local modeling samples. Such selection strategy cannot efficiently make full use of new measurements, which represents the current process state. This is because the newly obtained samples are very probably ignored or assigned little importance due to the inappropriate definition of similarity measure. One solution to this problem is to combine the space relevance and time relevance such that new samples can play a critical role when building local models online.71 Despite the availability of the above-mentioned local learning methods (i.e., ensemble learning and JIT learning) and several adaptation schemes (i.e., MW, recursive adaptation, OC, and database updating), these methods are often separately applied to soft sensor development. Thus, combining the benefits of local learning approaches and adaptation mechanisms is a promising option to develop adaptive soft sensors that can handle process nonlinearity as well as time-varying changes. For JTI learning, there is a general consensus that it can cope with changes in process characteristics as well as nonlinearity. However, it suffers from some significant limitations. One major concern is the heavy computational load resulting from the frequent reconstruction of local models.23 In JIT learning, a local model needs to be rebuilt from scratch whenever some kind of external factors, such as query time, batch number and tank number of batch processes, has changed. Consequently, a large number of local models required to be built, which consumes extensive computational resources. This problem C
DOI: 10.1021/acs.iecr.5b01495 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX
Article
Industrial & Engineering Chemistry Research
respect to different local domains,43,47−49,76 (iii) fuzzy memberships,72 and (iv) monitoring metrics such as I2 and SPE statistics.28 Alternatively, adaptive weights can also be determined based on the estimated prediction capability of local models on the query data, such as the prediction accuracy on the recently measured samples or on the similar samples to the query data,40,41,77,78 and the prediction uncertainty.45,46,50 The next issue concerning ensemble soft sensors is the incorporation of adaptation mechanisms. This allows automatic adaptation of soft sensors when new measurements are acquired. The adaptation of ensemble soft sensors can be performed at two levels: (i) adaptation of the local models; and (ii) adaptation of the combination weights.59,77,78 The former adaptation type can be achieved by applying techniques such as recursive and moving window updating. An additional mechanism for the adaptation of this level is online inclusion of new models to or removal of poor models from the ensemble. As for the level of combination weights, the aforementioned adaptive weighting methods can be used. The overall performance of ensemble soft sensors depends on all aspects discussed above. After addressing the above-mentioned problems, a highperformance ensemble soft sensor can be developed. Despite this, there remains an issue of evaluating the accuracy of prediction of soft sensors, which is discussed under the term applicability domains (ADs).79,80 During the offline phase, the estimation performance of soft sensor models can be easily assessed using a testing data set, whereas during the online phase it often becomes very difficult due to the lack of online measurements of target variable. AD is of vital importance in soft sensor applications since it can assist the operators to evaluate the reliability of online predictions and distinguish between the variations in the process variables and y-analyzer fault. One approach for this purpose is to quantitatively explore the relationships between the distances to models (DMs) and the prediction errors.79 In this method, DM can be defined as the distance to the average of training data or the distance to the nearest neighbor of training data. The general idea of this method is that the larger the distances to models are, the lower the estimation accuracy of prediction would be. An alternative approach is to estimate the data density.80 This method is based on the assumption that a soft sensor model is well trained in data regions where the data density is high and the prediction accuracy of the model will be high in these same regions. Conversely, the model will not be trained sufficiently in data regions where the data density is low and the prediction accuracy of the model will be low in these regions. Another efficient strategy of evaluating the prediction accuracy of the soft sensor model is to estimate the prediction variance in the Bayesian framework, such as the GPR technique.44 Overall, many of the current ensemble soft sensors encounter the following problems more or less: (i) inappropriate localization; (ii) nonadaptive weighting; (iii) unavailability of adaptation capability; and (iv) incapability of estimating prediction uncertainty. To address these issues, a novel adaptive soft sensor, referred to as online ensemble Gaussian process regression (OEGPR), is proposed for state estimation and quality prediction of nonlinear time-varying batch processes. The OEGPR approach allows efficiently dealing with the changes in process characteristics as well as nonlinearity, thus delivering reliable and accurate estimations. Another outstanding feature of the OEGPR approach is that it can provide
for detecting and removing redundant local domains should be considered.25 However, an additional potential problem of the MW division method is that samples with a certain moving window may come from multiple states, especially when abrupt changes occur and the window size is set inappropriately. In addition to the clustering and MW methods, bagging and boosting,74 which are originally proposed for solving classification problems, have also been applied to develop ensemble soft sensors.27,46,75 Bagging first creates a set of subsets from the original training data using bootstrap resampling method. Then multiple local models can be developed based on different resampling data sets. This method is especially suitable for “unstable” models, i.e. the models that are sensitive (in terms of model parameters and prediction performance) to small changes in the training data. However, this method may result in high redundancy between data sets and needs to build a large number of individual models. In contrast, boosting involves training multiple base models known as weak learners in sequence, where the error function used to train a new model depends on the performance of the previous models.74 Once all base models have been built, their predictions are then combined to produce the final prediction whose performance can be significantly better than that of any of the weak learners. However, even though they can well extract crucial information underlying the training data, both bagging and boosting methods are not suitable for handling concept drift issues during online operating. After constructing a set of local domains, local models can be built from corresponding local data sets. A key issue of this step is how to choose the local modeling technique. When dealing with slightly nonlinear processes, linear modeling methods such as PCR, PLS and ICR are preferable because they are easy to optimize and can be operated either in sample-wise or blockwise adaptation modes. If process exhibits strong nonlinear behavior, nonlinear techniques such as ANN, SVR and GPR should be preferred. Another crucial task to develop ensemble based soft sensor is the combination of local models. For any query data, each of the local models can provide a prediction of the target variable. Then all local predictions have to be integrated to obtain the final estimation. The simplest and very common combination technique is the simple averaging rule, which gives the final prediction as the mean of the local predictions. However, the averaging approach is not an optimal combination method and the weighted combination is preferable. Thus, different weights are assigned to the local models and the final estimation is given as a weighted sum of local predictions. Generally, the weighted ensemble methods can be classified into two types: nonadaptive and adaptive weightings. In the nonadaptive case, the weights with respect to different local models are often determined using linear27,34 or nonlinear9 regression techniques and often remain unchanged once deployed into real-life operation. Such nonadaptive weighting approaches tend to assign larger weights to the models that give better prediction performance on the training data. Thus, the generalization capability of ensemble models for new test data may be compromised. In fact, the prediction capability of local models varies with the query data and thus adaptive weights are preferable in order to track timevarying behavior of process. Often, adaptive weights are determined based on the relevance between the query data and local models. To this end, the following indices can be used: (i) distances between the query data and centers of local domains,54 (ii) posterior probabilities of query data with D
DOI: 10.1021/acs.iecr.5b01495 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX
Article
Industrial & Engineering Chemistry Research
mean. Thus, the output observations follow a Gaussian distribution as
the estimation of prediction variance. The OEGPR method is outlined as follows: (i) Local domains of batch processes are constructed using a JIT localization procedure. The JIT partition strategy consists of two main operations. First, a sample within the training database is selected as the query data and a local domain data set is constructed with those samples whose similarities to the query data are higher than those of other data. Second, a probabilistic analysis mechanism is used to detect and remove the redundant local domain. Repeat the above procedure until each of the training data has been traversed. (ii) The GPR method is used to build local models because of its strong capability of handling process nonlinearity as well as providing prediction uncertainty, i.e. the prediction variance. Additionally, a probabilistic data descriptor (PDD) is built for each of local domains. (iii) For any test sample, the overall mean and variance of the predictive distribution of target variable are estimated from local predictions based on the finite mixture mechanism. The mixture weights of local models are set as the posterior probabilities of the query data with respect to different local domains. (iv) The adaptation of the OEGPR method is performed not only on the level of the local GPR and PDD models, but also on the level of the mixture weights. The local GPR and PDD models with the maximum posterior probability for the new sample are updated using a moving window strategy. This two-level adaptation mechanism allows the OEPGR method to efficiently address the concept drift issues of batch processes. The rest of this paper is organized as follows. Section 2 briefly reviews Gaussian process regression. Then the novel OEGPR adaptive soft sensing method is developed in Section 3. Section 4 demonstrates the effectiveness of the OEGPR method through a simulated fed-batch penicillin fermentation process as well as an industrial chlortetracycline fed-batch fermentation process. Finally, the concluding remarks are drawn in Section 5.
y ∼ 5(0, C) where C is an n × n covariance matrix given by ⎡ C(x1, x1) ⋯ C(x1, x n) ⎤ ⎥ ⎢ ⋱ ⋮ C=⎢ ⋮ ⎥ ⎥ ⎢ ⎣C(x n, x1) ⋯ C(x n, x n)⎦
⎛ ⎡y⎤ ⎜0 , 5 ∼ ⎢y ⎥ ⎜ ⎣ *⎦ ⎝
(6)
where k∗ = [C(x∗,x1), ···, C(x∗, xn)] . With the covariance function, the posterior predictive distribution p(y∗|X,y,x∗) of the output variable y∗, given its inputs x∗, is Gaussian with mean (ŷ∗) and variance (σ2∗) given by ⎧ y ̂ = (y ) = k TC−1y ⎪* * * ⎨ 2 ⎪ σ = Var(y ) = C(x , x ) − k TC−1k ⎩ * * * * * *
(7)
where (·) and VAR(·) denote the operators for calculating the mean and variance, respectively. The capability of providing the prediction variance is an interesting feature, which distinguishes the GPR method from other machine learning techniques. It can be seen that a covariance function is the crucial ingredient in a Gaussian process predictor, as it encodes our assumptions about the function which we wish to learn. From the Gaussian process view, it is the covariance function that defines nearness or similarity between samples. The covariance function can be defined as any function that can generate a non-negative definite covariance matrix for any set of data points. Various forms of covariance functions have been proposed, such as the squared exponential (SE) covariance function and the Mateŕn class of covariance functions.44 In our application, a Materń covariance function with the noise term is used, which is given as
(1)
⎛ C(x i , x j) = σ f2⎜⎜1 + ⎝
3 || x i − x j || ⎞ ⎛ 3 || x i − x j || ⎞ ⎟⎟exp⎜⎜ − ⎟⎟ l l ⎠ ⎝ ⎠
+ σn2δij
(8)
with the positive hyperparameters Θ = {σ2f , l, σ2n}, where σ2f is the output scale, l is the input scale, σ2n is the noise variance; δij = 1 if i = j; otherwise, δij = 0. The hyperparameters Θ can be obtained by maximizing the log-likelihood of the training data: 1 1 n log p(y|X) = − y TC−1y − log|C| − log(2π ) 2 2 2
(2)
(9)
The partial derivative of the log-likelihood with respect to each hyperparameter (denoted by a generic notation θ) is
Then the Gaussian process can be written as f (x) ∼ 5(m(x), C(x , x′))
⎡C ⎤⎞ k * ⎥⎟ ⎢ ⎢⎣ k T C(x , x )⎥⎦⎟⎠ * * * T
where x = [x1, ···, xd] is a d-dimensional input vector; f(·) represents the unknown function, while ϵ denotes the Gaussian noise with zero mean and variance σ2n. From the function-space view, a Gaussian process is completely specified by its mean function m(x) and covariance function C(x, x′), which are defined as follows: ⎧ ⎪ m(x) = [f (x)] ⎨ ⎪ ⎩C(x , x′) = [(f (x) − m(x))(f (x′) − m(x′))]
(5)
Given a test input x∗, the joint distribution of the training outputs y and the test output y∗ according to the prior is
2. REVIEW OF GAUSSIAN PROCESS REGRESSION A Gaussian process is a collection of random variables, any finite number of which have a joint Gaussian distribution.44 Given a set of input and output samples Z = {X, y} = {xi, yi}in= 1, the regression model can be formulated as y = f (x) + ϵ
(4)
∂[log p(y|X)] ∂C ⎞ 1 ∂C 1 ⎛ = − tr⎜C−1 ⎟ + y TC−1 C−1y ⎝ ⎠ ∂θ ∂θ ∂θ 2 2
(3)
If the data are properly scaled, it can be assumed that the data {X, y} are generated from a Gaussian process with zero
(10) E
DOI: 10.1021/acs.iecr.5b01495 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX
Article
Industrial & Engineering Chemistry Research where ∂C/∂θ can be calculated from the covariance function and tr(·) is an operator used for calculating the trace of a matrix.
3. PROPOSED OEGPR SOFT SENSING ALGORITHM The novel OEGPR adaptive soft sensing algorithm can be split into the following steps: (i) construction of local domains; (ii) building of local models; (iii) combination of local models; and (iv) model adaptation. The former two steps are completed at the offline stage while the latter two steps are performed during the online operation. The above steps, as well as the parameter selection and online implementation procedure of the OEGPR approach, will be discussed in detail in the following sections. 3.1. Construction of local domains. The aim of this step is to divide the historical process data into multiple local domains, which represent different states of the process. The process data within a local domain are assumed to exhibit similar characteristics and thus the local regression relationship between the input and output data can be well modeled. A crucial issue of achieving localization is to partition the relevant samples into a group using some similarity criterion. To this end, in this work, a new just-in-time (JIT) localization procedure is proposed by integrating JIT relevant sample selection with a probabilistic analysis mechanism for redundancy check. A local domain is first built from the selected relevant samples and then a probabilistic analysis is performed to detect and remove the redundant local domain. The JIT localization method is inspired by the JIT learning. The basic idea of this approach is that any training sample, together with its similar samples, can be assumed to exhibit similar process characteristics. Thus, a set of local domains can be built for all historical samples by repeatedly performing relevant sample selection as JIT learning. In this way, all process states in the historical data can be identified. Consider a historical data set Z = {X, y} = {xi, yi}in= 1 with xi, yi, and n representing the input, output, and number of samples, respectively. Then a local domain can be constructed from samples relevant to the query data. To evaluate the relevance between samples, a similarity measure should be defined. While various indices, such as the angle-based55 and correlationbased15 measures, have been proposed for JIT learning, the Euclidean distance-based similarity is adopted in our research for simplicity but without loss of generality: sim(x i , x j) = exp( −|| x i − x j ||)
Figure 1. Illustration of JIT localization for local domains construction.
samples with the same distances from the query data are excluded from the local domain. By repeating the above procedure, all potential local domains can be built. However, a potential problem associated with this localization approach is that local domains may encounter great redundancy. This problem results from the fact that the above JIT localization procedure essentially mimics the process of JIT learning and thus leads to the same number of potential local domains as that of the historical samples. In fact, a good ensemble model should build local models that are not only accurate but also diverse. The redundant local domains may not improve the prediction performance but increase the model complexity instead. To address the above issue, a probabilistic analysis method is proposed to detect and remove the redundant local domains. Assume that a total of (M − 1) local domains, LD = {LDm}m M= −1 1, have been constructed by applying the JIT localization procedure. Dealing with the redundant issue consists of three steps. For a new query data xnew ∈ X, a potential local domain LDnew can be first built. Then the redundancy between the new and old local domains is evaluated. Further, make a decision to store or discard the new local domain. If the process dynamics underlying the samples similar to the query data have been captured by the previously built local domains, the new local domain is not required. Otherwise, the new local model is stored as the newest one, i.e. LDM. After a new local domain is built, there remains a curial issue that how to evaluate the redundancy between the new and old local domains. To this end, the posterior probabilities of the new local domain data with respect to all local domains are analyzed. For a local domain LDm, its input data Xm can be assumed to follow a multivariate Gaussian distribution. Thus, a probabilistic data descriptor (PDD) is defined as the probability density function of the local domain data:
(11)
Provided a training sample xq ∈ X, the similarity values between xq and each historical data (xi, yi) are calculated as si = sim(xq, x i),
i = 1, 2, ···, n
(12)
Then, arrange all si in descending order and the local domain data set Zq = {Xq, yq} for the training data xq can be constructed by selecting L most relevant samples (xi, yi) corresponding to the largest si to the Lth largest si. The basic concept of JIT localization is illustrated in Figure 1. It is shown that the samples close to the query data are assumed to belong to the same process state and are included in the same local domain. Also, as shown in the figure, there may exist partial overlapping between local domains. Furthermore, it can be seen that some similar samples located in the circle are not included in the corresponding local domain. This is because the local domain size L has been fixed such that some of the F
DOI: 10.1021/acs.iecr.5b01495 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX
Article
Industrial & Engineering Chemistry Research 1
p(x i|LDm) =
d
(2π ) |Σm|
⎡ 1 exp⎢ − (x i − μm )T Σ−m1 ⎣ 2
Jnew =
p(x new|LDm)p(LDm) p(x new)
p(x new|LDm)p(LDm) M
∑i = 1 p(x new|LDi )p(LDi )
,
m = 1, 2, ⋯ , M (14)
where p(LDm) and p(xnew|LDm) are the prior and conditional probabilities, respectively. Here, p(xnew|LDm) is calculated from eq 13. Without any process or expert knowledge, the prior probability for each local domain is assumed to be equal, that is p(LDm) =
1 , M
m = 1, 2, ···, M
(15)
Similarly, the posterior probabilities of the newly built local domain data Xnew with respect to the new local domain LDnew are calculated as follows: p(LDnew |x i) =
p(x i|LDnew )p(LDnew ) M
(17)
∑m = 1 p(x i|LDm)p(LDm)
,
(18)
where median{·} denotes the median operator, and γ is a tuning factor controlling the redundancy between local domains. The purpose of using the median value is to filter the extreme values and obtain a reliable threshold. If Jnew < J,̅ then LDnew is retained; otherwise, LDnew is discarded. After setting the local domain size L and threshold factor γ, all satisfactory local domains can be identified adaptively by using the proposed JIT localization method. To analyze the partition results, three indices are considered: (i) The number of local domains, i.e. M ∈ [1, n]. (ii) The sample-to-sample ratio (SSR) between the total number of samples from all local domains with duplicates and that of original training samples without duplicates, i.e. SSR = (LM)/n. (iii) The sample coverage rate (SCR), which is defined as the proportion of the samples constructing the local domains without duplicates in the historical data, i.e. SCR = (LM − ndup)/n, with ndup denoting the number of duplicated samples. Generally, M descends with L while growing with γ. For a given L, both SSR and SCR increase with γ. In contrast, for a given γ, both SSR and SCR vary irregularly with L. Compared with the conventional clustering or moving window (MW) based localization methods, the proposed JIT localization approach has the following features: (i) It provides the possibility to use flexible similarity measures for selecting relevant samples. Generally, similarity indices used for JIT modeling are also applicable to the JIT localization method. (ii) Instead of selecting successive samples included in a certain period of time as in the MW division method, the JIT localization strategy partitions the historical data into local domains based on a user-defined similarity criterion. (iii) It needs not preset the number of local domains, whereas in the clustering methods the number of clusters often needs to be given in advance. (iv) It allows partial overlapping between local domains, which is useful for characterizing the transitional process dynamics. This is because, in industrial applications, the operating modes are often overlapped or have nonlinear boundaries. (v) When a new process state is detected, if necessary, it supports online inclusion of a new local domain without repartitioning the process data from scratch. 3.2. Building of local models. After identifying the local domains, one local predictive model, also called local expert, and one probabilistic data descriptor (PDD) are built for each of the local domains. The building of local PDD models has been presented in Section 3.1. The local experts are based on the GPR technique due to the following useful features:44 (i) Similar to the ANN, SVR, and LSSVR methods, GPR models are capable of efficiently dealing with process nonlinearity when using a kernel function to map the input data from the original space onto a highdimensional feature space. (ii) As well as handling nonlinearity, the GPR model also provides the prediction distribution of target variable and
(13)
where the d-dimensional vector μm is called the mean, the d × d matrix Σm is called the covariance, and |Σm| denotes the determinant of Σm. μm and Σm are estimated using the local domain data Xm and completely specify the mth PDD model. Given the new query data xnew, a new local domain LDnew can be constructed from local data set Znew = {Xnew, ynew}. Then a new PDD model can be built from the data Xnew. Now suppose that a total of M local domains, {LD1, LD2, ···, LDM−1,LDnew}, have been isolated, where the previous ones {LDm}mM=−1 1 are assumed to meet the redundancy requirement, while the Mth one LDnew remains to be checked. If there exists one local domain LDm(m ∈ {1, 2, ···, M − 1}) that is highly similar to LDnew, then LDnew and LDm are thought to be redundant. In this case, LDnew is discarded; otherwise, LDnew is stored as the Mth local domain. In this work, a probabilistic analysis based redundancy check mechanism is proposed. This method is based on the assumption that a new local domain LDnew triggered by xnew is usually similar to a previous local domain LDm if the posterior probability of xnew with respect to LDm is large as that to LDnew. The larger the posterior probability p(LDm|xnew) is, the more similar the new local domain LDnew to LDm would be, which reveals more redundant information between two local domains. With a set of local domains {LDm}mM= −1 1, only the most similar one is involved in the redundancy check. That is, the maximum of {p(LDm|xnew)}m M= −1 1 is chosen to judge whether LDnew should be stored or not. Meanwhile, the decision threshold of posterior probability is determined by the posterior probabilities of the new local domain data with respect to LDnew. The posterior probabilities of xnew with respect to different local domains can be estimated through Bayesian inference strategy as follows:
=
p(LDm|x new)
J ̅ = γ × median{p(LDnew |x i)|x i ∈ X new}
⎤ (x i − μm )⎥ ⎦
p(LDm|x new) =
max
m = 1,2, ···, M − 1
xi ∈ X new (16)
With these posterior probabilities, the redundant index Jnew between the new local domain LDnew and old local domains {LDm}mM=−1 1, and the decision threshold J ̅ are defined as follows: G
DOI: 10.1021/acs.iecr.5b01495 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX
Article
Industrial & Engineering Chemistry Research
With the component densities p(ym,new|Θm), m = 1,···, M, moments of a finite mixture distribution are quite easily obtained. To determine the expectation (H(ynew)/ϑ) of a function H(ynew) of ynew with respect to the mixture density in eq 22, consider first the expectation (H(ynew)/Θm) of H(ynew) with respect to the component density p(ym,new|Θm):
consequently gives the prediction variance. This is beneficial when a soft sensor is implemented in the plants for monitoring purposes, since the prediction variance indicates the degree of prediction reliability. (iii) The GPR model has only a few training hyperparameters, which can be automatically optimized with an iterative method. (iv) The GPR model structure can be determined relatively easily. (v) Prior knowledge is possible to be included in the GPR model by using appropriate mean and covariance functions. After this step, a set of localized GPR and PDD models are built and stored: PDDm : p(x new|LDm)
(H(ynew )|Θm) =
(19)
M
ynew ̂ = (ynew |ϑ) =
2 = Var(ynew |ϑ) σnew
= ((ynew − ynew ̂ )2 |ϑ) M
=
∑ ωm((ym,new
− ynew ̂ )2 |Θm)
m=1 M
=
∑ ωm((ym,new
− ym̂ ,new + ym̂ ,new − ynew ̂ )2 |Θm)
m=1 M
=
∑ ωm[((ym,new
− ym̂ ,new )2 |Θm)
m=1
+ ((ym̂ ,new − ynew ̂ )2 |Θm) + (2(ym ,new − ym̂ ,new )(ym̂ ,new − ynew ̂ )|Θm)] (27)
(21)
with ((ym ,new − ym̂ ,new )2 |Θm) = σm2 ,new
(28)
((ym̂ ,new − ynew ̂ )2 |Θm) = (ym̂ ,new − ynew ̂ )2
(29)
(2(ym ,new − ym̂ ,new )(ym̂ ,new − ynew ̂ )|Θm) = 2(ym̂ ,new − ynew ̂ )E(ym ,new − ym̂ ,new |Θm) = 2(ym̂ ,new − ynew ̂ )(ym̂ ,new − ym̂ ,new )
(22)
=0
where ϑ = {Θ1, ···, ΘM, ω} denotes the vector of parameters of the mixture density. ω = {ω1, ···, ωM} is the vector of mixture weights satisfying the following constraint:
(30)
σ2m,new
where is the local prediction variance computed from the mth local GPR model. Thus, the overall variance σ2new is rewritten as follows: M
M 1
(26)
where ŷm,new = (ynew|Θm) is the component mean estimated from the mth local GPR model. Similarly, by substituting H(ynew) = (ynew − (ynew|ϑ))2 = (ynew − ŷnew)2 into eq 25, the overall variance (σ2new) is calculated as follows:
M
∑ ωm = 1
∑ ωmym̂ ,new m=1
with a density function p(ym,new|Θm), where Θm denotes the vector of parameters. (ym,new) and Var(ym,new) are the prediction mean and variance, respectively, which are calculated from the mth local GPR model. Assume that the local outputs of the target variable ym,new, ···, yM,new are independent realizations of the overall output ynew. Thus, ynew can be said to arise from a finite mixture distribution with the mixture density function p(ynew|ϑ) as follows:
0 ≤ ωm ≤ 1,
(25)
For the proposed OEGPR method, the overall mean (ŷnew) of the distribution generated by p(ynew|ϑ) is obtained with H(ynew) = ynew:
where km,new = [C(xnew, xm,1),···,C(xnew, xm,L)] with xm,i ∈ Xm, i = 1, 2, ···, L, and m = 1, 2, ···, M. p(xnew|LDm) is the likelihood value of xnew with respect to the mth local domain. ŷm,new and σ2m,new are the prediction mean and variance, respectively, which are calculated from the mth local GPR model. 3.3. Combination of local models. During the training phase, M local GPR models F = {GPRm}mM= 1 have been developed from the isolated local domains. From the set of predictors F, each of its members is making predictions of the output variable given the input samples. To obtain the final predictions during the online phase, these localized predictions have to be combined. In what follows, the combination problem will be addressed under the finite mixture framework.81 For a new query data xnew, the predictive distribution of the mth localized output ym,new of the target variable is estimated from the mth local GPR model and ym,new follows a Gaussian distribution as
1
∑ ωm(H(ynew )|Θm) m=1
(20)
∑ ωmp(ym,new |Θm)
(24)
M
E(H(ynew )|ϑ) =
T
p(ynew |ϑ) =
H(ynew )p(ym ,new |Θm)dynew
Then E(H(ynew)|ϑ) is given by
⎧ ŷ = (ym ,new ) = k mT ,newC−m1ym ⎪ m ,new ⎪ GPR m: ⎨ σ 2 = Var(ym ,new ) ⎪ m ,new ⎪ = C(x new , x new) − k mT ,newC−m1k m ,new ⎩
ym ,new ∼ 5((ym ,new ), Var(ym ,new ))
+∞
∫−∞
2 σnew =
(23)
∑ ωm(σm2,new + (ym̂ ,new m=1
H
− ynew ̂ )2 )
(31)
DOI: 10.1021/acs.iecr.5b01495 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX
Article
Industrial & Engineering Chemistry Research Further, by applying the Bayes’ theorem,76 the mixture weights ωm is determined as the posterior probability of xnew with respect to the mth local domain, that is
ωm = p(LDm|x new)
PDD models at a relatively low level. In this work, the adaptation of the target local PDD model is performed through a moving window (MW) method.83 For a new data xnew, suppose that the mth PDD model with the maximum posterior probability p(LDm|xnew) is selected and then the adaptation of PDD model is done by updating the mean vector μm and covariance matrix Σm. By applying MW updating, the latest sample is added to the local PDD model while the oldest sample is discarded to keep a fixed size of local domain samples, i.e. the window size L. Suppose that the input data matrix of the mth local domain at time instant t is given as
(32)
Note that the posterior probability p(LDm|xnew) is estimated in the same way as presented in Section 3.1. To sum up, for a new test point xnew, the local prediction mean and variance values from M local GPR models, together with the posterior probabilities of xnew with respect to different local domains, are given as {ym̂ ,new , σm2 ,new , p(LDm|x new)},
m = 1, 2, ···, M
X (mt) = {x (mt),1, x (mt),2 , ···, x (mt ), L} with μm(t ) and Σ(mt)
(33)
Then the final mean and variance of the predictive distribution of the target variable are estimated by combining all local predictions: M ⎧ ⎪ y ̂ = ∑ p(LDm|x new)y ̂ m ,new ⎪ new m=1 ⎪ M ⎨ 2 ⎪ σnew = ∑ p(LDm|x new) ⎪ m=1 ⎪ σm2 ,new + (ym̂ ,new − ynew { ̂ )2 } ⎩
(35)
After performing MW operation, the new local data matrix at time instant t+1 is given as X (mt+ 1) = {x (mt+,11), x (mt+,21), ⋯ , x (mt +, L1)} = {x (mt),2 , x (mt),3, ⋯ , x (mt ), L , x (mt +, L1)} with μm(t + 1) and Σ(mt+ 1) (36)
where x(tm,L+ 1) denotes the newest sample. It and X(tm + 1) share a transient data matrix:
can be seen that X(t) m
X̃ m = {x (mt),2 , x (mt),3, ···, x (mt ), L}
(34)
(37)
Next, a two-step procedure is used to recursively update the mean vector and covariance matrix. First, with the removal of (t) the oldest sample xm,1 from Xm(t), the mean vector and covariance matrix of X̃ m can be written as follows:
3.4. Adaptation of the OEGPR soft sensor. To reduce the model degradation as a result of the time-varying behavior of process, soft sensor model should be equipped with an adaptation mechanism. The adaptation operation allows soft sensor model to accommodate the changing process states. In the proposed OEGPR method, model adaptation is performed at two levels: (i) adaptation of the local PDD and GPR models; and (ii) adaptation of the mixture weights. In what follows, the adaptation mechanism of local PDD and GPR models will be discussed in detail. As for the adaptation of the mixture weights, two scenarios need to be clarified. In the case without updating the local models using new measurements, the posterior probabilities of the query data with respect to different local domains are used as the mixture weights of local models. In contrast, when new data are available for local models adaptation, the adaptive weights are calculated in the same way as the first scenario except that the posterior probabilities are computed based on the already updated PDD models and Bayesian inference. 3.4.1. Adaptation of local PDD model. At the offline development stage, the training data are divided into a set of local domains. Then a set of PDD models are built from the isolated local domain data sets. Nevertheless, due to the timevarying behavior of process, the local domain data distribution may change and thus the PDD models need to be updated. When new data are available, it is a common practice to update all local domains in various degrees of weights according to their relevance to the new data.49,82 However, a potential limitation of this method is that the computational load for online updating grows with the number of local domains. This problem is especially acute when a large number of local domains are involved. Thus, it may be preferable to update the most relevant local domain rather than all domains to guarantee high efficiency. The strategy is to update the local PDD model with the maximum posterior probability for the new data. This strategy can maintain the diversity of local domains while constraining the computational load for online updating of
μm̃ =
L 1 μm(t ) − x (mt),1 L−1 L−1
(38)
L − 1 (t ) 1 Σ̃ m = [Σm − (x (mt),1 − μm(t ) )T L−2 L−1 (x (mt),1 − μm(t ) ) − (Δμm(̃ t ) )T Δμm(̃ t ) ]
(39)
where = − μ̃m. With μ̃m and Σ̃m, the recursive relations of mean vector and covariance matrix associated with new data x(tm,L+ 1) are derived as follows: Δμ̃(t) m
μ(t) m
μm(t + 1) =
L−1 1 μm̃ + x (mt +, L1) L L
Σ(mt+ 1) =
1 L−2 ̃ (x (mt +, L1) − μm(t + 1) )T Σm + L−1 L−1
(40)
(x (mt +, L1) − μm(t + 1) ) + (Δμm(̃ t + 1) )T Δμm(̃ t + 1)
Δμ̃(tm + 1)
(41)
μ(tm + 1)
where = − μ̃m. By substituting eq 38 into eq 40, and eq 39 into eq 41, the recursive calculations of the mean vector and covariance matrix are given by μm(t + 1) = μm(t ) +
1 (t + 1) (x m , L − x (mt),1) L
(42)
1 (x (mt),1 − μm(t ) )T (x (mt),1 − μm(t ) ) L−1 1 (x (mt +, L1) − μm(t + 1) )T − (Δμm(̃ t ) )T Δμm(̃ t ) + L−1
Σ(mt+ 1) = Σ(mt) −
(x (mt +, L1) − μm(t + 1) ) + (Δμm(̃ t + 1) )T Δμm(̃ t + 1) I
(43)
DOI: 10.1021/acs.iecr.5b01495 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX
Article
Industrial & Engineering Chemistry Research Further, Σ(tm + 1) can be expressed in terms of the known input (t + 1) (t) (t + 1) vectors and mean matrices (i.e., x(t) ) as m,1, xm,L , μm , and μm follows: Σ(mt+ 1) = Σ(mt) −
It is not desirable to calculate the covariance matrix using the new window data from scratch. Instead, the new covariance matrix can be obtained recursively based on the old covariance matrix and new data. Fortunately, removal of the oldest sample corresponds to removing the first column and first row of the covariance matrix due to the marginalization property of the Gaussian process.44 In contrast, inclusion of the newest sample results in a new covariance matrix by adding a row and a column. By manipulating the row and column, a new covariance matrix C(tm + 1) can be easily obtained from C(t) m . However, as shown in the prediction formula of the GPR model, the model update requires the inverse of C(tm + 1), which is computationally and memory demanding (requiring 6 (n3) operations for a n × n covariance matrix). To address this issue, the inverse can be efficiently calculated (only requiring 6 (n2) operations) in a −1 recursive way based on the previous inverse (C(t) provided m) during the offline development stage. Two steps are involved in updating the inverse of the covariance matrix.84 First, with the deletion of the oldest sample, the first row and column are removed from C(t) m, resulting in a new covariance matrix C̃ m as follows:
L (x (mt),1 − μm(t ) )T (x (mt),1 − μm(t ) ) (L − 1)2
1 (x (mt +, L1) − μm(t + 1) )T (x (mt +, L1) − μm(t + 1) ) L−1 1 [x (mt),1 − Lμm(t ) + (L − 1)x (mt +, L1)]T + [L(L − 1)]2 +
[x (mt),1 − Lμm(t ) + (L − 1)x (mt +, L1)] (44)
By applying the new mean vector and covariance matrix, the mth local PDD model is updated as follows: p(x new|LDm) =
1 d
(2π )
|Σ(mt+ 1)|
1 exp − (x new − μm(t + 1) )T 2
{
[Σ(mt+ 1)]−1 (x new − μm(t + 1) )}
⎡ C(x (t ) , x (t ) ) ⋯ C(x (t ) , x (t ) )⎤ m ,2 m ,2 m ,2 m,L ⎥ ⎢ ⎢ ⎥ ⋮ ⋱ ⋮ C̃ m = ⎢ ⎥ ⎢C(x (t ) , x (t ) ) ⋯ C(x (t ) , x (t ) )⎥ ⎣ m,L m ,2 m,L m,L ⎦
(45)
3.4.2. Adaptation of local GPR model. As a result of the changes in process characteristics, the underlying relationship between the input and output data also exhibits time-varying behavior. For this reason, the local GPR models should be updated. In the proposed OEGPR algorithm, the moving window approach49,84,85 is employed for updating local GPR models. Similar to the adaptation of local PDD models, only the local GPR model most relevant to the new data performs online adaptation. When a new data is acquired, the window slides forward such that the oldest sample is removed from and the newest one is added to the window. In this way, not only is the new process state tracked, but also computational and memory requirements are controlled by the fixed window size. Suppose that the mth local GPR model with the maximum posterior probability for the new data x(tm,L+ 1) is selected for the adaptation purpose. At time instant t during online operation, the samples in a window of size L can be given as Z(mt) = {X (mt), y(mt)} = {(x (mt),1, y(mt),1), ···, (x (mt ), L , y(mt ), L)}
Meanwhile, the covariance matrix C(t) m is written as ⎡u vT⎤ ⎥ C(mt) = ⎢ ⎢⎣ v C̃ m ⎥⎦
C(x(t) m,1,
C(mt)
[C(x(t) m,2,
⎡ e fT ⎤ [C(mt)]−1 = ⎢ ⎥ ⎣f G ⎦
x(t) m,1),
···, C(x(tm,L+ 1), −1 t, [C(t) m ] , is
(51) −1 of [C(t) and f is a column vector m] (t) −1 ̃ Cm and [C(t) m ] , the inverse of Cm
where e is the first element with (L-1) elements. With can be easily expressed in terms of the known elements of −1 [C(t) as follows: m]
(46)
C(mt)[C(mt)]−1 = IL ⎧ ve + C̃ mf = 0 ⎪ ⇒⎨ T ⎪ ⎩ vf + C̃ mG = IL − 1
(47)
With the new input-output data (x(tm,L+ 1), y(tm,L+ 1)) acquired at time instant t+1, the model data corresponding to the mth local domain is updated in a moving window fashion. After removing the oldest sample and adding the newest one, the window data are updated as
ffT ⇒ C̃ −m1 = G − e
(52)
where IL and IL−1 are L- and (L-1)-dimensional identity matrices, respectively. In the second step, a new sample is added to the model by adding a row and a column, resulting in a new covariance matrix as follows:
Z(mt+ 1) = {X (mt+ 1), y(mt+ 1)} = {(x (mt),2 , y(mt),2), ··· ,(x (mt ), L , y(mt ), L), (x (mt +, L1), y(mt +, L1))}
(50)
x(t) m,1)
where u = and v = T x(t) m,1)] . Then the inverse matrix at time instant rewritten as follows:
with an regularized covariance matrix expressed as ⎡ C(x (t ) , x (t ) ) ⋯ C(x (t ) , x (t ) ) ⎤ m ,1 m ,1 m ,1 m , L ⎥ ⎢ ⎥ ⎢ ⋮ ⋱ ⋮ = ⎥ ⎢ ⎢C(x (t ) , x (t ) ) ⋯ C(x (t ) , x (t ) )⎥ ⎣ m,L m ,1 m,L m,L ⎦
(49)
(48)
⎡ C̃ b ⎤ m ⎥ C(mt+ 1) = ⎢ ⎢⎣ bT a ⎥⎦
Then the adaptation of the mth local GPR model can be simply achieved by updating the corresponding covariance matrix. J
(53) DOI: 10.1021/acs.iecr.5b01495 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX
Article
Industrial & Engineering Chemistry Research
Figure 2. Schematic diagram of the proposed OEGPR soft sensing approach. (t + 1) T where b = [C(x(tm,L+ 1), x(t) , x(t) m,2), ···, C(xm,L m,L)] , and a = C(x(tm,L+ 1), x(tm,L+ 1)). Then the inverse of the matrix C(tm + 1) is required for prediction. To this end, the target inverse matrix [C(tm + 1)]−1 is first expressed as
⎡ E h⎤ [C(mt+ 1)]−1 = ⎢ T ⎥ ⎣h r ⎦
Finally, the formula for predicting the mean and variance of the predictive distribution is updated as follows: ⎧ ŷ T (t + 1) −1 (t + 1) ⎪ m ,new = k m ,new[Cm ] ym ⎪ ⎨ 2 T (t + 1) −1 ⎪ σm ,new = C(x new , x new) − k m ,new[Cm ] ⎪ k m ,new ⎩
(54)
C̃ −1 m ,
C(tm + 1)
With and the inverse matrix at time instant t, calculated in terms of the known elements of C̃ −1 m as follows: C(mt+ 1)[C(mt+ 1)]−1
It should be noted that, before performing online prediction, y(tm + 1) are required to be rescaled to satisfy the assumption of zero mean process prior since the mean value of the output variable may change due to the inclusion and deletion of samples.63 Correspondingly, the prediction mean also requires the reverse scaling to obtain the final estimation. 3.5. Parameter selection of the OEGPR soft sensor. In the proposed OEGPR algorithm, there are two critical parameters that need to be selected carefully during the offline development stage. They are the local modeling size L and the threshold factor γ, which have a critical influence on the prediction performance as well as the model complexity of the OEGPR soft sensor.
= IL
⎧ C̃ E + bhT = I m L−1 ⎪ ⎪ ̃ ⇒ ⎨ Cmh + br = 0 ⎪ ⎪ T ⎩ b h + ar = 1 ⇒
[C(mt+ 1)]−1
with g = (a −
⎡ C̃ −1(I + bbT(C̃ −1)T g ) −C̃ −1bg ⎤ m m m ⎥ =⎢ ⎢−(C̃ −1b)T g ⎥ g ⎣ ⎦ m
(56)
(55)
−1 bTC̃ −1 m b) .
K
DOI: 10.1021/acs.iecr.5b01495 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX
Article
Industrial & Engineering Chemistry Research Although k-fold (e.g., 5-fold and 10-fold) cross-validation methods have been widely used to optimize model parameters, they may encounter the problem that the size of training data used for cross-validation is significantly different from that used for all training data. Such change in modeling size tends to result in suboptimal results though the model parameters have achieved the optimal results during the cross-validation procedure. To alleviate the above-mentioned problem, a batch-based leave-one-out cross-validation (LOOCV) method is used to select the OEGPR parameters when applied to batch processes. The strategy is to first remove one batch from the training set and then predict the removed batch based on the learning of the remaining batches. Each batch in the training set is predicted in the same way. The exhaustive grid-search strategy is used and the parameters leading to the minimum crossvalidation error are applied to the OEGPR soft sensor. To reduce the offline computational load, a relatively small search range is preferable. Thus, a suggestion regarding the selection of the OEGPR parameters is that the search range of {L, γ} can be initially determined by evaluating the localization indices (i.e., M, SSR, and SCR) during the stage of local domains construction. In additional to the presented batchbased LOOCV method, the cross-validation method can also be combined with computational intelligence techniques86 such as the genetic algorithm (GA) and particle swarm optimization (PSO) methods. 3.6. Implementation procedure of the OEGPR soft sensor. The schematic diagram of the proposed OEGPR soft sensing approach is illustrated in Figure 2, and the overall procedure using the OEGPR method for online prediction is listed below. (i) Collect the input and output data of batch process for soft sensor model training. (ii) Unfold the three-dimensional data matrix via a variablewise way. (iii) Construct a set of local domains using the JIT localization method and a probabilistic analysis is conducted to detect and remove the redundant local domains. (iv) Build localized GPR and PDD models for all the isolated local domains. (v) For a new test sample, calculate the localized mean and variance of the predictive distribution of the target variable using local GPR models. (vi) The posterior probabilities of the query data with respect to different local domains are estimated using Bayesian inference and set as the mixture weights of local models. (vii) The overall prediction mean and variance of the target variable is obtained by combining all local predictions based on the finite mixture mechanism. (viii) When a new data is available, update the local GPR and PDD models with the maximum posterior probability for the query data using the moving window technique. (ix) Go to step (v) for the prediction of the next query data.
(i) PLS (partial least-squares regression): global PLS model.18 (ii) GPR (Gaussian process regression): global GPR model.44 (iii) RPLS (recursive PLS): PLS model updated recursively through incremental integration of new data using a forgetting factor.65 (iv) JITPLS (just-in-time PLS): PLS model constructed from data whose Euclidean distances to the query data are smaller than those of other data. (v) MWGPR (moving window GPR): GPR model constructed from most recently measured data. (vi) LWPLS (locally weighted PLS): PLS model constructed from weighting data with a similarity matrix.54 The similarity measure defined by Kim et al.22 is used. (vii) OEGPR (online ensemble GPR): the proposed method. The optimal parameters for the above modeling methods are selected by the batch-based LOOCV method presented in Section 3.5 or by trial and error. These parameters include: (i) the number of latent variable (LV) for PLS, JITPLS and LWPLS, (ii) the local modeling size (L) for JITPLS and OEGPR; (iii) the localization parameter (φ) for LWPLS; (iii) the threshold factor (γ) for OEGPR. The forgetting factor (λ) for RPLS and window size (W) for MWGPR are selected by trial and error to maximize the prediction performance. The Mateŕn covariance function defined in Section 2 is used for both GPR and OEGPR methods. A total of 7 soft sensors are developed and their characteristics are compared in Table 1. Table 1. Characteristics of Different Soft Sensor Modeling Methods Adaptation Model structure
Method PLS GPR RPLS JITPLS LWPLS MWGPR OEGPR
Learning
Single model Single model Single model Single model Single model Single model Multiple models
Global Global Global Local Local Local Local
Itema
(i) (i), (i), (i), (i),
(iii) (iii) (iii) (ii)
Mechanism
Recursive adaptation Just-in-time learning Just-in-time learning Moving window Adaptive weighting Moving window updating
a
(i) Model’s parameters; (ii) combination weights of local models; and (iii) database of historical samples.
The computer configurations for experiments are as follows: OS: Windows XP (32 bit); CPU: Pentium(R) Dual-Core E6600 (3.06 GHz × 2); RAM: 1.96G byte; and MATLAB 2010a is used. To assess the prediction performance of soft sensors, two performance indices, namely root-mean-square error (RMSE) and coefficient of determination (R2), are used: RMSE =
4. CASE STUDIES The effectiveness of the proposed OEGPR soft sensing method is verified through two batch processes, which are a simulated fed-batch penicillin cultivation process and an industrial fedbatch chlortetracycline (CTC) fermentation process. The methods for comparison are as follows:
R2 = 1 −
1 ntest
n test
∑ (yi ̂ − yi )2 i=1
n ∑i =test1 (yi n ∑i =test1 (yi
(57)
− yi )2 − y ̅ )2
(58)
where yi and ŷi are the observed and predicted values of the output variable, respectively; y ̅ represents the mean value, and L
DOI: 10.1021/acs.iecr.5b01495 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX
Article
Industrial & Engineering Chemistry Research ntest denotes the number of testing samples. The prediction accuracy of soft sensor model can be indicated by the RMSE criterion while R2 can provide information about how much of the total variance in the output data can be explained by the model. 4.1. Simulated fed-batch penicillin fermentation process. 4.1.1. Process description. The penicillin fermentation process has been a well-known benchmark process used for exploring modeling, monitoring, and control of batch processes.31,41,87−91 The flow diagram of the penicillin cultivation process is illustrated in Figure 3. Since formation
Table 2. Input Variables of Soft Sensors in Fed-Batch Penicillin Fermentation Processes No.
Variable description (Unit)
1 2 3 4 5 6 7 8 9 10 11 12
Culture time (h) Aeration rate (L/h) Agitator power (W) Substrate feed rate (L/h) Substrate feed temperature (K) Dissolved oxygen concentration (g/L) Culture volume (L) Carbon dioxide concentration (g/L) pH (−) Fermenter temperature (K) Generated heat (kcal) Cooling water flow rate (L/h)
further evaluate the model performance of handling betweenbatch changes in process characteristics. 4.1.2. Online prediction results of penicillin concentration. During the offline development stage, the design of the OEGPR soft sensor consists of two steps: (i) construction of local domains and (ii) building of local GPR and PDD models. In the first step, where a set of local domains is identified, two critical model parameters, namely the local modeling size (L) and threshold factor (γ), control the construction process of local domains. Three indices, namely the number of local domains (M), sample-to-sample ratio (SSR), and sample coverage rate (SCR), are used to evaluate the localization results. The relationships between the model parameters (i.e., L and γ) and the evaluation indices (i.e., M, SSR, and SCR) are illustrated in Figure 5. It can be seen that M grows with γ and decreases with L. SSR increases with γ while varying with L irregularly and insensitively. In comparison, SCR shows an increasing trend with γ while exhibiting irregular changes with L. By analyzing the partition results, the search ranges for different model parameters can be determined more reasonably and thus the computational load for offline training can be effectively reduced. For example, L should not be too small while γ should not be too large in order to control the number of local domains. After setting the initial search ranges, the optimal OEGPR model parameters need to be further determined using the batch-based LOOCV method. In addition, the parameters for PLS, RPLS, JITPLS, LWPLS and MWGPR methods are selected using the LOOCV method or by trial and error. The search ranges for different parameters are determined as follows: (i) The number of latent variables for PLS, JITPLS, and LWPLS methods, LV ∈ {1, 2, ···, 12}; (ii) The forgetting factor for RPLS, λ ∈ {0.5, 0.55, ···, 1}; (iii) The local modeling size, L ∈ {50, 150, ···, 300} for JITPLS and L ∈ {100, 150, ···, 500} for OEGPR; (iv) The localization parameter for LWPLS, φ ∈ {0.05, 0.1, 0.5, 1, 5}; (v) The window size for MWGPR, W ∈ {100, 150, ···, 500}; (vi) The threshold factor for OEGPR, γ ∈ {10−7, 5 × 10−7, 10−6, 5 × 10−6, ···, 1}. The optimal parameters for different soft sensors are as follows: (i) PLS: LVopt = 11;
Figure 3. Flow diagram of fed-batch penicillin fermentation process.
of secondary metabolites (in this case, penicillin) is not associated with cell growth, it is a common practice to grow the cells in a batch culture followed by a fed-batch operation to promote synthesis of antibiotic. During the cultivation process, two cascade controllers are used to maintain the pH and temperature by manipulating the acid/base and cold/hot water flow rates, respectively. Meanwhile, the sterile substrate and air are continuously fed into the bioreactor to supply the nutrients for cell growth and product formation and to maintain the necessary oxygen consumption for microorganisms. The entire penicillin fermentation process has the duration of 400 h while the initial batch operation lasts only about 40 h. A simulator named PenSim has been developed by the Process Modeling, Monitoring, and Control Research Group of Illinois Institute of Technology for simulating fed-batch penicillin fermentation process.87 PenSim is capable of simulating the penicillin production process under various operating conditions and can be downloaded from the Web site http://simulator.iit.edu/web/pensim/index.html. In this work, the penicillin concentration, which is often analyzed offline in practical situations due to the lack of reliable online soft sensors, is chosen as the target variable, and the highly related input variables listed in Table 2 are selected for soft sensor development. All process data for experiments are collected by running the PenSim platform. The sampling interval is set as 1 h and the typical trend plots of process variables are depicted in Figure 4. Under the default operation settings listed in Table S1 (see Supporting Information), a total of 10 training batches are collected for soft sensor model learning while the additional 10 test batches are obtained to assess the prediction accuracy. Another 6 test sets, each of which contains 10 test batches with changes in initial conditions or set points, are collected to M
DOI: 10.1021/acs.iecr.5b01495 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX
Article
Industrial & Engineering Chemistry Research
Figure 4. Trend plots of process variables in fed-batch penicillin fermentation processes.
(ii) RPLS: LVopt = 11, λopt = 0.9; (iii) JITPLS: LVopt = 11, Lopt = 150; (iv) LWPLS: LVopt = 11, φopt = 0.1; (v) MWGPR: W = 400; (vi) OEGPR: Lopt = 300, γopt = 10−6 For the OEGPR soft sensor, the above model parameters result in construction of 17 local domains and consequently 17 local GPR models and 17 PDD models. The states of local domains construction throughout the training samples from 10 batches are depicted in Figure 6. This figure indicates that the number of local domains constructed from each batch data, Mi, decreases when the training sample number gets large. In fact, with few exceptions, most of the local domains are built only from the former two batches. Such behavior stems from the fact that batch processes exhibit recurring process dynamics, thus most of the process characteristics have been captured when scanning the former samples (e.g., samples 1−800 in this case) using the proposed JIT localization method. When arriving at the latter samples, most of the built local domains are removed due to high redundancy. Though the localization of process is completed offline in this work, the JIT localization method allows building new local domains online using the same way as that used in the offline stage when new process states are detected. Two test cases are designed to examine the accuracy and reliability of soft sensors. In the first case, 10 test batches are collected from PenSim simulator under default settings of initial conditions and set points. During the second test case, abrupt changes are introduced to the initial conditions or set points, and consequently result in another 6 test sets, each of which contains 10 test batches from the same operation settings.
Based on the optimal model parameters, different soft sensors are built and their prediction accuracies are assessed through the first test case. The quantitative comparison of prediction results from different soft sensors are listed in Table 3, where the two extreme scenarios without (0%) and with (100%) available target data for model adaptation are presented. Meanwhile, Figure 7 shows the prediction RMSE values from different adaptive soft sensors for the different percentages of target values availability. As shown in Table 3 and Figure 7, for the case without online adaptation, PLS leads to fairly poor predictions due to the inherent linear model structure, whereas the penicillin cultivation process is of strong nonlinearity. In comparison, GPR performs much better than PLS due to better capability of handing nonlinearity. Among the adaptive soft sensors, MWGPR does not function well when model adaptation is not involved because only limited historical information is included in the moving window and remains unchanged. In contrast, JITPLS, LWPLS, and OEGPR achieve high prediction performance in the case without model adaptation because they are based on local learning and thus can well deal with nonlinearity. Though GPR model performs well, it is unattractive for soft sensor development for two main reasons. First, the computational load for global optimization for GPR training may be unacceptable when a large number of training samples are used. Second, global GPR model is unfavorable for online updating when new process data are available. Thus, local learning methods are preferable because they are capable of efficiently capturing process nonlinearity as well as providing great convenience for online adaptation. N
DOI: 10.1021/acs.iecr.5b01495 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX
Article
Industrial & Engineering Chemistry Research
Table 3. Prediction RMSE and R2 Values of Different Soft Sensors in Fed-Batch Penicillin Fermentation Processes (L = 300, γ = 10−6, M = 17) 0%a of target data
100%a of target data
Method
RMSE
R2
RMSE
R2
PLS GPR RPLS JITPLS LWPLS MWGPR OEGPR
0.0430 0.0213 0.0430 0.0265 0.0263 0.0318 0.0220
0.9914 0.9979 0.9914 0.9967 0.9968 0.9953 0.9978
0.0430 0.0213 0.0119 0.0151 0.0078 0.0083 0.0066
0.9914 0.9979 0.9993 0.9989 0.9997 0.9997 0.9998
a
Percentage of target data available for model adaptation of adaptive soft sensors.
Figure 5. Effects of local modeling size (L) and threshold factor (γ) on: (a) the number of local domains (M), (b) sample-to-sample ratio (SSR), and (c) sample coverage rate (SCR). Figure 7. Prediction RMSE values under different percentages of target data available for model adaptation using different adaptive soft sensors in fed-batch penicillin fermentation processes (L = 300, γ = 10−6, M = 17).
In the case with available data for model adaptation, adaptive soft sensors are updated in order to address concept drift issues. As shown in Figure 7, all adaptive soft sensors perform much better in the case with model adaptation than the nonadaptive case. This is because new process states can be captured by updating soft sensor models using new measurements. Among the five adaptive soft sensors, RPLS performs badly when only a
small number of new target data are available, which is due to the limitation of global model structure in dealing with nonlinearity. For the local learning based adaptive soft sensors,
Figure 6. States of local domains construction in fed-batch penicillin fermentation processes (L = 300, γ = 10−6, M = 17). The state of 1 indicates an appeal to construct a new local domain while the state of 0 indicates a rejection. Mi (i = 1, 2, ···, 10) denotes the number of local domains constructed from the ith batch data. O
DOI: 10.1021/acs.iecr.5b01495 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX
Article
Industrial & Engineering Chemistry Research the performance improvement of JITPLS is relatively slow when new data are continuously added to the database. In contrast, the LWPLS, MWGPR and OEGPR methods outperform RPLS and JITPLS significantly. Overall, OEGPR results in the highest accuracy throughout different percentages of target values availability for model adaptation. As well as providing accurate predictions, OEGPR also provides the prediction variance of the target variable. This outstanding feature allows the operators to evaluate the reliability of online predictions. The relationship between the relative standard deviation (RSTD) and absolute value of relative error (ARE) is shown in Figure 8. Note that the relative
Figure 8. Relationship between the relative standard deviation and absolute value of relative error using an OEGPR soft sensor in a fedbatch penicillin fermentation process. RSTD = |σi/yi| and ARE = |(yi − ŷi)/yi|.
Figure 9. Comparison of average CPU times of different adaptive soft sensors under various percentages of available target data in fed-batch penicillin fermentation processes.
indices are adopted in order to eliminate the influence of magnitude differences of the target variable. As can be seen from the figure, the ARE values increase with the RSTD values, and their distributions become wider. Thus, the prediction variance of the OEGPR model can tell the operators whether a prediction value is reliable or not, which is particularly useful to indicate the occurrence of significant prediction errors. Apart from the prediction accuracy, the realtime performance of soft sensors for online estimation is also considerably important for monitoring and control of processes. Figure 9 illustrates the average CPU time (s) of different adaptive soft sensors under various percentages of target data availability. It can be clearly seen that RPLS and JITPLS are much more efficient than other methods. RPLS can be conducted very fast since the model adaptation is performed in a recursive way. JITPLS requires little CPU time because it is essentially linear model and only a small number of relevant samples are selected for online modeling. In comparison, the increase of the computational load of LWPLS method is due to the increase of local modeling size. MWGPR is the most time-consuming method compared to other ones due to frequent online training of GPR models. As for the proposed OEGPR approach, computational load is mainly focused on the offline model development stage, especially the identification of local domains and construction of local models. During the online phase, OEGPR can be conducted fast and the average CPU time for predicting one testing sample is less than 0.3 s. For the fed-batch penicillin fermentation process, such short prediction
time is negligible compared to the time-consuming offline analysis. The detailed prediction results of one test batch using the OEGPR soft sensor are shown in Figure 10. These include (i) trend plots of the actual and predicted penicillin concentration, (ii) standard deviations, (iii) prediction errors, and (iv) CPU time for online prediction. Figure 10a shows that the online predictions match well the actual measurements, which demonstrates the effectiveness of the OEGPR method to adaptively capture the nonlinear relationships between the input and output variables throughout the process. Furthermore, the prediction errors and standard deviations are analyzed to quantitatively evaluate the estimation accuracy. Comparing Figure 10b with Figure 10c reveals that the prediction reliability can be indicated by the prediction standard deviations when lacking the actual measurements. One can readily see that large prediction errors usually correspond to large prediction uncertainty, i.e. standard deviation. Apart from the estimation accuracy, the realtime performance of the OEGPR soft sensor is also illustrated in Figure 10d. It can be seen that OEGPR requires little computational cost with an average CPU time of 0.2834 s for online prediction. Such online prediction efficiency is completely acceptable compared to the offline analysis costing several hours. Therefore, the proposed OEGPR soft sensing approach can be potentially applied to the small-scale as well as the large-scale batch processes. P
DOI: 10.1021/acs.iecr.5b01495 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX
Article
Industrial & Engineering Chemistry Research
Figure 10. Prediction results of one test batch in a fed-batch penicillin fermentation process (100% of target data, L = 300, γ = 10−6, M = 17). (a) Trend plots of the actual and predicted values; (b) prediction standard deviations; (c) prediction errors; and (d) CPU time for online prediction.
Figure 11. Posterior probabilities of the query data with respect to different local models of the OEGPR soft sensor from one test batch in a fedbatch penicillin fermentation process (L = 300, γ = 10−6, M = 17).
total of 17 local predictions are integrated based on finite mixture mechanism to give the final prediction mean and variance for every query data. One can see that different parts of the online data are dominated by different local models. This behavior suggests that the process characteristics are always
In what follows, the posterior probabilities of the query data with respect to different local models from one test batch are depicted in Figure 11. Unlike global models, OEGPR makes predictions by combining a set of predictions from local models. In the case of penicillin concentration prediction, a Q
DOI: 10.1021/acs.iecr.5b01495 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX
Article
Industrial & Engineering Chemistry Research
Figure 12. Prediction RMSE values of the OEGPR soft sensor under different settings of local modeling size, threshold factor, and percentage of available target data in a fed-batch penicillin fermentation process.
changing and can be captured by different local models in various degrees. The contributions of the local models are indicated by the posterior probabilities. Though all local models are involved in ensemble prediction, only a few ones with large posterior probabilities make the main contributions for each query data while the remaining models make a little influence on final prediction. For example, the local model 9 makes significant contribution from about 140 to 270 h while making a very little effect on the final prediction during the rest of
process. The tremendous differences of the change trends of posterior probabilities from different local models also imply the fact that the redundancy between local domains is small. It can be further concluded that the diversity of local domains can be guaranteed by using the proposed JIT localization method due to the incorporation of a probabilistic analysis method to detect and delete the redundant local domains. To further investigate the estimation performance and adaptive capability of the OEGPR soft sensor, Figure 12 R
DOI: 10.1021/acs.iecr.5b01495 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX
Article
Industrial & Engineering Chemistry Research
Figure 13. Trend plots of prediction RMSE values of 6 test sets under different percentages of target data available for model adaptation using different adaptive soft sensors in a fed-batch penicillin fermentation process (L = 300, γ = 10−6, M = 17).
the bottom. In comparison, for the case with 100% of target data for model adaptation, the position of the RMSE plot with L = 100 and that with L = 500 are reversed, as shown in Figure 12f. This shows that the OEGPR soft sensor with a relatively small L is prone to result in a small RMSE value when the model are fully updated using new measurements. In other words, the OEGPR soft sensor with a smaller L can adapt to new process states more quickly. In general, large L is preferable from the viewpoint of reliability of the model but
presents the prediction RMSE values for different settings of local modeling size (L), threshold factor (γ) and percentage of available target data. The two critical model parameters (i.e., L and γ) not only control the model complexity but also affect the model prediction accuracy as well as the adaptive capability. For the scenario without online updating (i.e., 0% of target data), a relatively large L tends to result in a small RMSE value, as shown in Figure 12a. For example, in most cases, the RMSE plot with L = 100 is at the top while the plot with L = 500 is at S
DOI: 10.1021/acs.iecr.5b01495 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX
Article
Industrial & Engineering Chemistry Research
Figure 14. Trend plots of the actual and predicted penicillin concentration from two test batches with changes in initial conditions or set points using the OEGPR soft sensor in a fed-batch penicillin fermentation process (L = 300, γ = 10−6, M = 17). (a) 10% increase in substrate concentration (0% of target data); (b) 10% increase in substrate concentration (100% of target data); (c) 10% increase in substrate feed rate (0% of target data); and (d) 10% increase in substrate feed rate (100% of target data).
from the two cases of target data availability (0% and 100%) is listed in Table S3 (see Supporting Information). From the figure and the table, it can be observed that all adaptive soft sensors lead to poor predictions when no new measurements for model adaptation due to the batch-to-batch variations caused by changes in initial conditions or set points. With the increase of new data available for model adaptation, the prediction performance of different methods is gradually improved. Similar to the prediction results in the first test case, LWPLS, MWGPR and OEGPR performs better than RPLS and JITPLS in terms of the RMSE trend plots. Overall, OEGPR is superior to other methods in the test sets 3−6 except for the test set 1 where MWGPR performs best and the test 2 where LWPLS performs best. It is also worth noting that, when lacking sufficient new data for model adaptation (e.g., 0% and 5% of target data), OEGPR also delivers poor predictions as other methods. This is because the current test sets are significantly different from the historical training data. Thus, the inclusion of as much as historical information may deteriorate the model performance instead of improvement. Finally, the prediction results of two test batches in the second test case using the OEGPR soft sensor are shown in Figure 14. It can be readily observed from Figure 14a and Figure 14c that OEGPR leads to poor predictions when soft sensor model is not updated using new data. In this case, there exist significant deviations between the actual and predicted values throughout the cultivation process. The prediction standard deviations also indicate the performance degradation. In comparison, the prediction accuracy of the OEGPR soft sensor is significantly improved after performing model
it is not good from the viewpoint of adaptation of the model. It can be observed that, with the increase of the available target data for model adaptation from Figure 12a to Figure 12f, the RMSE plots of the OEGPR soft sensor for small L values gradually move downward while the RMSE plots from large L values move upward. Therefore, in order to select an appropriate L, there is a trade-off between the model adaptive capability using new data and the estimation accuracy without sufficient online adaptation. In addition to the local modeling size L, the threshold factor γ also has an obvious effect on the estimation performance of the OEGPR soft sensor. The RMSE plots in Figure 12a−12c show that the model performance is not greatly influenced by γ when only a small number of available target data for model adaptation. In contrast, as shown in Figure 12d−12f, a small γ is preferable to obtain accurate predictions when a large number of target data are available for model adaptation. In the first test case, the between-batch changes in process characteristics are mainly caused by the addition of the pseudorandom binary signal (PRBS) to the glucose feed rate to mimic the real process environment. In contrast, in the second test case, large variations are introduced to the initial conditions or set points to motivate the significant betweenbatch changes in process characteristics. The second test case is designed to further evaluate the prediction performance and adaptive capability of the five adaptive soft sensors: RPLS, JITPLS, LWPLS, MWGPR, and OEGPR. Figure 13 presents the trend plots of the prediction RMSE values of 6 test sets under different percentages of available target values. Meanwhile, the quantitative comparison T
DOI: 10.1021/acs.iecr.5b01495 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX
Article
Industrial & Engineering Chemistry Research
Ethernet interconnecting programmable logical controllers, which actuate directly on process equipment, with humanmachine interface (HMI). Five control loops are involved in CTC production: temperature, pH, antifoam feeding, dissolved oxygen (DO) concentration, and substrate feeding. The temperature is controlled at 32 °C in batch phase and 29 °C in fed-batch phase by manipulating the flow rate of cold water. During the cultivation, ammonia−water is automatically fed in to control the pH value at about 5.95. Antifoam feeding is triggered when the foam level touches the foam detection electrode. DO is controlled by manipulating air flow rate using a cascade PID controller. A typical time course of process variables is presented in Figure 16.
adaptation, as shown in Figure 14b and Figure 14d. For example, compared with Figure 14c with an RMSE of 0.09401, Figure 14d shows a significant performance improvement with an RMSE of 0.005445. These estimation results show that OEGPR can efficiently cope with time-varying changes and maintain high prediction accuracy. The above experimental results from two test cases confirm that the proposed OEGPR adaptive soft sensor is more desirable than other conventional methods in dealing with time-varying changes and nonlinearity, as well as providing prediction uncertainty. 4.2. Industrial fed-batch CTC fermentation process. 4.2.1. Process description. The chlortetracycline (CTC) fermentation process under study is an industrial process carried out in a biochemical plant attached to Charoen Pokphand Group Co., Ltd. The cultivation of Streptomyces aureofaciens is conducted in an air-lift stirred fermenter with a volume of 136 m3. Figure 15 illustrates the flow diagram and
Figure 16. Trend plots of process variables in an industrial fed-batch CTC fermentation process.
During the fed-batch operations of industrial CTC production process, the substrate feed rate remains to be tuned manually by skilled operators according to their experience. This is mainly because no reliable online sensor is available for measuring substrate concentration. Therefore, a soft sensor that can provide realtime estimates of substrate concentration is required for enabling efficient feeding control.92 The input variables of soft sensors are listed in Table 4. After removing some batches with large deviations from the statistical mean plot (see Figure 17), a total of 32 training batches are collected to construct the database for soft sensor Table 4. Input Variables of Soft Sensors in an Industrial FedBatch CTC Fermentation Process Figure 15. Flow diagram and fermenter of an industrial fed-batch CTC fermentation process.
fermenter of industrial fed-batch CTC cultivation process. The fermenter serves as the major equipment in the process with the substrate and air continuously fed in to supply the raw materials for cell growth and maintain the necessary oxygen consumption for microorganisms. A distributed control system is applied to allow the automatic supervision and control of the main process variables. The supervisory system is built by using U
No.
Variable description (Unit)
1 2 3 4 5 6 7 8 9
Culture time (min) Temperature (°C) pH (−) DO (%) Air flow rate (m3/h) Total air consumption (m3) Substrate feed rate (L/h) Total substrate consumption (L) Total alkali consumption (L) DOI: 10.1021/acs.iecr.5b01495 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX
Article
Industrial & Engineering Chemistry Research
Table 5. Prediction RMSE and R2 Values of Different Soft Sensors in an Industrial Fed-Batch CTC Fermentation Process (L = 500, γ = 5 × 10−7, M = 2) 0%a of target data
100%a of target data
Method
RMSE
R2
RMSE
R2
PLS GPR RPLS JITPLS LWPLS MWGPR OEGPR
0.3623 0.3036 0.3623 0.3045 0.3127 0.3420 0.2990
0.9268 0.9486 0.9268 0.9483 0.9455 0.9348 0.9502
0.3623 0.3036 0.3200 0.2927 0.3021 0.2963 0.2852
0.9268 0.9486 0.9429 0.9523 0.9491 0.9511 0.9547
a
Percentage of target data available for model adaptation of adaptive soft sensors.
Figure 17. Statistical distribution of substrate concentration in an industrial fed-batch CTC fermentation process.
training while the additional 32 batches are obtained to assess the model accuracy and adaptive capability. Though the sampling interval of online variables is 5 min, the target variable, i.e. substrate concentration, is only offline analyzed every 4 h. Therefore, the online measurements are downsampled to correspond to the available output data. 4.2.2. Online prediction results of substrate concentration. Before soft sensors are used for online predictions, some model parameters need to be selected. For the substrate concentration prediction, the search ranges for different parameters are determined as follows: (i) The number of latent variables for PLS, JITPLS, and LWPLS methods, LV ∈ {1, 2, ···, 9}; (ii) The forgetting factor for RPLS, λ ∈ {0.5, 0.55, ···, 1}; (iii) The local modeling size, L ∈ {50, 150, ···, 300} for JITPLS and L ∈ {100, 150, ···, 500} for OEGPR; (iv) The localization parameter for LWPLS, φ ∈ {0.05, 0.1, 0.5, 1,5}; (v) The window size for MWGPR, W ∈ {100, 150, ···, 500}; (vi) The threshold factor for OEGPR, γ ∈ {10−7, 5 × 10−7, 10−6, 5 × 10−6, ···, 1} The optimal parameters for different soft sensors are selected as follows: (i) PLS: LVopt = 7; (ii) RPLS: LVopt = 7, λopt = 0.8; (iii) JITPLS: LVopt = 7, Lopt = 150; (iv) LWPLS: LVopt = 7, φopt = 1; (v) MWGPR: W = 400; (vi) OEGPR: Lopt = 500, γopt = 5 × 10−7. For the OEGPR soft sensor, the above model parameters result in construction of 2 local domains and consequently 2 local GPR models and 2 PDD models. The quantitative comparison of online predictions for substrate concentration using different soft sensors is listed in Table 5. Meanwhile, the prediction RMSE values under various percentages of available target data using different adaptive soft sensors are depicted in Figure 18. Without considering the model adaptation, it can be seen that global PLS leads to the worst predictions since it cannot address the process nonlinearity. Similarly, MWGPR also leads to poor predictions in the case of 0% of target data. In comparison, the global GPR,
Figure 18. Prediction RMSE values under various percentages of target data available for model adaptation using different adaptive soft sensors in an industrial fed-batch CTC fermentation process (L = 500, γ = 5 × 10−7, M = 2).
JITPLS, LWPLS and OEGPR methods obtain much lower RMSE values than PLS and MWGPR methods due to their strong capability of handling nonlinearity and full use of historical information. However, the prediction accuracy is still not satisfactory in the case without model adaptation. When soft sensor models are updated using new data, the prediction performance of the RPLS, JITPLS, LWPLS, MWGPR and OEGPR is enhanced. Among the five adaptive soft sensors, the proposed OEGPR method performs best. These results demonstrate the superiority of the OEGPR method over conventional soft sensors in dealing with process nonlinearity as well time-varying changes in process characteristics. The detailed prediction results of one test batch using the OEGPR soft sensor are shown in Figure 19. As shown in Figure 19a−19c, the OEGPR method can accurately predict the trend of substrate concentration, which is very meaningful for assisting the monitoring and substrate feeding control of fedbatch CTC fermentation process. Figure 19d presents the CPU time plot for online prediction, showing that OEGPR exhibits high realtime performance with an average CPU time of 0.1655 s. Furthermore, the posterior probabilities of the query data with respect to the two local models are shown in Figure 19e. It can be seen that, during the former (about 0−30 h) and the latter (about 60−100 h) stages, local mode1s 1 and 2 make the V
DOI: 10.1021/acs.iecr.5b01495 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX
Article
Industrial & Engineering Chemistry Research
Figure 19. Prediction results of one test batch using the OEGPR soft sensor in an industrial fed-batch CTC fermentation process (L = 500, γ = 5 × 10−7, M = 2). (a) Trend plots of the actual and predicted values; (b) prediction standard deviations; (c) prediction errors; (d) CPU time for online prediction; and (e) posterior probabilities of query data with respect to different local models.
approach has the attractive merit of providing the prediction uncertainty, which is impossible for many conventional soft sensors. The OEGPR method is applied to a simulated fed-batch penicillin fermentation process and an industrial fed-batch CTC fermentation process. The estimation performance of the OEGPR soft sensor is evaluated and compared with that of several traditional soft sensors under different percentages of target values available for model adaptation. The developed OEGPR soft sensor is shown to perform better than those traditional methods. As well as dealing with the gradual changes, the prediction results from the second test case of the penicillin cultivation process reveal that the OEGPR method can also handle abrupt changes caused by the variations in initial conditions or set points. Furthermore, it is shown that the prediction variance of the model can be used as an efficient indicator to evaluate the reliability of online predictions. In addition, the realtime performance of the OEGPR method is also demonstrated to be satisfactory. Apart from soft sensor applications, the presented method has the potential of addressing different modeling problems in nonlinear timevarying batch processes.
main contributions, respectively. During the middle stage (about 30−60 h), both the two local models make significant contributions to the final predictions. The above estimation results for substrate concentration prediction demonstrate the effectiveness of the proposed OEGPR method in terms of the prediction accuracy, capability of handling time-varying changes and realtime performance for online prediction.
5. CONCLUSIONS The OEGPR adaptive soft sensing algorithm, the main contribution of this work, is proposed to estimate the difficult-to-measure variables online for nonlinear time-varying batch processes. First, the batch process is divided into a set of local domains using a JIT localization method equipped with a probabilistic analysis mechanism to detect and remove the redundant local domains. Then multiple local GPR models and probabilistic data descriptors (PDD) are built for all isolated local domains. Further, the final predicted mean and variance are obtained based on the finite mixture mechanism, where the mixture weights are determined as the posterior probabilities of the query data with respect to different local domains. When new measurements are available, the OEGPR method provides the possibility to perform adaptation at two levels: (i) adaptation of the local GPR and PDD models and (ii) adaptation of the mixture weights. By exploiting the local learning framework, the OEGPR method can effectively capture the nonlinear input−output relationship by building locally valid models and performing ensemble prediction. Moreover, incorporation of the two-level adaptation mechanism enables the OEGPR method to accommodate the changing environments and thus maintain high estimation performance. Additionally, the OEGPR
■
ASSOCIATED CONTENT
S Supporting Information *
The default operation settings of initial conditions and set points of the PenSim simulator (Table S1); optimal prediction results of the OEGPR soft sensor under different local modeling sizes in fed-batch penicillin fermentation processes (Table S2); quantitative comparison of prediction results from 6 test sets with changes in initial conditions or set points using different adaptive soft sensors in fed-batch penicillin fermentation processes (Table S3). The Supporting InformaW
DOI: 10.1021/acs.iecr.5b01495 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX
Article
Industrial & Engineering Chemistry Research
(19) Kadlec, P.; Gabrys, B. Local learning-based adaptive soft sensor for catalyst activation prediction. AIChE J. 2011, 57, 1288−1301. (20) Ni, W.; Tan, S. K.; Ng, W. J.; Brown, S. D. Localized, adaptive recursive partial least squares regression for dynamic system modeling. Ind. Eng. Chem. Res. 2012, 51, 8025−8039. (21) Ni, W.; Br own, S. D.; Man, R. A localized adaptive soft sensor for dynamic system modeling. Chem. Eng. Sci. 2014, 111, 350−363. (22) Kim, S.; Kano, M.; Hasebe, S.; Takinami, A.; Seki, T. LongTerm Industrial Applications of Inferential Control Based on Just-InTime Soft-Sensors: Economical Impact and Challenges. Ind. Eng. Chem. Res. 2013, 52, 12346−12356. (23) Jin, H.; Chen, X.; Yang, J.; Wang, L.; Wu, L. Online local learning based adaptive soft sensor and its application to an industrial fed-batch chlortetracycline fermentation process. Chemom. Intell. Lab. Syst. 2015, 143, 58−78. (24) Shao, W.; Tian, X.; Wang, P. Local partial least squares based online soft sensing method for multi-output processes with adaptive process states division. Chin. J. Chem. Eng. 2014, 22, 828−836. (25) Shao, W.; Tian, X. Adaptive Soft Sensor for Quality Prediction of Chemical Processes Based on Selective Ensemble of Local Partial Least Squares Models. Chem. Eng. Res. Des. 2015, 95, 113−132. (26) Kaneko, H.; Arakawa, M.; Funatsu, K. Development of a new soft sensor method using independent component analysis and partial least squares. AIChE J. 2009, 55, 87−98. (27) Ge, Z.; Song, Z. Ensemble independent component regression models and soft sensing application. Chemom. Intell. Lab. Syst. 2014, 130, 115−122. (28) Ge, Z.; Song, Z.; Wang, P. Probabilistic combination of local independent component regression model for multimode quality prediction in chemical processes. Chem. Eng. Res. Des. 2014, 92, 509− 521. (29) Yuan, X.; Ge, Z.; Song, Z. Locally Weighted Kernel Principal Component Regression Model for Soft Sensing of Nonlinear TimeVariant Processes. Ind. Eng. Chem. Res. 2014, 53, 13736−13749. (30) Rosipal, R.; Trejo, L. J. Kernel partial least squares regression in reproducing kernel hilbert space. Journal of Machine Learning Research 2001, 2, 97−123. (31) Yu, J. Multiway Gaussian mixture model based adaptive kernel partial least squares regression method for soft sensor estimation and reliable quality prediction of nonlinear multiphase batch processes. Ind. Eng. Chem. Res. 2012, 51, 13227−13237. (32) Jin, H.; Chen, X.; Yang, J.; Wu, L. Adaptive soft sensor modeling framework based on just-in-time learning and kernel partial least squares regression for nonlinear multiphase batch processes. Comput. Chem. Eng. 2014, 71, 77−93. (33) Gonzaga, J.; Meleiro, L.; Kiang, C.; Maciel Filho, R. ANN-based soft-sensor for real-time process monitoring and control of an industrial polymerization process. Comput. Chem. Eng. 2009, 33, 43−49. (34) Niu, D.-p.; Wang, F.-l.; Zhang, L.-l.; He, D.-k.; Jia, M.-x. Neural network ensemble modeling for nosiheptide fermentation process based on partial least squares regression. Chemom. Intell. Lab. Syst. 2011, 105, 125−130. (35) Cui, L.; Xie, P.; Sun, J.; Yu, T.; Yuan, J. Data-driven prediction of the product formation in industrial 2-keto-L-gulonic acid fermentation. Comput. Chem. Eng. 2012, 36, 386−391. (36) Desai, K.; Badhe, Y.; Tambe, S. S.; Kulkarni, B. D. Soft-sensor development for fed-batch bioreactors using support vector regression. Biochem. Eng. J. 2006, 27, 225−239. (37) Jain, P.; Rahman, I.; Kulkarni, B. Development of a soft sensor for a batch distillation column using support vector regression techniques. Chem. Eng. Res. Des. 2007, 85, 283−287. (38) Yu, J. A Bayesian inference based two-stage support vector regression framework for soft sensor development in batch bioprocesses. Comput. Chem. Eng. 2012, 41, 134−144. (39) Kaneko, H.; Funatsu, K. Application of online support vector regression for soft sensors. AIChE J. 2014, 60, 600−612.
tion is available free of charge on the ACS Publications website at DOI: 10.1021/acs.iecr.5b01495.
■
AUTHOR INFORMATION
Corresponding Authors
*Tel.: +86 13601333018; Fax: +86 010 68914662. E-mail:
[email protected] (H. Jin). *Tel.: +86 13601333018; Fax: +86 010 68914662. E-mail:
[email protected] (X. Chen). Notes
The authors declare no competing financial interest.
■
ACKNOWLEDGMENTS The authors thank the Charoen Pokphand Group for their financial support (No. 20131041017) and for providing the dataset of industrial fed-batch CTC fermentation processes.
■
REFERENCES
(1) Cinar, A.; Parulekar, S. J.; Undey, C.; Birol, G. Batch Fermentation: Modeling: Monitoring, and Control; CRC Press: New York, 2003. (2) Weber, R.; Brosilow, C. The use of secondary measurements to improve control. AIChE J. 1972, 18, 614−623. (3) Joseph, B.; Brosilow, C. B. Inferential control of processes: Part I. Steady state analysis and design. AIChE J. 1978, 24, 485−492. (4) Brosilow, C.; Tong, M. Inferential control of processes: Part II. The structure and dynamics of inferential control systems. AIChE J. 1978, 24, 492−500. (5) Joseph, B.; Brosilow, C. Inferential control of processes: Part III. Construction of optimal and suboptimal dynamic estimators. AIChE J. 1978, 24, 500−509. (6) Tham, M. T.; Montague, G. A.; Morris, A. J.; Lant, P. A. Softsensors for process estimation and inferential control. J. Process Control 1991, 1, 3−14. (7) Kresta, J.; Marlin, T.; MacGregor, J. Development of inferential process models using PLS. Comput. Chem. Eng. 1994, 18, 597−611. (8) Park, S.; Han, C. A nonlinear soft sensor based on multivariate smoothing procedure for quality estimation in distillation columns. Comput. Chem. Eng. 2000, 24, 871−877. (9) Fortuna, L.; Graziani, S.; Rizzo, A.; Xibilia, M. G. Soft sensors for monitoring and control of industrial processes; Springer-Verlag: London, 2007. (10) Lin, B.; Recke, B.; Knudsen, J. K.; Jørgensen, S. B. A systematic approach for soft sensor development. Comput. Chem. Eng. 2007, 31, 419−425. (11) Kano, M.; Nakagawa, Y. Data-based process monitoring, process control, and quality improvement: Recent developments and applications in steel industry. Comput. Chem. Eng. 2008, 32, 12−24. (12) Kadlec, P.; Gabrys, B.; Strandt, S. Data-driven soft sensors in the process industry. Comput. Chem. Eng. 2009, 33, 795−814. (13) Kano, M.; Fujiwara, K. Virtual sensing technology in process industries: trends and challenges revealed by recent industrial applications. J. Chem. Eng. Jpn. 2013, 46, 1−17. (14) Jolliffe, I. T. Principal component analysis, 2nd ed.; Springer: New York, 2002. (15) Fujiwara, K.; Kano, M.; Hasebe, S.; Takinami, A. Soft-sensor development using correlation-based just-in-time modeling. AIChE J. 2009, 55, 1754−1765. (16) Ge, Z.; Song, Z.; Kano, M. External analysis-based regression model for robust soft sensing of multimode chemical processes. AIChE J. 2014, 60, 136−147. (17) Ge, Z.; Huang, B.; Song, Z. Mixture semisupervised principal component regression model and soft sensor application. AIChE J. 2014, 60, 533−545. (18) Wold, S.; Sjöström, M.; Eriksson, L. PLS-regression: a basic tool of chemometrics. Chemom. Intell. Lab. Syst. 2001, 58, 109−130. X
DOI: 10.1021/acs.iecr.5b01495 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX
Article
Industrial & Engineering Chemistry Research (40) Kaneko, H.; Funatsu, K. Adaptive soft sensor based on online support vector regression and Bayesian ensemble learning for various states in chemical plants. Chemom. Intell. Lab. Syst. 2014, 137, 57−66. (41) Jin, H.; Chen, X.; Yang, J.; Zhang, H.; Wang, L.; Wu, L. Multimodel adaptive soft sensor modeling method using local learning and online support vector regression for nonlinear time-variant batch processes. Chem. Eng. Sci. 2015, 131, 282−303. (42) Liu, Y.; Gao, Z.; Li, P.; Wang, H. Just-in-time kernel learning with adaptive parameter selection for soft sensor modeling of batch processes. Ind. Eng. Chem. Res. 2012, 51, 4313−4327. (43) Liu, Y.; Chen, J. Integrated soft sensor using just-in-time support vector regression and probabilistic analysis for quality prediction of multi-grade processes. J. Process Control 2013, 23, 793−804. (44) Rasmussen, C.; Williams, C. Gaussian processes for machine learning; MIT Press: London, 2006. (45) Li, X.; Su, H.; Chu, J. Multiple model soft sensor based on affinity propagation, gaussian process and bayesian committee machine. Chin. J. Chem. Eng. 2009, 17, 95−99. (46) Chen, T.; Ren, J. Bagging for Gaussian process regression. Neurocomputing 2009, 72, 1605−1610. (47) Yu, J. Online quality prediction of nonlinear and non-Gaussian chemical processes with shifting dynamics using finite mixture model based Gaussian process regression approach. Chem. Eng. Sci. 2012, 82, 22−30. (48) Yu, J.; Chen, K.; Rashid, M. M. A Bayesian model averaging based multi-kernel Gaussian process regression framework for nonlinear state estimation and quality prediction of multiphase batch processes with transient dynamics and uncertainty. Chem. Eng. Sci. 2013, 93, 96−109. (49) Grbić, R.; Slišković, D.; Kadlec, P. Adaptive soft sensor for online prediction and process monitoring based on a mixture of Gaussian process models. Comput. Chem. Eng. 2013, 58, 84−97. (50) Liu, Y.; Gao, Z. Real-time property prediction for an industrial rubber-mixing process with probabilistic ensemble Gaussian process regression models. J. Appl. Polym. Sci. 2015, 132, http://dx.doi.org/10. 1002/app.41432, 10.1002/app.41432. (51) Liu, Y.; Gao, Z. Industrial melt index prediction with the ensemble anti-outlier just-in-time Gaussian process regression modeling method. J. Appl. Polym. Sci. 2015, 132, http://dx.doi.org/ 10.1002/app.41958, 10.1002/app.41958. (52) Soares, S.; Araújo, R.; Sousa, P.; Souza, F. Design and application of soft sensor using ensemble methods. 2011 IEEE 16th Conference on Emerging Technologies & Factory Automation (ETFA, Toulouse, France, September 5−9, 2011; pp 1−8. (53) Atkeson, C. G.; Moore, A. W.; Schaal, S. 1997. Locally weighted learning. Artificial Intelligence Review 1997, 11, 11−73. (54) Schaal, S.; Atkeson, C. G.; Vijayakumar, S. Scalable techniques from nonparametric statistics for real time robot learning. Applied Intelligence 2002, 17, 49−60. (55) Cheng, C.; Chiu, M.-S. A new data-based methodology for nonlinear process modeling. Chem. Eng. Sci. 2004, 59, 2801−2810. (56) Chen, K.; Ji, J.; Wang, H.; Liu, Y.; Song, Z. Adaptive local kernel-based learning for soft sensor modeling of nonlinear processes. Chem. Eng. Res. Des. 2011, 89, 2117−2124. (57) Kano, M.; Ogawa, M. The state of the art in chemical process control in Japan: Good practice and questionnaire survey. J. Process Control 2010, 20, 969−982. (58) Tsymbal, A. The problem of concept drift: definitions and related work. Technical Report; Department of Computer Science, Trinity College Dublin, The University of Dublin: Ireland, 2004. (59) Kadlec, P.; Grbić, R.; Gabrys, B. Review of adaptation mechanisms for data-driven soft sensors. Comput. Chem. Eng. 2011, 35, 1−24. (60) Kaneko, H.; Funatsu, K. Classification of the degradation of soft sensor models and discussion on adaptive models. AIChE J. 2013, 59, 2339−2347. (61) Wang, X.; Kruger, U.; Irwin, G. W. Process monitoring approach using fast moving window PCA. Ind. Eng. Chem. Res. 2005, 44, 5691−5702.
(62) Ni, W.; Tan, S. K.; Ng, W. J.; Brown, S. D. Moving-window GPR for nonlinear dynamic system modeling with dual updating and dual preprocessing. Ind. Eng. Chem. Res. 2012, 51, 6416−6428. (63) Grbic, R.; Sliskovic, D.; Kadlec, P. Adaptive soft sensor for online prediction based on moving window Gaussian process regression. 2012 11th International Conference on Machine Learning and Applications (ICMLA), Boca Raton, Florida, USA, December 12− 15, 2012; pp 428−433. (64) Kaneko, H.; Funatsu, K. Moving window and just-in-time soft sensor model based on time differences considering a small number of measurements. Ind. Eng. Chem. Res. 2015, 54, 700−704. (65) Qin, S. J. Recursive PLS algorithms for adaptive data modeling. Comput. Chem. Eng. 1998, 22, 503−514. (66) Mu, S.; Zeng, Y.; Liu, R.; Wu, P.; Su, H.; Chu, J. Online dual updating with recursive PLS model and its application in predicting crystal size of purified terephthalic acid (PTA) process. J. Process Control 2006, 16, 557−566. (67) Ahmed, F.; Nazir, S.; Yeo, Y. K. A recursive PLS-based soft sensor for prediction of the melt index during grade change operations in HDPE plant. Korean J. Chem. Eng. 2009, 26, 14−20. (68) Kaneko, H.; Funatsu, K. Maintenance-free soft sensor models with time difference of process variables. Chemom. Intell. Lab. Syst. 2011, 107, 312−317. (69) Kaneko, H.; Funatsu, K. A soft sensor method based on values predicted from multiple intervals of time difference for improvement and estimation of prediction accuracy. Chemom. Intell. Lab. Syst. 2011, 109, 197−206. (70) Kaneko, H.; Funatsu, K. Database monitoring index for adaptive soft sensors and the application to industrial process. AIChE J. 2014, 60, 160−169. (71) Chen, M.; Khare, S.; Huang, B. A unified recursive just-in-time approach with industrial near infrared spectroscopy application. Chemom. Intell. Lab. Syst. 2014, 135, 133−140. (72) Liu, J. On-line soft sensor for polyethylene process with multiple production grades. Control Engineering Practice 2007, 15, 769−778. (73) Xie, L.; Zeng, J.; Gao, C. Novel just-in-time learning-based soft sensor utilizing non-Gaussian information. IEEE Transactions on Control Systems Technology 2014, 22, 360−368. (74) Bishop, C. M. Pattern recognition and machine learning; The MIT Press: London, 2006. (75) Tian, H.-X.; Mao, Z.-Z. An ensemble ELM based on modified AdaBoost. RT algorithm for predicting the temperature of molten steel in ladle furnace. IEEE Transactions on Automation Science and Engineering 2010, 7, 73−80. (76) Khatibisepehr, S.; Huang, B.; Khare, S. Design of inferential sensors in the process industry: A review of Bayesian methods. J. Process Control 2013, 23, 1575−1596. (77) Soares, S. G.; Araújo, R. A dynamic and on-line ensemble regression for changing environments. Expert Systems with Applications 2015, 42, 2935−2948. (78) Gomes Soares, S.; Araújo, R. An on-line weighted ensemble of regressor models to handle concept drifts. Engineering Applications of Artificial Intelligence 2015, 37, 392−406. (79) Kaneko, H.; Arakawa, M.; Funatsu, K. Applicability domains and accuracy of prediction of soft sensor models. AIChE J. 2011, 57, 1506−1513. (80) Kaneko, H.; Funatsu, K. Estimation of predictive accuracy of soft sensor models based on data density. Chemom. Intell. Lab. Syst. 2013, 128, 111−117. (81) Frühwirth-Schnatter, S. Finite mixture and Markov switching models; Springer-Verlag: New York, 2006. (82) Fan, M.; Ge, Z.; Song, Z. Adaptive Gaussian mixture modelbased relevant sample selection for JITL soft sensor development. Ind. Eng. Chem. Res. 2014, 53, 19979−19986. (83) Xie, X.; Shi, H. Dynamic multimode process modeling and monitoring using adaptive Gaussian mixture models. Ind. Eng. Chem. Res. 2012, 51, 5497−5505. (84) Van Vaerenbergh, S.; Via, J.; Santamaría, I. A sliding-window kernel RLS algorithm and its application to nonlinear channel Y
DOI: 10.1021/acs.iecr.5b01495 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX
Article
Industrial & Engineering Chemistry Research identification. 2006 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toulouse, France, May 14−19, 2006; pp V−V. (85) Liu, W.; Principe, J. C.; Haykin, S. Kernel Adaptive Filtering: A Comprehensive Introduction; John Wiley & Sons: Hoboken, NJ, 2011. (86) Chen, L. Z.; Nguang, S. K.; Chen, X. D. Modelling and optimization of biotechnological processes: artificial intelligence approaches; Springer Science & Business Media: 2006. (87) Birol, G.; Ü ndey, C.; Cinar, A. A modular simulation package for fed-batch fermentation: penicillin production. Comput. Chem. Eng. 2002, 26, 1553−1565. (88) Zhao, C.; Wang, F.; Gao, F.; Lu, N.; Jia, M. Adaptive monitoring method for batch processes based on phase dissimilarity updating with limited modeling data. Ind. Eng. Chem. Res. 2007, 46, 4943−4953. (89) Zhao, C.; Wang, F.; Mao, Z.; Lu, N.; Jia, M. Adaptive monitoring based on independent component analysis for multiphase batch processes with limited modeling data. Ind. Eng. Chem. Res. 2008, 47, 3104−3113. (90) Zhao, C.; Wang, F.; Gao, F. Improved calibration investigation using phase-wise local and cumulative quality interpretation and prediction. Chemom. Intell. Lab. Syst. 2009, 95, 107−121. (91) Hu, Y.; Ma, H.; Shi, H. Enhanced batch process monitoring using just-in-time-learning based kernel partial least squares. Chemom. Intell. Lab. Syst. 2013, 123, 15−27. (92) Jin, H.; Chen, X.; Yang, J.; Wu, L.; Wang, L. Hybrid intelligent control of substrate feeding for industrial fed-batch chlortetracycline fermentation process. ISA Trans. 2014, 53, 1822−1837.
Z
DOI: 10.1021/acs.iecr.5b01495 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX