Development of Interval Soft Sensors Using Enhanced Just-in-Time

Such interval soft sensors have the potential for validating online analyzers, .... Consequently, given a confidence level 1 – δ, a regression ICP ...
0 downloads 0 Views 4MB Size
ARTICLE pubs.acs.org/IECR

Development of Interval Soft Sensors Using Enhanced Just-in-Time Learning and Inductive Confidence Predictor Yiqi Liu,*,†,‡ Daoping Huang,† and Yan Li† †

College of Automation Science and Engineering, South China University of Technology, Guang Zhou, 510640, People's Republic of China ‡ Advanced Water Management Centre, University of Queensland, Hawken Building L501, Level 5, St. Lucia Campus, Queensland, 4072, Australia ABSTRACT: In the development of soft sensors for chemical processes, outliers of input variables and the time-varying nature of the process are difficult to address, thereby often resulting in unavailable prediction. Motivated by these issues, new just-in-time (JIT) learning is derived to track the normal changes of processes regardless of abrupt noises. Such an approach adapts a proposed robust nearest correlation (RNC) algorithm with multimodel ensemble learning to enhance conventional JIT learning. Furthermore, to gauge the quality of the given prediction, we integrate such JIT learning with the inductive confidence predictor (ICP) to yield a new soft sensor called the “interval soft sensor” which generates not only prediction values but also associated confidence values that represent the credibility of a soft sensor’s output. These ideas were applied to a wastewater treatment process. The proposed interval soft sensor was seen to be effective for prediction in the absence and presence of outliers in the process and for validating the online analyzer because of its modeling method independent of output data.

1. INTRODUCTION In process industries, many quality variables need to be measured online, but most of them are hard to measure online because of technical difficulties, large measurement delays, high investment costs, and so on. Soft sensors are good alternatives to solve this problem. Mainly, the partial least squares (PLS)13 method was used as a modeling method for soft sensors. Also, the principal component regression (PCR) method,4 the nonlinear PLS method,5 artificial neural networks,6,7 the support vector machine (SVM) based regression method,8 and others were utilized as the soft sensor model. However, building a high performance soft sensor is very laborious, since input variables and samples for model construction have to be selected carefully and parameters have to be tuned appropriately. In addition, even though a good soft sensor is developed successfully, its estimation performance deteriorates as process characteristics change. Therefore, different recursive methods have been proposed, such as recursive PLS,9 recursive support vector regression,10 etc. However, although these methods can adapt the soft sensor model to a new operation condition recursively, they cannot cope with abrupt changes of the process. When the process is operated among a certain period of region, the recursive method will adapt the soft sensor model excessively because of its blind update. To overcome these challenges, justin-time (JIT) learning was recently developed as an attractive alternative to deal with continuity and nonlinearity of processes. It has been applied in process monitoring and in soft sensors.11 The efficiency of JIT learning depends largely upon its ability to select the most similar data set and to build a local model. Conventional JIT algorithm estimation accuracy is not always high due to the sample selection for local modeling based on the distance between the samples and the query point. As such, efforts have been made by Cheng and Chiu12 to select samples on the basis of not only the distance but also the angle between two samples. Unfortunately, the angle does not always describe the r 2011 American Chemical Society

correlation among variables adequately because there are always some special pairs of samples which are orthogonal to each other. As a solution to this issue, Fujiwara et al.13 introduced correlation instead of the distance or the angle. Despite their improvements in the data selection method, the presence of outliers in the new coming data is not taken into account, which may result in unavailable data selection. Furthermore, even if data selection is well accomplished, a single linear local model in the conventional JIT algorithm may not always function very well since highly nonlinear variable relationships widely exist in complex processes. Little attention has been devoted to address all of these problems mentioned above. In addition, the soft sensor modeling methods, such as principal component analysis (PCA), PLS, and SVM mentioned above, are advanced and have been successfully applied to predict the process variables with certainty in different engineering fields. However, the soft sensors based on the aforementioned methods only output a bare prediction without any associated confidence values and hence have to rely on previous experience or relatively loose theoretical upper bounds on the probability of error to gauge the quality of the given prediction. Many methods exist in statistics and machine learning for solving such problems.14,15 Quite often Bayesian methods16 are used on the basis of probabilistic measures of their quality. They require, however, strong extra assumptions, and in practice there are great computational difficulties. Probably approximately correct (PAC) theory,17 in contrast, only makes the general independently and identically distributed (i.i.d.) assumption. Nonetheless, in order for the PAC methods to give nontrivial results, the data set should Received: May 17, 2011 Accepted: December 20, 2011 Revised: December 17, 2011 Published: December 20, 2011 3356

dx.doi.org/10.1021/ie201053j | Ind. Eng. Chem. Res. 2012, 51, 3356–3367

Industrial & Engineering Chemistry Research

ARTICLE

Figure 1. Cross validation procedure.

be particularly clean. As such, the transductive confidence predictor18,19 (TCP) is proposed. Yet, the main disadvantage of the existing variant of TCP is its relative computation inefficiency. The inductive confidence predictor (ICP) appears to be a good alternative to these methods, as the only assumption made by ICP is i.i.d. and the improvement in the computational efficiency is massive.20 Thus, employing ICP for uncertainty description is implemented in this work. To the authors’ knowledge, there is little information available in the literature to describe uncertainties for ensemble learning. The aim of this paper is to extend the idea of JIT learning in several directions as well as to combine it with ICP to yield an interval soft sensor. First, in order to make the soft sensors using the JIT model not only sensitive to normal process changes but also robust to outliers or noise, a robust nearest correlation (RNC) data selection algorithm is proposed, which takes into account the correlation together with distance for data selection and considers Jolliffe parameters21 for outlier detection. Moreover, unlike the classical local models, multimodel ensemble learning is presented as a local model because of its nonlinear characteristic and its fitness for small data set modeling. Ensemble learning typically refers to methods that generate several models which are combined to make a prediction. This approach has been the object of a significant amount of research in recent years, and good results have been reported.2224 The resulting enhanced JIT modeling is referred as to the JIT-ENS algorithm. In addition, integrating the JIT-ENS algorithm with ICP forms a new soft sensor called “interval soft sensors” which give a set of possible values rather than a single real value in contrast to traditional soft sensors. Such interval soft sensors have the potential for validating online analyzers, when the maintenance of analyzers is very laborious and their performance is unreliable. This paper is organized as follows: in section 2, the basic concepts of JIT learning, ensemble learning, and ICP are briefly described, followed by the development of the JIT-ENS algorithm and the interval soft sensors in section 3. Section 4 is devoted to the case study of BOD5 prediction in a wastewater treatment plant (WWTP).

2. PRELIMINARIES 2.1. JIT Learning. In general, a global linear model does not function well when a process has strong nonlinearity in its operation range. Thus, a method that divides a process operation region into small multiple regions and builds a local model in each small region has been proposed. JIT learning,11 also called “lazy learning”, is a local learning technique which postpones all the computation until an explicit request for a prediction is received. The request is fulfilled by interpolating locally the samples considered relevant according to a distance measure. Each prediction requires therefore a local modeling procedure that composes an identification. The major features of JIT learning13 are shown as follows: 1. The new input and output data are stored into a database once they are available. 2. Only when a prediction is required, a local polynomial model is built from samples located in a neighbor region around the new coming data point, and output variables are estimated. 3. The built local model is discarded after its usage for estimation. However, samples for local modeling should be selected carefully and the online computation load will become large when JIT modeling is used. 2.2. Ensemble Learning. If we average the output of several different models, we call this an ensemble model,25 i.e., ensemble learning. Building an ensemble of models has been a common way to improve the stability and accuracy of regression models since it was discovered. The novelty of the presented approach herein consists of building heterogeneous ensembles with several different model classes (multilayer perceptron neural network (MLP), K-nearest neighbor (K-NN), and ridge regression) on different subsets of the training data using a cross validation scheme.25 In order to select models for the final ensemble, we use cross validation (CV) for model training, model selection, and the estimation of the expected regression error. The implementation of these models together with a more detailed description can be found in ref 25. 3357

dx.doi.org/10.1021/ie201053j |Ind. Eng. Chem. Res. 2012, 51, 3356–3367

Industrial & Engineering Chemistry Research

ARTICLE

As seen from Figure 1, first of all we isolate a “test set” that is held out from the training procedure and only used for final evaluation in every round. For a K-fold CV the data is divided K times into a “training set” and a “test set”, with both sets containing randomly draw subsets of the data without replications. After the data partition we have K CV partitions with a training and a test set each. This leads to K rounds, and in every round we train MLP, K-NN, and ridge regression with a variety models parameters, respectively (see refs 25 and 26 for overviews of the models and the related model parameters) and select only one model to become a member of final ensemble (namely the best performer with respect to the test set). This means that all models have to compete with each other in a fair tournament because they are trained and tested on the same data set. The models with the lowest regression error in each CV-fold are taken out and added to the final ensemble. The regression error is shown as follows: Etrain ¼

K

xi Þ2 Þ ∑i ðy  f ð!

K

pððx1 , y1 Þ, :::, ðxl , yl Þ, ðxlþg , ylþg ÞÞ ¼ pðylþg Þ y

¼

ð1Þ

where the sum is taken over all members of the training set Mtrain. Finally, the procedure stops if the ensemble has the desired size. After the procedure is repeated K times, K different models are obtained. These models are used to build the final ensemble ^f ðxÞ ¼

should be assigned to every pair (xi,yi) in eq 5 which indicates how nonconforming this pair is for the rest of the samples in the same set. It is defined as the nonconformity score of the pair (xi,yi). More specifically, the nonconformity score of a pair (xi,yi) is the degree of disagreement between the actual output yl+g and the prediction ^yl+g. To implement a value αyi l+g efficiently, ICP first splits the training set (of size l) into two smaller sets: the proper training set with m l samples and the calibration set with q := l  m samples. It then utilizes the proper training set for training its regression algorithm and the calibration set for calculating the p value of each possible output yl+g. From this point on, it only l+g each new needs to compute the nonconformity score of αyl+g sample xl+g being assigned each possible output yl+g and calculate l+g : the p value of yl+g as αyl+g

∑ fi ðxÞ i¼1 K 1

eðxÞ ¼ yðxÞ  ^f ðxÞ

ð3Þ

can now be decomposed in the following manner:

ð6Þ

where A stands for the number of elements in the set A. Assuming it is possible to calculate the p value of every possible output in the calibration set following above, then all outputs with a p value are under significance level δ and have at most δ chance of being wrong. Consequently, given a confidence level 1  δ, a regression ICP outputs the set

which can be proved indeed valid in ref 28. To use ICP in conjunction with a JIT-ENS algorithm, a nonconformity measure is defined first. Because a nonconformity measure is a function describing the disagreement between the actual output yi and the prediction ^yi for sample xi, it is defined as the absolute difference between the two:

eðxÞ ¼ E  a̅ K

q þ 1

ð2Þ

To avoid overfitting problems, equal weights for each model are used. It is noticed that the generalization error of the ensemble25

¼

fi ¼ m þ 1, :::, l, l þ g : αi g αl lþg þ gg

αi ¼ jyi  ^yi j K

∑ wi ðyðxÞ  fi ðxÞÞ2 þ i∑¼ 1 wi ðfi ðxÞ  ^f ðxÞÞ2 i¼1

ð8Þ

ð4Þ

ð5Þ

3. SOFT SENSOR DESIGN USING INTERVAL JIT-ENS ALGORITHM Conventional JIT modeling selects data samples on the basis of distance regardless of correlation among variables and outliers in the database. Moreover, its local model is a linear polynomial regression model, which results in unavailable tracking when it is used for a nonlinear process. In the present work, a new JIT modeling based on the correlation and distance among variables for data selection is proposed. In the proposed method, the data selection method referred to as the robust nearest correlation (RNC) method uses Jolliffe parameters21 for outlier detection. Furthermore, to address the limitation of the polynomial model in the original JIT learning, this paper proposes multimodel ensemble learning as a local model for JIT learning. On the other hand, in contrast to traditional soft sensors, every prediction output is not a single real value, but a set of possible values called the “predictor region” because of our combination of the enhanced JIT learning soft sensor model with ICP technique. This soft sensor herein called the “interval soft sensor” is shown in Figure 2.

is i.i.d.; i.e., this is to check the likelihood of yl+g being the true output of xl+g as it is the only unknown value in eq 5. A value αyi l+g

correlation coefficient Ci,j can be used as an index to indicate the

where ɛ is the average error of the individual models and a is the average ̅ ambiguity of the ensemble. On the basis of error decomposition, we can see that the ensemble generalization error e(x) is always smaller than the expected error of the individual models ɛ. This is why ensemble learning can always ̅ get better performance. 2.3. ICP. Confidence values are an indication of how likely each prediction is to be correct. To facilitate the confidence description, ICP is employed in this paper. ICP, as suggested by its name, replaces the transductive inference19 used in the original CP approach with inductive inference. As a result, ICPs are almost as computationally efficient as their underlying algorithms, which were proved by Papadopoulos et al.20 This section will give a brief description of the ICP framework; for a more detailed description refer to ref 27. Given a training set {(x1,y1), ..., (xl,yl)} of l samples, i = 1, ..., l, and a new input is xl+g ∈ R m . ICP assumes every possible output yl+g of the new sample xl+g and checks how likely it is that the extended set of samples fðx1 , y1 Þ, :::, ðxl , yl Þ, ðxlþg , ylþg Þg

3.1. JIT-ENS Algorithm. 3.1.1. Sample Selection for Local Modeling Using Proposed RNC. 1. Correlation Coefficient. The 3358

dx.doi.org/10.1021/ie201053j |Ind. Eng. Chem. Res. 2012, 51, 3356–3367

Industrial & Engineering Chemistry Research

ARTICLE

Figure 2. Schematic interval JIT-ENS.

similarity between two vectors, xi and xj ∈ Rn. The Ci,j is defined as follows: x0i

has been performed, the Jolliffe parameters are shown as follows. N

d1i 2 ¼

∑ tik 2 k¼M  p þ 1

ð12Þ

ð13Þ

¼ xi  xnew

ð9Þ

x0j ¼ xj  xnew

ð10Þ

d2i 2 ¼

tik 2 k ¼ M  p þ 1 σk

ð11Þ

d3i 2 ¼

∑ σktik 2 k¼M  p þ 1

Ci, j ¼

x0j x0i T ||x0i ||||x0j ||

The query xnew is newly measured. xi and xj are samples from the database, and the correlation coefficient between xnew and them is calculated. As the correlation coefficient becomes larger, these samples are identified as the most similar to sample xnew. It is noticed that we assume Ci,j = 1 and xi is put into the data space S directly if x0 i = 0. 2. Outlier Detection Using Jolliffe Parameters Together with PCA. The purpose of the JIT algorithm is to obtain a very accurate dynamic model able to estimate the system output in real time. By using the correlation coefficient, the JIT algorithm can select data that are most relevant to xnew. However, if the coming data xnew is invalid or out of range, selecting a best fitting data set using the correlation coefficient is impossible. This paper, therefore, introduces Jolliffe parameters to detect outliers before data selection. As reported by Fortuna et al.,6 numerous procedures had been suggested for detecting outliers with respect to a single variable; however, the literature on multivariate outliers is less extensive. The excellent performance of Jolliffe parameters was shown by Fortuna et al.6 Furthermore, Jolliffe parameters are chosen for the outlier detection algorithm because of their easy integration with PCA and subsequent JIT data selection method. After PCA decomposition of original data

N

∑ N

ð14Þ

where index i refers the ith sample of the considered projected variable; M, N, and p (eM) denote the numbers of variables, samples, and principal components retained in the PCA model, respectively; tik is the ith sample of the kth principal component (or latent variable); σk is the variance of the kth component. By comparing the three Jolliffe parameters, d12, with respect to d22, has been employed to detect and penalize components with a low value of variance. Moreover, d32 is constructed to detect and avoid observations that inflate the data set variance.30 When using Jolliffe parameters to detect outliers, a suitable limit should be defined. For simplicity, the 3σ limit is adopted herein. Since the modeling method in the subsequent section is a recursive method, the 3σ limit is also needed to update recursively as N1 1 mi þ xnew miþ1 ¼ N N ð15Þ N 2 2 1 σi þ ðxnew  miþ1 Þ2 σ iþ1 2 ¼ N 1 N 1 where mi and σi are the mean and variance of the ith sample and mi+1 and σi+1 are the mean and variance of the i + 1 samples, respectively. 3359

dx.doi.org/10.1021/ie201053j |Ind. Eng. Chem. Res. 2012, 51, 3356–3367

Industrial & Engineering Chemistry Research

ARTICLE

Figure 3. Example of different states of xnew.

To further illustrate the data selection method, assume that xnew is a new coming point and is used as the base to select relevant data. Parts a and b of Figure 3 show different sample selection methods between the enhanced JIT modeling and conventional JIT modeling. The samples are divided into two groups, i.e. “circle” and “star”, which have different correlations. In conventional JIT, samples are selected on the basis of only distance regardless of correlation as noticed in Figure 3b. In contrast, the JIT-ENS can select samples whose correlation is best fit for the new sample as seen in Figure 3a. However, good data selecting is impossible if only correlation is considered, since it is highly probable that xnew is an outlier as presented in Figure 3c. If xnew is an outlier, it may deteriorate ensuing data selection and even result in zero data point selecting for local modeling. On the other hand, it is obvious in Figure 3d that the relevant modeling space can be derived easily and correctly when xnew is checked by the 3σ limit and identified as a normal point. 3. Index for Data Selection. If the new coming data point is not an outlier, the index for data selection is designed next. In PCA, the loading matrix Vp ∈ Rmp is derived as the right singular matrix of X ∈ RNM whose ith row is xiT, and the column space of Vp is the subspace spanned by principal components. All variables are mean-centered and appropriately scaled. The score is a projection of X onto the subspace spanned by principal components. The score matrix Tp ∈ RNp is given by Tp ¼ XVp

ð16Þ

Table 1. RNC Algorithm 1. Select the number of components p, lower limit of coefficient lo (0 e lo e 1), re. 2. Detect and delete the outliers in the stored data using eqs 1214 and 3σ limit. 3. Judge if xnew is an outlier using the 3σ limit; if yes, delete and wait a new coming point, or go ahead. 4. Update the variance using eq 15. 5. Calculate the coefficient using eqs 911, according to xnew. 6. Choose the samples if |Ci,j g lo| and then put them into S subspace. 7. If the number of S e p, go next, or lo = lo  re, return to step 6. 8. Determine index J according to eqs 1921. 9. Detect the first k1 samples in ascending order of J or the samples whose J is smaller than J as samples are similar to the xnew, where J is the threshold. Then determine the most relevant data set for subsequent modeling.

^ ¼ XðI  Vp VpT Þ E ¼ X X The Q statistic is defined as Q ¼

^ ¼ X

¼

XVp VpT

M

∑ ðxi  ^xi Þ2 i¼1

ð19Þ

^ , respectively. In addition, where xi and ^xi are the row of X and X to guarantee the sample is located in modeling data and to avoid extrapolation, Hotelling’s T2 statistic is used and defined as

where X can be estimated from Tp with Vp: Tp VpT

ð18Þ

T2 ¼

ð17Þ 3360

p

t2

i ∑ 2 σ i ¼ 1 ti

ð20Þ

dx.doi.org/10.1021/ie201053j |Ind. Eng. Chem. Res. 2012, 51, 3356–3367

Industrial & Engineering Chemistry Research

ARTICLE

where σti denotes the standard deviation of the ith score ti. The Q statistic and T2 statistic can be integrated into a single index for data selection as proposed by Raich and Cinar.29 J ¼ λT 2 þ ð1  λÞQ

Figure 4. JIT-ENS algorithm structure.

Table 2. JIT-ENS Algorithm Coupled with ICP Algorithm 1. Treat the data in space S as the training set. 2. Split the training set {(x1,y1, ..., (xl,yl)} into two subsets: the proper training set, {(x1,y1, ..., (xm,ym)}; the calibration set, {(xm+1,ym+1, ..., (xm+q,ym+q}. 3. Use the proper training set to train multimodel ensemble learning illustrated in section 2.2. 4. For each pair (xm+i,ym+i), i = 1, ..., q in the calibration set, supply the input pattern xm+i to the trained multimodel ensemble learning to obtain^ym+i and calculate the nonconformity score αm+i with eq 8. 5. Sort the nonconformity scores of the calibration samples in descending order obtaining the sequence α(m+1), ..., α(m+q). 6. Utilize eq 6 to calculate the confidence regions for the new prediction when xnew is coming.

ð21Þ

where 0 e λ e1. 4. Algorithm of Robust Nearest Coefficient. The detailed RNC algorithm is summarized below. Suppose the samples stored in database are xi or xj and the query is xnew. The algorithm of the proposed RNC is described in Table 1. In Table 1, step 3, xnew is deleted if xnew is an outlier; then the model and its output that was obtained the last time are kept unchanged and used this time. Otherwise, xnew is the base for relevant data selection. As for Table 1, step 5, to guarantee the sample number in space S is larger than p, re is utilized to relax lo. 3.1.2. Local Model and Its Integration with JIT-ENS Algorithm. Dissimilar from the traditional JIT model using the polynomial method for the local model, in this paper, a simple but effective ensemble algorithm proposed by Merkwirth et al.26 is used as a local model for JIT modeling. This improvement has two advantages: (1) due to the diversity of different models, the combining models can demonstrate different behaviors; (2) the soft senor can be made robust because of its inherent redundant design. Additionally, with the aim of reducing the memory usage of conventional JIT, a moving window technique is introduced. If xnew is available, it will be added into the database and the oldest sample will be deleted. Figure 4 provides a summary of the procedure of the JIT-ENS algorithm. 3.2. Design Interval Soft Sensors with Inductive Conformal Predictor. Confidence values are an indication of how likely each prediction is correct. In the ideal case, a confidence of 99% or higher for all samples in a set means that the percentage of erroneous predictions in that set will not exceed 1%. For this reason, we use ICP to estimate confidence and prediction intervals for the JIT-ENS algorithm. The resulting soft sensors using this technique can give confidence prediction values rather than bare predictions. To present the JIT-ENS regression inductive conformal predictor algorithm with this measure, the algorithm is described as

Figure 5. Wastewater plant process. 3361

dx.doi.org/10.1021/ie201053j |Ind. Eng. Chem. Res. 2012, 51, 3356–3367

Industrial & Engineering Chemistry Research follows: After the data are chosen by the RNC algorithm, a data set for modeling that is put into S is obtained. The JIT-ENS algorithm coupled with the ICP algorithm is summarized in Table 2.

4. RESULTS AND DISCUSSION This is a particularly interesting case study for the proposed methodology. The biological wastewater treatment plant investigated in this work is activated sludge, which is a common example of a wastewater treatment process. It is designed for the Table 3. Parameters Selected for Inferring position overall plant [xx-G]

parameters sedimentable solids [SED], suspended solids [SS], biological oxygen demand [DBO)], chemical oxygen demand [DQO]

primary settlers [xx-P]

DBO, SS

secondary settlers [xx-S] influent to WWTP [xx-E]

DBO, DQO DBO, DQO

secondary treatment [xx-D]

DQO, DBO, SS, pH, SED

output [xx-S]

DQO, SED, SS, pH

ARTICLE

removal of organic matter and nutrients. In this process, process knowledge is very limited, and the BOD5 online analyzer tends to be unreliable. It is very desirable to have a reasonably accurate inferential model for BOD5 prediction due to a 5-day delay that is inherent in the laboratory measurement. In addition, even if an accurate inferential model for BOD5 prediction is obtained, the large variety and complexity of all present biological species in WWTPs make it difficult to identify the role of each one in a process. This leads to uncertainty in the initial conditions. In fact, the quality of wastewater is strongly influenced by weather conditions and seasonal changes. Such fluctuations often result in the degradation of the performance or even plant failure. Consequently, these would lead to outliers in the input sensors of a soft sensor as well as the output sensor, i.e., analyzer. As displayed in Figure 5, the proposed wastewater plant process31 comprises four elements: pretreatment, primary settlers, aeration tankers, and secondary settlers. Also, the parameters needed to be measured are presented. The data are collected daily on the operation of the WWTP during about a 2-year period. A total of 400 data records have been used; each consists of 38 process variables. At the same time, designing a soft sensor requires the application of preprocessing techniques to

Figure 6. Prediction result of BOD5 using PLS, RMSE = 0.6951, and r = 0.7661.

Figure 7. Prediction result of BOD5 using JIT, RMSE = 1.2241, and r = 0.4057. 3362

dx.doi.org/10.1021/ie201053j |Ind. Eng. Chem. Res. 2012, 51, 3356–3367

Industrial & Engineering Chemistry Research

ARTICLE

Figure 8. Prediction result of BOD5 using JIT-ENS, RMSE = 0.950, and r = 0.8815.

Figure 9. Outlier detection using Jolliffe parameters.

select the relevant variables. Thus, the automatic clustering algorithms based on Kohomen’s self-organizing maps (SOMs)32 are used to detect redundant and irrelevant features. Through this preprocess, 19 process variables are selected as inputs of models, which are shown in Table 3. Also, 200 samples are utilized for training; the remaining 200 samples are used for testing the performance of models. An objective variable y is the concentration of BOD5, and explanatory variables X are the 19 variables, which are biological oxygen demand, suspended solids, sedimentable solids, and so on. 4.1. Prediction Performance of the JIT-ENS Inferential Model. To access precisely the prediction performance of the inferential model, the root mean square error (RMSE) and coefficient r are used. The RMSE criterion is defined by eq 22 for quality comparisons of different methods. sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1 N ðyi  ^yi Þ2 ð22Þ RMSE ¼ N i¼1



In this section, the BOD5 of the Barcelona WWTP is predicted for algorithm testing and performance evaluation. There are some abrupt changes in the pH of effluent and the COD of

Figure 10. Comparison of the coefficient r and RMSE among JIT-ENS, JIT, and PLS.

secondary settlers in the wastewater treatment plant, and these lead to the occurrence of outliers. To evaluate the linear model unsuitable for the nonlinear wastewater treatment process, PLS, which is popular in the chemical process for soft sensor modeling, is employed. The number of latent variable used in the PLS model is determined by trial and error to maximize the prediction performance. The estimation result is shown in Figure 6, illustrating that the PLS does not function well. This is due to the fact that the linear PLS model is not suitable for nonlinear process and dealing with outliers is hard for it. The conventional JIT uses polynomial as the local model, and the Euclidean distance is used as the measure for selecting samples to build local models. The simulation software is referred from the MATLAB Lazy Learning Toolbox.33 The prediction result is given in Figure 7 and demonstrates that JIT modeling does not function well, especially when suffering from outliers in the process. One reason for the poor performance of JIT seems to be that correlations among variables and outlier detection are not taken into consideration. Besides, because high nonlinear variable relationships widely exist in the wastewater treatment, a 3363

dx.doi.org/10.1021/ie201053j |Ind. Eng. Chem. Res. 2012, 51, 3356–3367

Industrial & Engineering Chemistry Research

ARTICLE

simple polynomial local model cannot characterize the complex process well. To prove these, JIT-ENS is applied to the same problem. To estimate BOD5, JIT-ENS is employed. The criterion for selecting a data set is to minimize the J. Because we pay more attention to the Q statistic, λ = 0.1 is set in eq 21. The local model is updated every day once BOD5 is measured. The window size W and the window moving width ds are 100 and 1, respectively. In addition, lo is 0.9. Another important parameter is the cross validation (cv) partition values, which is related to the computational complexity of soft sensors. By minimizing RMSE through trial and error, 4 is chosen as the best value for the cv partition. Figure 8 displays the estimation result, demonstrating clearly an excellent fit between the testing measures and the JIT-ENS prediction values over 200 days. Also, it is obvious from Figure 9 that the outliers can be detected by the RNC algorithm. After the outliers are identified, they are all deleted and replaced by the mean of variables from the moving window. By doing so, the estimation accuracy is increased. Obviously, the RMSE and r of JIT-ENS are improved by about 68 and 121% in comparison with those of the conventional JIT, respectively. These results show clearly the RNC algorithm incorporated into the JIT algorithm functions very well, even though JIT-ENS is subject to outliers in the modeling updating step. Additionally, to further validate the Table 4. CPU Time and Prediction Performance lo

CPU time (s)

r

RMSE

0.6

94.76

0.8936

0.3963

0.7 0.8

75.35 57.55

0.8734 0.8199

0.5345 0.5347

0.9

49.14

0.8991

0.3865

Table 5. Empirical Reliability of the JIT-ENS Algorithm with Different Confidence Levels empirical reliability (%) 90%

95%

99%

94.5

97.4

99.5

ability of JIT-ENS, we divide 200 test data into 10 groups, which contain 20 data points each. It is obvious in Figure 10 that the accuracy of the JIT-ENS algorithm is also best in terms of RMSE and r, even though there are no outliers before the 119th sample. This is because the correlation together with the distance method instead of only the distance measure is used for data selection and the local polynomial model is replaced by multimodel ensemble learning. Besides fitness results, another performance to be mentioned is the CPU time. Compared to the conventional JIT soft sensor with 3.3 s for prediction, the JIT-ENS soft sensor makes its prediction with almost 49 s. This computation burden arises largely from the ensemble of MLP, K-NN, and ridge regression, which increases the prediction accuracy at the cost of computing time. To further improve the reliability of the model, larger W may be better. However, this will reduce the adaptation ability of the model. As a compromise, W = 100 is determined by trial and error. Notice that the running time of conventional JIT is 3.3 s, much less than that of JIT-ENS. It is important to underline that, compared to JIT-ENS with moving window, conventional JIT will suffer from a curve of dimension problem which results in a huge computational burden as time is gone. Given a 5-day delay for the lab test and 1/2 h for the BOD5 online analyzer, 49 s is a good performance and sufficient to meet the requirement of feedback control. lo is an essential factor for the modeling size of JIT-ENS, which is highly related to the soft sensor performance and computational complexity. Thus, different lo values from 0.6 to 0.9 are carried out for JIT-ENS soft sensor development. The performance of JIT-ENS soft sensors with different lo values is tabulated in Table 4. The computer configuration is as follows: OS, Windows XP (32 bit); CPU, Pentium Dual-Core E5400 (2.7 GHz); RAM, 2 GB. MATLAB 2009a is employed. It can be seen that the computation time is reduced when the lo value gets larger. However, there is not as much improvement in RMSE if lo is smaller than 0.9. When lo = 0.9, the model obtains the best performance. 4.2. Validation of Interval Soft Sensors. Next, the usefulness of interval JIT-ENS, i.e., interval soft sensor, is analyzed. Since the problem in WWTPs is the highly uncertain environment, traditional soft sensors with bare predictions cannot meet these demands. This paper, thus, proposes an interval soft sensor using

Figure 11. Prediction result of interval JIT-ENS soft sensors with 90% confidence. 3364

dx.doi.org/10.1021/ie201053j |Ind. Eng. Chem. Res. 2012, 51, 3356–3367

Industrial & Engineering Chemistry Research

ARTICLE

ICP. Such a soft sensor takes into account not only the uncertainty of WWTPs but also the model uncertainty. The first step is to check how reliable the obtained predictive regions are. We count the percentage of wrong predictive intervals;in other words, how many times the algorithm fails to give a predictive region that contains the real output of every test sample. The results in Table 5 confirm the validity of our algorithm: the rate of successful predictions is at least equal to the desired accuracy. Table 6. Median Width of the Predictions Made for the JITENS Algorithm, at Different Accuracy Levels median width 90%

95%

99%

1.2028

1.5556

2.2487

Figure 12. Medians, upper and lower quartiles, and 10th and 90th percentile of the distributions of the predictive interval widths using eq 8 as the strangeness value.

Figure 11 complements the information given in Table 5 for predicting with 90% confidence. This case shows that the prediction uncertainty is an important issue for a model with new data updating. During the transition stage, some of the input variables are adjusted to bring the process to a new steady state. The confidence would be widened due to the adjusted process variable derivative from steady state values. The second step is to check the tightness of our predictive regions by calculating the median value of the lengths of all predictive regions obtained for a specific significance level. This gives us a measure of how efficient our algorithm is. We prefer using the median value instead of the mean, because it is more robust: if a few of the predictions are extreme due to noise or overfitting, the average will be affected, while the median will remain unchanged. Table 6 provides a summary of the results, demonstrating clearly that the median width gets larger as we move toward higher confidence levels. It shows the tightness of our predictive regions obtained for a specific level. Figure 12 complements the information given in Table 6 for the JIT-ENS algorithm by also providing other characteristics of the distribution of the predictive interval widths. To further validate the effectiveness of the interval soft sensor, it is used to calibrate the online output analyzer. In an industrial process, a common example of encountering abrupt noise occurs when the online analyzer is installed. Therefore, there is a need to perform checking in the output of the online analyzer. Because the model update is not dependent on the online analyzer, the intervals can be used to validate if the online analyzer is influenced by the noise. Assuming some noises occur in the online analyzer, Figure 13 shows the confidence intervals of predictions are capable of showing whether the online output analyzer is out of interval and then indicating the status of the online output analyzer. Finally, it is important to note that the results of this case study clearly show that the proposed interval soft sensor can cope with not only abrupt changes in process characteristics but also unavailable problems of the online analyzer. All these problems are common in chemical processes. Especially in microbial systems including the pharmaceutical industry and food systems, the online analyzer is unreliable and many variables are needed

Figure 13. Prediction result of interval JIT-ENS soft sensors with 90% confidence and validating the online analyzer when it suffers from outliers or abrupt noises. 3365

dx.doi.org/10.1021/ie201053j |Ind. Eng. Chem. Res. 2012, 51, 3356–3367

Industrial & Engineering Chemistry Research for prediction. An unavailable analyzer or sensors usually cause outliers. Besides the potential use of interval soft sensors for parameter estimation, they may also be used in robust minmax control.34

5. CONCLUSIONS In this work, we propose an interval soft sensor to cope with normal changes as well as abrupt noises in the process characteristics and inputoutput sensors. Experimental study demonstrates the validity and benefits of this approach. PLS, JIT, and JIT-ENS have been compared in terms of RMSE and r. The RMSE and r of JIT-ENS are improved by about 68 and 121% in comparison with those of the conventional JIT, respectively, because of our combination of the RNC algorithm and ensemble learning. Alternatively, such an interval soft sensor complements the bare prediction with measures of confidence in those predictions, thereby providing more information to validate the online output analyzer. In current work, the computational burden is not taken seriously, which could compromise the ability of the JIT-ENS algorithm. Making the ensemble learning model recursive is the recommended solution. In future work, the JIT-ENS algorithm could be integrated with recursive ensemble learning, and also a further modification to the outlier detection method could be taken into account. ’ AUTHOR INFORMATION Corresponding Author

*Tel.: +61 0452214426. Fax: +86 20 87114189. E-mail: liuyiqi769@ sina.com.

’ ACKNOWLEDGMENT This work was supported by the Fundamental Research Funds for the Central Universities, SCUT (2012ZM0102, 2011ZM0120). Also, many thanks should also be given to Manabu Kano and his group at Kyoto University for their generosity in providing the Lazy Learning Toolbox. ’ REFERENCES (1) Facco, P.; Doplicher, F.; Bezzo, F.; Barolo, M. Moving average PLS soft sensor for online product quality estimation in an industrial batch polymerization process. J. Process Control 2009, 19, 520. (2) Martens, H.; Hoy, M.; Westad, F.; Folkenberg, D.; Martens, M. Analysis of designed experiments by stabilised PLS Regression and jackknifing. Chemom. Intell. Lab. Syst. 2001, 58, 151. (3) Liu, J.; Chen, D. S.; et al. Development of Self-Validating Soft Sensors Using Fast Moving Window Partial Least Squares. Ind. Eng. Chem. Res. 2010, 49, 11530. (4) Aguado, D.; Montoya, T.; Borras, L.; Seco, A.; Ferrer, J. Using SOM and PCA for analysing and interpreting data from a P-removal SBR. Eng. Appl. Artif. Intell. 2008, 21, 919. (5) Khayamian, T. Robustness of PARAFAC and N-PLS regression models in relation to homoscedastic and heteroscedastic noise. Chemom. Intell. Lab. Syst. 2007, 88, 35. (6) Fortuna, L.; Graziani, S.; Xibilia, M. G. Soft sensors for product quality monitoring in debutanizer distillation columns. Control Eng. Pract. 2004, 13, 499. (7) Alhoniemi, E. S. A. Process monitoring and modeling using the self-organizing map. Integr. Comput.-Aided Eng. 1999, 6, 3.

ARTICLE

(8) Yan, W.; Shao, H.; Wang, X. Soft sensing modeling based on support vector machine and Bayesian model selection. Comput. Chem. Eng. 2004, 28, 1489. (9) Qin, J. S. Recursive PLS algorithms for adaptive data modeling. Comput. Chem. Eng. 1998, 22, 503. (10) Liu, M.; Huang, D.; et al. Combining KPCA with LSSVM for the Mooney-viscosity forecasting. In Proceedings of the 2nd International Conference on Genetic and Evolutionary Computing (WGEC’2008); IEEE Computer Society: Los Alamitos, CA, 2008; p 522. (11) Bontempi, G.; Birattari, M.; Bersini, H. Lazy learning for local modeling and control design. Int. J. Control 1999, 72, 643. (12) Cheng, C.; Chiu, M. S. A New Data-Based Methodology for Nonlinear Process Modeling. Chem. Eng. Sci. 2004, 59, 2801. (13) Fujiwara, K.; Kano, M.; Hasebe, S.; Takinami, A. Soft Sensor Development Using Correlation-Based Just-in-Time Modeling. AIChE J. 2009, 55, 1754. (14) Graepel, T.; Herbrich, R.; Obermayer, K.; Bayesian transduction. Advances in Neural Information Processing Systems 12; MIT Press: Denver, 2000; pp 456462. (15) Heskes, T. Practical confidence and prediction intervals. Advances in Neural Information Processing Systems 9; MIT Press: Cambridge, 1997; pp 176182. (16) Melluish, T.; Saunders, C.; Nouretdinov, I.; Vovk, V. Comparing the Bayes and typicalness frame works. In Proceedings of the 12th European Conference on Machine Learning (ECML’2001); SpringerVerlag: London, 2001; p 360. (17) Cristianini, N.; Taylor Shawe, J. Support Vector Machines and Other Kernel-based Learning Methods [Online]; Cambridge University Press: London, 2000; pp 130240. http://www.support-vector.net/ (accessed Nov 17, 2010). (18) Gammerman A.; Vapnik, V.; Vovk, V. Learning by transduction. In Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence (CUAI’ 1998); Morgan Kaufmann: San Francisco, CA, 1998; p 148. (19) Proedrou, K.; Nouretdinov, I.; Vovk, V.; Gammerman, A. Transductive Confidence Machines for Pattern Recognition. In Proceedings of the 13th European Conference on Machine Learning (ECML’2001); Springer-Verlag: London, 2001; p 381. (20) Papadopoulos, H.; Proedrou, K.; et al. Inductive Confidence Machines for Regression. In Proceedings of the 13th European Conference on Machine Learning (ECML’2002); Springer-Verlag: London, 2002; p 185. (21) Warne, K.; Prasad, G.; Rezvani, S.; et al. Statistical and computational intelligence techniques for inferential model development: a comparative evaluation and a novel proposition for fusion. Eng. Appl. Artif. Intell. 2004, 17, 871. (22) Weigel, A. P.; Liniger, M. A.; Appenzeller, C. Can multi-model combination really enhance the prediction skill of probabilistic ensemble forecasts? Q. J. R. Meteorol. Soc. 2008, 134, 241. (23) Mevik, B. H.; Segtnan, V. H.; Næs, T. Ensemble methods and partial least squares regression. J. Chem. 2004, 18, 498. (24) Liu, J. On-line soft sensor for polyethylene process with multiple production grades. Control Eng. Pract. 2007, 15, 769. (25) Wichard J. D.; Ogorzalek, M. Time series prediction with ensemble models, Neural Networks. In Proceedings of IEEE International Joint Conference (IEEE IJC’2004); IEEE Computer Society: Los Alamitos, CA, 2004; p 1625. (26) Merkwirth C.; Wichard J.; et al. A software toolbox for constructing ensembles of heterogeneous linear and nonlinear models. In Proceedings of the 2005 European Conference on Circuit Theory and Design (ECCTD’2005); IEEE Computer Society: Los Alamitos, CA, 2005; p 197. (27) Papadopoulos, H.; Proedrou, K.; et al. Inductive Confidence Machines for Regression. In Proceedings of the 13th European Conference on Machine Learning (ECML’2002); Springer-Verlag: London, 2002; p 185. (28) Papadopoulos, H.; Haralambous, H. Neural networks regression inductive conformal predictor and its application to total electron 3366

dx.doi.org/10.1021/ie201053j |Ind. Eng. Chem. Res. 2012, 51, 3356–3367

Industrial & Engineering Chemistry Research

ARTICLE

content prediction. In International Conference of Artificial Neural Networks (ICANN’2010); Springer-Verlag: London, 2010; p 6352. (29) Raich, A.; Cinar, A. Statistical process monitoring and disturbance diagnosis in multivariable continue processes. AIChE J. 1994, 42, 995. (30) Fortuna, L.; Graziani, S.; Rizzo, A.; Xibilia, M. G. Soft Sensors for Monitoring and Control of Industrial Processes; Advances in Industrial Control; Springer: London, 2007. (31) Blake, C. L.; Merz, C. J. UCI Repository of Machine Learning Databases; University of California: Irvine, CA, 2007; http://www.ics. uci.edu/∼mlearn/MLRepository.html; Feb. 18, 1998. (32) Rallo, R.; Ferre-Gine, J.; Giralt, F. Best Feature Selection and Data Completion for the Design of Soft Neural Sensors; Research Report; Universitat Rovira i Virgili: Tarragona, Spain. (33) Botempi, G.; Birattari, M.; Bersini, H. Lazy learners at work: the Lazy learning toolbox. Presented at The 7th Europe Congress on Intelligent Techniques and Soft Computing (ECITSC’1999); 1999. (34) Rapaport, A.; Harmand, J. Robust regulation of a class of partially observed nonlinear continuous bioreactors. J. Process Control 2002, 12, 291.

3367

dx.doi.org/10.1021/ie201053j |Ind. Eng. Chem. Res. 2012, 51, 3356–3367