Development of a Novel Adaptive Soft-Sensor Using Variational

Dec 2, 2014 - Soft-sensor is the most common strategy to estimate the hard-to-measure variables in the chemical processes. Recent research has shown t...
2 downloads 12 Views 6MB Size
Article pubs.acs.org/IECR

Development of a Novel Adaptive Soft-Sensor Using Variational Bayesian PLS with Accounting for Online Identification of Key Variables Yiqi Liu,† Yongping Pan,‡ and Daoping Huang*,† †

School of Automation Science & Engineering, South China University of Technology, Wushang Road, Guang Zhou 510640, China Department of Biomedical Engineering, National University of Singapore, Singapore Medical Drive, Singapore 117575, Singapore



S Supporting Information *

ABSTRACT: Soft-sensor is the most common strategy to estimate the hard-to-measure variables in the chemical processes. Recent research has shown that accurate prediction of hard-to-measure variables can significantly improve system performance. However, deterioration of predictive ability resulting from dramatic changes in the operation conditions always renders a generic soft-sensor inadequate. This study developed an adaptive soft-sensor with Moving Window and Time Differencing technique accounting for both of long-term and short-term information for modeling. At each step of model update, the most insensitive variables were removed by VIP (Variable importance in projection). With further integrating Variational Bayesian PLS (VBPLS) as predictive model, not just prediction values are obtained but also the credibility of information for hard-to-measure quantities can be generated. The proposed methodology was first demonstrated by applying the design algorithm to a WWTP simulated with the well-established model, BSM1, then extended to a real WWTP with data collecting from the field. Results showed that the proposed strategy significantly improved the prediction performance.

1. INTRODUCTION Soft-sensors are widely used to estimate variables that are difficult to measure online because of technical difficulty, large measurement delays, high investment cost, and so on.1−3 To build a proper relationship between easy-to-measure variables x and those that are hard-to-measure y, statistical methods including, but not limited to, partial least-squares (PLS),4 Principle Component Regression (PCR),5 nonlinear PLS,6 and support vector machine based regression7 are researched as the soft-sensor models. One of the bottlenecks limiting their widespread applications is that their prediction performance sometimes deteriorates due to the highly varying operating conditions. In response to this degradation, the Moving Window (MV) method8 has been proposed to collect the most recent and fairly long-term data for modeling during each step when new data points are coming. However, since the reconstructed models tend to specialize in predictions over a narrow data range, poor predictions could be expected once suffering from dramatic changes. In this methodology, the data collected for modeling become less informative about these high variations. Given the ability of timely adjustment to highly varying patterns, the just-in-time (JIT) model9,10 provided an alternative to deal with this problem. Nonetheless, to build a model with high accuracy when the prediction is affected by the drift or abrupt changes during the reconstruction of models, corresponding patterns are required to be involved in the training data set. One of the potential ways is to perform time differencing (TD) before modeling.11 In other words, the softsensor model is built based on the time difference of explanatory variables X and that of an objective variable y, which would potentially stationarize the data series and incorporate the latest variations to facilitate model construction. © 2014 American Chemical Society

However, due to the considerable characteristic difference before and after a state transition and too little information being included for model building, the prediction accuracy could be low when the system encounters new operating conditions, even though the rapid change points can be accounted for better than WM and JIT. Therefore, designing an adaptive model to account for both rapid change points and those in the new stable states is imperative. However, building an adaptive soft sensor with high predictive ability is very laborious, since input variables for model construction have to be selected carefully. Multivariate statistical models, such as PLS and PCR,12,13 commonly serve as soft-sensor models, with the goal of dealing with high dimension data sets. It was commonly stated that no feature selection was required for these methods. However, such attitude changes and it has been widely recognized that, in some situations, it can be an advantage to reduce the number of variables in order to obtain improvement of the model predictions, a better interpretation and lower measurement costs.14 Sometimes, they may even give a bit worse performance. However, adequate models with as few variables as possible are desired sometimes. Additional advantages of variable selection are relevant to reduce the risk of overfitting or for computational reasons. An optimal way to perform variable selection is to try all combination of variables and select the best ones.15 Unfortunately, this method is computationally prohibitive. Moreover, even if it was possible to test all Received: Revised: Accepted: Published: 338

September 25, 2014 November 19, 2014 December 2, 2014 December 2, 2014 DOI: 10.1021/ie503807e Ind. Eng. Chem. Res. 2015, 54, 338−350

Article

Industrial & Engineering Chemistry Research Y = ZQ + ϵY

combinations of variables, there is still a high risk of overfitting unless the number of samples are much larger than the number of combination of variables. For these reasons, a number of variable selection methods, such as Akaike’s information criterion (AIC),15 Bayesian information criterion (BIC),15 interval PLS (iPLS),16 Genetic algorithm (GA),17,18 and so on, have been devoted to find a good set of variables rather than the optimal one. Even if several techniques has been devoted to select the most informative variables, different sets of variables supposed to be used for different stage of model construction once states change frequently. Most of methods, especially GA, require a high variables/objects ratio and computational intensity for variable selection.17 It can happen that these models cannot be used since it would model noise instead of information. Also, for different run of GA algorithm, different variables could be derived for online model construction even with the same data set. It is, therefore, necessary to implement an reliable algorithm with low computational intensity for variable selection online. In this paper, MW was first presented to collect proper data for the adaptive model construction. To offset the large phase lag caused by the delay when using predefined window for selection, we borrowed the time differencing methods from ARMA model (Autoregressive−moving-average model), rather than the ideas proposed by Kaneko et al.19 This also added more potentials to break up the dynamic characteristics of the process and stationarize data series than purely depending on first order time differencing proposed by Kaneko et al.19 Additionally, integration of TD and MW would enable a new adaptive modeling method not only track dramatic changes but also to improve the prediction accuracy after having come to a new state. In order to reduce the risk of overfitting and computational impossibility, the VIP20 is implemented in the adaptive model at each step to get rid of the most insensitive variables. The idea behind this measure is to accumulate the importance of each variable being reflected by the weight from each component in the PLS model. To be consistent with the variable selection algorithm, a novel Bayesian PLS, termed as Variational Bayesian PLS (VBPLS),21 is presented as a softsensor model. The most attractive ability of VBPLS is to involve an automatic estimation of the optimal number of latent components in such way that the soft-sensor model can be updated without necessarily resorting to parameters setting by trial and error. Additionally, due to assuming all parameters with proper distributions and being inferred by Variational Bayesian method in the VBPLS, the uncertainties can be accessed by the prediction variance of VBPLS and further used to evaluate how reliable the predicted values are. In Section 2, the PLS and Variational Bayesian PLS models are introduced. Section 3 presents the proposed adaptive softsensor. The proposed soft-sensor is validated through two different data sets from different WWTPs with different dynamics in Section 4. Finally, Section 5 concludes.

where Z is a score matrix, G is an X-loading matrix, and Q is a y-loading matrix. ϵX is a matrix of X residuals, and ϵY is the matrix of Y residuals. The parameter b is identified using leastsquares method generally.

b = W (GW )−1Q

(3)

where W is an X-weight matrix. VIP score is always used as the measure of the importance of X-variables and implemented for variable selection. The VIP score for the ith variable is defined as follows:20 VIPi =

k ⎧ k ⎛ w ⎞⎫ ⎪ ⎪ ij ⎟⎟⎬/∑ SSj p ∑ ⎨SSj⎜⎜ 2 ⎪ ⎪ ⎝ || wj || ⎠⎭ j = 1 j=1 ⎩

SSj = qj2z′jzj

(4) (5)

where SSj is the sum squares explained by the jth component. (wij/∥wj∥2) represents the importance of the jth variable. k is the number of latent variables. 2.2. VBPLS. The soft-sensor model we used in this paper is a Bayesian format of PLS. In order to facilitate further Bayesian treatment of PLS, eq 1 is reformulated as follows: Z = XP + ϵZ

(6)

where P and Q are p × k and k × q loading matrices, Z is the N × k latent score matrix with elements zil and rows zi, and ϵZ and ϵY are the matrices of residuals. An intermediate k-dimensional latent space, k typically being lower than p and q. Together with eq 2, the definition of PLS is obtained. In this model, corresponding parameters are treated to specific distributions. Several normal and Wishart distributions for the latent and output variables, as well as for the loading matrices are defined. z′ ∼ N (x′P, Ω), Ω−1 ∼ ω(A, l), Pl ∼ N (0, Σl), −1

∑ ∼ω(Bl , υl)

(7)

l

y′ ∼ N (z′Q, Ψ), Ψ−1 ∼ ω(C , k), q j ∼ N (0, Γj), Γ−j 1 ∼ ω(Dj , ς)

(8)

with l = 1,...,k and j = 1,...,q, and where pl is the lth column of P and qj is the jth column of Q, A, Bl, C, Dj are the scale matrix hyperparameters of the Wishart prior distribution l, υl, k, ς are the corresponding degrees of freedom. The graphical representation by plate notation as shown in Figure 1. As illustrated in the paper,21 the corresponding parameters computation can be proceeded as follows: Initialize Z to the k first principal components. Initialize P, Q, and Z using PLS model. Compute the distribution of Ω−1, P, and ∑−1 using eq 9, 10, and 11.

2. PRELIMINARIES 2.1. PLS and VIP. PLS is an algorithm to relate a set of explanatory variables X to a response y through the linear relationship y = Xb + ϵ. For simplicity, we will concentrate on the single response, rather than multiple responses. A PLS model consists of the following two equations:20 X = ZG + ϵX

(2)

F(Ω−1) = ω(Ω−1; Ã −1 , l )̃

(9)

with l ̃ = l + N and A = (E[Z′Z] + E[P′X′XP] − μ′ZXμP − μ′PXμZ + A−1)−1, where E[Z′Z] = ∑nN= 1E[znz′n] = ∑nN= 1(Szn + μznμzn ′ ). ̃ −1

(1) 339

DOI: 10.1021/ie503807e Ind. Eng. Chem. Res. 2015, 54, 338−350

Article

Industrial & Engineering Chemistry Research

most correlated variable selection in the stationary variables; (4) construction of Variational Bayesian PLS model and making predictions at each step of window moving; (5) reconstruction of y(t) and its associated prediction variance from differencing values; (6) description of uncertainties through variances from Variational Bayesian PLS. 3.1. Moving Window to Collect the Most Recent Data for Modeling. The first step of this methodology is to involve the most recent and relative long-term information by using Moving Window method. Given a series of data, the first element of the Moving Window is obtained by taking the initial fixed and predefined subset of the number series. Then the subset is modified by “shifting forward”; that is, excluding the oldest or farthest sample of the series and including the newest or the next number following the original subset in the series. This creates a new subset of numbers for model reconstruction at each step. This process is repeated over the entire data series. Since the data collection horizon remains of the same length as before, but slides along by one sampling interval at each step, this way of collecting data is, also, called a receding horizon strategy. The challenged task for Moving Window method, as a matter of fact, lies in the selection of its window width. Through proper selection, sufficient information is able to be included to further facilitate model construction at each step of window moving. Given that a similar pattern shows up in a cycle way always in many chemical processes, such as a wastewater process, batch processes, and so on, the window width is preferred to cover such a pattern totally. For example, due to very similar diurnal work and rest time for citizens, flow and concentrations exhibit a relative similar pattern every day in the wastewater treatment process. A window width supposed to cover the data during 1 day for sequential model building. 3.2. Time Differencing Preprocess. WM technique is, in fact, prone to fit into stationarized series, in which the training set having abnormal data could otherwise deviate the constructed model severely. As a result, a remedy is to introduce TD to break up such behaviors properly. The time differencing modeling method for a soft-sensor was proposed by Kaneko et al.22 intending to differencing the explanatory data x(t) and the objective data y(t) before performing regression modeling. This will potentially enable the resulting model to track the dramatic changes as quickly as possible. In fact, such method is somehow similar to the first order differencing in the ARIMA model.23 However, due to the gross features of seasonality in many processes with a high sample rate, the first order differencing data alone could not be able to stationarize the series properly. Therefore, we further introduce seasonality removal method into time differencing method in this paper as follows:

Figure 1. Bayesian hierarchy of the proposed Bayesian PLS model, where Z lies in the latent space.

Due to assuming Ω−1 being diagonal, we can simplify F(P) = Πl k= 1F(pl), for each factor, we have F(pl ) = N (pl ; μpl , Spl) −1

with Spl = (E[∑ ] + −1

−1

(10)

−1 E[Ω−1 ll ]X′X)

and μpl =

E[Ω−1 ll ]SplX′μZ.l

̃ −1

F(Σ ) = ω(Σ ; B ,υ ̃

(11)

with B̃ −1 = (E[PP′] + B−1)−1 and υ̃ = υ + k. Compute the distributions of Ψ−1, Q, and Γ−1 using eqs 12, 13, and 14 −1 F(Ψ−1) = ω(Ψ−1; C̃ , k)̃

(12)

̃ −1

−1 −1

with C = (Y′Y + E[Q′Z′ZQ] − Y′μZμQ − μ′Qμ′ZY + C ) and k̃ = k + N. Due to choosing to factorize F(Q), which follows from taking Ψ−1 to be diagonal, F(Q) = Πj k= 1 F(qj), the corresponding distributions of qj is F(q j) = N (q j; μqj ), Sq )

(13)

j

−1

with Sqj = (E[Γ ] + μ′ZYj

−1 E[Ψ−1 jj ]E[Z′Z] )

and μqj =

E[Ψ−1 jj ]Sqj

F(Γ−1) = ω(Γ−1; D̃ −1, ς)

(14)

with D̃ −1 = (E[QQ′] + D−1)−1 and ς̃ = ς + q. Compute the distribution of Z using eq 15 F(zn) = N (zn ; μzn , Szn) −1

(15) −1

−1

with SZn = (E[Ω ] + E[QΨ Q′]) and μzn = Szn(E[Ω ]μ′Pxn + μQE[Ψ−1]yn). We can compute q

E[QΨ−1Q′] = μQ E[Ψ−1]μQ′ +

q

∑ ∑ E[Ψ−j j1 ]SQ 12

j1 = 1 j2 = 1

j1j2

(16)

Repeat steps 3−5 until convergence. Calculate prediction using eq 8.

x d(t ) = x(t ) − x(t − 1) − μx (t )

(17)

3. AN ADAPTIVE SOFT-SENSOR DEVELOPMENT BASED ON VBPLS MODEL WITH ACCOUNTING FOR FEATURE SELECTION DYNAMIC CHARACTER ONLINE To establish a specific adaptive model to describe a process with dramatic changes or drift. Six steps are involved, namely: (1) implementation of Moving Window to select recent sample data for further modeling online; (2) differencing X(t) and y(t) to stationarize the series; (3) implementation of VIP for the

yd(t ) = y(t ) − y(t − 1) − μy (t )

(18)

where x(t) and y(t) are the raw data with their general trend data μx and μy, for example, the typical diurnal variation of wastewater flow in the WWTP. Then the relationship between explanatory variables and the objective variable can be modeled as follows: yd (t ) = f (xd(t )) + e(t ) 340

(19) DOI: 10.1021/ie503807e Ind. Eng. Chem. Res. 2015, 54, 338−350

Article

Industrial & Engineering Chemistry Research

Figure 2. Schematic of the proposed soft-sensor.

yd(t + 1) = f (x d(t + 1)) + e(t + 1)

where f is the regression model and e(t) is the associated errors at the time t. Similarly, the prediction of differencing of the objective variable can be obtained 341

(20)

DOI: 10.1021/ie503807e Ind. Eng. Chem. Res. 2015, 54, 338−350

Article

Industrial & Engineering Chemistry Research

instead of a conformity measurement in this paper. In the sense, VIP is implemented for the removal of the most irrelevant, noisy or unreliable variables, which contribute more to the poor prediction and model complexity. Therefore, only a few of the variables with the very lowest VIP values supposed to be removed. If the model performance improves, the method can be repeated on the reduced data set until no further improvements are found. Due to the use of VIP for the nonconformity measurement, it is recommended to keep the VIP limit below 0.5, which is obtained by trial and error. 3.4. Soft-Sensor Implementation with VBPLS. Many different strategies can be employed for soft-sensor model, with PLS being very commonly used methods in the chemical processes due to its ability to deal with high dimension variables. Given the fact that PLS model is built without considering the uncertainties of its parameters identified in general, Variational Bayesian was chosen for PLS model parameter identification in this paper, which reformulates PLS model as VBPLS. VB method is, in fact, a way to approximate a posterior distribution which has a highly complex form for which expectations are not analytically tractable.28,29 In this paper, the computation of latent variables together with all other parameters are treated as posterior calculation and identified by VB. Also, due to employing VB for parameter identification, the risk of overfitting is capable of being reduced.29 Any task in the chemical processes presupposes the availability of a working process model perfectly connecting to the reality. However, it is impossible to have a complete knowledge reveals itself through inaccuracies in the parameters of the model. In fact, a model is only an approximate abstraction of a process. As a result, there is no perfect model that precisely describes a physical process. This is commonly known as model uncertain. Given the uncertainties of a model are generally unavoidable, a first step is to qualify uncertainties, while yielding high quality outputs within the capacity and limitation of the system. For example, if the prediction value is 3 units, and uncertainty value is 0.1, then there is 99% chance that the prediction value lies within the interval 3 ± 0.1 units. Uncertainty is based on existing metrological standards, and is calculated based upon all error sources affecting the online measurement such as model parameters, process noise and measured errors.30 Thus, confidence values are a good alternative to indicate how likely of each prediction is of being correct. One can estimate the uncertainty of corresponding predictions by computing the variances of y from the VBPLS model. Overall steps for proposed soft-sensor implementation are shown in Figure 2. First, once a new sampled data is available, the window is moving one step ahead to involve the new sample and get rid of the oldest one. Second, making differentiating is implemented for the data during the window. Third, VIP is performed in the resulting data samples. All these steps have been given more details in Figure 2. Fourthly, VBPLS is identified and constructed, then used for prediction. Finally, the predicted values and obtained variances are recovered by dedifferentiating.

Moreover, the real prediction value can be reconstructed by dedifferencing as follows: y(t + 1) = yd(t + 1) + y(t ) + μy (t + 1)

(21)

The basis of differentiating is the recognition that many chemical systems exhibit circular and time-delayed relationships. This exactly hold true and has been proven by many published papers already.24,25 Due to such circular and timedelayed relationships, it is plausible to believe coordination of past and current inputs would facilitate modeling. By constructing time difference models, the negative effects of deterioration with age such as the drift and dramatic changes in the state of plants can be broken up significantly. This mainly results from the fact that autocorrelation of the variable can be removed and corresponding process can be stationarized, then ease the model complexity. Also because of time differencing, the prediction values are able to follow the state of the plant with significant changes smoothly. Normally, the correct amount of differencing is the state that yields a time series which fluctuates around a well-defined mean value. Noted that If the series still exhibits a long-term trend, or otherwise lacks a tendency to return to its mean value, then it needs a higher order of differencing as follows: ydh(t ) = y(t ) − 2y(t − 1) + y(t − 2) − μy (t )

(22)

y(t ) = ydh(t ) + 2y(t − 1) − y(t − 2) + μy (t )

(23)

x(t) has similar format as y(t). For most of cases, in fact, the differencing behavior like eq 17 and 18 is enough to tackle almost all the nonstationary series.26 More consideration about whether or not to perform further differencing is always resorting to evaluate the behavior of the autocorrelation function (ACF) and partial autocorrelation function (PACF) of the data set x(t), x(t − 1),...,x(t − n) and y(t), y(t − 1),...,y(t − n), as previously described in the published papers.26,27 Time differencing method is, in fact, an efficient way to track changes of the state in a plant. However, even using the highest differencing order, the information considered in the model is merely two steps backward. For example, in order to predict y(t), only y(t − 1) and y(t − 2) are able to be used for model construction if depending merely on time differencing method with the highest order, i.e. the second order. This limited information would worsen the prediction accuracy when the process come to a new and stable state. However, the Moving Window, which is more suitable for stationarized process, could cover more information in need for the new state. These trigger this paper to combine MW and TD to formulize a new method to select proper data for soft-sensor modeling. 3.3. VIP. In order to reduce the risk of overfitting and simplify the recursive model when performing model reconstruction in the process of window moving, VIP method is assimilated into online model reconstruction in this paper. Generally, a VIP smaller than one indicates a nonimportant variable, which could probably be removed.15 It is not advisible to simply remove everything below one, but to select proper variables under a threshold between 0.83 and 1.21 proposed by Mehmood et al.20 However, this proposal is more suitable for high-dimension data sets, for example more than one hundred. Also, it is important to inspect into the model appearance before a variable is included in the model or thrown away.15 Given significant fluctuations of operational conditions in the processes, VIP is performed for a nonconformity measurement

4. CASE STUDIES Two simulation studies to access the performance of proposed soft sensor were presented. The first case study represents a highly instrumented WWTP with significant dynamic behaviors at 15 min sampling rate, whereas the second one represents a 342

DOI: 10.1021/ie503807e Ind. Eng. Chem. Res. 2015, 54, 338−350

Article

Industrial & Engineering Chemistry Research

Figure 3. Schematic of wastewater plant for BSM1.

Figure 4. Flow rate and NH4++NH3 nitrogen concentrations profiles. Left: Flow rate. Right: NH4++NH3 nitrogen concentrations.

Table 1. Parameter Setting up and Time Consumed by Four Types of Models type of model PLS VBPLS MV-VIP-VBPLS proposed model

time consumed (min)

param. No. of components = 6 No. of steps: options.cyc = 200. The stopping error: options.tol = 0.001. The maximum No. of components: options.k = 14. The standardized method: options.initialization = ‘pls’; options.adaptive = 1 No. of steps: options.cyc = 200. The stopping error: options.tol = 0.001. The maximum No. of components: options.k = 14. The standardized method: options.initialization = ‘pls’; options.adaptive = 1; vip_limit = 0.25; window_length = 96 No. of steps: options.cyc = 200. The stopping error: options.tol = 0.001. The maximum No. of components: options.k = 14. The standardized method: options.initialization = ‘pls’; options.adaptive = 1; vip_limit = 0.25; window_length = 96

0.17 1.02 3.1 4.9

4.1.2. Scenario Definition. The aim of this case study is to develop a soft-sensor to ensure suitable predictions for BOD5 regardless of dry or wet weather. Prediction performance was accessed by comparing proposed soft-sensor with three other models (PLS, VBPLS and MV-VIP-VBPLS) under three typical weather conditions (dry weather, rainy weather and storm weather). In this case, 14 important variables were selected for model constructing shown as the Table S1 in the Supporting Information. Each variable was sampled every 15 min. The BSM1 was simulated for 14 days, half of which was used for training, while the other was for validating. 4.1.3. Results and Discussion. (1). Prediction Performance and Results. (1) Data Preprocess. As depicted in Figure 4, the historical data sets revealed that, during dry-weather periods, daily flow and nitrate concentration follow a very similar pattern which is also observed in other variables (not shown). In this respect, the typical data profile for each variable under dry weather conditions was generated. This was done by stacking all the historical data in one set (from 0 to 1 day), and from it calculating the average flow at each time step. Closer

situation in which the WWTP is instrumented lowly and data is sampled at interval of 1 day. The computer configuration is as follows: OS: Windows 7 (32 bit), CPU: i3, RAM: 4G, and MATLAB 2011a was employed. 4.1. Benchmark Simulation Model 1 (BSM1). 4.1.1. Background. The simulation platform used is the BSM1 developed by the second IWA Respirometry Task Group together with (COST) 682/624 Actions, offering an unbiased benchmarking system for comparing various control strategies (http://www.benchmarkwwtp.org/). In this study, a plant layout was selected in the simulation benchmark, consisting of five compartment biological tanks (5999 m3) and a secondary settler (depth 4 m, 10 layers, 6000 m3) to deal with average flow of 20 000 m3 /day and an average biodegradable chemical oxygen demand (COD) concentration of 300 mg/L. To remove organic matter, nitrification and denitrification were involved. The last three compartments of the bioreactor are aerated, whereas the others are not (Figure 3). 343

DOI: 10.1021/ie503807e Ind. Eng. Chem. Res. 2015, 54, 338−350

Article

Industrial & Engineering Chemistry Research

Figure 5. Prediction profiles of BOD5 under dry weather.

Figure 6. Prediction profiles of BOD5 under rainy weather.

analysis of the data sets showed that daily data from Monday to Friday were very similar, whereas a slightly different pattern (shift in amplitude) was observed on the weekend profiles. Therefore, two typical data profiles were generated for each variable; one for weekdays and one for weekends. Such pattern is consistent with the diurnal water consumptions by inhabitants. By subtracting typical data from historical data, dynamic behaviors during the wastewater plant data set were capable of being broken up, thereby providing better data to construct a predicted model. In fact, typical data served as μX and μy in the stage of differencing. The window width was initialized as Table 1. Through the window, data for modeling locally was selected. Due to the seasonality feature involved in the historical data in such window, the typical data for input and output variables was

removed accordingly. The resulting data was then differenced and stationarized using eqs 17 and 18. The final step focused on employment of VIP to get rid of all the most insensitive variables at such window. Overall, the most stationarized and sensitive variable can be obtained for sequential modeling. (2) Performance under Different Weather Conditions. First, the proposed soft-sensor was validated and compared with other three models under dry weather conditions. For most of cases, a wastewater treatment plant is operated continuously under dry weather. As can be seen in Figure 5a, the PLS model made a acceptable good prediction with RMSE and correlation coefficient being 0.0629 and 0.982, respectively. However, it is obvious that the prediction deteriorated when the process attained the peak or valley, which could arise from the negative influence of strong nonlinear characteristic in the 344

DOI: 10.1021/ie503807e Ind. Eng. Chem. Res. 2015, 54, 338−350

Article

Industrial & Engineering Chemistry Research

Figure 7. Prediction profiles of BOD5 under stormy weather.

rainy weather, and stormy weather in this case, four predictive models were considered. VBPLS achieved fairly better results than PLS, suggesting that the Variational Bayesian methods for parameter identification is indeed capable of improving the prediction performance. Also, the automatic ability to search for the optimum number of latent variables acted as an important and positive factor for the predictions. However, for the wet weather (rainy weather and stormy weather), the performance of VBPLS deteriorated, mainly due to the fact that VBPLS was a model using offline training without considering the feature changes online. Even though automotive latent selection was involved, the adaptive operation range is still limited. Then, MV and VIP methods were used for data preprocess and also expected to enhance the predictive ability of VBPLS. In this methodology, the predictive model was updated once a new data point was coming. The data for modeling was filtered from two folds: the most time-correlated data points chosen by Moving Window and the most feature-correlated variables chosen by VIP. Unfortunately, the prediction of BOD5 worsen significantly in some points, even though it is capable of approximating values in the peak of rainy and stormy weather better. This is mainly arises from the fact that MV method is more suitable for stationary series, rather than series with dramatic changes. Also, the data selection behaviors for modeling are implemented too frequently, in fact, thus potentially leading to overfitting in the construction of models. Given the fact that the WWTPs are not just operated in dry weather but rather present dramatic changes in the wet weather, it is improper for the MV method to describe such obvious changes with just the relative long-term information. To deal with this problem, the time differencing method was considered in the proposed soft-sensor. By doing so, both of long-term and short-term information in the window were taken into account, offering useful data to correct the delay of information. To provide more advantages, Figure 8 from day 8 to day 9 reveals that, by performing time differencing, the data under all three weather conditions was almost stationarized and close to zero. Even though few rapidly increased or decreased

process. VBPLS and MV-VIP-VBPLS were also simulated, showing that VBPLS improves the prediction performance significantly, whereas a relative poor performance was obtained for MV-VIP-VBPLS in terms of RMSE and r (Figure 5b and c). Obviously, simple assimilation of Moving Window and VIP made the performance of VBPLS worsen. To illustrate the efficiency of the proposed soft-sensor, its predictions were performed in as shown in Figure 5d, indicating that the proposed soft-sensor achieved the best performance with the RMSE and r being 0.9979 and 0.0217, respectively. Second, to further illustrate the performance of different prediction methods, the prediction profiles of BOD5 were presented under wet weather conditions (rainy weather and storm weather). In the rainy weather, the variation of BOD5 would change dramatically and last for long time. This requires the soft-sensor not only to track the abrupt changes accurately but also to predict the BOD5 under this new state in the next few days. In comparison, in Figure 6, the proposed soft-sensor was able to estimate BOD5 accurately, even in the peaks in terms of RMSE and r being 0.047 and 0.998, respectively, whereas relative poor results were obtained in three other models. It also worthy to note that MV-VIP-VBPLS achieved rather poor predictions sometimes in comparison with all other scenarios. The differences between the stormy weather and the rainy weather conditions for BOD5 lie in the fact that, in the stormy weather, significant BOD5 peak (up to 6 g COD/m3) existed in the data and fell downward to valley quickly in a short period. The predicted concentrations of BOD5 are shown in Figure 7, indicating that all of PLS, VBPLS, and MV-VIP-VBPLS were not able to capture the peaks in the stormy weather. MV-VIPVBPLS made the worst performance with significant deviations in some points. Once the storm stopped, the performance of MV-VIP-VBPLS was not recovered as expected. In comparison, the proposed soft-sensor achieved the best performance not only in the dry state and stormy state. (2). Discussion. (1) Results Discussion. In order to predict BOD5 at the discharge of the WWTP during the dry weather, 345

DOI: 10.1021/ie503807e Ind. Eng. Chem. Res. 2015, 54, 338−350

Article

Industrial & Engineering Chemistry Research

It is important to note that the best performance of proposed soft-sensor was achieved at the cost of time in fact. However, compared with the sample rate of data, the consuming time was acceptable as shown in Table 1. It is, also, important to emphasize that, due to the model structure similarity, PLS is used as a pretraining tool to initialize the parameters necessary to be identified, amounting to a constraint on the region in parameter space where a solution is allowed. The constraint is able to force the solutions near the right ones and capture significant statistical structure in the input.31 (3) Uncertainty Description. Dissimilar to a generic softsensor, uncertainty description ability was added upon the proposed soft-sensor. Thus, to further validate the effectiveness of uncertainties, the standard variances of prediction obtained from VBPLS were used to calculate the confident interval. In an industrial process, it is common to encounter abrupt noise and kinds of uncertainties when the online analyzer is calibrating and running. There, therefore, is a need to check the output of soft sensor. The confidence intervals of predictions are capable of showing how reliable the prediction it is, as revealed in Figure 10. Results in Figure 10 clearly show that the confident intervals of the predictions from proposed soft-sensor become narrow with the negative effect of dramatic changes in these processes resulting from extreme weather, in the sense that the confidence for the predictive values obtained by proposed softsensor was not as high as previous sections. 4.2. A Real Wastewater Treatment Plant. 4.2.1. Background. In difference to previous case study with highly instrumented installation, the application considered is a simple Activated Sludge WWTP aiming to remove organic matter and nutrients. In this process, the influent rate and the population of microorganisms (both in quality and number of species) vary overtime, process knowledge is very limited and data is collected at 1 day interval because of weak instrumentation. Furthermore, an online analyzer tends to be unavailable due to its sensitiveness in weather conditions and seasonal changes. Such complexity and fluctuations often result in the degradation of the online analyzer performance or even failure. These are very common for small WWTPs in the rural areas. As shown in Figure 11, the proposed wastewater plant process32,33 comprises of four elements: pretreatment, primary settlers,

Figure 8. Stationarized profiles of BOD5 under different weather scenarios.

points were not stabilized as well as others, their values were reduced significantly. These would facilitate sequential modeling essentially. (2) Parameter Setting. In all three weather conditions, the parameters for four types of models were set up as shown in Table 1. In this case study, the number of components for PLS was obtained by trial and error. Such a laborious work would be improper for data with high dimensions. In contrast, VBPLS is not necessary to set any important parameters, except the number of steps for recycle, which was set up at 200 to ensure VBPLS being convergent. The window_length of MV was controlled to 96 covering the information on 1 day. As for the VIP limit, it was set to make sure the most insensitive variables not involved in the reconstruction of model. It is obvious in Figure 9 that most of variables were selected (variable 1, 2, 4, 5, 7, 8, 10, 13, 14), whereas five variables were selected with lower frequency. Note that the third variable was considered as an important input variable for the soft-sensor modeling under wet weather but ignorable for the dry weather condition. This necessitates the selection of input variables for soft-sensor modeling online when suffering from operational condition changes.

Figure 9. Total number of times being selected for each input variable in the period of validation. 346

DOI: 10.1021/ie503807e Ind. Eng. Chem. Res. 2015, 54, 338−350

Article

Industrial & Engineering Chemistry Research

to maintain an appropriate level of biomass, allowing the oxidation of the organic matter. The remaining sludge is purged. A lot of variables related with the organic matter and microorganisms are measured in the plant, giving a lot of information that is difficult to manage. This plant treats a flow of 35 000 m3/d wastewater approximately. 4.2.2. Scenario Definition. Typically, a wastewater treatment plant in the rural areas is always lowly instrumented. Therefore, manual measurement has been an important way to validate if the effluent of WWTP is under well conditions. Given the delay of measurements, soft-sensors provide an alternative to address this issue. Prediction performance was accessed by comparing proposed soft-sensor with three other models (PLS, VBPLS and MV-VIP-VBPLS). A total of 527 days data daily from online sensor have been recorded, each consists of 38 process variables. Of these samples, 200 were utilized for training and another 200 samples were used to test the performance of the proposed soft sensor. An objective variable y is the concentration of BOD at the effluent of WWTP, and explanatory variables X are 37 input variables, which are tabulated as Table S2 in the Supporting Information. 4.2.3. Results and Discussion. (1). Prediction Performance and Results. The use of proposed soft-sensor for BOD prediction was tested in this case study. The window width was initialized as previous case study shown at Table 1. Through the window, data for modeling locally was selected. Then, data collected by each window was processed by the differencing and VIP accordingly. Finally, the most stationarized and sensitive variable can be obtained for sequential modeling. In this study, VBPLS was used as a local model. Under normal conditions, the proposed soft-sensor was able to predict the variations with the best performance in terms of RMSE and r compared with PLS, VBPLS, and MV-VIP-VBPLS as Figure 12. It worthy to note VBPLS had a better performance than PLS, suggesting that the prediction ability can be indeed improved by Variational Bayesian. It is pity that MV-VIP-VBPLS did not make any improvement as expected. In the Section 4.1, prediction using proposed soft-sensor for wet weather conditions was considered. In such scenario, the corresponding variations would get involved as the inputs of the soft-sensor and benefit its sequential model building. However, there is still a possibility that such critical variations do not favor any input variables x, therefore leading to

Figure 10. Prediction profiles of BOD5 under different weather with confident intervals.

aeration tankers, and secondary settlers. After the primary settler, wastewater is treated in the bioreactor first, where the level of substrate is reduced by the action of the microorganisms. Second, the wastewater flows to a secondary settler for the biomass sledges settlement. Thus, clean water locates at the top of the settler and is carried out of the plant. A fraction of the sludge is returned to the input of the bioreactor in order

Figure 11. Wastewater plant for validation. 347

DOI: 10.1021/ie503807e Ind. Eng. Chem. Res. 2015, 54, 338−350

Article

Industrial & Engineering Chemistry Research

Figure 12. Prediction profiles of BOD using four types of models.

Figure 13. Prediction profiles of BOD using VBPLS, MV-VIP-VBPLS, and the proposed model when suffering from abrupt changes and drifts.

unexpected prediction. Abrupt changes and drifts are two common types of process variations in the wastewater treatment. The proposed soft-sensor was tested together with other two methods for the abrupt changes and drifts in the

following section. Figure 13c and f clearly suggest that the proposed soft-sensor obtained the best fit for not only abrupt changes but also drift with RMSE being 17.6 and 8.56, respectively. However, due to using offline information for 348

DOI: 10.1021/ie503807e Ind. Eng. Chem. Res. 2015, 54, 338−350

Article

Industrial & Engineering Chemistry Research

Moving Window to improve the prediction ability for sequential soft-sensor model construction. The parameters for four types of models were set up as in Table 1, except some parameters with the number of components for PLS equaling to 11, the maximum number of components for VBPLS equaling to 36, and window_length equaling to 80. The window_length of Moving Window was set up depending on the season variation feature of data. The total number of times being selected for each input was shown in Supporting Information Figure S1, suggesting the necessity of implementation of online input variable selection.

VBPLS modeling without considering online changes, VBPLS failed to track the variations totally [Figure 13a and d]. For the MV-VIP-VBPLS model, it is capable of catching the trends of abrupt changes or drifts only if enough steps have been taken, as shown in Figure 13b and e. Therefore, significant delays were happening if using MV-VIP-VBPLS model for predictions. This is mainly resulting from the fact that only the relative long-term information was included in the Moving Window, thereby confusing the latest variations, which would otherwise enhance the soft-sensor model construction. (2). Discussion. (1) Results Discussion and Parameter Setting. In this case study, BOD at the discharge of WWTP was used to access the prediction performance of the proposed softsensor. Four types of predictive models (PLS, VBPLS, MVVIP-VBPLS) were considered. VBPLS achieved fairly better results than PLS resulting from the fact the Variatiobal Bayesian methods for parameter identification and the automatic ability to search for the optimum number of latent variables were involved in the algorithm. However, rather than online information, offline training data was considered for model construction. Lack of feature variation information in the explanatory variables adds further complexity for accurate soft-sensor modeling in this case study. For both of abrupt changes and drifts, VBPLS failed to catch these variations, even through automotive latent selection. Even if the Moving Window method was involved, the predictive model did not perform as well as expected. The MW model cannot capture the latest variations between x and y because the number of data for the model construction was set to 80. In our methodology, time differencing method was considered in the proposed soft-sensor. Therefore, not only long-term information in the window but also short-term information were taken into account and further used to correct the disruption of information delay. Also, the variable series can be stationarized

Figure 15. Prediction profiles of BOD under three scenarios (normal state, abrupt changes, and drift) with confident intervals.

Figure 14. Comparisons of BOD series under normal state and the ones with time differencing under abrupt changes and drift.

(2). Uncertainty Description. To further validate the effectiveness of uncertainties, the standard variances of prediction obtained from VBPLS were used to calculate the confident interval once performing a new VBPLS model in a Moving Window. As revealed in Figure 15, when the process suffered from abrupt changes and drift, the confidence intervals of predictions can still be obtained capable of showing how reliable the prediction it is. Similar to the previous case study, results clearly show that the confident intervals of the predictions from proposed soft-sensor become narrow with negative effect of dramatic changes in these processes.

by the time differencing. Following the case study in the previous section, the BOD series in one window, including both of abrupt and drift variations, was taken for example as Figure 14. Obviously, Figure 14 reveals that time differencing almost broke up the variations of abrupt changes and drift properly, making sure variable be stationarized reasonably even if few transitional points were still unstable. However, it is worthy to emphasize that the values of few points with rapid variations were reduced dramatically. This would facilitate the use of 349

DOI: 10.1021/ie503807e Ind. Eng. Chem. Res. 2015, 54, 338−350

Article

Industrial & Engineering Chemistry Research

(10) Kaneko, H.; Funatsu, K. Classification of the degradation of soft sensor models and discussion on adaptive models. AIChE J. 2013, 59, 2339. (11) Kaneko, H.; Funatsu, K. Maintenance-free soft sensor models with time difference of process variables. Chemom. Intell. Lab. Syst. 2011, 107, 312. (12) Ergon, R. Reduced PCR/PLSR models by subspace projections. Chemom. Intell. Lab. Syst. 2006, 81, 68. (13) Zhang, J. Offset-free inferential feedback control of distillation compositions based on PCR and PLS models. Chem. Eng. Technol. 2006, 29, 560. (14) Facco, P.; Doplicher, F.; Bezzo, F.; Barolo, M. Moving average PLS soft sensor for online product quality estimation in an industrial batch polymerization process. J. Process Control 2009, 19, 520. (15) Andersen, C. M.; Bro, R. Variable selection in regressionA tutorial. J. Chemom. 2010, 24, 728. (16) Nørgaard, L. S. A.; Wagner, J.; Nielsen, J. P.; Munck, L.; Engelsen, S. B. Interval partial least-squares regression (iPLS): A comparative chemometric study with an example from near-infrared spectroscopy. Appl. Spectrosc. 2000, 54, 413. (17) Kaneko, H.; Funatsu, K. A new process variable and dynamics selection method based on a genetic algorithm-based wavelength selection method. AIChE J. 2012, 58, 1829. (18) Riccardo, L.; Amparo, L. G. Genetic algorithms applied to feature selection in PLS regression: how and when to use them. Chemom. Intell. Lab. Syst. 1998, 41, 195. (19) Kaneko, H.; Funatsu, K. Development of soft sensor models based on time difference of process variables with accounting for nonlinear relationship. Ind. Eng. Chem. Res. 2011, 50, 10643. (20) Mehmood, T.; Liland, K. H.; Snipen, L.; Sæbø, S. A review of variable selection methods in Partial Least Squares Regression. Chemom. Intell. Lab. Syst. 2012, 118, 62. (21) Vidaurre, D.; van Gerven, M. A. J.; Bielza, C.; Larrañaga, P.; Heskes, T. Bayesian sparse partial least squares. Neur. Comput. 2013, 25, 3318. (22) Kaneko, H.; Funatsu, K. A soft sensor method based on values predicted from multiple intervals of time difference for improvement and estimation of prediction accuracy. Chemom. Intell. Lab. Syst. 2011, 109, 197. (23) Box, G. E. P.; Jenkins, G. Time Series Analysis, Forecasting, and Control. Holden-Day, Inc.: San Francisco, 1990; p 500. (24) Tufa, L. D.; Ramasamy, M. Closed-loop identification of systems with uncertain time delays using ARX−OBF structure. J. Process Control 2011, 21, 1148. (25) Sun, Y.; Babovic, V.; Chan, E. S. Multi-step-ahead model error prediction using time-delay neural networks combined with chaos theory. J. Hydrol. 2010, 395, 109. (26) Chen, J.; Ganigué, R.; Liu, Y.; Yuan, Z. Real-time multistep prediction of sewer flow for online chemical dosing control. J. Environ. Eng. 2014, 140, 0401. (27) Brockwell, P. J.; Davis, R. A. Time Series: Theory and Methods; Springer Verlag: New York, 2009. (28) Liu, Y.; Pan, Y.; Sun, Z.; Huang, D. Statistical monitoring of wastewater treatment plants using variational Bayesian PCA. Ind. Eng. Chem. Res. 2014, 53, 3272. (29) Bishop, C. M. Pattern Recognition and Machine Learning, 1st ed; Springer: New York, 2006. (30) Liu, Y.; Huang, D.; Li, Y. Development of interval soft sensors using enhanced just-in-time learning and inductive confidence predictor. Ind. Eng. Chem. Res. 2011, 51, 3356. (31) Bengio, Y. Learning deep architectures for AI. Found. Trends Mach. Learn. 2009, 2, 1. (32) Blake, C. L.; Merz, C. J. UCI Repository of Machine Learning Databases; University of California: Irvine, CA, 1996; http://www.ics. uci.edu/~mlearn/MLR-epository.html (accessed Feb. 18, 1998). (33) Belanche, L.; Sànchez, M.; Cortés, U.; Serra, P. A knowledgebased system for the diagnosis of waste-water treatment plants. Ind. Eng. Appl. Artif. Intell. Exp. Syst. 1992, 604, 324.

5. CONCLUSIONS The developed adaptive soft-sensors were successfully used for prediction in the highly and lowly instrumented WWTPs with different characteristic. Due to proper coordination of the MW, TD, and VIP techniques, the proposed soft-sensor was able to track rapid changes of state and even make better predictions when process become stable, compared with PLS, VBPLS, and MV-VIP-VBPLS, in terms of RMSE and r. With VBPLS being a predictive model, the prediction values together with the credibility of information for hard-to-measure quantities can be generated in the proposed soft-sensor. This study further demonstrated the potentials for usage of the proposed softsensors to extreme weather conditions and undetected changes of states in the WWTP, even the possibility to extend to the batch processes with frequent operation mode changes



ASSOCIATED CONTENT

S Supporting Information *

This material is available free of charge via the Internet at http://pubs.acs.org.



AUTHOR INFORMATION

Corresponding Author

*Tel: +86 13610008387. Fax: +86 20 87114189. E-mail: [email protected]. Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS This work was supported by the National Natural Science Foundation of China (61403142), the Specialized Research Fund for the Doctoral Program of Higher Education of China (20120172110026), and the Fundamental Research Funds for the central Universities, SCUT (2014ZB0028, 2014ZZ0043). Besides, many thanks are also given to the reviewers for this paper.



REFERENCES

(1) Liu, Y.; Huang, D.; Li, Z. A SEVA soft sensor method based on self-calibration model and uncertainty description algorithm. Chemom. Intell. Lab. Syst. 2013, 126, 38. (2) Kadlec, P.; Gabrys, B.; Strandt, S. Data-driven Soft Sensors in the process industry. Comput. Chem. Eng. 2009, 33, 795. (3) Kano, M.; Nakagawa, Y. Data-based process monitoring, process control, and quality improvement: Recent developments and applications in steel industry. Comput. Chem. Eng. 2008, 32, 12. (4) Wold, S.; Sjöström, M.; Eriksson, L. PLS-regression: A basic tool of chemometrics. Chemom. Intell. Lab. Syst. 2001, 58, 109. (5) Salahshoor, K.; Komari Alaei, H. A new on-line predictive monitoring using an integrated approach adaptive filter and PCA. 4th International Workshop on Soft Computing Applications (SOFA); IEEE: New York, 2010. (6) Durante, C.; Cocchi, M.; Grandi, M.; Marchetti, A.; Bro, R. Application of N-PLS to gas chromatographic and sensory data of traditional balsamic vinegars of modena. Chemom. Intell. Lab. Syst. 2006, 83, 54. (7) Yan, W.; Shao, H.; Wang, X. Soft sensing modeling based on support vector machine and Bayesian model selection. Comput. Chem. Eng. 2004, 28, 1489. (8) Dayal, B. S.; MacGregor, J. F. Recursive exponentially weighted PLS and its applications to adaptive control and prediction. J. Process Control 1997, 7, 169. (9) Cheng, C.; Chiu, M.-S. Nonlinear process monitoring using JITLPCA. Chemom. Intell. Lab. Syst. 2005, 76, 1. 350

DOI: 10.1021/ie503807e Ind. Eng. Chem. Res. 2015, 54, 338−350