Reservoir Computing with Sensitivity Analysis Input Scaling

Mar 27, 2014 - Science and Technology, Shanghai 200237, People's Republic of China. •S Supporting Information. ABSTRACT: Although reservoir computin...
0 downloads 10 Views 2MB Size
Article pubs.acs.org/IECR

Reservoir Computing with Sensitivity Analysis Input Scaling Regulation and Redundant Unit Pruning for Modeling Fed-Batch Bioprocesses Heshan Wang and Xuefeng Yan* Key Laboratory of Advanced Control and Optimization for Chemical Processes of Ministry of Education, East China University of Science and Technology, Shanghai 200237, People’s Republic of China S Supporting Information *

ABSTRACT: Although reservoir computing (RC) is an effective approach to designing and training recurrent neural networks, the optimization of neural network systems involves a number of manual, tweaking or brute-force searching parameters, such as network size, input scaling parameters, and spectral radius. To create an optimal echo state network (ESN), we propose a modified RC that combines sensitivity analysis input scaling regulation (SAISR) and redundant unit pruning algorithm (RUPA). SAISR is first employed to obtain the optimal input scaling parameters. In SAISR, an ESN without tuning is established, and then its input scaling parameters are tuned based on the Sobol’ sensitivity analysis. Second, RUPA is employed to prune out the redundant readout connections. A fed-batch penicillin cultivation process is chosen to demonstrate the applicability of the modified RC. The results show that the input scaling parameter has a more important influence than other parameters in ESN, and SAISR-ESN outperforms ESN without tuning. The RUPA method improves the generalization performance and simplifies the size of ESN. The prediction performance of RUPA-SAISR-ESN is compared with those of the existing methods, and the results indicate the superiority of RUPA-SAISR-ESN in the fed-batch penicillin cultivation process. of the single LSSVM. Yu6 proposed a multiway Gaussian mixture model (MGMM) based adaptive kernel partial least squares (AKPLS) method to handle online quality prediction of the penicillin fed-batch fermentation process with multiple operating phases. The computational results demonstrated that the proposed MGMM-AKPLS approach is superior to the conventional kernel PLS model. Yu7 subsequently presented a two-stage support vector regression (SVR) method with integrating Bayesian inference (BI) strategy, and the results showed significant improvement of the BI-SVR approach over the regular SVR method in predicting measurements. Recurrent neural networks (RNNs) have a dynamic memory and can process temporal context information because they internally preserve a nonlinear transformation of the input history. RNNs are highly promising tools used for solving complex temporal, nonlinearity, time variation, and uncertainty tasks.8 Thus, RNNs can be used for complex dynamic tasks such as fed-batch processes. However, the application of RNNs to real-world problems is not always feasible because of high computational training costs, slow convergence, and fading gradient.9 Reservoir computing (RC)8,10 was then introduced as a more effective method to use the computational power of RNNs without the inconvenience of training the internal weights. From this perspective, the reservoir functions as a complex nonlinear dynamic filter that transforms the input signals using a high-dimensional temporal map, similar to the operation of an explicit, temporal kernel function. The

1. INTRODUCTION Fed-batch processes have been utilized to produce high value added products in the chemical, biological, food, pharmaceutical, and semiconductor industries. General features of fed-batch biological processes include strong nonlinearity, nonsteady state operation, instinctive time variation, batch-to-batch variation, and uncertainty caused by the drifting of raw materials.1 These features complicate modeling and control. An ideal model must be obtained to conduct further optimization and control strategy. However, certain parameters, such as bioparameters, which are more complex and important, cannot be measured online during the actual fermentation process. Effective process predictive models, instead of hardware instruments or offline laboratory analysis, have become essential in providing reliable online measurements of critical product quality or environmental variables.2 For the past few years, soft sensors have attracted increasing attention in both academic circles and industry because of their inferential measurement capability exhibited in process data and predictive models. Facco et al.3 proposed a methodology for the automatic maintenance of partial least squares (PLS) soft sensors in fed-batch processing. Ji et al.4 proposed a semisupervised recursive weighted kernel regression (RWKR) method to model the fed-batch processes by leveraging both methods that can better utilize unlabeled data than classical harmonic functions and can be more robust than the relevance vector machine (RVM) for the soft-sensing modeling of fedbatch processes. Wang et al.5 presented a hybrid model that employed the advantage of the least squares support vector machine (LSSVM) and the kinetics model. Experimental results showed that the hybrid modeling method has higher predicting accuracy and a more powerful generalization ability than those © 2014 American Chemical Society

Received: Revised: Accepted: Published: 6789

January 23, 2014 March 24, 2014 March 27, 2014 March 27, 2014 dx.doi.org/10.1021/ie500296f | Ind. Eng. Chem. Res. 2014, 53, 6789−6797

Industrial & Engineering Chemistry Research

Article

the input, reservoir, feedback, and readout matrices, Wsu is an N × K weight matrix, Wss is an N × N weight matrix, Wso is an N × L weight matrix, Wos is an L × N weight matrix, s(t) is initialized as a zero vector, and “T” denotes the transpose. Wss, Wso, and Wsu are fixed prior to training with random values drawn from a uniform distribution. Only the readout matrix Wos requires training. To account for the echo state property, the reservoir connection matrix Wss is typically scaled as

striking feature of RC is that the recurrent part of the network can be left untrained after initialization as long as it satisfies certain easy-to-check properties. The RC training procedure is a simple adjustment of output weights to fit training data. Echo state networks (ESNs),11 liquid state machines (LSMs),12 the back-propagation decorrelation neural network,13 and Evolino14 are examples of popular RC methods. In this paper, we focus on the ESN approach, one of the simplest, but effective forms of RC. ESNs have been successfully applied in several sequential domains, such as time-series prediction,15,16 nonlinear system identification,17 speech processing, 18 and mobile traffic forecasting.19 However, an underlying theoretical framework or a principled approach for designing a good ESN is lacking, and ESN has several main drawbacks, including a number of manual, tweaking or brute-force searching parameters involved in optimizing these systems, such as network size, input scaling parameters, and spectral radius.20 Thus, one of the major benefits of RC (short training time compared with other RNN learning rules) is slightly diminished by extensive parameter searches necessary to obtain optimal reservoir settings. Over the past few decades, numerous researchers have focused on new methods to optimize the reservoir dynamics by experimentally studying the relationship between reservoir parameters and reservoir performance,10,21,22 as well as by pruning the network size.23,24 Verstraeten et al.10,21 presented a series of experimental results and gave a overview of the influence of the reservoir parameters (node type, network size, spectral radius, input scaling) on the performance based on a variety of benchmark tests. However, these methods are greedy methods and require a lot of experiments. Venayagamoorthy and Shishir22 studied the effects of varying two important ESN parameters: the spectral radius and settling time on the performance of ESN. However, a principled method of how to choose the reservoir parameters is still not studied. This study introduces a sensitivity analysis input scaling regulation (SAISR) by tuning the input scaling parameter based on the Sobol’ sensitivity analysis (SSA) and the redundant unit pruning algorithm (RUPA) method to design an improved ESN model in the penicillin fed-batch process. The results exhibit the superiority of the proposed methods in the fed-batch penicillin cultivation process. The rest of this paper is organized as follows. Section 2 provides a brief overview of the ESN design, ESN training algorithm, and SSA. The SAISR and RUPA methods for designing and improving ESN are described in section 3. A fedbatch penicillin fermentation process is introduced in section 4. Experimental results and discussion are presented in section 5. Finally, a brief conclusion is provided in section 6.

W ss ← αW ss/|λmax |

where λmax is the spectral radius of Wss and 0 < α < 1 is the scaling parameter. Generally, the input variable u(t) is scaled by β = [ β1 β2 ··· βp ] to increase the performance of ESN. [ u1(t ) u 2(t ) ··· u p(t )] ← [ β1u1(t ) β2 u 2(t ) ··· βp u p(t )]

The large absolute β implies that the network is strongly driven by input and vice versa. During training, the reservoir states obtained are collected in the state matrix S: ⎡ s T(1)⎤ ⎢ ⎥ ⎢ s T(2)⎥ ⎥ S=⎢ ⎢⋮ ⎥ ⎢ T ⎥ ⎣ s (n)⎦

(1)

o(t ) = s T(t ) ·W so

(2)

(4)

and the corresponding target outputs are collected in the target output matrix O:

⎡ o(1)⎤ ⎢ ⎥ ⎢ o(2)⎥ O=⎢ ⎥ ⎢⋮ ⎥ ⎢ o(n)⎥ ⎣ ⎦

(5)

where n is the number of training samples. The readout matrix should then solve a linear regression problem. S ·W so = O

(6)

The traditional method involves using the least-squares solution W so = arg min Sw − O

2

w

where ∥·∥ denotes the Euclidean norm and then the readout matrix Wos is expressed as

2. ESN AND SOBOL’ SENSITIVITY ANALYSIS 2.1. Architecture and Training Algorithm of ESN. ESN is a recurrent discrete time neural network with K input units, N internal (reservoir) units, and L output units, as shown in Figure S1 (see the Supporting Information). The reservoir state s(t) and output o(t) at discrete time step t are described by25 s(t ) = f (Wus·u(t ) + W ss·s(t − 1) + Wos·oT(t − 1))

(3)

W so = (STS)−1STO

(7)

Details of the ESN architecture and training algorithm are described in the Supporting Information. 2.2. Sobol’ Sensitivity Analysis. Sensitivity analysis (SA) is the study of how uncertainty in the output of a model (numerical or otherwise) can be attributed to different sources of uncertainty in the model input factors. The SSA method26 is a global and model-independent SA method based on variance decomposition and can handle nonlinear and nonmonotonic functions and models. Considering the model function

where f is the reservoir activation function (typically a hyperbolic tangent or other sigmoidal function), u(t), s(t), and o(t) are, respectively, the input, reservoir state, and output at discrete time step t, Wsu, Wss, Wso, and Wos denote, respectively,

O = f (u) = f (u1 , u 2 , ..., up) 6790

(8)

dx.doi.org/10.1021/ie500296f | Ind. Eng. Chem. Res. 2014, 53, 6789−6797

Industrial & Engineering Chemistry Research

Article

3. OPTIMAL DESIGN OF ESN Assume that the three-way data matrix of process variables is X(I × J × K). Here I represents the number of batches, J denotes the number of variables, and K is the number of observations within each batch. The three-way matrix data of input and output variables are unfolded into a two-dimensional matrix by stacking data from different sampling instants across various batches along each variable column as shown in Figure S2 in the Supporting Information and normalized to [−1, 1]. 3.1. Sensitivity Analysis Input Scaling Regulation Algorithm. In this study, we aim to investigate the effect of the two parameters, namely, network size (N) and input scaling parameter (β) on ESN performance in the penicillin fed-batch process. β can be tuned by the SAISR method and N can be pruned by the RUPA method. The main objective of the SAISR method is to design an improved ESN model in the penicillin fed-batch process by tuning the input scaling parameter based on SSA. The main steps of SAISR are conducted as follows: 1. An ESN is established with internal connection weights Wss, which are set to random values. The internal units are standard sigmoid units with a transfer function f = tanh. The input scaling parameter is the default unit vector:

where O is the output and u = (u1, u2, ..., up) contains p independent input factors, each one varying over its own probability density function. Sobol’26 introduced the first-order sensitivity index by decomposing the model function f into summands of increasing dimensionality. p

f (u1 , u 2 , ..., up) = f0 +

p

p

∑ fi (ui) + ∑ ∑ i=1

fij (ui , uj)

i=1 j=i+1

+ ... + f1,..., p (u1 , ..., up)

(9)

The input factors are independent. Each term in eq 9 is chosen with zero average and is square integrable. f 0 is a constant equal to the expectation of the output E(O), and the integrals of every summand over any of its own variables are zero, i.e.

∫0

1

fis (ui1 , ui2 , ..., uip) duik = 0

(10)

where 1 < k < p. Consequently, all of the summands are mutually orthogonal. The total unconditional variance can be defined as V (O) =

∫Ω

p

f 2 (u) du − f0 2

β = [ β1 β2 ··· βp ] = [1 1 ··· 1]

(11)

with Ωp representing the p-dimensional unit hyperspace (i.e., parameter ranges are scaled between 0 and 1). The partial variances are computed from each of the terms in eq 9. 1

Vi1,..., is =

∫0 ··· ∫0

1

fi1,..., is 2 (ui1 , ..., uis) dui1 ... duis

The input weights Wsu are generated randomly from a uniform distribution over an interval [−1, 1]. The feedback weights Wso are generated randomly from a uniform distribution over an interval [−1, 1] at the presence of feedback connections. 2. The ESN is operated by a variable-wise unfolding and normalized training set. The network starts at time t = 1 with zero starting state s(t) = 0. During the sampling period, data are dismissed from the initial washout period and the remaining network states s(t) are gathered into the matrix S. Meanwhile, the target values from the training set are collected in the matrix O. The output units are then calculated using eq 7. 3. The total Sobol’ sensitivity indices ST = [ ST1 ST2 ··· STp ] of each input variable are calculated using Sobol’ quasi-random sequences (eq 16).27 The inputs are then sorted by decreasing sensitivity numbers for efficient tuning of the input scaling parameters. 4. β is adjusted based on ST. SA can determine the most influential inputs of a model. The input scaling parameter of the important input variable should be increased and vice versa. In this study, we increase the scaling parameter βm to 2 of the most significant input. Another input scaling parameter βi can be adjusted proportionally by βi/βm = STi/STm. Thus,

(12)

Based on the assumption that the parameters are mutually orthogonal, the variance decomposition can be obtained as p

V (O) =

p−1

p

∑ Vi + ∑ ∑ i=1

Vij + ... + V1,..., p (13)

i=1 j=i+1

Consequently, the variance contributions to the total output variance of individual parameters and parameter interactions can be determined. These contributions are characterized by the ratio of the partial variance to the total variance, namely, the Sobol’ sensitivity indices (SI). first-order SI:

Si =

Vi V

(14)

second-order SI:

Sij =

βi = βm

Vij (15)

V

∑ Sij + ... = 1 − j≠i

V∼ i V

(17)

The new input scaling parameter β̂ is then obtained. 5. A new ESN model with input scaling parameter β̂ is established, and the ESN is trained on the same training data as step 2. The new SAISR-ESN model, which will be optimized and simplified by RUPA, is established. 3.2. RC Redundant Unit Pruning Algorithm. The internal layer of ESNs is sparsely connected. Thus, a contradiction seemingly appears, in which each output node is connected to all internal nodes.28 For ESNs, as the network size is considerably large (usually N = 100−1000), pruning is an effective method to reduce the generalization error and simplify

total SI: STi = Si +

STi S = 2 Ti STm STm

(16)

where V∼i is the variation of all parameters except ui. For additive models and under the assumption that orthogonal input factors, STi and Si, are equal, the sum of all Si is 1. For nonadditive models, STi is greater than Si and the sum of all Si is less than 1. Meanwhile, the sum of all STi is greater than 1. 6791

dx.doi.org/10.1021/ie500296f | Ind. Eng. Chem. Res. 2014, 53, 6789−6797

Industrial & Engineering Chemistry Research

Article

the network complexity. The idea presented in this paper is the exclusion of irrelevant states in S from Wso (i.e., setting connection weights in Wos to zero). The remaining connection weights are then computed by regression. The states of an internal node can be regarded as values of a stochastic variable. In the literature, selecting nonzero output connection weights can be considered as a variable selection problem, which is more often known as feature selection.29 We propose a novel RUPA method in which a large reservoir is used first and then the size is reduced by pruning out the readout connections. Pruning can decrease the requirements in computing the output and identify redundant internal units. The main steps of RUPA are conducted as follows: 1. An ESN is established by the SAISR method, and the SAISR-ESN is implemented using the training data. 2. The network starts at time t = 1, with zero starting state s(t) = 0. During the sampling period, data are dismissed from the initial washout period and the remaining network states s(t) are gathered into the matrix S. Meanwhile, the target values from the training set are collected in the matrix O. The output unit weights are then calculated by the pseudoinverse method (eq 7). The unpruned performance error RMSEU is obtained. 3. The correlation coefficient between each reservoir unit state is calculated, as well as the covariance matrix R of the internal units. Based on the internal unit output matrix S, the correlation coefficient between the nth and the mth internal units rnm (0 ≤ |rnm| ≤ 1) of the vector s(n) and s(m) is calculated as

pseudoinverse method is adjusted, and the training error NMSEN is calculated. 4.2. The mth readout connection is removed, while the nth readout connection is retained. The output unit weight using the pseudoinverse method is adjusted, and the training error NMSEM is calculated. 4.3. NMSEN and NMSEM are then compared. If NMSEN < NMSEM, then we proceed to step 4.1 and collect the pruned error NMSE = NMSEN. Conversely, we go to step 4.2 and collect the pruned error NMSE = NMSEM. 5. We proceed to step 3 and calculate the correlation coefficient between the remaining reservoir unit states. 6. We continue the calculation until the remaining reservoir size is equal to the minimum reservoir size. In this study, the minimum reservoir size is N = 50. 7. The best SAISR-ESN structure is chosen as the final RUPASAISR-ESN model. Considering both the simplification and precision of SAISR-ESN, we can select the simplest architecture with the simplest reservoir size and with the error not exceeding the initial error or we can select a reservoir size with minimum testing error. The flow diagram of the ESN design by SAISR and RUPA is shown in Figure 1.

h

∑k = 1 (snk − sn̅ )(smk − sm̅ )

rnm =

s̅ =

Rh − l

h

h

∑k = 1 (snk − sn̅ )2 ∑ j = 1 (smk − sm̅ )2

1 N

(18)

N

∑ snk

(19)

k=1

⎡1 ⎢ ⎢ r21 =⎢ ⎢⋮ ⎢r ⎣ h − l ,1

r12 1 ⋮ rh − l ,2

··· r1, h − l ⎤ ⎥ ··· r2, h − m ⎥ ⎥ ⋱ ⋮ ⎥ ⎥ ··· 1 ⎦

and

R 0h − l

⎡0 ⎢ ⎢ r21 =⎢ ⎢⋮ ⎢r ⎣ h − l ,1

r12 0 ⋮ rh − l ,2

··· r1, h − l ⎤ ⎥ ··· r2, h − m ⎥ ⎥ ⋱ ⋮ ⎥ ⎥ ··· 0 ⎦

Figure 1. Flow diagram of the ESN design by SAISR and RUPA approach.

where s(n) are the mean values of the nth and the mth ̅ and s(m) ̅ outputs of the internal unit, respectively. The covariance matrix can be obtained as R 0h − l = [ r1 r2 ··· rh − l ] (0 ≤ l ≤ h − l), where ri is the correlation coefficient of the ith neuron with others and l is the number of removed connections. 4. The two most relevant internal neurons, namely, the nth and the mth neurons, are selected according to step 3. The nth and the mth neurons subsequently exhibit redundant relations. 4.1. The nth readout connection is removed, while the mth readout connection is retained. The output unit weight

4. FED-BATCH PENICILLIN CULTIVATION PROCESS The fed-batch fermentation process is a complex bioprocess of microorganism community growth, propagation, and metabolism. In this study, a modular simulator (PenSim v2.0) developed by the monitoring and control group of the Illinois Institute of Technology (the modular simulator is provided on the Web site http://mypages.iit.edu/∼cinar) is used to investigate the fed-batch penicillin fermentation process. The flowchart of the penicillin fermentation process is presented in Figure S3 (Supporting Information). In this 6792

dx.doi.org/10.1021/ie500296f | Ind. Eng. Chem. Res. 2014, 53, 6789−6797

Industrial & Engineering Chemistry Research

Article

Figure 2. Influence of reservoir size and spectral radius on the performance of the fed-batch penicillin fermentation process, with β = [1 1 ··· 1] as default.

Table 1. Total SI of the Four ESN Models o(1)

o(2)

o(3)

o(4)

input

STi

input

STi

input

STi

input

STi

u(1) u(2) u(3) u(4) u(5) u(6) u(7) u(8) u(9)

0.0288 0.0285 0.0453 0.0224 0.0199 0.0794 0.0490 0.8160 0.0204

u(1) u(2) u(3) u(4) u(5) u(6) u(7) u(8) u(9)

0.0201 0.0330 0.0218 0.0212 0.0123 0.3021 0.0183 0.5938 0.0574

u(1) u(2) u(3) u(4) u(5) u(6) u(7) u(8) u(9)

0.0258 0.1427 0.0217 0.0275 0.0163 0.0263 0.0095 0.7022 0.1223

u(1) u(2) u(3) u(4) u(5) u(6) u(7) u(8) u(9)

0.0063 0.0459 0.0527 0.0693 0.0830 0.1172 0.0411 0.7111 0.3901

For ESN, we calculate each test set performance measure in 10 simulation runs (presented as mean) because the reservoir construction is stochastic. The input scaling parameter β, the default unit vector

process, four key concentration variables related to product quality are chosen as outputs of the predictive model, while the other nine measurement variables, listed in Table S1 (Supporting Information), are used as inputs (as described in ref 7). The input and output variables are marked as u(1)−u(9) and o(1)−o(4), respectively. The fed-batch fermentation process initial operating conditions are summarized in Table S2 (Supporting Information). A total of 30 training batches are collected under the nominal operations for ESN model training, and an additional 10 batches are obtained to test model accuracy and reliability. The sampling period of the fed-batch penicillin fermentation process is 0.5 h, and the trend plots of the input variables are shown in Figure S4 (Supporting Information). The 30 training batches and 10 testing batches are unfolded by variable-wise unfolding and normalized to [−1, 1]. Four ESN models are established for four output variables. We present the results for three reservoir sizes, N = [100:200:500] (notation [s:d:e] denotes a series of numbers starting from s, which are increased by increments of d until the ceiling e is reached), and nine spectral radii, λmax = [0.1:0.1:0.9].

β = [ β1 β2 ··· β9 ] = [1 1 ··· 1]

is tuned by the SAISR method. The model performance is evaluated by the root-mean-square error (RMSE), which is written as follows: N

RMSE =

∑t = 1 (o(t ) − ô(t )) N

(20)

where ô(t) is the actual readout output, o(t) is the desired output, and N is the total number of o(t).

5. RESULTS AND DISCUSSION 5.1. Comparison of Model Prediction Results of ESN and SAISR-ESN Approaches. First, we investigate the effect of the two parameters N and λmax on ESN performance in the 6793

dx.doi.org/10.1021/ie500296f | Ind. Eng. Chem. Res. 2014, 53, 6789−6797

Industrial & Engineering Chemistry Research

Article

Figure 3. Diagram of the total SI for the four ESN models.

Figure 4. Testing performance with N = 500 and λmax = 0.9 by SAISR for one batch of testing data.

According to Figure 2, the performance of the dissolved oxygen concentration (o(2)) is better than that in three other outputs, and the penicillin concentration (o(4)) is poor when

penicillin fed-batch process. The influence of the reservoir size and the spectral radius on the performance of the fed-batch penicillin fermentation process is presented in Figure 2. 6794

dx.doi.org/10.1021/ie500296f | Ind. Eng. Chem. Res. 2014, 53, 6789−6797

Industrial & Engineering Chemistry Research

Article

Table 2. Comparison of RMSE Values of Quality Variable Predictions Using Single ESN without β Tuning and SAISR-ESN substrate concn

dissolved oxygen concn

biomass concn

penicillin concn

batch no.

ESN (β)

SAISR-ESN (β1̂ )

ESN (β)

SAISR-ESN (β̂2)

ESN (β)

SAISR-ESN (β̂3)

ESN (β)

SAISR-ESN (β̂4)

1 2 3 4 5 6 7 8 9 10 av

0.0792 0.0554 0.1018 0.2253 0.2203 0.5106 0.2732 0.0693 0.1173 0.0388 0.1691

0.0044 0.0022 0.0015 0.0021 0.0048 0.0039 0.0046 0.0028 0.0057 0.0020 0.0034

0.0540 0.0554 0.0544 0.0941 0.1237 0.0874 0.0852 0.0686 0.0509 0.0582 0.0732

0.0470 0.0436 0.0431 0.0487 0.0572 0.0480 0.0593 0.0484 0.0496 0.0513 0.0496

0.1959 0.0466 0.3225 0.2020 0.1597 0.3008 0.2471 0.1499 0.2924 0.0626 0.1979

0.0172 0.0264 0.0144 0.0445 0.0510 0.0315 0.0493 0.0456 0.0170 0.0560 0.0353

1.0806 0.3212 0.7879 1.0320 1.1241 0.4166 0.4174 0.7658 1.0540 1.0683 0.8068

0.0329 0.0231 0.0510 0.0551 0.0304 0.0294 0.0493 0.0348 0.0375 0.0434 0.0387

Figure 5. Testing performance of one batch data utilizing SAISR-ESN based on RUPA when the spectral radius is 0.9 and network size is 500.

β = [1 1 ··· 1]. N = 500 and λmax = 0.9 are sufficient to establish an ESN model for the fed-batch penicillin fermentation process according to Figure 2. The total SI of the four models is presented in Table 1 and Figure 3. The new input scaling parameters, β̂1, β2̂ , β3̂ , and β4̂ (notation ̂βx denotes the input scaling parameter of the xth model), are obtained according to eq 18 and Table 1.

4

β̂ = [0.02 0.13 0.15 0.19 0.23 0.33 0.12 2.00 1.10]

Four new ESN models (SAISR-ESN) are then established with the new input scaling parameters β1̂ −β̂4. The testing performance with N = 500 and λmax = 0.9 by SAISR for one batch of testing data is presented in Figure 4. The testing RMSE with N = 500 and λmax = 0.9 by SAISR for 10 batches of testing data is shown in Table 2. 5.2. Improved and Simplified SAISR-ESN with RUPA and Comparison of Model Prediction Results of SAISRRUPA-ESN and Other Methods. Although the ESN model is established with good performance by the SAISR method, the network size can be reduced and simplified by RUPA, and the generalization error can be reduced by the pruning method. The initial reservoir size (N) of the ESN for the fed-batch penicillin cultivation process task is 500, as described previously. The readout connections are pruned individually according to the correlation coefficient until N equals 50. The maximum and

1

β̂ = [0.07 0.07 0.11 0.05 0.05 0.19 0.12 2.00 0.05] 2

β̂ = [ 0.07 0.11 0.07 0.07 0.04 1.02 0.06 2.00 0.19 ] 3

β̂ = [ 0.07 0.41 0.06 0.08 0.05 0.07 0.03 2.00 0.35] 6795

dx.doi.org/10.1021/ie500296f | Ind. Eng. Chem. Res. 2014, 53, 6789−6797

Industrial & Engineering Chemistry Research

Article

simplify the ESN in fed-batch bioprocesses. Unlike the bruteforce searching methods, the proposed SAISR is a principled approach to a good ESN design. The proposed SAISR approach is applied to the fed-batch penicillin cultivation process, and its performance is compared to that of the ESN without tuning. The results indicate that the generated heat is more significant than other input variables in the ESN models, and the input scaling parameter has a stronger influence than other ESN parameters for fed-batch bioprocesses. RUPA allows significant reduction of the ESN size. The comparison results demonstrate the superiority of RUPA-SAISR-ESN in the fed-batch penicillin cultivation process. Although RC has been criticized for its black-box nature, the results of this study can enable ESN users and designers to make more informed adjustments and determine optimal settings faster. The fed-batch penicillin fermentation process is recognized as a dynamic modeling test object. In the past few years, a lot of methods (PLS, RWKR, RVM, LSSVM, MGMM-AKPLS, etc.) have been proposed to model the fed-batch penicillin process. The prediction performance of the proposed method is compared with those of the existing methods, and the results indicate the superiority of RUPA-SAISR-ESN in the fed-batch penicillin cultivation process. Therefore, the proposed method is a better choice for modeling the fed-batch penicillin process. Also, the proposed method is suitable for the processes which have the same features as the fed-batch penicillin process, and their performances should be similar to the performance in the fed-batch penicillin process. On the condition that there is a significant difference between the modeling object and the fedbatch penicillin process, the performance of the object’s model developed by the proposed method needs to be analyzed further.

minimum correlation coefficients are 0.9727 and 0.7426, respectively, in the pruning process. The testing performance of one batch of data that employ SAISR-ESN with RUPA when the spectral radius is 0.9 is shown in Figure 5. To validate the performance of the proposed method, several methods, such as the relevance vector machine (RVM),4 RWKR,4 BI-SVR,7 kernel partial least squares (KPLS),6 and MGMM-AKPLS,6 are implemented for the comparison. The comparison of mean RMSE values of quality variable predictions by ESN with RUPA-SAISR and other methods is presented in Table 3. Table 3. Comparison of Mean RMSE Values of Quality Variable Predictions Using ESN with RUPA-SAISR and Other Methods substrate concn RVM RWKR KPLS MGMMAKPLS BI-SVR RUPA-SAISRESN

dissolved oxygen concn

biomass concn

penicillin concn

− − 0.5410 0.0730

− − − −

0.1927 0.1998 0.8970 0.0920

0.0428 0.0425 0.1950 0.0380

0.0850 0.0027

0.0030 0.0467

0.0710 0.0219

0.0110 0.0352

5.3. Discussion. The experimental results indicate that generated heat (u(8)) is more important than the other input variables in the ESN models. Aeration rate (u(1)), substrate feed rate (u(3)), substrate feed temperature (u(4)), CO2 concentration (u(5)), and fermenter temperature (u(7)) appear to be less insignificant in the ESN models, as shown in Figure 3. The input scaling parameter has a more important influence than network size and spectral radius in ESN, as presented in Figures 2 and 4. According to Figure 4, the SAISR-ESN method significantly outperforms ESN without input scaling tuning, particularly in the penicillin concentration (o(4)) model. The experimental results clearly show that RUPA allows significant reduction in the reservoir size, while the performance is improved until the reservoir size reaches a certain value. Even if the network size is reduced to 90−200, the performance remains not worse than for N = 500, as shown in Figure 5. The substrate concentration and biomass concentration prediction performance of RUPA-SAISR-ESN significantly outperforms that of other methods, according to Table 3. The dissolved oxygen concentration prediction performance of RUPA-SAISRESN is worse than that of BI-SVR. Because the BI procedure is designed to identify measurement biases and misalignments via posterior probabilities, the significant model deviations resulted from the measurement uncertainty can be avoided through BI. However, the proposed method may suffer from misalignments. The penicillin concentration prediction performance of RUPASAISR-ESN is better than that of the other methods except BISVR, according to Table 3. Although the penicillin concentration prediction of the proposed method is slightly worse than that of BI-SVR, the RMSEs of BI-SVR and the proposed method are of the same order of magnitude. We believe the reason for the deviation is the randomness of data when we run the modular simulator.



ASSOCIATED CONTENT

S Supporting Information *

Detailed introduction of ESN and Sobol’ sensitivity analysis, experimental data unfolding, and fed-batch penicillin cultivation process; Figures S1−S4; Tables S1 and S2. This material is available free of charge via the Internet at http://pubs.acs.org.



AUTHOR INFORMATION

Corresponding Author

*Tel.: +86-21-64251036. Fax: +86-21-64251036. E-mail: [email protected]. Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS The authors gratefully acknowledge the support of the following foundations: 973 project of China (2013CB733605), National Natural Science Foundation of China (21176073), and the Fundamental Research Funds for the Central Universities.



REFERENCES

(1) Ashoori, A.; Moshiri, B.; Khaki-Sedigh, A.; Bakhtiari, M. R. Optimal control of a nonlinear fed-batch fermentation process using model predictive approach. J. Process Control 2009, 19, 1162−1173. (2) Yu, J. Nonlinear bioprocess monitoring using multiway kernel localized Fisher discriminant analysis. Ind. Eng. Chem. Res. 2011, 50, 3390−3402. (3) Facco, P.; Bezzo, F.; Barolo, M. Nearest-neighbor method for the automatic maintenance of multivariate statistical soft sensors in batch processing. Ind. Eng. Chem. Res. 2010, 49, 2336−2347.

6. CONCLUSION In this study, a novel input scaling regulation method based on SSA and RUPA is developed to design an improved ESN and 6796

dx.doi.org/10.1021/ie500296f | Ind. Eng. Chem. Res. 2014, 53, 6789−6797

Industrial & Engineering Chemistry Research

Article

(4) Ji, J.; Wang, H.; Chen, K.; Liu, Y.; Zhang, N.; Yan, J. Recursive weighted kernel regression for semi-supervised soft-sensing modeling of fed-batch processes. J. Taiwan Inst. Chem. Eng. 2012, 43, 67−76. (5) Wang, X.; Chen, J.; Liu, C.; Pan, F. Hybrid modeling of penicillin fermentation process based on least square support vector machine. Chem. Eng. Res. Des. 2010, 88, 415−420. (6) Yu, J. Multiway Gaussian Mixture Model Based Adaptive Kernel Partial Least Squares Regression Method for Soft Sensor Estimation and Reliable Quality Prediction of Nonlinear Multiphase Batch Processes. Ind. Eng. Chem. Res. 2012, 51, 13227−13237. (7) Yu, J. A Bayesian inference based two-stage support vector regression framework for soft sensor development in batch bioprocesses. Comput. Chem. Eng. 2012, 41, 134−144. (8) Lukoševičius, M.; Jaeger, H. Survey: Reservoir computing approaches to recurrent neural network training. Comput. Sci. Rev. 2009, 3, 127−149. (9) Schrauwen, B.; Verstraeten, D.; Van Campenhout, J. An overview of reservoir computing: theory, applications and implementations. Proceedings of the 15th European Symposium on Artificial Neural Networks; Ghent University Academic Bibliography: Ghent, Belgium, 2007; pp 471−482. (10) Verstraeten, D.; Schrauwen, B.; d’Haene, M.; Stroobandt, D. An experimental unification of reservoir computing methods. Neural Networks 2007, 20, 391−403. (11) Jaeger, H. The “Echo State” Approach to Analysing and Training Recurrent Neural Networks; Technology GMD Technical Report 148; German National Research Center for Information, 2001. (12) Maass, W.; Natschläger, T.; Markram, H. Real-time computing without stable states: A new framework for neural computation based on perturbations. Neural Comput. 2002, 14, 2531−2560. (13) Steil, J. J. Backpropagation-Decorrelation: Online recurrent learning with O (N) complexity. In Proceedings of the 2004 IEEE International Joint Conference on Neural Networks; Institute of Electrical and Electronics Engineers: New York, 2004; pp 843−848. (14) Schmidhuber, J.; Wierstra, D.; Gagliolo, M.; Gomez, F. Training recurrent networks by evolino. Neural Comput. 2007, 19, 757−779. (15) Jaeger, H.; Haas, H. Harnessing nonlinearity: Predicting chaotic systems and saving energy in wireless communication. Science 2004, 304, 78−80. (16) Song, Y.; Li, Y.; Wang, Q.; Li, C.Multi-steps prediction of chaotic time series based on echo state network. In Fifth International Conference on Bio-Inspired Computing: Theories and Applications; Institute of Electrical and Electronics Engineers: New York, 2010; pp 669−672 (17) Jaeger, H. Adaptive nonlinear system identification with echo state networks. In Advances in Neural Information Processing Systems; Becker, S., Thrun, S., Obermayer, K., Eds.; Neural Information Processing Systems Foundation: La Jolla, CA, 2002; Vol. 15, pp 593− 600. (18) Skowronski, M. D.; Harris, J. G. Minimum mean squared error time series classification using an echo state network prediction model. In Proceedings of the 2006 IEEE International Symposium on Circuits and Systems; Institute of Electrical and Electronics Engineers: New York, 2006; Vol. 4, p 3156. (19) Yu, P.; Miao, L.; Jia, G. Clustered complex echo state networks for traffic forecasting with prior knowledge. In Instrumentation and Measurement Technology Conference (I2MTC); Institute of Electrical and Electronics Engineers: New York, 2011; pp 1−5. (20) Ozturk, M. C.; Xu, D.; Príncipe, J. C. Analysis and design of echo state networks. Neural Comput. 2007, 19, 111−138. (21) Verstraeten, D.; Dambre, J.; Dutoit, X.; Schrauwen, B. Memory versus non-linearity in reservoirs. In 2010 International Joint Conference on Neural Networks (IJCNN); Institute of Electrical and Electronics Engineers: New York, 2010; pp 1−8. (22) Venayagamoorthy, G. K.; Shishir, B. Effects of spectral radius and settling time in the performance of echo state networks. Neural Networks 2009, 22, 861−863.

(23) Dutoit, X.; Schrauwen, B.; Van Campenhout, J.; Stroobandt, D.; Van Brussel, H.; Nuttin, M. Pruning and regularization in reservoir computing. Neurocomputing 2009, 72, 1534−1546. (24) Dutoit, X.; Van Brussel, H.; Nuttin, M. A first attempt of reservoir pruning for classification problems. In European Symposium on Artificial Neural Networks, Bruges, Belgium; Ghent University Academic Bibliography: Ghent, Belgium, 2007; pp 507−512. (25) Jaeger, H. Tutorial on Training Recurrent Neural Networks, Covering BPPT, RTRL, EKF and the “Echo State Network” Approach; Technology GMD Technical Report 159; German National Research Center for Information, 2002. (26) Sobol’, I. M. On sensitivity estimation for nonlinear mathematical models. Mat. Model. 1990, 2, 112−118. (27) Sobol’, I. M. Global sensitivity indices for nonlinear mathematical models and their Monte Carlo estimates. Math. Comput. Simul. 2001, 55, 271−280. (28) Kobialka, H.-U.; Kayani, U. Echo state networks with sparse output connections. Lect. Notes Comput. Sci. 2010, 6352, 356−361. (29) Guyon, I.; Elisseeff, A. An introduction to variable and feature selection. J. Mach. Learn. Res. 2003, 3, 1157−1182.

6797

dx.doi.org/10.1021/ie500296f | Ind. Eng. Chem. Res. 2014, 53, 6789−6797