Optimization of an Artificial Neural Network by Selecting the Training

The BP is a gradient descent optimization procedure in which the mean square error (eq 3) performance index is minimized.(22) (3) There are several di...
0 downloads 0 Views 2MB Size
7072

Ind. Eng. Chem. Res. 2008, 47, 7072–7080

Optimization of an Artificial Neural Network by Selecting the Training Function. Application to Olive Oil Mills Waste Jose´ S. Torrecilla,* Jose´ M. Arago´n, and Marı´a C. Palancar Department of Chemical Engineering, Faculty of Chemistry, UniVersity Complutense of Madrid, 28040 Madrid, Spain

This article describes the selection of the training algorithm of an artificial neural network (ANN) used to model the drying of olive oil mill waste in a fluidized-bed dryer. The ANN used was a three-layer perceptron that predicts the moisture value at time t + T from experimental data (solid moisture, input air, and fluidizedbed temperature) at t time; T is the sampling time. In this study, 14 training algorithms were tested. This selection was carried out by applying several statistical tests to the real and predicted moisture values. Afterward, an experimental design was carried out to analyze the influence of the training function parameters on the ANN performance. Finally, Polak-Ribiere conjugate gradient backpropagation was selected as the best training algorithm. The ANN trained with the selected algorithm predicted the moisture with a mean prediction error of 1.6% and a correlation coefficient of 0.998. Introduction The drying of granular materials in fluidized beds has been widely used for five decades in many industries such as the chemical, food, metallurgical, agricultural, and pharmaceutical industries. Fluidized beds have the following advantages over other drying systems (rotary dryer, continuous-tray dryer, tunnel dryer, etc.): (a) the heat- and mass-transfer coefficients between solid and gas are higher than in other contact system; (b) the overheating of heat-sensitive products is avoided because there is a rapid exchange of heat and mass between gas and solid; (c) the intense solid mixing leads to nearly isothermal conditions throughout the fluidized bed, and the control of the drying process can be easily achieved; (d) the size of solid particles that can be fluidized varies widely (from 1 × 10-4 m to 1 × 10-3 m); and (e) there are no mechanical moving parts, so that maintenance costs are low. Some of these advantages allow the drying processing time and the temperature required for drying of the wet solid to be reduced, so that the power consumption is lower than in other drying systems. The drying process performance depends on heat and mass transfer, physical diffusion, and the physical and chemical properties of the solid, among other variables. These parameters are controlled by several complicated chemical and physical mechanisms and can be modeled using complicated mathematical algorithms. The main difficulty is that there is not yet a satisfactory linkage between the drying mechanism and the overall dryer performance. Actually, many models are used to model fluidized-bed dryers. Most of the models are based on energy and mass balances with rather hard simplifications of reality. The simplest ones assume that the fluidized-bed temperature is uniform.1 These simple models are based on many other conditions that are not always achieved in real situations.2 Other models are based on the two-phase theory, which assumes that the gas flows through the bed in two phases called bubbles and emulsion.2,3 These models have a wide applicability, but they have the disadvantage that they use a great number of parameters, e.g., geometrical data of the vessel, solid and gas properties, etc. * To whom correspondence should be addressed. Tel.: +34 91 394 42 40. Fax: +34 91 394 42 43. E-mail: [email protected].

Other more sophisticated methods are based on three-phase models,4-7 in which the transfer phenomena are described by three phases: bubble (dilute) phase, interstitial phase, and solid phase. These models require a greater number of parameters than the two-phase theories, and therefore, the mathematical complexity makes their use and implementation difficult. When the physicochemical properties of the wet solid to dry are heterogeneous and change with time, the accuracy and exactness of the model decrease. Artificial neural networks (ANNs) have been applied for drying processes. In 1993, Huang and Mujumdar proposed an ANN to model a tissue paper dryer.8 In 1995, Trelea et al. used an ANN to predict the dried maize grain quality,9 and Jinescu and Lavric studied the modeling of a fixedbed dryer for sebacic acid grains of small particle size.10 Different ANN topologies have been used to model the moisture distribution in fixed bed dryers.11,12 Castellanos et al. and Torrecilla et al. have applied ANNs to model steady-state and non-steady-state fluidized-bed drying processes.13-15 An ANN is an algorithm that does not require any knowledge of the drying mechanism. After an adequate optimization, it is able to predict output data from new sets of input data. In 1958, the multilayer perceptron (MLP) was developed by Rosenblatt,16 and today, its generalist concept is still used. The main characteristics of this type of ANN are that it is (a) fast and easy to implement; (b) a good pattern classifier; (c) good at forming internal representation of features in input data or Table 1. Physical Properties of Orujo physical property

value

moisture (% wet basis) oil content (% wet basis) olive skin (% dry solid) pulp (% dry solid) stone (% dry solid) particle size of dry solid (mm)

60-70 3.4 2 78 20 0.5-2

Table 2. Chemical Properties of Orujo chemical compound

value

sugars (%) polyphenol (ppm) nitrogen (%) phosphor (%) potassium (%)

4.8 23000 0.8 0.25 1.8

10.1021/ie8001205 CCC: $40.75  2008 American Chemical Society Published on Web 08/01/2008

Ind. Eng. Chem. Res., Vol. 47, No. 18, 2008 7073

Figure 1. Fluidized-bed dryer.

Figure 2. ANN computation process. Table 3. Experimental Conditions of Learning and Verification Data experimental conditions

range

initial moisture (% wet basis) initial moisture in the fluidized-bed dryer (% wet basis) final moisture (% wet basis) input air temperature (°C) fluidized-bed temperature (°C) air flow (L/s, at normal conditions) solid flow (kg/h)

65-70 45-55 8 60-160 50-150 4-4.5 5

classification; and (d) well-studied, with many successful applications. Because of these characteristics, the MLP is one of the ANNs used most frequently to model chemical processes.17 The MLP consists of several artificial neurons arranged in two or more layers. Each neuron receives information from all of the neurons of the previous layer. The neurons are information-processing units that are fundamental for MLP operation. The inputs of each neuron are added, and the result is transformed by an activation function (e.g., the sigmoid function) that serves to limit the amplitude of the neuron output. The output of each neuron is multiplied by a coefficient, called a weight, before being input to every neuron in the following layer. The most important ANN characteristic is its ability to learn from its environment. This ability of ANNs to learn results from the process by which the connection weights are updated.

The process of weight updating is called learning or training. The training process is achieved by applying a backpropagation (BP) procedure. Several training algorithms use the BP procedure, and although each one has its own advantages, such as calculation rate and computation and storage requirements, no single algorithm is best suited to all problems. The performance of each algorithm depends on the process to be modeled and on the learning sample and training mode used. The aim of this work was to study different training algorithms that use the BP procedure in order to optimize an ANN used to model a drying process. Therefore, the ANN designed should allow solid moisture predictions to be made with the least prediction error, the highest adaptation to real data, and the highest training rate. This article is organized as follows: First, the experimental setup is described, and the ANN model and training algorithms used are explained. Second, a selection of the best training algorithms is made by analyzing the statistical tests of validation and ANN-predicted database. Finally, the sensitivity of the ANN to the algorithm parameters is evaluated. Experimental Setup The process modeled by the ANN is the drying of orujo (pomace) in a fluidized bed. Descriptions of this solid and the experimental setup used to obtain the training and validation database are provided in the following sections. Physicochemical Description of the Solid. The solid to be dried is a waste of olive oil manufacturing named orujo.18,19 (It is known also as pomace.) Approximately five million tons of orujo are produced each year in Spain that must be dried before subsequent uses and recycling procedures. This solid-liquid waste is a non-Newtonian and heterogeneous product. It is similar to a high-viscosity slurry with high

7074 Ind. Eng. Chem. Res., Vol. 47, No. 18, 2008 Table 4. Advantages and Disadvantages of the Studied Training Algorithms MATLAB tool

description 26,27

Gradient Descent with Variable Learning Rate gradient descendent BP (TRAINGD)

Basic gradient descent. Slow response; it can be used in incremental-mode training. Gradient descent with momentum. Generally faster than TRAINGD. TRAINGDM can be used in incremental-mode training. Adaptive learning rate. Faster training than TRAINGD, but it can be used only in batch-mode training.

gradient descendent with momentum BP (TRAINGDM) gradient descendent with momentum and adaptive linear BP (TRAINGDX) gradient descendent with adaptive learning rate BP (TRAINGDA)

Resilient BP23 random-order incremental update (TRAINR)

Resilient BP. Simple batch mode training algorithm with fast convergence and minimal storage requirements.

resilient BP (RPROp) (TRAINRP) Conjugated Gradient Descent27-29 Fletcher-Powell conjugate gradient BP (TRAINCGF)

Fletcher-Reeves conjugate gradient algorithm. It has the smallest storage requirement of the conjugate gradient algorithms. Polak-Ribiere conjugate gradient algorithm. Slightly larger storage requirements than TRAINCGF. Faster convergence on some problems. Powell-Beale conjugate gradient algorithm. Slightly larger storage requirements than TRAINCGP. Generally faster convergence. Scaled conjugate gradient algorithm. The only conjugate gradient algorithm that requires no line search. Very good general-purpose training algorithm.

Polak-Ribiere conjugate gradient BP (TRAINCGP) Powell-Beale conjugate gradient BP (TRAINCGB) scaled conjugate gradient BP (TRAINSCG)

Quasi-Newton Algorithm30 BFGS quasi-Newton BP (TRAINBFG)

BFGS quasi-Newton method. It requires storage of approximate Hessian matrix and has more computation in each iteration than conjugate gradient algorithms, but it usually converges in fewer iterations. One-step secant method. Compromise between conjugate gradient methods and quasi-Newton methods.

one-step secant BP (TRAINOSS)

Levenberg-Marquardt31,32 Levenberg-Marquard BP (TRAINLM)

Levenberg-Marquardt algorithm. It is the fastest training algorithm for networks of moderate size. It has memory reduction feature for use when the training set is large. Automated Regularization33

Bayesian regularization (TRAINBR)

Bayesian regularization. Modification of the Levenberg-Marquardt training algorithm to produce networks that generalize well. It reduces the difficulty of determining the optimum network architecture.

Table 5. Experimental Conditions of Learning and Verification Data learning range variable input air temperature (°C) fluidized-bed temperature (°C)

verification range

minimum maximum minimum maximum 70

160

80

150

60

120

70

120

water content (60-70%, wet basis).20 The physical and chemical characteristics of orujo make it difficult to transport, handle, treat, and store, and large ponds are necessary for interim storage. The properties and composition of orujo change with time. Some typical orujo properties are as follows: Physical Properties. Orujo is composed of olive skin, the rest of the pulp, olive stones, and water (vegetation and process water). The main properties are listed in Table 1. Dry orujo forms indefinable agglomerates with varying solid particle properties (e.g., shape, particle size, and density). Chemical Composition. The main chemicals contained in orujo are listed in Table 2.18,21 The sugars are dissolved in water (vegetation and process water) and give orujo a high fermentation rate. The sugars contained in orujo make solid handling difficult because the solid sticks to the equipment walls. Their composition and quantity depend on harvest area, storage time, manufacturing process, and type of olive, among other factors. The oil content in orujo, called orujo oil (pomace oil), is less than 3.5% but it mainly depends on the manufacturing process,

harvest area, olive type, etc. The organic material of orujo explains its high biochemical oxygen demand (BOD). The uncontrolled disposal of orujo has an important environmental impact because of its high fermentation performance and BOD value. Nevertheless, the chemical composition of orujo is useful for several purposes; for instance, the pulp is used to made orujo oil, compost (high contents of nitrogen, phosphorous, and potassium), or power through combustion or gasification. Active carbon can be made from the stone. Wax and essential oils can be extracted from the skin. It is important to remark, in relation with the present work, that the first stage of all of these uses is the drying of orujo from a moisture of around 70% (wet basis) to around 8% (wet basis) (close to the equilibrium moisture). Experimental Setup Used. The fluidized-bed dryer consists of a cylindrical chamber (5.4 × 10-2 m i.d. and 4.75 × 10-1 m height) joined by a conical device to an upper cylindrical chamber (1.92 × 10-1 m i.d. and 3.05 × 10-1 m height). The fluidized-bed dryer is made of stainless steel with a thickness of 3 × 10-3 m. It can operate in continuous mode with respect to the solid and can treat up 5 kg/h of wet solid. The fluidized-bed dryer is shown in Figure 1. The wet solid is fed from the hopper to the fluidized dryer. The air is heated to an adequate temperature and flows inside the dryer, from the lower part of the fluidized bed. When the solid is dried, it is removed, and the moisture is measured. In order to obtain the essential information from the drying process and to design and update the ANN, several devices were installed in the dryer:

Ind. Eng. Chem. Res., Vol. 47, No. 18, 2008 7075

by dividing the measured variables by their maximum foreseeable values. The output of the ANN is the predicted moisture at time t + T. The sampling time, T, is typically 1 min. In the hidden layer, there are four neurons. This is an optimal topology obtained in a previous work.15 The calculation process carried out in each artificial neuron of the hidden and output layers is the one described in the Introduction section. All inputs at one neuron are added (eq 1), and the result is transformed by the sigmoid function (eq 2). This calculated value is the output of the artificial neuron. xk )

∑w y

(1)

jk j

j)1

(

)

1 (2) 1 + e-xk The nomenclature used is illustrated in Figure 2. MATLAB version 7 (R14) and QuickBasic version 5.4 were used for the calculations. Training Process. The ANN training is achieved by applying the BP procedure. The BP is a gradient descent optimization procedure in which the mean square error (eq 3) performance index is minimized.22 yk ) f(xk) )

Ek )

1 2

∑ (r - y )

2

k

k

(3)

There are several different BP training algorithms. In this work, training and validation steps were carried out for each of the 14 different training functions (Table 4) contained in the MATLAB ANN toolbox.23 When the gradient descendent training algorithm (TRAINGD) is used, the weights are updated in the direction of the negative gradient of the performance index by applying eq 4 xk+1 ) xk - ηkgk

Figure 3. Example of an experimental run: (a) input air temperature, (b) fluidized-bed temperature, and (c) solid moisture as functions of time.

mass flow meter/controller, thermocouples, and moisture sensor. The moisture sensor is an electrical capacitor that uses the wet solid as its dielectric; therefore, the solid moisture is measured on-line in terms of the capacitance value. The wet and dry solid moisture is also measured off-line using the norm UNE 55031 to test the moisture sensor. The operating conditions used in this study are listed in Table 3. ANN Model ANN Description. The ANN employed is a three-layer perceptron that predicts the output solid moisture at a sampling time T after the current time t (i.e., at the time t + T) from values of variables at the current time. It consists of some artificial neurons arranged in input, hidden, and output layers. The input layer is used only to input real data to the ANN; the nonlinear calculations are carried out in the other layers. The topology of the ANN consists of three nodes in the input layer: the input air and fluidized-bed temperatures and the output moisture at the same time t. A previous experimental study showed that these variables are sufficient to describe the drying process in a fluidized bed.15 The input real data are normalized

(4)

where xk is the current vector of weights, ηk is the learning rate, and gk is the current gradient. The learning rate, ηk, determines the calculation time required for algorithm convergence. The algorithm speedily converges when the learning rate is large, but it becomes unstable. If the learning rate is too small, the time required by the algorithm to converge is long. The gradient descendent with momentum (TRAINGDM) considers a new adjustable parameter, the momentum, p, as shown in eq 5 ∆xk+1 ) p∆xk + ηkgk

(5)

The modification of the weight vector at the current time step depends on both the current gradient and the weight change of the previous step. Intuitively, the rationale for the use of the momentum term is that the steepest descent is particularly slow when there is a long and narrow valley in the error function surface. In this situation, the direction of the gradient is almost perpendicular to the long axis of the valley. The system thus oscillates back and forth in the direction of the short axis and moves very slowly along the long axis of the valley. The momentum term helps average out the oscillations along the short axis, while simultaneously adds up contributions along the long axis.24 The gradient descendent with adaptive learning rate BP (TRAINGDA) and gradient descendent with momentum and adaptive linear BP (TRAINGDX) functions are algorithms similar to the one given by eq 5. They have variable learning rates and momenta. There are other algorithms that can converge from 10 to 100 times faster than the TRAINGD and TRAINGDM algorithms.

7076 Ind. Eng. Chem. Res., Vol. 47, No. 18, 2008

Figure 4. P value and correlation coefficient of all training algorithms and statistical test: (a) inferential parametric test, (b) nonparametric test based on measures of central tendency, (c) nonparametric test based on the variance, (d) correlation coefficient.

Figure 5. Mean P values of the statistical test for all training algorithms.

Ind. Eng. Chem. Res., Vol. 47, No. 18, 2008 7077 Table 6. TRAINCGP and TRAINCGB: Line Search Parameters and Studied Ranges parameter

range

Min_grad δ Scale_tol γ Max_step Min_step Bmax

1 × 10-10-0.1 1 × 10-4-0.1 1-100 1 × 10-4-0.1 10-1000 1 × 10-7-0.1 10-30

They can be based either on heuristic techniques or on standard numerical optimization techniques. The algorithms based on heuristic techniques also are called conjugate gradient algorithms. In TRAINGD and TRAINGDM, the learning rate is selected for one determined length of the weight update (step size) and always in the descent gradient direction. In most of the conjugate gradient algorithms, the step size is adjusted after each iteration, with the adjustment being made along the conjugate gradient direction. There are different versions of conjugate gradient algorithms; a general description of a conjugate gradient algorithm is given by eqs 6 and 7. The main differences among the various versions are the manner in which the learning rate parameter, βk, is computed and the algorithm used for the search direction that is periodically reset. Equations 8 and 9 give the learning rate parameter variation for Fletcher-Reeves and Polak-Ribiere algorithms, respectively. xk+1 ) xk + Rkpk

(6)

pk ) -gk + βkpk-1

(7)

βk ) βk )

gkTgk gk-1Tgk-1 ∆gk-1Tgk gk-1Tgk-1

(8)

(9)

The inequality proposed by Powell-Beale to reset the search direction is given by: |gk-1Tgk| g 0.2||gk||2

(10)

The basic calculation step of the quasi-Newton methods, eq 11, converges faster than the conjugate gradient method, but it requires the calculation of the Hessian matrix, Ak-1, of the performance index at the current values of weights and biases. This calculation is complex and expensive to perform. The Broyden, Fletcher, Goldfarb, and Shano (TRAINBFG) updating method and the Levenberg-Marquardt (TRAINLM) algorithms avoid this difficulty because they update an approximate Hessian matrix at each iteration of the algorithm. The one-step secant (TRAINOSS) method is an improvement of the TRAINBFG method because it decreases the storage and computation in each TRAINBFG iteration. The TRAINOSS method does not store the complete Hessian matrix; it assumes that the previous Hessian matrix was the identity matrix. xk+1 ) xk - Ak-1gk

(11)

The resilient BP (TRAINRP) method can be applied when the activation function of the MLP is the sigmoid function or other “squashing” functions. These functions have slopes that must approach zero as the inputs get larger. Therefore, if a steepestdescendent method is used to train a MLP with a squashing function, the gradient can have a very small magnitude, the weight changes can be small, and the time required to convergence can be large. In the resilient BP method, only the

sign of the derivative of the performance function with respect to the weight is used to determine the direction of the weight update. If the derivative has the same sign for two successive iterations, the update value for each weight is increased by a prefixed factor. If the derivative has different signs for two successive iterations, the update value for each weight is decreased by another prefixed factor. If the derivative is zero, then the update value remains the same. The Bayesian regularization (TRAINBR) method improves the ANN generalization. This method involves modifying the usually used performance index. The performance index used for the Bayesian regularization method is given by F ) βEd + REw

(12)

where R and β are parameters to be optimized, Ed is the mean sum of the squared network errors, and Ew is the sum of the squares of the network weights. The advantages and disadvantages of each algorithm are presented in Table 4. Training and Validation Samples. Two independent data sets were used to optimize the ANN: a training set made up of 584 sets of input and output values and a validation set made up of 123 sets. All data sets were obtained from the experimental results of the installation described in the Experimental Setup Used section. The experimental conditions used are listed in Table 5. One representative run that provided a learning data set is shown in Figure 3. This run was carried out with an air flow rate of 4.1 L/s. In Figure 3, the input air and fluidized-bed temperature and moisture are plotted as functions of the experimental time. A more detailed description about the experimental results can be found in Torrecilla et al.15 One data set is composed of three columns. One column contains the values of input air temperature, the second contains the values of the fluidized-bed temperature, and the third contains the values of the output solid moisture. The number of rows is equal to the number of minutes taken by the run (for instance, if an actual run lasts 15 min, the sample data set has 15 rows). Therefore, the three variable values contained in each row correspond to the values measured at the same time. Results and Discussion The optimization of the ANN was performed in two steps. First, a selection of the best training algorithms was made. This selection was made in terms of the best prediction capability and the time requirements. Second, a study of the influence of the ANN and the training algorithm parameters on the ANN performance was made. Training Function Selection. The ANN used to select the best training algorithms is the one described in the ANN Description section. The selection of the training function was carried out in two basic steps: training and validation. In the first step, the weights were optimized. In the second, the optimized weights were validated taking into account the prediction error. Statgraphics Plus version 5.1 was used to carry out the statistical tests in order to select the most adequate training function. Training and validation steps were carried out for 14 different training functions, as listed in Table 4. The data predicted in the validation step using all training functions were compared with the corresponded experimental data; that is, the 14 predicted data sets were compared with the same real data set one by one. The training function selection was carried out using two types of mathematical tools, viz., correlation coefficient values and statistical tests. First, using ANN estimated values, depending on the training function used, the correlation coefficients were calculated to quantify the quality of the least-squares fit to the experimental

7078 Ind. Eng. Chem. Res., Vol. 47, No. 18, 2008 Table 7. Factors with more influence on ANN performance most important factors training function

correlation coefficient

mean prediction error

TRAINCGP TRAINGDX TRAINCGB

Min-grad, RK topology, LR topology, Scale_tol, Bmax

Min-grad, Max-step P, LR, topology all factors

Table 8. TRAINGDX Parameters and Studied Ranges parameter

range

ηk LR_inc LR_dec Max_perf_inc p

(1 × 10-3)-1 1-2 (1 × 10-3)-1 1-2 (1 × 10-3)-1

Table 9. Statistical Differences between Validation Database and the Database Predicted by the Selected Training Algorithm validation database TRAINCGP TRAINGDX TRAINCGB average variance standard deviation minimum maximum Kurtosis

0.195 0.010 0.102 0 0.340 -0.846

0.207 0.010 0.100 0.022 0.340 -0.909

0.206 0.010 0.100 0.024 0.339 -0.931

0.221 0.005 0.073 0.053 0.384 -0.973

Verification Process Comparisons mean prediction error mean square error iterations

0.016 0.006 43073

0.016 0.006 187190

0.029 0.017 9

0.758 0.897 0.783 1 0.897 0.897 0.863 0.770 0.998

0.102 0.040 0 0 0.03 0.03 0.201 0.009 0.82

Statistical Comparisons t test (means) F-test (standard deviation) Mann-Wihtney (median) Kolmogorov-Smirnov Cochran (variance) Bartlett (variance) Levane (Variance) Krustal-Wallis correlation coefficient

0.7458 0.910 0.705 1 0.910 0.910 0.876 0.692 0.998

number of iterations

Table 10. Correlation Coefficient and Mean Prediction Error Decreased Rates training algorithm

correlation coefficient (%)

mean prediction error (%)

TRAINCGP TRAINGDX TRAINCGB

2.1 0.1 4.8

34.1 2.2 62.6

values, to study how well the regression model explained the variability of these values. Second, the comparison was carried out by applying several statistical tests: nonparametric methods based on measures of central tendency (Kolmogorov-Smirnov test andMann-Whitney-Wilcoxontest),onthevariance(Kruscal-Wallis test, Cochran test, Barllet test, and Levene test), and on an inferential parametric test for significance (F-test and t-test). All of these statistical analyses were performed to determine the significant statistical differences between moisture data provided by the experimental runs (real data) and those predicted by the ANN. The P value was used to check the null hypothesis, which assumes that the statistical parameters of the two series are equal. If the P value is greater than 0.05, the null hypothesis is fulfilled. If the P value is closer to 1, the null hypothesis is fulfilled with more confidence. The correlation coefficient of the moisture prediction data versus real data is shown in Figure 4. Given that the correlation coefficients of TRAINGDM and TRAINGD are less than 0.8, these algorithms were disregarded. TRAINCGP and TRAINGDX provide the highest correlation coefficients, 0.998 and 0.97, respectively. Since

Min-grad, Scale-tol, Delta P, LR, Max_perf_inc, LR_dec topology

most of the correlation coefficients are higher than 0.9, Figure 4, other statistical tools (vide supra) are required to select the best training function. The P values calculated for all statistical differences between the moisture data provided by the runs and those predicted by the ANN implemented with each one of the training algorithms are shown in Figure 4. In all cases, a confidence level of 95% was used. The P values range between 0 and 0.988. In general, the Kolmogorov-Smirnov test (Figure 4b) is the test that provides the lowest P values. Some training algorithms, for example, TRAINGD, TRAINGDM, TRAINR, can be rejected because the P values of all of the statistical tests were low. The selection of the two best training algorithms from those remainding was made considering that a P value closer to 1 indicate a better training algorithm. The Polak-Ribiere conjugate gradient BP (TRAINCGP) and the gradient descent with momentum and adaptive linear BP (TRAINGDX) are the training algorithms that have the highest P values. The P values obtained for all statistical tests, with the exception of the Kolmogorov-Smirnov test applied to the TRAINCGP algorithm, were greater than 0.05. Also, TRAINCGP and TRAINGDX are the training algorithms that provided the highest values of the mean of the P values of all statistical tests (0.52 and 0.74 for TRAINCGP and TRAINGDX, respectively), as shown in Figure 5. This fact confirms that the real and predicted solid moisture data have similar statistical distributions. In summary, taking into account the correlation coefficient values and the results of the statistical tests, the training functions selected were TRAINCGP and TRAINGDX. Once an ANN is modeled, some practical aspects should be considered to establish the design criteria, such as the accuracy of the sensors or the sampling time required system identification. A realistic ANN design does not require prediction accuracy higher than that of the sensor measurements, and the ANN calculation time should be lower than the sampling time required for system identification. Therefore, a second criterion to form the basis of ANN selection is the ANN time requirements to carry out the predictions. From this criterion of the minimum time requirement, the best training algorithm is Powell-Beale conjugate gradient BP (TRAINCGB). When TRAINCGB is used to train the ANN, the P values obtained for all statistical tests, with the exception of the central tendency tests, are greater than 0.05. This fact confirms that the real and predicted solid moisture data have similar variances and significances. Analysis of ANN Sensitivity. A study to determine the parameters that have a greater influence on the ANN performance was done. The study was done for the ANN that was trained with the three best training algorithms selected in the previous section, namely, TRAINCGP, TRAINGDX, and TRAINCGB, and it was carried out by following a factorial design. The factors considered in the factorial design were ANN, training algorithm, and line search parameters. The factorial design responses were the correlation coefficient; the number of iterations; and the mean prediction error (MSE), given by N

MSE )



1 (r - yk)2 N k)1 k

(13)

where N is the number of concurrent vectors of the learning sample, rk is the target output value, and the yk is the network

Ind. Eng. Chem. Res., Vol. 47, No. 18, 2008 7079

Figure 6. Experimental values from validation samples versus their respective estimated moisture values calculated by ANN with the TRAINCGP training function fitting.

output value (eq 2). The factors with more influence on ANN performance were selected by the factorial design obtained for each of the three selected algorithms. Polak-Ribiere Conjugate Gradient Backpropagation (TRAINCGP). The ANN parameters considered as Factorial Design variable is the topology, which is the neuron number in the hidden layer; the range studied is from 1 to 20 neurons. The parameters required by the TRAINCGP algorithm, Eqs 6, 7 and 9, are Rk and βk. The parameter Rk is a factor which determines the error reduction that has been ranged from 1 × 10-4 to 0.1 and the βk is a factor which determines sufficiently large step size in the searched of the minimum error and has been ranged from 1 × 10-4 to 0.1. The ranges studied of the parameters of the line search are shown in Table 6. These ranges were selected following Vacic’s recommendations, Vacic (2005).25 The responses depend on all factors studied, but the degree of influence is function of the factor and response considered. So, the correlation coefficient, r,2 depends on all the factors studied but those that present more influences are the Min-grad and Rk, Table 7. The correlation coefficient increases when Min-grad and Rk decrease. The more influencing factors on the mean prediction error are Min_grad and Max_step, Table 7. The mean prediction error decreases when Min_grad decreases and Max_step increases. The number of iterations is the most sensible response to the factors, Table 7. The number of iterations decreases when the Min-grad, βk, δ and γ increase and Scale-tol, Rk, Max-step, Min-step, Bmax and topology decrease. Min_grad, Scale_tol and Delta are the mainly influence factors, Table 7. Gradient Descent with Momentum and Adaptive Linear Backpropagation (TRAINGDX). The ANN parameter considered as the factorial design variable was the number of neurons in the hidden layer; the range studied was from 1 to 20 neurons. The parameters required by TRAINGDX (eq 5) are ηk, the learning coefficient; LR_inc, the ratio by which to increase the learning coefficient; LR_dec, the ratio by which to decrease the learning coefficient; Max_perf_inc, the parameter that controls the variation of the error; and p, the momentum. The studied ranges of these parameters were taken from Demuth et al. and are listed in Table 8.23 The results show that the responses depend on all of the factors studied, but the degree of influence is a function of the factor and response considered. Thus, the correlation coefficient, r2, at 95% statistical confidence, depends on the topology (number of neurons in the hidden layer, Table 7). The correlation coefficient depends

on all other factors but to a lower degree; for example, the correlation coefficient depends on LR but about 72% less than on topology. The mean prediction error mainly depends on p, LR, and topology (Table 7). It also depends on all of the other factors but less than on p, LR, and topology; for example, the error depends on LR_dec but about 86% lower than on p. The number of iterations mainly depends on p, LR, MAX_perf_inc, and LR_dec (Table 7). Again, it depends on all of the other factors but to a lower degree; for example, the iteration number depends on LR but about 85% less than on p. The ANN topology has a lower influence on the number of iterations. Powell-Beale Conjugate Gradient BP (TRAINCGB). The ANN parameter considered as the factorial design variable was the topology, i.e., the number of neurons in the hidden layer; the range studied was from 1 to 20 neurons. The parameters required by TRAINCGB (eqs 6 and 10) and the ranges studied are the same as for TRAINCGP (Table 6). The correlation coefficient, r2, increases when topology, Scale-tol, and Bmax increase (Table 7). The influence of the other parameters on the correlation coefficient is the inverse. The lower influences are those of Bmax and Scale-tol. The mean predictive error depends on all of the factors (Table 7). With a 90% of confidence level, the influence of the topology on the iteration number is the highest (Table 7); it increases with the topology. The selection of TRAINCGB was made on the basis of the number of iterations required. Therefore, in the study of this algorithm, two new variables were defined to evaluate the algorithm efficiency, the correlation coefficient and the mean prediction error decreased rates. They were calculated by dividing the difference between the initial (time equal to zero) and final values of the correlation coefficient and the mean prediction error, respectively, by the iteration number. In the verification process, when the matrix of weights was optimized, the correlation coefficient rate was 8.2%, and the mean prediction error was 92.4%. Comparison of Training Algorithms. The statistics of the validation database and those obtained when the three selected algorithms, TRAINCGP, TRAINGDX, and TRAINCGB, were used to train the ANN are reported in Table 9. The results show that, in accord with the statistical properties, the TRAINCGP and TRAINGDX algorithms are similar. The main difference between the two algorithms is the iteration number necessary to reach the optimal weight matrix. TRAINCGP required 43073 iterations to optimize the weight matrix, which is less than the

7080 Ind. Eng. Chem. Res., Vol. 47, No. 18, 2008

number of iterations required by the TRAINGDX algorithm, 187190. The mean prediction error for both algorithms was 1.6%, that is lower than the 4.6%, mean prediction error obtained with the same ANN trained with gradient descendent training. The correlation coefficient and mean prediction error of the decreased rates obtained with ANNs trained with each of the three training algorithms selected are reported in Table 10. TRAINCGB shows the highest values of both variables. In summary, although the number of iterations is superior, in light of the statistical results reported in Table 9, TRAINCGP was selected. The experimental data and the related results predicted by the ANN with TRAINCGP are shown in Figure 6. Conclusions The influence of the training algorithm used in the ANN training process has been studied. Statistical tools were used in order to discriminate among the different training algorithms studied. The results obtained show that the training algorithm used during ANN training has a significant influence on the ANN prediction accuracy and on the computing time requirements. To obtain a good model accuracy, the best training algorithm of those studied is the Polak-Ribiere conjugate gradient BP method. The ANN implemented with this algorithm was able to model the drying process with the lowest mean prediction error (1.6%) and required the lowest number of iterations (43073 iterations). The correlation coefficient of real versus predicted values was 0.998. Therefore, the resulting trained ANN is a good tool for modeling the fluidized-bed dryer. These results were compared with another ANN developed in a previous study trained with gradient descendent training.15 The previously developed ANN was able to predict moistures with a prediction error higher (4.5%) than that of the ANN described here (1.6%), and the correlation coefficient for that ANN (0.95) is lower than that obtained with the present optimized ANN (0.998). Therefore, in order to have a better model, the improvements developed in this study allow for a better ANN to be obtained. The fluidized-bed dryer could then be well modeled by using the adequately designed and optimized ANN. Acknowledgment This study was carried out with the financial support from the Commission of the European Communities, FAIR Program, RTD Project CT1996-1420, and from the National RTD Plan, Spanish Ministry Education and Culture, Project AMB97-1293CE. J.S.T. was supported by a Ramon y Cajal research contract from the Spanish Ministry Education and Culture. Literature Cited (1) Viswanathan, K. Model for continuous drying of solids in fluidized/ spouted beds. Can. J. Chem. Eng. 1986, 64, 87–95. (2) Lai, F.; Yiming, C.; Fan, L. Modeling and simulation of a continuous fluidized-bed dryer. Chem. Eng. Sci. 1986, 41, 2419–2430. (3) Donsı`, G.; Ferrari, G. Modeling of Two Component Fluid Bed Dryers: An Approach to the Evaluation of the Drying Time. In Drying ’92; Mujumdar, A. S., Ed.; Elsevier Science Publishing: Amsterdam, 1992; pp 493-502. (4) Hoebink, J; Rietema, K. Drying granular solids in a fluidized bed I: Description on basis of mass and heat transfer coefficients. Chem. Eng. Sci. 1980, 35, 2135–2139. (5) Panda, R.; Rao, S. Dynamic model of a fluidized bed dryer. Drying Technol. 1993, 11, 589–602.

(6) Pala´nz, B. A mathematical model for continuous fluidized bed drying. Chem. Eng. Sci. 1983, 38, 1045–1059. (7) Zahed, A.; Zhu, J.; Grace, J. Modeling and simulation of batch and continuous fluidized bed dryers. Drying Technol. 1995, 13, 1–28. (8) Huang, B.; Mujumdar, A. Use of neural network to predict industrial dryer performance. Drying Technol. 1993, 11, 525–541. (9) Trelea, I.; Courtois, F.; Trystam, G. Modelisation de la cine´tique de se´chage et de la de´gradation de la qualite´ amidonnie`re du maı¨s par re´seaux de neurones. Re´cents Prog. Ge´nie Procede´s 1995, 9, 135–140. (10) Jinescu, G.; Lavric, V. The artificial neural networks and the drying process modelling. Drying Technol. 1995, 13, 1579–1586. (11) Farkas, I.; Reme´nyi, P.; Biro´, A. A neural network topology for modeling grain drying. Comput. Electron. Agric. 2000, 26, 147–158. (12) Farkas, I.; Reme´nyi, P.; Biro´, A. Modeling aspect of grain drying with a neural network. Comput. Electron. Agric. 2000, 29, 99–113. (13) Castellanos, J. A.; Palancar, M. C.; Arago´n, J. M. Neural network model for fluidised bed dryers. Drying Technol. 2001, 19, 1023–1044. (14) Castellanos, J. A.; Palancar, M. C.; Arago´n, J. M. Designing and optimizing a neural network for the modeling of a fluidized-bed drying process. Ind. Eng. Chem. Res. 2002, 41, 2262–2269. (15) Torrecilla, J. S.; Aragon, J. M.; Palancar, M. C. Modeling the drying of a high-moisture solid with an artificial neural network. Ind. Eng. Chem. Res. 2005, 44, 8057–8066. (16) Rosenblatt, F. The Perceptron: A probabilistic model for information storage and organization ion the brain. Psychol. ReV. 1958, 65, 386–408. (17) Torrecilla, J. S.; Mena, M. L.; Ya´n˜ez-Sede˜no, P.; Garcı´a, J. Quantification of Phenolic Compounds in Olive Oil Mill Wastewater by Artificial Neural Network/Laccase Biosensor. J. Agric. Food Chem. 2007, 55, 7418–7426. (18) Uceda, M.; Hermoso, M.; Gonza´lez, F.; Gonza´lez, J. Evolucio´n de la tecnologı´a de extraccio´n del aceite de oliva. Aliment., Equipos Tecnol. 1995, 4, 12–18. (19) Alba, J. Nuevas tecnologı´as para la obtencio´n del aceite de oliva. Fruticultura Prof. 1994, 62, 85–95. (20) Suria, E. Secado de orujo de aceituna procedente del decanter de dos fases. Aliment., Equipos Tecnol. 1996, 10, 12–24. (21) Uceda, M.; Hermoso, M.; Gonza´lez, J. Evolucio´n de la tecnologı´a del aceite de oliva, nuevos sistemas ecolo´gicos; ensayos y conclusiones. Aliment., Equipos Tecnol. 1995, 5, 93–98. (22) Hagan, M. T.; Demuth, H. B. Neural networks for control. In Proceedings of the 1999 American Control Conference; IEEE Press: Piscataway, NJ, 1999; pp 1642-1656. (23) Demuth, H.; Beale, M.; Hagan, M. T. Neural Network Toolbox for Use with MATLAB. Neural Network Toolbox User’s Guide; The MathWorks Inc.: Natick, MA, 2005. (24) Qian, N. On the momentum term in gradient descent learning algorithms. Neural Netw. 1999, 12, 145–151. (25) Vacic, V. Summary of the training functions in Matlab’s NN toolbox; University of California, Riverside: Riverside, CA, 2005; available at http://www.cs.ucr.edu/∼vladimir/cs171/nn_summary.pdf. (26) Rumelhart, D. E.; Hinton, G.; Willians, R. Learning representations by back-propagating errors. Nature 1986, 323, 533–536. (27) Hagan, M. T.; Demuth, H. B.; Beale, M. H. Neural Network Design; PWS Publishing: Boston, MA, 1996. (28) Fletcher, R.; Reeves, C. M. Function minimization by conjugate gradient. Comput. J. 1964, 7, 149–154. (29) Moller, M. F. A scaled conjugate gradient algorithm for fast supervised learning. Neural Netw. 1983, 6, 525–533. (30) Dennis, J. E.; Schanabel, R. B. Numerical Methods for Unconstrained Optimization and Nonlinear Equations; Prentice-Hall: Englewood Cliffs, NJ, 1983. (31) Foresee, F. D.; Hagan, M. T. Gauss-Newton approximation to bayesian regularization. In Proceedings of the International Joint Conference on Neural Networks; IEEE Press: Piscataway, NJ, 1997; pp 1930-1935. (32) Hagan, M. T.; Menhaj, M. Training feedforward networks with the Marquardt algorithm. IEEE Trans. Neural Netwoks 1994, 5, 989–993. (33) Markay, D. J. C. Bayessian interpolation. Neural Comput. 1992, 4, 415–447.

ReceiVed for reView January 22, 2008 ReVised manuscript receiVed June 11, 2008 Accepted June 24, 2008 IE8001205