Designing and Optimizing a Neural Network for the ... - ACS Publications

The target is to predict the moisture of the solid from operating conditions data. ... to improve the ANN training, e.g., selecting the data sets (num...
0 downloads 0 Views 181KB Size
2262

Ind. Eng. Chem. Res. 2002, 41, 2262-2269

Designing and Optimizing a Neural Network for the Modeling of a Fluidized-Bed Drying Process Jose´ A. Castellanos, Marı´a C. Palancar, and Jose´ M. Arago´ n* Faculty of Chemistry, Department of Chemical Engineering, University Complutense of Madrid, 28040 Madrid, Spain

A wet granular solid material (alperujo, a waste from the olive mills) was dried using a fluidizedbed dryer (FBD) system. The drying curves, data of moisture vs time, were fitted to an exponential equation and then interpolated and used as learning data for an artificial neural network (ANN). The target is to predict the moisture of the solid from operating conditions data. The ANN has three layers, with four inputs, four hidden neurons, and one output. Several criteria are given to improve the ANN training, e.g., selecting the data sets (number of data and order in which they are shown to the network), tuning the learning coefficient (set at 1.5), and optimizing the sigmoid function (two adjustable parameters, R and β, set at 3 and 9). The optimized ANN can predict the evolution of the moisture of the solid with a model error of (1.57%. Introduction “Alperujo” is a wet solid waste generated in the olive mills using the modern two-phase decanters for the extraction of olive oil. Alperujo has a high content of water (55-65%) and a remaining fraction of oil (3-6%) that can be extracted with hexane from dried alperujo. The best moisture content for this purpose is about 8%. Currently, alperujo is dried in rotary cylindrical drums, which operate at a high temperature (the inlet air temperature is 500-600 °C). The operation in this type of dryer is frequently problematic because other compounds are present in alperujo, e.g., sugars (about 0.3%), pectins, and resins that decompose at high temperatures, forming a strong protective shield around the particles (a resistance for the water vaporization) and promoting the granulation of the solid. Fluidized-bed systems are gas-solid contactors widely used in many operations and processes in the last few decades. The drying of granular solids is one attractive application of fluidized beds because of the good uniform temperature conditions that can be reached inside the bed. The hot spots are practically avoided in fluidized systems, and the input drying gas is at a lower temperature than that in a rotary drum dryer (e.g., 125 °C instead of 600 °C). Consequently, the problems of granulation and decomposition of sugars and other problematic compounds can be diminished significantly, and because of the good gas-solid contact, the thermal efficiency is higher and the required drying time can be lower than that in rotary drum dryers. The rigorous study of fluidized-bed dryers (FBD) is very difficult. The classical models of FBDs are based on energy and mass balances and kinetic and equilibrium laws. They are solved with more or less simplifications of the reality. For example, Viswanathan1 has assumed that the temperature and other properties inside the bed are uniform and that the gas throughout the bed is in thermal and chemical equilibrium with the solid. Ulku¨ and Uckan2 have modeled a batch FBD as a homogeneous system; bubbles are ignored, and the rate of the internal diffusion inside the solid particles * To whom correspondence should be addressed. Telephone and Fax: +34-91-394 41 73. E-mail: [email protected].

is the rate-controlling step of the drying process. Several authors3-5 have used the two-phase theory to model FBDs. The assumptions are that the gas flows through the bed in two different ways and that the bed is formed by two phases: a dense phase (emulsion) and a diluted one (gas bubbles). Heat and mass transfer between the two phases are also assumed. Other authors6-8 have used a three-phase model that involves a dilute phase (bubbles), the interstitial gas, and the solid particles. Simplified models can be valid under some circumstances but not usually in many real situations. The most elaborate ones (two- and three-phase models) have wide applicability, but they require the knowledge of many parameters about the vessel geometry and properties of the solid particles and the gas. For example, the model of FBD developed by Lai et al.4 requires 22 departure data. Some of these data are well-known, e.g., some basic properties of air, water, and steam. Others can be measured, e.g., moisture, temperature, and flow rate of the solid, temperature humidity and flow rate of the input air, height and diameter of the bed, and temperature of the inner wall of the dryer. The model of Lai also includes several properties of the fluidized system that usually are only roughly known, e.g., the fraction of the bed occupied by the bubbles, the mass exchange coefficient between the bubbles and the emulsion, the humidity of the air interface on the particles, and the convective heat-transfer coefficients (airparticle and bed-wall of the dryer). Some authors5,9 have presented a review of experimental and estimating methods for some of these parameters. However, most of the parameters are roughly known or can be estimated only approximately. Besides, the properties required by the FBD model of Lai et al.4 for the drying of alperujo would be density, critical moisture, heat capacity, and superficial velocity of minimum fluidization. However, the alperujo characteristics depend on the moisture, pulp/pit content, particle size distribution of the material, etc., and are difficult to find out. These models calculate the outlet solid moisture and temperature by the application of mass and energy conservation equations to the bubble phase, emulsion gas, and a single particle. The mass and energy conservation equations of the bubble phase and emulsion gas are in steady-state conditions. They are first-order differential

10.1021/ie000950t CCC: $22.00 © 2002 American Chemical Society Published on Web 04/02/2002

Ind. Eng. Chem. Res., Vol. 41, No. 9, 2002 2263

equations of the emulsion gas and gas bubble moisture and enthalpy with respect to the bed elevation. The mass and energy conservation equations applied to a single particle are first-order differential equations with respect to time. The simultaneous resolution of the equations should be done. The bubble phase and emulsion gas mass conservation equations and emulsion gas energy conservation equation are directly integrated. The rest of the equations should be solved by numerical approximation. The main problem is that some nonmeasurable variables are present in some of these equations. The solution of the model equations should be obtained through a two-dimensional trial-and-error procedure. According to the model of Lai et al.,4 the humidity and temperature of the air filling the interstices of the emulsion phase must be assumed. Therefore, the calculation of the final unknown variables (temperature and moisture of the output solid) cannot be directly made but by successive iterations. The computing time can be long; e.g., a 200-MHz PC can take many seconds to process each iteration, a time that could promote unstable behavior if the FBD dynamic model is implemented in online control systems. The use of empirical models avoids the difficulties described above. The empirical models based on artificial neural networks (ANN) are useful to model and design process control systems.10-12 Excellent information about ANNs is available in books such as that in ref 13. The chemical engineering applications of the ANNs have increased in the last 2 decades. The application of ANNs to model and control processes is abundant and diverse. Baratti et al.14 have shown that the implementation of ANNs to monitor and control two distillation columns in operation in a refinery can significantly improve the operation. Cubillos and Lima15 have modeled a four-stage flotation unit by a hybrid neuronal model. Different aspects of the operation of a wastewater treatment plant have been treated by means of ANN models. For example, Schuhen and Koehne16 have applied a hybrid neural network to control the NH4 concentration in the discharge of an activated-sludge tank with an upstream equalizing basin. Syu and Chen17 have used a control system based in a back-propagation neural network to reduce the volume of reagent consumed to reach the required carbon oxygen demand in a wastewater treatment plant. Milanicˇ et al.18 have optimized the control of an industrial hydrolysis batch process using a model-based strategy that uses an ANN to identify the process dynamics of the plant. Breusegem et al.19 have used an ANN to predict fermentation variables. Tasks of optimization are also treated by means of ANNs. Thus, Smith20 has used an ANN to optimize a treatment of wastepaper, achieving a cost reduction of 32%. Boot and Roland21 have developed a close-loop supervisory control based on ANNs to reduce the NOx emissions and to improve the heat rate in a combustion plant. Neutralization processes (control of pH) are strongly nonlinear. The ANNs have been used to model the dynamics of neutralization and to design neutralization control systems. There are several papers related to these applications.22-27 With the exception of Cheng et al.,24 who have used a recurrent neural network trained by a nonlinear programming code, the rest of the authors have used multilayer feedforward networks trained by back-propagation algorithms. Regarding control, the ANN is implemented in different forms in the control

system. For example, Nahas et al.23 and Palancar et al.25 have developed a model inverse system control. Bhat and McAvoy22 have compared the efficiencies of inverse model and optimal control systems that use an ANN to predict future values of pH. Yeo and Kwon27 have developed a method to tune a proportional-integralderivative controller by means of an ANN. The use of ANNs for modeling dryers is rarely documented in the open literature. There is a scarcity of experimental data, design of the learning data, and criteria for optimizing the training. Trelea et al.28 have used an ANN to express kinetic aspects of drying and to predict the quality of dried corn. Jinescu and Lavric29 have modeled a fixed-bed dryer with an ANN. Huang and Mujumdar30 have used an ANN to model the performance of an industrial dryer of tissue paper. The ANNs used in drying systems usually predict the output values of the temperature and moisture of the solid. The training is made with data obtained by simulation, and the results have validated the use of ANNs to predict properly the performance of many dryers. Balasubramanian et al.31 have used a three-layer ANN for modeling a FBD. The input layer has three nodes (temperature and flow rate of the inlet air and particle residence time) and a bias. The output layer has two nodes, which contain the predicted values of temperature and humidity of the outlet air. The hidden layer has 10 nodes. The activation function for all of the nodes is the sigmoid function, and the back-propagation algorithm was used to train the ANN. The training database was formed by 120 data sets that were obtained from a FBD simulation model. The aim of this paper is to verify that an ANN can be used to predict the moisture-time curves obtained in the drying of alperujo in FBD. In addition, a method to obtain a representative learning data and several criteria to provide an optimal training of the ANN are proposed. The paper is organized as follows: First the experimental setup and procedure are described, and then the description of a departure basic model based on an ANN is made. This ANN was first trained using a database of data sets obtained experimentally. The second step was the optimization of the parameters of the ANN, establishing some criteria to optimize the training process and the learning database. A new learning database was generated by interpolation and optimized by numeric filtration. Using then the optimized database, a new optimization was made by considering the number of iterations, number of hidden layer nodes, learning coefficient, sigmoid function parameters, and learning data ordering. Finally, a comparison between the prediction results obtained with the ANN trained by a classical and an optimized method is made. Experimental Setup Data were obtained in a medium-size pilot plant (Figure 1), which consists of a fluidized bed of 5.4 cm i.d. and 40 cm height. The wet solid is fed from the hopper by means of a J valve. Another J valve, located at the bottom of the fluidized bed, serves to take samples of solid from the fluidized bed. There are several measurement and control devices along the plant used to control the mass flow rate and temperature of the input air and to know the temperature of the fluidized bed. The moisture of the solid is measured by weighing the samples of solid before and after a thermal treatment for 24 h in a furnace at 105 °C.

2264

Ind. Eng. Chem. Res., Vol. 41, No. 9, 2002

Neural Network Description The ANN used in this work is based on the multilayer perceptron.13 It has three layers: input, hidden, and output. The input layer has four nodes with the values of time (t), bed temperature, inlet air temperature, and moisture of the solid in the fluidized bed at time t. The output is a single node containing the moisture of the solid in the fluidized bed. All of the input and output values are refreshed every 30 s. All of the input and output data are normalized between 0 and 1 (by dividing each by its expected or measured maximum value). The training is made using known values of input and output data sets. More details about the basic method of training and its optimization are given in the following section. Once the ANN is trained, the desired drying curves are obtained by feeding the ANN with the adequate input data. The transfer function used in the ANN to calculate the output of every neuron is a modified sigmoid function:

oj )

1 1 + e-Rsj/β

(1)

where the sum of inputs to each neuron is

sj )

Figure 1. Diagram of the pilot plant.

The temperature of the input air stream ranged between 70 and 250 °C. This range is adequate for reaching a fluidized-bed temperature of between 60 and 120 °C. A bed temperature lower than 60 °C would lead to very long drying times. The upper value of 120 °C should not be exceeded because of the risk of alperujo self-combustion, which could start near 150 °C. The moisture of the fresh alperujo coming from the olive mill is about 60% (wet basis). This means that fresh alperujo is similar to sludge and hard to handle and transport (not to mention the impossibility of feeding and fluidizing that material). For this reason, fresh alperujo must be mixed with dry alperujo before being fed into the fluidized bed. This previous operation leads to a granulated mixture, which is manageable, and does not have sludge behavior. This procedure is a common practice in industrial dryers based on rotary drums. Several batches of material with different mean moisture were prepared by mixing fresh (wet) and dry alperujo in ratios between 50/50 and 60/40. The experimental procedure has the following steps: (1) the loading of the vessel with a 200-300 g mass of dry alperujo; (2) the fluidization of the above mass with a fixed flow rate of hot air; (3) the loading of 200-300 g of fresh alperujo which is added when a steady-state temperature is reached (time zero); (4) the measurement of the further reduction of the solid moisture inside the fluidized bed with time. Following that procedure, a series of 17 drying curves (solid moisture vs time) has been obtained. The parameters of each curve are the initial moisture of the solid and the temperature of the inlet air stream.

∑i wij(k) oi

(2)

The parameters R and β in eq 1 modify the value of the derivative of the threshold function. Therefore, these parameters are used to optimize the learning rate in a way similar to that in other previous publications.29 The training is made using the back-propagation algorithm,35 a tool frequently used because of its simplicity and robustness. This algorithm tries to minimize the error after each iteration:

1 Ej ) (dj - oj)2 2

(3)

The successive changes in the connection values (weights) are proportional to the first derivative of the error with respect to weights. The weight updating is calculated by

wij(k+1) ) wij(k) + µδjoi

(4)

where δj is for the output layer

δj ) (dj - oj)f ′(oj)

(5)

and for the hidden layer

δj )

∑h δhwhj f ′(oj)

(6)

where the derivative of the sigmoid function is

f ′(oj) ) oj(1 - oj)

(7)

Generation of Interpolated Learning Data The ANN described above was used to study the influence of the sample size used in the training of the ANN (number of data sets included in the database) and the number of iterations (number of times the complete database is shown to the ANN). Once these aspects were optimized, the parameters of the ANN (R, β, µ, and

Ind. Eng. Chem. Res., Vol. 41, No. 9, 2002 2265

Figure 2. Description of the interpolation by tangents.

Figure 3. Example of interpolation by tangents and data selection.

number of hidden neurons) were optimized again with the new database. The results and comparison between both ANNs are described in the last sections. The optimal quantity of learning data is unpredictable and difficult to know a priori. With only a few data, the system is roughly defined, but the ANN could describe the entire experimental data used for learning without significant errors. On the other hand, if the ANN is trained with too many data, one cannot necessarily expect an improved performance because the ANN may suffer an “overlearning” and a subsequent loss of prediction efficiency. An objective of this paper is to define an optimal number of samples for the learning process. The runs provided 292 sets of data that were used in basic training as described in the following section. If more learning data were desired to train the ANN, the experimental work required to obtain these new data would be too laborious and expensive. The solution proposed here is to generate learning data by interpolating a rather limited series of experimental values. The interpolated data sets are obtained by fitting the experimental curves of solid moisture vs time to the following equation:

selected points are a and c. To provide a criterion to obtain a suitable database from the curves fitted by eq 8, an algorithm of interpolation by tangents is proposed. This algorithm serves to select interpolated data that have a given quality. The geometrical meaning of the algorithm is shown in Figure 2. It is assumed that only some points have to be selected among several points, a, b, ..., h, which have been generated by the interpolation of the fitted curve with equally spaced intervals of the independent variable. For example, for a given point a, the first trial of the following point to select would be g; the error can be measured by the vertical distance between the curve and the straight line parallel to the tangent of the curve. If this error is greater than the desired accuracy, successive closer points to a would be selected, e.g., points f, e, etc., until the generated error was less than the specified one. The rest of the intermediate data should be ignored (for that accuracy). An example of a result of this algorithm is given in Figure 3 in which 9 data, selected from a series of 29 interpolated data, are shown. It can be seen that the original 29 interpolated data are equally spaced in time (one point every 0.5 min) and that the selected points are closer together in the first period and more apart later.

H ) a + be-ct

(8)

Some authors32,33 have used eq 8 as an empirical model of drying kinetics in FBD obtaining relationships between the coefficients a, b, and c and the operating conditions. However, these relationships were tried unsuccessfully, and this is the reason for developing the ANN model. A particular fitting of each experimental drying curve to eq 8 has been made, and a number of interpolated data have been calculated to reach the desired sets of learning data. The data sets contained in the learning database should represent the behavior of the system and should provide dynamic information.34 However, this fitting should be used carefully, given that, if the data are bad, on adjusting them to fit drying curves, the ANN trained using data of these curves could give deceptively good predictions. The interpolation could be made either by using equal time intervals for each point or by selecting a series of concrete points. The selection of discrete points of the curves requires some criteria to reach as much as possible the maximum “quality” for each interpolated point. This quality depends on many factors, some of them unpredictable, as is the shape of the curve itself. For example, in Figure 2 the selection of two discrete points of curve H ) f(t) depends on an admissible error because this error would be larger if the points d and g are selected than if the

Basic Classical Training of the ANN First an ANN model was obtained using a classical method of optimization based on a trial-and-error procedure to select the number of hidden neurons and a good value of the learning coefficient. The error criterion was minimizing the learning error, eq 3. The parameters R and β of the sigmoid function, eq 1, were set at 1. The training was made with a database of 292 experimental sets of data. Each input set of data comprises the values, at time t, of the bed temperature, inlet air temperature, and moisture of the solid in the bed. The desired value of the moisture of the solid in the bed at time t + 0.5, the single output node, was supplied by calculating interpolated data by eq 8. The training was carried out by showing 5000 times the complete database to the ANN. From the trial-and-error method, the learning coefficient µ was finally set at 2 and the number of neurons in the intermediate layer at 5. A bias was tested but not considered finally because it did not significantly improve the error. Optimization of the Database Many samples with different sizes were obtained using the interpolation by tangents algorithm. Con-

2266

Ind. Eng. Chem. Res., Vol. 41, No. 9, 2002

Figure 4. Effect of the sample size and the number of iterations on the ASDRLI (topology of the ANN: 4:5:1, µ ) 2, basic training with 292 data sets).

Figure 5. Effect of the sample size and the number of iterations on the UCC (topology of the ANN: 4:5:1, µ ) 2, basic training with 292 data sets).

Table 1. Relation between the Maximum Error Accepted and the Sample Size

In order to analyze the computation effort required by the ANN to use the learning data, both effects of the number of iterations and the sample size were studied using a new parameter, the unitary computational cost (UCC), which expresses the number of operations needed to improve the error and is defined as

tolerance (%)

SS obtained

tolerance (%)

SS obtained

8.0 5.0 3.0 2.0 1.7 1.5

55 68 85 101 107 113

1.3 1.0 0.8 0.5 0.2 0.0

124 138 158 198 300 471

UCC )

cretely, a new set of 471 interpolated data was obtained from the initial 292 experimental data. Then, several databases were selected from these 471 data and were used in the ANN training. The size of each selected database depends on the tolerance, maximum error assumed in the interpolation algorithm (Table 1). The selected databases were tested as learning data, studying the learning error, eq 3, and the model error, eq 9, which represents the mean deviation of all of the points used in the ANN training, defined by

x∑ N

E)

(dj - oj)2

j)1

SS

× 100

(9)

where SS is the sample size (number of data sets in the database), dj the desired value for the output j, and the actual output oj. The variations of both errors with SS are not constant. When SS is low, the variations of the errors are large, whereas for large SS, the variation becomes very smooth. This change of the error variation could be a good method to establish the optimal values of SS. Therefore, a further analysis was made following a criterion based on two complementary measures of the efficiency of iteration. The specific differential rate of loss information (SDRLI), defined by eq 10, and the accumulated specific differential rate of loss information (ASDRLI), defined as the sum of the values of the SDRLI, are from the highest SS to the smallest one.

SDRLI ) -

dE 1 dSS SS

(10)

The evolution of ASDRLI with SS is shown in Figure 4. A similar result is obtained for SDRLI. It was verified that the loss of information becomes meaningful when the sample size is less than about 100 data sets, independent of the number of iterations.

computing work SS × N ) error variation ∆E

(11)

where N is the number of iterations required to decrease the model error in a ∆E % using a sample size SS. From the three-dimensional fitting of the UCC vs the number of iterations and SS (Figure 5), it can be deduced that for either a high or a very low SS the data are used poorly during the ANN training. Using the ASDRLI criterion, the lower limit was set at 107. Though UCC is better for smaller sizes, it was decided that the improvement of the ANN training using a sample with 107 sets of data during 2000 iterations is good enough to apply subsequent criteria of optimization. Optimization of the ANN This was made by the following three steps: (1) optimization of the number of hidden nodes; (2) optimization of the learning coefficient, and (3) improvement using a threshold function based on a modified sigmoid function, eq 1, with two adjusting parameters, R and β. The training was carried out in all cases using the database selected in the above section, 107 data sets, with a maximum of 2000 iterations and following the back-propagation algorithm,35 eqs 3-7. The number of hidden nodes varying from 3 to 6 was tried. The learning error when the topology is 4:3:1 is the largest one. The behaviors of the topologies 4:4:1 and 4:5:1 are similar. The 4:6:1 topology does not converge at all. From these considerations, a topology of 4:4:1 was finally taken as the most reasonable, optimal, topology. The whole study described up to this point was completed with a learning coefficient µ ) 2. Once the topology of the ANN (4:4:1), the sample size (107), and the number of iterations (2000) were optimized, the learning coefficient was too. Five values of µ in the range of 0.3-2 were tried. The model error after 2000 iterations for every value of the learning coefficient is shown in Table 2.

Ind. Eng. Chem. Res., Vol. 41, No. 9, 2002 2267 Table 2. Learning Coefficient Optimization (Model Error after 2000 Iterations) learning coefficient

model error (%)

learning coefficient

model error (%)

0.3 0.7 0.85 1.0

2.82 2.50 2.42 2.40

1.2 1.5 1.7 2.0

2.39 2.38 2.38 2.39

From these results, the learning coefficient of 1.5 leads to the best convergence and lowest model error. Similar results are reached with µ ) 1.7, between 600 and 1000 iterations. The use of µ higher than 1 produces fast convergence. The optimal learning coefficient was finally set at 1.5, which is lower than 1.7, because, as suggested by Haykin,13 the higher coefficient means a higher probability of overfitting. The parameter R affects the slope of the sigmoid curve at the inflection point, and β modifies the location of this point. Depending on the combination of these two parameters, the shape of the sigmoid can be modified in very different ways. When R and β are set at 1, the elementary sigmoid function is obtained. Once the learning coefficient was optimized, the ANN performance was improved by tuning R and β. The optimal values of these parameters should be reached by a trialand-error method. To start this method, some departure values of R and β are required. The most appropriate initial values can be selected by means of a theoretical analysis of the threshold function. Thus, if the right part of eq 1 is multiplied and divided by β

oj ≡ f(x) )

β β + e-Rx

(12)

If β < 0, R ∈ R, x ∈ [0, 1] w β > β + e-Rx, thus f(x) > 1, and consequently the function is invalid. Thereby, the denominator of eq 12 could be zero for those inputs such as

x)-

ln(-β) R

(13)

Negative values of β imply real negative x inputs, and therefore the malfunction of the network simulator is possible. Consequently, β should be positive. Based on the relation between inputs and outputs, the dynamic systems can be classified in a very simple way. When the output increases with the input, the system can be denoted as direct or proportional. In the opposite case, the system is called inverse. The analysis is easier when the system has only one input and one output. If the system complexity increases, the global behavior can be studied by means of a factorial experimental design and the most suitable values for R and β can be deduced. In each case, the direct or inverse system, the range for those parameters is quite different. For a direct problem, if x f 0 w f(x) f 0, then e-Rx/β f ∞. Considering x f 0, this last expression is reduced to 1/β f ∞. Thus, β must be positive and would have a low value. If x f 1 w f(x) f 1, then e-Rx/β f 0. Because β must be greater than zero, R must be positive and should have a high value. For an inverse problem, if x f 0 w f(x) f 1, then e-Rx/β f 0. Considering x f 0, the last expression is reduced to 1/β f 0. Thus, β must be positive and high. If x f 1 w f(x) f 0, then e-Rx/β f ∞. Independent of the

Figure 6. Effects of the sigmoid coefficients, R and β, on the model error (topology of the ANN: 4:4:1, optimized µ ) 1.5, optimized database with 107 data sets, number of iterations 2000).

value of β, R must be negative and should have a high absolute value. To know the type of problem in the particular case under study, a factorial experimental design was made in order to know which variable(s) is (are) supposed to be the most important. A 24 factorial design was made with the following factors: t ) time, TB ) bed temperature at time t, TA ) inlet air temperature at time t, H ) fluidized-bed moisture at time t. The output response (Y) is the ANN prediction for the fluidized-bed moisture at time t + 0.5. From the analysis of the results obtained, it can be established that the influences of t and H agree with the real performance of the FBD, while TB and TA do not. Assuming that only primary influences are meaningful, the potential and relative importance of each factor is as follows: t f 11.2%; TB f 5.3%; TA f 10.0%; H f 73.4%. H is the most important variable and, consequently, the global system behavior is strongly conditioned by it. The time t is significant and so also are the t-H, TB-TA, and TA-H interactions. The qualitative conclusion is that the high influence of H (a proportional variable) confirms the direct nature of the system. Therefore, the parameters R and β should be positive. Once the range of values of R and β is determined, the best values were decided by trial and error. The results are shown in Figure 6. The threedimensional surface shows low values of the model error for values of R greater than 3. When R is less than 3, the model error decreases first with β, and then it passes through a series of low values and finally increases with β. In view of the model error, the best combination of values observed is R ) 3 and β ) 9. Effects of the Order in Which the Learning Data Are Shown to the ANN The last factor studied was the strategy to vary the order in which the learning data sets should be shown to the ANN during training. Several strategies have been studied. The most relevant results are shown in Table 3. The same original database of 107 data sets used in the preceding sections was presented to the ANN following four different strategies. The database was divided into two sections. One section (zone 1 in Table 3) was formed with all of the data sets corresponding to solid moisture lower than 17%. The other

2268

Ind. Eng. Chem. Res., Vol. 41, No. 9, 2002

Table 3. Comparison of Five Strategies of Showing Learning Data strategy

data used

iterations

computing work

SQR

model error (%)

E1 E2 E3 E4 E5

107 sets (total) zone 1 + total zone 2 + total total + zone 1 total + zone 2

2000 1000 + 1000 1000 + 1000 1000 + 1000 1000 + 1000

214 000 199 000 122 000 199 000 122 000

56.4 55.0 2560.8 767.8 1735.5

1.59 1.57 10.74a 5.88 8.84

a

Overfitting occurred.

section consisted of the data corresponding to solid moisture greater than 17% (zone 2 in Table 3). To compare the fitness of each strategy, the SQR (square of the sum of the learning error), the model error, and the computing work were used. This computing work is defined as the product of the sample size by the number of iterations used in the training. The order chosen to present the data sets to the ANN affects the prediction error and computing time. Thus, the ANN trained with strategies E3, E4, and E5 is improved using strategies E1 and E2. Strategy E1 is very simple, and E2 is slightly better in view of the lesser UCC and because it achieves a better approximation in fewer operations. The effects of the order in which the learning data are presented in the ANN learning process can be due to the use of a sequential training mode. During the sequential training of an ANN every time a new set of data is shown to the ANN, a change takes place on the weights of the connections between nodes. If the number of iterations that are required to train the ANN is large, the ANN learns properly the most recent data sets but it can forget the older ones. This fact can also occur if the training is completed first with one type of data set and further with another type.36 An interesting study about this issue has been done by Joya et al.,37 who compared six different ways of showing the learning data. These authors have observed that the subsequent predictions obtained with the ANN were different depending on the order of the presentation of the database that was used in the ANN training. They explain these results as depending on the evolution of the ANN during its training. The sequential mode of training can lead to multiple optima solutions. This could be avoided using a batch training mode. Nevertheless, the sequential mode is used fully because it is simple to implement, requires less local storage, and provides effective solutions to large and difficult problems.13 Final Comparison The comparison of results and parameters in Table 4 summarizes the main characteristics of the two ANNs, one trained by a basic classical method and the other with diverse criteria of optimization. These criteria are based on the better sample size selection, the optimization of the transfer function parameters, and the data presentation strategy. The model error in the optimized ANN is fairly lower than the model error of the ANN trained basically. Moreover, it has been achieved with less information (fewer sample data) and in a shorter computing time (fewer number of iterations). The total number of operations during the learning process is 8-9 times lower when the new criteria of optimization are applied. Furthermore, the learning coefficient is less than the one previously used (1.5 instead of 2), and it

Table 4. Comparison of Learning Methods and Optimization Criteria basic training SS data quality learning coefficient no. of hidden nodes sigmoid parameters strategy (see Table 3) no. of iterations model error (%)

optimized network

292 107 low (nonrepresentative) high (representative) 2.0 1.5 5

4

R ) 1; β ) 1

R ) 3; β ) 9

E1

E2

5000 2.31

2000 1.57

is preferred, as is suggested by Haykin,13 because it would reduce the probability of overfitting, e.g., if the ANN is used as a part of a control system with online training. Conclusions 1. The dynamics of the drying of alperujo in FDB can be modeled by an ANN with four inputs (time, bed temperature, inlet air temperature, and solid moisture) and one output (the moisture of the solid every 30 s). 2. The accuracy of the model is rather satisfactory, and the model error is of the same order as the experimental error ((1%). 3. The optimal hidden layer of the ANN is a layer with four nodes, the best learning coefficient is 1.5, and the optimal sigmoid parameters are R ) 3 and β ) 9. The best data presentation strategy is strategy E2. 4. A method to adjust the size of primary experimental information and to produce different sets of highly meaningful data with a known and limited information loss has been developed. This method is based on fitting of the experimental data to eq 8, interpolation of new data from the fitting curves, and selection of training data by the algorithm of tangents. The proposed method may be useful to study the effects of the sample size on the learning process and to know the relative importance of each datum. Furthermore, it contributes to the reduction of the computing time by allowing the use of fewer learning samples with known errors of interpolation. 5. Two criteria have been proposed to evaluate the quality of information (SDRLI and ASDRLI) and the profit the ANN makes of the learning data (UCC). These criteria are useful to find out the optimum sample size. 6. The model error is a better index than the learning error to evaluate how good the ANN model is. 7. From the results and experience obtained during this study, the ASDRLI and UCC are the best indices to serve as criteria of selection. In the case of controversy between the ASDRLI and UCC (e.g., if comparing two ANNs, one ANN has a better ASDRLI and the other one has a better UCC), it would be preferable to select the ANN with a better system description (i.e., a better ASDRLI). 8. The sigmoid parameters, R and β, serve to improve significantly the performance of the ANN and are useful tools to study the sensitivity of the ANN against the variables involved. Acknowledgment The study has been carried out with financial support from the Commission of the European Communities, Agriculture and Fisheries (FAIR) specific RTD program,

Ind. Eng. Chem. Res., Vol. 41, No. 9, 2002 2269

CT96-1420, “Improvements of Treatments and Validation of the Liquid-Solid Waste from the Two-Phases Olive Oil Extraction (IMPROLIVE)”. It does not necessarily reflect its views and in no way anticipates the Commission’s future policy in this area. Nomenclature a ) parameter in eq 8 b ) parameter in eq 8 c ) parameter in eq 8 dj ) desired output for neuron j E ) model error (%) Ej ) output error for neuron j f ′(oj) ) first derivative of the sigmoid function at neuron j H ) moisture of the solid inside the fluidized bed (kgwater/ kgdry solid) k ) iteration counter N ) number of iterations oj ) output of neuron j sj ) sum of inputs of neuron j SS ) sample size, number of data sets TA ) temperature of the inlet air (°C) TB ) temperature of the fluidized bed (°C) t ) time (min) wij(k) ) weight connecting the neuron i to the neuron j Greek Characters R ) sigmoid function coefficient β ) sigmoid function coefficient δ ) delta, function of back-propagated error and derivative ∆E ) error model variation µ ) learning coefficient

Literature Cited (1) Viswanathan, K. Model for Continuous Drying of Solids in Fluidized/Spouted beds. Can. J. Chem. Eng. 1986, 64 (2), 87. (2) Ulku¨, S.; Uckan, C. Corn Drying in Fluidized Beds. In Proceedings of Drying ’86; Mujumdar, A. S., Ed.; McGraw-HillHemisphere: New York, 1987. (3) Donsi, G.; Ferrari, G. Modeling of Two-Component Fluid Bed Dryers: An Approach to the Evaluation of the Drying Time. In Proceedings of Drying ’92; Mujumdar, A. S., Ed.; Elsevier Sci. Publishers: New York, 1992. (4) Lai, S. F.; Yiming, C.; Fan, L. T. Modeling and Simulation of a Continuous Fluidized Bed Dryer. Chem. Eng. Sci. 1986, 41 (9), 2419. (5) Zahed, A. H.; Zhu, J. X.; Grace, J. R. Modeling and Simulation of Batch and Continuous Fluidized Bed Dryers. Drying Technol. 1995, 13 (1-2), 1. (6) Hoebink, J. H. B. J.; Rietema, K. Drying Granular Solids in Fluidized Bed. I: Description on Basis of Mass and Heat Transfer Coefficients. Chem. Eng. Sci. 1980, 35, 2135. (7) Pala´ncz, B. A. Mathematical Model for Continuous Fluidized Bed Drying. Chem. Eng. Sci. 1983, 38 (7), 1045. (8) Panda, R. C.; Rao, S. R. Fluidized Bed Dryers: Dynamic Modeling and Control. Chem. Eng. Technol. 1991, 14, 307. (9) Panda, R. C.; Rao, S. R. Dynamic Model of a Fluidization Bed Dryer. Drying Technol. 1993, 11, 589. (10) Thibault, J.; Grandjean, B. P. A. Neural Networks in Process Control. A Survey. Proceedings of the IFAC Symposium ADCHEM91, Toulouse, France, 1991; Pergamon Press: New York, 1991; p 295. (11) Narendra, K. S.; Parthasarathy, K. Identification and Control of Dynamical Systems Using Neural Network. IEEE Trans. Neural Networks 1990, 1 (1), 4. (12) Suykens, J. A. K.; Vandewalle, J. P. L.; Demoor, B. L. R. Artificial Neuronal Networks for Modeling and Control of NonLinear Systems; Kluwer: Dordrecht, The Netherlands, 1996. (13) Haykin, S. Neural Networks: A Comprehensive Foundation; Prentice Hall International Editions: Upper Saddle River, NJ, 1999.

(14) Baratti, R.; Vacca, G.; Servida, A. Neural Network Modeling of Distillation Columns. Hydrocarbon Process. 1995, June, 35. (15) Cubillos, F. A.; Lima, E. L. Adaptive Hybrid Neuronal Network Models for Process Control. Comput. Chem. Eng. 1998, 22, S989. (16) Schuhen, M.; Koehne, M. Predictive Control of Wastewater Cleaning Processes in Sewage Treatment Plants by Hybrid Neural Networks. Int. Wiss. Kolloq. Kaffee 1997, 290. (17) Syu, M. J.; Chen, B. C. Back-Propagation Neuronal Network Adaptive Control of a Continuous Wastewater Treatment Process. Ind. Eng. Chem. Res. 1998, 37 (9), 3625. (18) Milanicˇ, S.; Sˇ el, D.; Strme`nik, R. K. Applying Artificial Neural Network Models to Control a Time Variant Chemical Plant. Comput. Chem. Eng. 1997, 21, S637. (19) Breusegen, V.; Thibault, J.; Che´ruy, A. Adaptive Neuronal Models for On-Line Prediction in Fermentation. Can. J. Chem. Eng. 1991, 69, 481. (20) Smith, B. A. A Tool for Process Optimization: Neural Network Software. Recycl.; Tappi Press: Atlanta, GA, 1996; p 163. (21) Boot, R. C.; Roland, W. B. Neural Network-Based Combustion Optimization Reduces NOx Emissions While Improving Performance. Proc. Am. Power Conf. 1998, 60 (2), 667. (22) Bhat, N.; McAvoy, T. J. Use of Neural Networks for Dynamic Modeling and Control of Chemical Process Systems. Comput. Chem. Eng. 1990, 14 (4-5), 573. (23) Nahas, E. P.; Henson, M. A.; Seborg, D. E. Nonlinear Internal Model Control Strategy for Neural Network Models. Comput. Chem. Eng. 1992, 16 (12), 1039. (24) Cheng, Y.; Karjala, T. W.; Himmelblau, D. M. Identification of Nonlinear Dynamic Processes with Unknown and Variable Dead Time Using an Internal Recurrent Neural Network. Ind. Eng. Chem. Res. 1995, 34, 1735. (25) Palancar, M. C.; Arago´n, J. M.; Torrecilla, J. S. pH-Control System Based on Artificial Neural Networks. Ind. Eng. Chem. Res. 1998, 37 (7), 2729. (26) Wang, H.; Oh, Y.; Yoon, E. S. Strategies for Modeling and Control of Nonlinear Chemical Processes Using Neural Network. Comput. Chem. Eng. 1998, 22, S823. (27) Yeo, Y. K.; Kwon, T. I. A Neural PID Controller for the pH Neutralization Process. Ind. Eng. Chem. Res. 1999, 38, 978. (28) Trelea, I. C.; Courtosis, F.; Trystam, G. Dynamic Models for Drying and Wet-Milling Quality Degradation of Corn Using Neural Networks. Drying Technol. 1997, 15 (3-4), 1095. (29) Jinescu, G.; Lavric, V. The Artificial Neural Networks and the Drying Process Modeling. Drying Technol. 1995, 16 (5-7), 1579. (30) Huang, B.; Mujumdar, A. S. Use of Neural Network and the Drying Process Modeling. Drying Technol. 1993, 11 (3), 525. (31) Balasubramanian, R. C.; Panda, V. S.; Rao, R. Modeling of a Fluidized Bed Drier Using Artificial Neural Network. Drying Technol. 1996, 14 (7-8), 1881. (32) Tosi, E.; Ciappini, M.; Tapiz, L.; Masciarelli, R. Cine´tica del Secado de Maı´z (Zea mays). Ecuacio´n Empı´rica para el Secado en Lecho Fluidizado. Rev. Agroquı´m. Tecnol. Aliment. 1987, 27 (1), 60. (33) Tosi, E.; Tamburelli, M.; Prado, R. Nota. Cine´tica del Secado de Soja (Glicine max). Ecuacio´n Empı´rica para el Secado en Lecho Fluidizado. Rev. Agroquı´m. Tecnol. Aliment. 1990, 30 (1), 130. (34) Ramasamy, S.; Deshpande, P. B.; Paxton, G. E.; Hajare, R. P. Consider Neural Networks for Process Identification. Hydrocarbon Process. 1995, June, 59. (35) Rumelhart, D. E.; Hinton, G. E.; Williams, R. J. Learning Representations by Back-propagation Errors. Nature 1986, 323, 533. (36) Freeman, J. A.; Skapura, D. M. Neural Network. Algorithms, Applications and Programming Techniques; AddisonWesley: Reading, MA, 1991. (37) Joya, G.; Frı´as, J. J.; Marı´n, M. M.; Sandoval, F. Evolucio´n a Nivel Microsco´pico de una Red Neuronal Artificial: Nuevas Estrategias de Aprendizaje. Inf. Autom. 1994, 27 (2), 46.

Received for review November 6, 2000 Revised manuscript received September 26, 2001 Accepted January 29, 2002 IE000950T