Integrated Genetic Algorithm−Artificial Neural ... - ACS Publications

A methodology is devised in this work to cope with this computational complexity by illustrating the potential of genetic algorithms (GAs) to efficien...
0 downloads 0 Views 127KB Size
Ind. Eng. Chem. Res. 2002, 41, 2543-2551

2543

Integrated Genetic Algorithm-Artificial Neural Network Strategy for Modeling Important Multiphase-Flow Characteristics Laurentiu A. Tarca, Bernard P. A. Grandjean,* and Faı1c¸ al Larachi Department of Chemical Engineering & CERPIC, Laval University, Sainte-Foy, Que´ bec, Canada G1K 7P4

Numerous investigations have shown that artificial neural networks (ANNs) can be successful for correlating experimental data sets of macroscopic multiphase-flow characteristics, e.g., holdup, pressure drop, and interfacial mass transfer. The approach proved its worth especially when rigorous fluid mechanics treatment based on the solution of first-principle equations is not tractable. One perennial obstacle facing correlations is the choice of a low-dimensionality input vector containing the most expressive dimensionless independent variables allowing the best correlation of the dependent output variable. Because no clue is known in advance, one has recourse to a laborious, often inefficient, and nonsystematic trial-and-error procedure to identify from a broad reservoir of possible candidates, the most relevant combination of ANN input dimensionless variables. The combinatorial nature of the problem renders the determination of the best combination, especially for multiphase flows, computationally difficult because of the large scale of the search space of combinations. A methodology is devised in this work to cope with this computational complexity by illustrating the potential of genetic algorithms (GAs) to efficiently identify the elite ANN input combination required for the prediction of desired characteristics. The multiobjective function to be minimized is a composite criterion that includes ANN prediction errors on both learning and generalization data sets, as well as a penalty function that embeds phenomenological rules accounting for ANN model likelihood and adherence to behavior dictated by the process physics. The proof of concept of the integrated GA-ANN methodology was illustrated using a comprehensive database of experimental total liquid holdup for countercurrent gas-liquid flows in randomly packed towers for extracting the best liquid hold-up correlation. 1. Introduction In recent years, soft computing using artificial neural networks (ANNs) has proved to be a pragmatic alternative to correlation of multiphase-flow characteristics. ANN computing is indeed a powerful black-box approach used to map complex nonlinear system behaviors such as those exhibited by multiphase-flow macroscopic transport characteristics. ANN correlations are useful shortcuts especially when a rigorous theoretical treatment from first principles of the multiphase-flow problem is lacking or time-consuming. Successful examples where ANN computing was used for correlating heat, mass, and momentum transport in multiphase flow abound in the literature. To name just a few, ANN correlations were derived for the pressure gradient in distillation column1,2 and textile fabrics3 applications, for flooding inception and interfacial mass transfer in countercurrent random-packing towers,4,5 for masstransfer applications in stirred tanks,6 trickle beds,7 and fast fluidized beds,8 for the displacement of water during infiltration in porous media of nonaqueous phase liquids,9 for holdups and wake parameters in gas-liquidsolid fluidization,10 for the prediction of the bubble diameter in bubble columns,11 or for improvment of the simulation of multiphase-flow behavior in pipelines.12 ANNs are useful modeling tools particularly if there exist pertinent wide-ranging data sets, or databases, of * Corresponding author. Tel.: 418-656-2859. Fax: 418-6565993. E-mail: [email protected].

some input variables along with the respective values of the desired output characteristics. The traditional approach used until now to identify the most relevant ANN model inputs, generally dimensionless Buckingham Π groups, is a laborious trial-and-error procedure. It consists of choosing an arbitrary combination of inputs used for the training on a learning data set of several ANN models differing by the number of nodes in their hidden layer. The resulting models are further tested on a validation data set to evaluate their generalization performances. The ANN model to be retained among all of the simulated ones is the one that yields the smallest relative error on both training and generalization data sets. Hence, the topology of the model is thoroughly tested for phenomenological consistency within the valid range of the working database to check whether it restores the trends expected from the known process physics. Any misbehavior disqualifies the choice, a new combination of inputs is chosen, and the search is resumed afresh. Until now, this time-consuming approach was not automated because the human expertise regarding the phenomenological consistency was somehow difficult to formulate mathematically into an optimization criterion. Beyond that, nothing ensures that this blind-search approach would successfully identify the most relevant set of dimensionless inputs. This is especially acute in multiphase-flow contexts where the number of dimensionless groups abound and the combinatorial problem is explosive. Finding the best ANN model would become a matter of chance.

10.1021/ie010478r CCC: $22.00 © 2002 American Chemical Society Published on Web 04/18/2002

2544

Ind. Eng. Chem. Res., Vol. 41, No. 10, 2002

Table 1. Typical Structure of the Database for Applying the GA-ANN Methodology candidate inputs (independent variables)

output (dependent variable)

CI1

CI2

...

CIM

O

CI1,1 CI2,1 ...

CI1,2 CI2,2 ...

... ... ...

CI1,M CI2,M ...

O1 O2 ...

CIN,1

CIN,2

...

CIN,M

ON

Assuming that the agreement between an ANN model and the expected physical evidence can be assessed automatically using an expert-system-like package of rules, the trial-and-error method would become suitable for a computer algorithm. However, the main problem of how to find the fittest input combinations would remain when the evaluation of all of the combinations is CPU time-consuming and impossible to perform within reasonable limits of time. Genetic algorithms (GAs) have been successfully applied to such explosive combinatorial problems where high-quality solutions within reduced search times are needed. Based on the mechanisms of natural selection and natural genetics, GAs are capable of highly exploiting the information from already evaluated input combinations, i.e., parent specimens, while ensuring good exploration of the search space.13 GAs and ANNs have been combined in several different ways. Mostly, GAs have been used to generate (i) the ANN connectivity weights,14 (ii) the ANN architecture,15 and (iii) both ANN architecture and weights simultaneously.16 The present contribution is intended to provide an integrated GA-ANN methodology to facilitate and accelerate the development of an ANN regression model (three-layer perceptron type) on a given database. The GA implemented in this study is designed to identify the most relevant ANN input combinations resulting in a neural model allowing the minimization of a multiobjective criterion that includes ANN prediction errors on the learning and generalization data sets and, most importantly, a penalty function that embeds the phenomenological rules accounting for ANN model likelihood. The integrated GA-ANN methodology is validated on a comprehensive liquid hold-up database of countercurrent randomly dumped packed towers with the aim of finding the best liquid hold-up ANN correlation. The aforementioned multiphase-flow problem is akin to problems in areas where database mining using GAs and soft computing using neural networks have been applied.17 2. Problem Statement Having at disposal, as in Table 1, a sufficiently large database (N occurrences) in the form of dimensionless Buckingham Π groups, where M candidate Π groups, CI1, CI2, ..., CIM (N > M . 1), embed redundantly the physical and operational parameters stemming from a process, find an ANN model that uses, as inputs, only a subset of m pertinent Π groups among the M ones to predict an output O, a key process characteristic. The three-layer ANN model to identify, in the case of a single output, is described by the following set of equations18 (using normalized data and sigmoidal activation functions):

1

O)

(1)

J+1

1 + exp(-

wjHj) ∑ j)1

with

Hj )

1 m+1

1 + exp(-

for 1 e j e J

(2)

wijIi) ∑ i)1

where m is the number of inputs I, for 1 e i e m, Ii ) CIS(i) ∈ {CI1, CI2, ..., CIM} with S an m-input selector, J is the number of hidden neurons in the hidden layer, Im+1 ) HJ+1 ) 1 are the biases, and wj and wij are the connectivity weights, and it is expected to fulfill the following requirements: (i) Accuracy: The model must be very accurate in prediction, preferably to the level of experimental error with which the output is measured. (ii) Phenomenological Consistency: The model to be built must preserve, at least within the database documented domain, the expected behavior of the output in accordance with all known aspects of the process physics. (iii) Low Complexity: The ANN model must preferably involve a minimal number of inputs (m Π groups) and hidden neurons (J), resulting correspondingly in a minimal number of connectivity weights (the [mJ + 2J + 1] neural fitting parameters). 3. Problem Solving Methodology A usual approach in solving multiobjective problems consists of optimizing a primary response function while turning into constraints the other functions.19 The GA practice, on the contrary, consists of optimizing a composite objective function which sanctions violations of the restrictions by means of the penalty method.13 In our problem, we preferred to mix both approaches. Table 2 reports, in a hierarchical order, the parameters to be identified and how their searches have been managed and integrated. Because the number of inputs, m, and the number of hidden nodes, J, must desirably be low (in order to minimize model complexity), they have been varied by discrete sweeps in selected ranges. From our past experience in neural modeling on data of interest, m has been varied in the range 4-6 and j has been varied in the range 2m - 1 to 2m + 3.20 The determination of the input selector S consists of the identification of m pertinent inputs, among M ones. It is a combinatorial and tedious task because the search space of combinations is large and the solution S must meet both phenomenological consistency and accuracy of the resulting ANN model. There exists few search techniquess enumerative technique, random walk, simulated annealing, and GAsthat allow finding of solutions over discrete domains using only the value of the function at different points of the domain. Because of its robustness13 and natural appeal, the GA technique is employed in this work for searching the best input selector S. The connectivity weights wij and wj are adjusted by minimizing the sum of squares of the prediction errors on part of the data, referred to as the training data set,

Ind. Eng. Chem. Res., Vol. 41, No. 10, 2002 2545

Figure 1. Logical flow diagram for the GA-ANN methodology.

using the Broyden-Fletcher-Goldfarb-Shanno variable metric method.21 The remaining part of the data is used to evaluate the generalization capability of the ANN model. These steps have been processed by means of a slave software, NNFit,18 and will not be detailed in this work because they have been discussed previously.4,5,7,10 The integrated GA-ANN procedure used to handle the problem is presented in Figure 1, and it will be detailed in the following sections. 3.1. GA Encoding Solutions. The GA approach requires a string representation of the m-input selector, S. In our context, S is a selection of indices representing some of the candidate input variables of the database sketched in Table 1. The encoding modality chosen was M-sized bit strings, allowing only m “one bit” values per string (Figure 2). In this binary representation of solutions, M corresponds to the total number of candidate input variables (or input columns) in the database (Table 1). The “1” at a given rank of the string stands

for an input being selected that occupies the same rank in the database. Conversely, the “0” stands for an input variable being discarded. 3.2. Multiobjective Criterion and Fitness Function. To identify the best input selector S and its related ANN model fulfilling the two requirements i and ii of

Figure 2. Bit string representation of m-input selector S.

2546

Ind. Eng. Chem. Res., Vol. 41, No. 10, 2002

Figure 3. Stepwise construction of the composite criterion, Q(S). Table 2. Parameter Identification Strategya

a

parameter to identify

search method

objective function

m S J wj, wij

trial and error in the range mmin to mmax GA, using binary bit strings trial and error in the range Jmin to Jmax BFGS variable metric method

expert decision multiobjective fitness (eq 4) multiobjective criterion (eq 3) least squares on prediction errors

m ) number of inputs. S ) input selection operator. J ) number of hidden nodes. wj, wij ) connectivity weights.

section 2, the following composite criterion, Q, was formulated:

Q(S) ) min {Qj(S), Jmin e J e Jmax}

(3a)

with

QJ(S) ) R AARE[ANNJ(S)]T + β AARE[ANNJ(S)]G + γ PPC[ANNJ(S)] (3b) In eq 3, R, β, and γ are conveniently selected weighting multipliers. AARE[ANNJ(S)]T is the average absolute relative error the ANN (having J hidden nodes) achieves on the training data set for a given input combination (or specimen) S. Equivalently, AARE[ANNJ(S)]G measures how accurate the ANN model on the generalization data set that was left aside when optimizing the neural connectivity weights using the training set is. Inclusion in the composite criterion of a penalty for phenomenological consistency, PPC[ANNJ(S)], ideally guarantees that the model is not prone to violate the expected behavior of the simulated output. By expected behavior, it is meant an ensemble of prescribed behavioral rules known to govern the phenomenon of interest,

and which are embedded, as will be shown in section 4.2, in the term PPC[ANNJ(S)]. Ideally, the penalty term is zero if the topological features of the ANN function meet all of the rules. The role of the multipliers R, β, and γ enables more versatility in targeting models that fit better the training data set or models that generalize better while satisfying, through the PPC term, the phenomenological consistency at various degrees. The stepwise construction of the criterion Q(S) is shown in Figure 3. In the GA practice, fitness maximization is preferred to the classical minimization problem. Hence, the better the solution, the greater its fitness value. Because every minimization problem can be turned into a maximization problem, the composite criterion Q(S) can easily be switched into a fitness function using the simple linear transformation:22

Fitness(S) ) CQmax - Q(S)

(4)

where C is a conversion coefficient greater than 1 to ensure positive fitness function values and Qmax is the maximum value of Q among the population having MAXPOP specimens, S. 3.3. Building the First Generation. Starting with a null M-sized string (all bits are zero), each m-input S

Ind. Eng. Chem. Res., Vol. 41, No. 10, 2002 2547

Figure 5. Unsuccessful trial to produce a 4 one-bit valued specimen in the first step of the modified two-point crossover.

Figure 4. Successful trial to produce 4 one-bit valued specimens in the modified two-step two-point crossover.

specimen of the first generation is built by turning randomly and equiprobably m zeros among the M’s into ones. The operation is repeated MAXPOP times. A uniform random number generator based on the Knuth subtractive method21 was used throughout this work. 3.4. Making the Next Generations. Once the initial population becomes available, it is allowed to evolve in order to identify better specimens that maximize the fitness function equation (4). The evolution process rests on the so-called reproduction, recombination (crossover), and mutation operators pioneered in the area of artificial systems by Holland.23 (a) Reproduction Operator. The purpose of this operator is to ensure that the fittest specimens perpetuate through off-springs and/or have greater chances to be found in the next generation. Numerous schemes are known which introduce various levels of determinism into the selection process. Among them, three have been tested in this work: the roulette wheel selection with elitism, the stochastic remainder selection without replacement, and the stochastic remainder selection without replacement with elitism.13,24 This third method was finally the one retained in our GA. (b) Modified Recombination Operator. No matter how perfect it is, reproduction does not create new better specimens, whereas recombination and mutation do. The recombination operator combines useful features from two different specimens, yielding an offspring. For instance, classical two-point crossover recombinations taking a random start point and length for the selected substringswould produce from two parent specimens two new children by simply interchanging a selected region in specimen A with that corresponding in specimen B. Though this recombination proves efficient for unconstrained GAs,24,25 it is unsuitable in our context because the compulsory m one-bit values in the specimens are not automatically preserved during parentoffspring transition. The crossover operator was modified to ensure conservative passage with fixed m onebit values in the offspring. This was done by splitting the crossover operation into two distinct steps. In the first step (Figure 4), a substring in specimen A (with random length and start point) is selected and transferred in specimen B at the same location. The resulting child, specimen C, is retained provided it possesses m one-bit values as his parents (Figure 4) and it is distinct from them. If not (Figure 5), the first step

Figure 6. m conservative modified mutation: case of “0 f 1” mutation followed by repair mutation.

is repeated until the condition is satisfied or the number of trials exceeds a given value. The second step is then resumed to create the second offspring, specimen D (Figure 4). This step is identical to the first one, with the exception that now a substring in specimen B is transferred into A. A synthetic description of the recombination operator acting on the whole population of specimens issued from the reproduction adheres to the following steps: (i) Randomly split into two equal sets the population obtained at the end of the reproduction step. (ii) Take all pairs of specimens having the same order number in each part and simulate a coin toss weighted with the crossover probability pc. (iii) If the coin shows “true”, apply a modified crossover as described earlier. (c) Modified Mutation Operator. Mutation prevents permanent loss of useful information and maintains diversity within the population. A specimen is to be altered by mutation with a low probability pm. Classical mutation consists of changing the value of one single bit at a randomly chosen position in the string. As in the case of crossover, classical mutation is not m one-bit conservative. Nevertheless, to allow new features to be introduced in the specimens, a two-step mutation, namely, mutation and repair mutation, was defined to maintain the m one-bit structure of the strings. The repair mutation merely does the inverse of what mutation did by acting on another randomly chosen opposite bit value in the same specimen to restore the constant amount of ones in the string. For example, if mutation is 0 f 1, then antimutation is 1 f 0 on a different randomly chosen one-bit value in the specimen as shown in Figure 6. Following is a summary of operations when applying modified mutation: (i) A coin toss weighted with a mutation probability pm for each bit is simulated for all of the MAXPOP specimens of the population. (ii) If the coin shows “true”, mutation and antimutation are applied, after that skipping to the next specimen, authorizing just one operation per specimen. The mutation probability must be kept very low, because too many mutations may erase useful parts of the combinations, rendering thus the search directionless.

2548

Ind. Eng. Chem. Res., Vol. 41, No. 10, 2002

General Remarks on the Constrained GA. The parameters needed to run a GA are the population size, MAXPOP, the crossover, and the mutation probabilities, pc and pm. The choice of these parameters is important for the GA global efficiency. The parameter set MAXPOP ) 50, pc ) 0.6, and pm ) 0.003 was used in this work; it was inspired from the general recommendations of De Jong24 and was adapted for the peculiarity of our constrained GA by trial-and-error procedures. Regarding the issue of constraining the specimens to m nonnull bit strings, one could have argued an alternative route in which extra penalty terms in the fitness function would prevent larger m-input combinations to dominate the population. However, from an efficiency standpoint, unconstrained GAs would have been less appropriate. One obvious reason is that mmax m CM solutions (mmin > 1 and searching among ∑m min mmax < M) is far more efficient than searching among M Cm the whole ∑m)1 M combinations. Moreover, inclusion of penalty terms in the objective function is worthy only if, within the interrogation space, the feasible regions are larger than the unfeasible ones.26 The M mmax m fact that in our case ∑m)1 Cm M . ∑mmin CM means that unconstrained GAs would have spent most of the time evaluating unfeasible solutions. Combinations having too small or too large a size are unfeasible solutions also. Too low m values do not yield accurate ANN models, whereas if m is too large, the resulting ANN models are cumbersome and thus not acceptable. The tradeoff between what is low and what is high m value is problem-dependent, and m has to be specifically tailored as explained in Figure 1 and Table 2.

Details as to how these groups are defined and which force ratios they include were largely described elsewhere27 and will be skipped here for the sake of brevity. The database thus constructed has N ) 1438 rows and M ) 27 columns of candidate inputs. The best m-input selector, S, to be identified must contain a minimum number of elements, m. It has to be found among all possible combinations of M ) 27 input columns. To figure out how computationally laborious this task can be, the combinatorial size for m ) 5, ca. 81 000 combinations, would require 84 CPU days on a dual 800 Hz processor, to identify the optimal ANN model using an enumerative technique. 4.2. Evaluation of the PPC Term and Choice of r, β, and γ Multipliers. To run the GA-ANN procedure, the penalty for phenomenological consistency appearing in the composite criterion equation (3) needs to be formulated. As mentioned earlier in section 3.2, this term must embed some prescribed behavioral rules which force the ANN model to behave in a proper manner. Such prescribed rules are inferred after tedious expert-system analyses that combine (i) thorough inspection of the trends exhibited by the liquid holdup in the database, (ii) consensual observations from the literature, and (iii) any qualitative and quantitative information revealed from first-principle-based phenomenological models in the field such as, for example, the Billet and Schultes liquid hold-up models in the preloading and loading regions.28,29 As a result, the following six rules can be stated in the form of inequalities (for the symbols, see the Glossary section):

4. Results and Discussion The proof of concept of the integrated GA-ANN methodology will be illustrated using a comprehensive database concerning the total liquid holdup for countercurrent gas-liquid flows in randomly packed towers. Recall that the goal behind this approach is to identify the liquid hold-up ANN model that best satisfies the three requirements summarized in section 2. The data mining role of the GA consists of interrogating a broad reservoir of M input vectors to enable the extraction of an elite of m inputs best mapping, through ANN, the (hold-up) output. 4.1. Brief Overview of the Liquid Hold-Up Database. A large liquid hold-up database (1483 experimental points) set up in a recent study27 was reorganized by converting all of the physical properties and operating parameters relevant to the modeling of liquid holdup, into M ) 27 dimensionless Buckingham Π groups (or candidate inputs) according to the Table 1 format. These groups, listed below, cover all possible force ratios or external effects the liquid hod-up might experience in randomly packed beds. Liquid phase: Reynolds (ReL), Blake (BlL), Froude (FrL), Weber (WeL), Morton (MoL), Eotvos (EoL), modified Eotvos (EoL′), Galileo (GaL), modified Galileo (GaL′), Stokes (StL), modified Stokes (StL′), Capillary (CaL), and Ohnesorge (OhL). Gas phase: ReG, BlG, FrG, GaG, GaG′, StG, and StG′. Solid phase: wall factors K1, K2, and K3 and bed parameters B and SB. Two-phase: Lockhart-Martinelli number (χ) and Saberian number (Sa).

∂L >0 ∂FG

(5)

∂L 0 ∂σL

(7)

∂L >0 ∂uL

(8)

∂L >0 ∂µL

(9)

∂L >0 ∂uG

(10)

Each gradient was evaluated at two points chosen in the vicinity of the interval edges in the database for each physical variable appearing in eqs 5-10 (see ref 27 for database limits). For the six physical variables FG, FL, σL, uL, µL, and uG, the gradient conditions are considered fulfilled if they prove true simultaneously at the two points near the edges of the corresponding valid intervals. To better distinguish between models, the gradient conditions were equivalently recast into 10 rules:

|

|

|

∂L ∂L ∂L >0& >0& >0 ∂FG 1 ∂FG 2 ∂uG 1

(11)

Ind. Eng. Chem. Res., Vol. 41, No. 10, 2002 2549

| | | | | | | | |

| | | | | | | | |

| | | | | | | | |

∂L ∂L ∂L >0& >0& >0 ∂FG 1 ∂FG 2 ∂uG 2

(12)

∂L ∂L ∂L 0& >0 ∂σL 1 ∂σL 2 ∂uG 1

(15)

∂L ∂L ∂L >0& >0& >0 ∂σL 1 ∂σL 2 ∂uG 2

(16)

∂L ∂L ∂L >0& >0& >0 ∂uL 1 ∂uL 2 ∂uG 1

(17)

∂L ∂L ∂L >0& >0& >0 ∂uL 1 ∂uL 2 ∂uG 2

(18)

∂L ∂L ∂L >0& >0& >0 ∂µL 1 ∂µL 2 ∂uG 1

(19)

∂L ∂L ∂L >0& >0& >0 ∂µL 1 ∂µL 2 ∂uG 2

(20)

Each index indicates the points where the gradient was evaluated: 1 at the beginning and 2 at the end of the valid range of each physical variable, while “&” stands for the logical AND. A scale from 0 to 10, measuring the extent of disagreement with physical evidence, is then established to quantify how many rules are violated by a given ANN model. The PPC term is expressed simply as the number of rules transgressed by an ANN having J hidden nodes and using input selector S. If no rules are transgressed by the ANN model, PPC[ANNJ(S)] ) 0 and the model has no penalty. Conversely, if the model violates all of the rules, the penalty is maximum, i.e., equals 10. The multipliers R and β, i.e., the weighting coefficients of the training and generalization AAREs (eq 2) were assigned the values 0.8 and 0.2, respectively. These values corresponded to the splitting proportions of the initial database into training and generalization sets. Several values for the penalty coefficient γ were tested, and a value of 0.05 was then retained. 4.3. GA Optimization through Generations. As an example of the evolution of the performance of a population through successive generations, Figure 7 reports both the evolutions of the average criterion of the population and of the minimum criterion occurring for the best specimen. The average criterion measures how well the population, as an ensemble, is doing, as well as how fast it is converging to the optimal solution. The minimum criterion indicates how well the GA has performed in finding a minimum-cost solution.30 In Figure 7, the sharp decrease occurring at the 22nd generation is related to the first creation of a fully phenomenologically sought ANN model (i.e., PPC[ANNJ(S)] ) 0). 4.4. Identification and Description of the Solution Model. The exposed methodology implies a sys-

Figure 7. Best and population averaged criterion Q(S) in a typical GA run searching ANN models to predict liquid holdup for m ) 5, Jmin ) 9, and Jmax ) 13. Computational system specification: dual CPU speed, 1000 MHz; operating system, Linux; computation time, 8 h.

Figure 8. Evolution of the best criterion for various numbers of ANN inputs, m.

tematic search with GA of ANN models for several values of m, i.e., number of nodes in the ANN input layer, with the objective of choosing a model with the least complexity, the full phenomenological consistency, and the best accuracy. A search was conducted by launching GA runs for m ) 4-6. The evolution through generations of the best criterion is illustrated in Figure 8. It is observed that the first occurrence of having a full phenomenological consistency model arises, for m ) 4 and 6, after 3 and 6 generations, respectively. After 22 generations, the penalty term, PPC, becomes zero for the three cases. Then the criterion reduces to the averaged sum of ARRE on both training and generalization data sets, and it is observed that the ANN models present lower criterion values with increasing m. Considering that there is no significant improvement in the prediction performance between m ) 5 and 6, we suggest to retain the model having less complexity, i.e., that for m ) 5. The best ANN model found with m ) 5 involves J ) 12 hidden neurons and expressed as a function L ) ANN(BlG,WeL,St′L,K2,K3). It involves

2550

Ind. Eng. Chem. Res., Vol. 41, No. 10, 2002

Glossary

Figure 9. Parity chart of the ANN model L ) ANN(BlG,WeL, StL′,K2,K3) for the learning (b) and the generalization (O) data sets. Dotted lines represent (30% envelopes.

significant dimensionless numbers describing the liquid, gas, and solid phases, thus making the model applicable for predicting liquid holdup for different types of beds and fluids. The model AARE is 12.8% on the whole database; the standard deviation is 11.7%. The model requires only 85 connectivity weights and, most importantly, fulfills all 10 imposed rules given by eqs 16-24. The parity chart of the ANN model, shown in Figure 9, reveals a good agreement between experimental and predicted liquid hold-up data with also almost uniform data scatter distribution around the parity line. The performance of the model identified using the integrated GA-ANN procedure is quite similar to, but slightly better than, the one reported by Piche´ et al.27 5. Conclusion A GA-based methodology was proposed to determine the best three-layer feed-forward ANN correlation for mapping a desired output variable when a large number of candidate independent input variables are susceptible to influencing the output. The integrated GA-ANN methodology operates with a multiobjective function to be minimized that includes ANN prediction errors on both learning and generalization data sets, as well as a penalty function that embeds phenomenological rules accounting for ANN model likelihood and adherence to behavior dictated by the process physics. This methodology was tested and found to be successful, using a comprehensive database of experimental total liquid holdup for countercurrent gas-liquid flows in randomly packed towers for extracting the best liquid hold-up correlation. In this example, the gain is tremendous because the GA-ANN strategy converged to an accurate and consistent ANN model within several hours rather than a few weeks if GAs were not used. Because of the methodology design and the GAs’ population-based approach, the search process is inherently parallel. It is expected that implementation of the suggested integrated GA-ANN methodology on parallel processing computers could render the development of ANN models to be extremely fast, even using very large databases.

aT ) bed specific surface area (m2/m3) AARE ) absolute average relative error ANNj(S) ) ANN having j hidden nodes that uses the m-input selector S Bl ) Blake number ) FGuG/aT(1 - )µG C ) scaling coefficient in the linear conversion of criterion into a fitness function dPV ) sphere diameter equivalent with the particle volume (m) DC ) column diameter Hj ) activation function of the j neuron in hidden layer Ii ) input variable representing a column in the database J ) number of nodes in the hidden layer Jmax ) maximum number of nodes in the hidden layer K2 ) wall factor K2 ) 2(dPV/DC) K3 ) wall factor K3 ) (dPV/DC)(Z/DC) m ) number of ANN inputs selected by S M ) number of input columns in the database MAXPOP ) size of the population N ) number of records in the database O ) output variable of interest for a particular problem PPC ) number of phenomenological rules an ANN model violates Q(S) ) value of criterion for S S ) m-input selector StL′ ) modified Stokes number ) µLuLaT2/FLg u ) phase velocity (m/s) wij, wj ) ANN connectivity weights We ) Weber number ) uL2FLdP/2σL Z ) bed height Greek Letters L ) total liquid hold-up  ) bed porosity µ ) phase viscosity (kg/m‚s) F ) phase density (kg/m3) σ ) phase surface tension (N/m) Abbreviations ANN ) artificial neural network GA ) genetic algorithm Subscripts exp ) experimental G ) gas, generalization G+T ) generalization and training L ) liquid max ) maximum min ) minimum pred ) predicted T ) training

Literature Cited (1) Whaley, A. K.; Bode, C. A.; Ghosh, J. G.; Eldridge, R. B. HETP and Pressure Drop Prediction for Structured Packing Distillation Columns Using a Neural Network. Ind. Eng. Chem. Res. 1999, 38, 1736-1739. (2) Pollock, G. S.; Eldridge, R. B. Neural Network Modeling of Structured Packing Height Equivalent to a Theoretical Plate. Ind. Eng. Chem. Res. 2000, 39, 1520-1525. (3) Brasquet, C.; Lecloirec, P. Pressure Drop Through Textile FabricssExperimental Data Modelling Using Classical Models and Neural Networks. Chem. Eng. Sci. 2000, 55, 2767-2778. (4) Piche´, S.; Larachi, F.; Grandjean, B. P. A. Flooding Capacity in Packed Towers: Database, Correlations and Analysis. Ind. Eng. Chem. Res. 2001, 40, 476-487. (5) Piche´, S.; Grandjean, B. P. A.; Iliuta, I.; Larachi, F. Interfacial Mass Transfer in Randomly Packed Towers: A Con-

Ind. Eng. Chem. Res., Vol. 41, No. 10, 2002 2551 fident Correlation for Environmental Applications. Environ. Sci. Technol. 2001, 35, 4817-4822. (6) Yang, H.; Fang, B. S.; Reuss, M. kLa Correlation Established on the Basis of a Neural Network Model. Can. J. Chem. Eng. 1999, 77, 838-843. (7) Iliuta, I.; Larachi, F.; Grandjean, B. P. A.; Wild, G. Gasliquid Interfacial Mass Transfer in Trickle-bed Reactors: Stateof-art Correlations. Chem. Eng. Sci. 1999, 54, 5633-5645. (8) Zamankhan, P.; Malinen, P.; Lepomaki, H. Application of Neural Networks to Mass Transfer Predictions in a Fast Fluidized Bed of Fine Solids. AIChE J. 1997, 43, 1684-1690. (9) Morshed, J.; Powers, S. E. Regression and Dimensional Analysis for Modeling Two-Phase Flow. Transp. Porous Media 2000, 38, 205-221. (10) Larachi, F.; Belfares, L.; Iliuta, I.; Grandjean, B. P. A. Three-phase Fluidization Macroscopic Hydrodynamics Revisited. Ind. Eng. Chem. Res. 2001, 40, 993-1008. (11) Jamialahmadi, M.; Zehtaban, M. R.; Muller-Steinhagen, H.; Sarrafi, A.; Smith, J. M. Study of Bubble Formation under Constant Flow Conditions. Trans. Inst. Chem. Eng. 2001, 79, 523532. (12) Rey-Fabret, I.; Sankar, R.; Duret, E.; Heintze, E.; Henriot, V. Neural Network Tools for Improving Tacite Hydrodynamic Simulation of Multiphase Flow Behavior in Pipelines. Oil Gas Sci. Technol. 2001, 56, 471-478. (13) Goldberg, D. E. Genetic Algorithms in Search, Optimization, and Machine Learning; Addison-Wesley: Reading, MA, 1989. (14) Morshed, J.; Kaluarachchi, J. Application of Neural Network and Genetic Algorithm in Flow and Transport Simulations. Adv. Water Resour. 1998, 22, 145-158. (15) Blanco, A.; Delgado, M.; Pegalajar, M. C. A Genetic Algorithm to obtain the Optimal Recurrent Neural Network. Int. J. Approximate Reasoning 2000, 23, 67-83. (16) Gao, F.; Li, M.; Wang, F.; Wang, B.; Yue, P. L. Genetic Algorithms and Evolutionary Programming Hybrid Strategy for Structure and Weight Learning for Multi-layer Feed-forward Neural Networks. Ind. Eng. Chem. Res. 1999, 38, 4330-4336. (17) Cundari, T. R.; Russo, M. Database Mining Using Soft Computing Techniques. An Integrated Neural Network-Fuzzy Logical-Genetic Algorithm Approach. J. Chem. Inf. Comput. Sci. 2000, 41 (2), 281-287. (18) Cloutier, P.; Tibirna, C.; Grandjean, B. P. A. Thibault, J. NNFit, logiciel de re´gresion utilisant les re´seaux a` couches. http:// www.gch.ulaval.ca/∼nnfit, 1997. (19) Viennet, R.; Fonteix, C.; Marc, I. New Multicriteria Optimization Method Based on the Use of a Diploid Genetic

Algorithm: Example of an Industrial Problem. Lecture Notes in Computer Science Artificial Evolution; Alliot, J.-M., Lutton, E., Ronald, E., Schoenauer, M., Snyers, D., Eds.; Springer: Berlin, 1995; Vol. 1063, pp 120-127. (20) Maren, A. J.; Harston, C. T.; Pap, R. M. Neural Computing Applications; Academic Press: San Diego, 1990; p 240. (21) Press, W. H.; Flannery, B. P.; Teukolsky, S. A.; Vetterling, W. T. Numerical Recipes: The Art of Scientific Computing, Cambridge University Press: Cambridge, U.K., 1989. (22) Friese, T.; Ulbig, P.; Schultz, S. Use of Evolutionary Algorithms for the Calculation of Group Contribution Parameters in order to Predict Thermodynamic Properties. Comput. Chem. Eng. 1998, 22, 1559-1572. (23) Holland, J. H. Adaptation in Natural and Artificial Systems; University of Michigan Press: Ann Arbor, MI, 1975. Cited in ref 13. (24) De Jong, K. A. An Analysis of the Behavior of a Class of Genetic Adaptive Systems. Ph.D. Thesis, University of Michigan, Ann Arbor, MI, 1976. Cited in ref 13. (25) Frantz, D. R. Nonlinearities in Genetic Adaptive Search. Doctoral Dissertation, University of Michigan, Ann Arbor, MI, 1972; Diss. Abstr. Int. B 1972, 33 (11), 5240-5241 (University MicroFilms No. 73-11, 116). Cited in ref 13. (26) Lohl, T.; Schultz, C.; Engell, S. Sequencing of Batch Operations for Highly Coupled Production Process: Genetic Algorithms Versus Mathematical Programming. Comput. Chem. Eng. 1998, 22, S579-S585. (27) Piche´, S.; Larachi, F.; Grandjean, B. P. A. Improved Liquid Hold-up Correlation for Randomly Packed Towers. Trans. Inst. Chem. Eng. 2001, 79, 71-79. (28) Billet, R.; Schultes, M. A Physical Model for the Prediction of Liquid Hold-up in Two-phase Countercurrent Columns. Chem. Eng. Technol. 1993, 16, 370-375. (29) Billet, R.; Schultes, M. Prediction of Mass Transfer in Columns with Dumped and Arranged Packings. Trans. Inst. Chem. Eng. 1999, 77, 498-504. (30) Carroll, D. L. Chemical Laser Modeling with Genetic Algorithms. AIAA J. 1996, 34, 338-346.

Received for review May 30, 2001 Accepted March 13, 2002 IE010478R