Genetic Algorithms and Evolutionary Programming ... - ACS Publications

Oct 9, 1999 - For instance, a strategy based on a parallel genetic algorithm has been proposed by Maniezzo9 to evolve feedforward artificial neural ne...
0 downloads 0 Views 94KB Size
4330

Ind. Eng. Chem. Res. 1999, 38, 4330-4336

Genetic Algorithms and Evolutionary Programming Hybrid Strategy for Structure and Weight Learning for Multilayer Feedforward Neural Networks Furong Gao,* Mingzhong Li, Fuli Wang, Baoguo Wang, and PoLock Yue Department of Chemical Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong

A hybrid strategy (GAs-EP) combining genetic algorithms (GAs) and evolutionary programming (EP) via a matrix group encoding is proposed to evolve a multilayer feedforward neural network, through simultaneously acquiring the network structure and weights. The strategy uses EP for evolving neural network and GAs for diversifying the individuals of the neural network population. This strategy inherits the strengths and suppresses the shortcomings of GAs and EP in their separate forms. The resulting strategy is simple and practical, and it has fast convergence. Its effectiveness has been demonstrated through its application to the polymer melt temperature prediction of injection molding. 1. Introduction Artificial neural networks have many successful applications in soft measurements and control of chemical processes.1-4 Among these applications, the multilayer feedforward neural network (MFNN) is one of the most popular topologies. Normally, the MFNN design consists of two separate steps: (1) the determination of a proper MFNN structure that includes the number of hidden layers and the number of nodes in each layer and (2) the training of the network by learning algorithms to minimize the mean-squared error between the desired and the actual outputs for the given training samples with the given structure. There exist several traditional methods that may be used to automatically determine the neural network configuration and weights: pruning, growing, and growing and pruning. Pruning techniques5 start with a network with over-specified nodes, and then prune the excess nodes. Growing starts with a network with under-specified nodes, and adds nodes as needed. For growing neural networks, Moody and Antsaklis6 proposed an algorithm called dependence identification for constructing and training a multilayer neural network. Growing and pruning networks, e.g., the space partition network (SPAN) proposed by Lee and Sheu,7 can add or delete nodes during training. All the above methods are susceptible to trapping at structural local optima, and the result thus depends on initial network structures. Recently, genetic algorithms (GAs) and evolutionary programming (EP)8 have been developed to mimic the natural process evolution, and they have also been used in their separate forms to evolve the neural network structure as well as the weights. For instance, a strategy based on a parallel genetic algorithm has been proposed by Maniezzo9 to evolve feedforward artificial neural networks. An evolutionary programming called GNARL has been presented by Angeline et al.10 to acquire both the structure and weights for recurrent networks. Genetic algorithms are effective for * To whom correspondence should be addressed. E-mail: [email protected].

global search but are slow in fine-tuning once a promising region of the search space has been identified. On the other hand, evolutionary programming is a stable search algorithm, but it is susceptible to stagnation at local minimal. It would be ideal if we could combine GAs and EP together, as they have complementary strengths. The difficulty is that there lacks a common encoding scheme to allow them to work together. A new encoding procedure based on a group of matrixes to represent MFNNs is proposed to combine GAs and EP for learning both the structures and weights of the networks. The proposed algorithm, let us code name it as GAs-EP, uses EP to evolve neural networks; at the same time, it uses GAs to diversify the neural network population individuals. Consequently the advantages of both GAs and EP can be inherited, and their individual shortcomings can be suppressed. This simple and practical algorithm has fast convergence and can be implemented in a parallel way. The proposed method can be applied to not only the evolution of neural networks, but also to other optimizations. Two standard benchmark problems, XOR and Multiplexer problems, are used to test the effectiveness of the proposed algorithm. Finally, the proposed algorithm is applied to the prediction of the polymer melt temperature of an injection molding process. This paper is organized as follows. A brief introduction of GAs and EP is given in section 2, and the GAsEP algorithm is presented in section 3. The applications of the proposed algorithm are given in section 4, and the paper is summarized in section 5. 2. Background Genetic algorithms rely on chromosomal operators, while EP stresses behavioral changes at the species level. A brief introduction is given in this section, while the detailed background can be found in the paper by Fogel.8 2.1. Genetic Algorithms. Genetic algorithms evolve a population of individuals based on the mechanics of natural selection, genetics, and evolution. Each individual of the population represents a trial solution of

10.1021/ie990256h CCC: $18.00 © 1999 American Chemical Society Published on Web 10/09/1999

Ind. Eng. Chem. Res., Vol. 38, No. 11, 1999 4331

the problem, which is termed as a chromosome with each element being described as a gene. The value of a gene is called its allele. All chromosomes are usually represented by binary strings. The measurement of the value of a chromosome in the context of the problem is given by a fitness function. A GA in its simplest form uses three operators: reproduction, crossover, and mutation. Reproduction (selection) is a process in which individual strings are copied according to their fitness values to produce a new population. The reproduction process can be conducted in several ways. Roulettewheel selection is the most commonly used: individuals are extracted by spinning a simulated biased roulette wheel with slots of different sizes proportional to the fitness of the individuals. Crossover operates on a selected pair with a crossover probability Pc. At first, two strings from the reproduced population are randomly selected for mating. At a randomly selected crossover site (a bit position) the strings are crossed, forming new individuals (offspring) by juxtaposing the first part of one parent with the last part of the other. Mutation modifies alleles of each individual of the population in a low mutation probability, Pm. Usually the new allelic value is randomly chosen with a uniform probability distribution. A typical GAs procedure may be described below: 1. Randomly generate an initial population X(0) ) {x1, x2, ..., xN}. 2. Compute the fitness F(xi) of each individual xi of the current population X(t). 3. Generate an intermediate population Xr(t) by reproduction. 4. Generate X(t+1) by crossover and mutation operations on Xr(t). 5. t ) t + 1; if max{F(xi)} >  or t g Tmax, where  is a preselected constant and Tmax is the maximum allowable iteration, the procedure ends; otherwise goes to step 2. The process of generating new offspring and selecting those with high fitness values continues until a satisfied solution is reached or the allowable iteration is exhausted. 2.2. Evolutionary Programming. Similar to GAs, evolutionary programming is a neuro-Darwinian search paradigm. This multiagent paradigm provides a systematic stochastic optimization technique for an arbitrary objective function. When applied to a real-valued function optimization, the problem is defined as finding the real-valued n-dimensional vector x that is associated with the extreme of a function F(x): Rn f R. Without loss of generality, let the procedure be implemented as a minimization process. The simplest method is implemented as follows: 1. An initial population of parent vectors, xi, i ) 1, 2, ..., N, is randomly selected from a feasible range in each dimension. The distribution of initial trials is typically uniform. 2. An offspring vector, xi(t)′, i ) 1, 2, ..., N, is created from each parent xi(t) by adding a zero-mean Gaussian random variable with a preselected standard deviation. 3. Selection operation then determines which of these vectors to remain by comparing the values F(xi(t)) and F(xi(t)′), i ) 1, 2, ..., N. The vector xi(t+1), i ) 1, 2, ..., N, that possesses the smaller value becomes the new parent for the next generation.

4. t ) t + 1; if min{F(xi)} <  or t g Tmax, where  is a preselected constant and Tmax is the maximum allowable iteration, the procedure ends, otherwise goes to step 2. The process of generating new offspring and selecting those with the smallest value continues until a satisfied solution is reached or the allowable iteration is exhausted. Both EP and GAs operate on a population of candidate solutions and employ a selection criterion to determine which solution should be retained for future generations. EP, however, differs from GAs in the following: (a) the representation of a problem follows in a natural fashion, thus obviating the need of a dual representation; (b) EP has better stability, as offspring are created through mutation operations. The drawbacks of EP for optimization are (a) the constant standard deviation (step size) in each dimension makes the procedure converge slowly and (b) the nature of a point-to-point search makes it susceptible to stagnation at local minimal. 3. GAs and EP Hybrid Strategy In this section, GAs and EP are combined to learn both the structure and weights of the networks, as they have complementary strengths. An encoding procedure based on a group of matrixes is proposed to represent MFNNs. The corresponding crossover operations are modified for this representation with the following four operations: selection, structure assimilation, site selection and crossover, and structure restoration. 3.1. Matrix Encoding of MFNNs. A typical MFNN structure shown in Figure 1 is assumed to have n0 input nodes, L hidden layers with ni hidden nodes in the ith hidden layer, and NL+1 output nodes. The activation function F(x) is typically a sigmoidal function. To treat the thresholds of the hidden and output nodes as network weights, node x0 with a fixed value of 1 is added to the input layer, and node hi,0 with a fixed value of 1 is added to the ith hidden layer. The network is described by the following equations: ni-1

si,j )

i wj,k hi-1,k ∑ k)0

i ) 1, 2, ..., L + 1; j ) 1, 2, ..., ni (1) hi,j ) F(si,j)

(2)

In above equations, h0,k is the network input xk, and hL+1,j is the network output yj. Equations 1 and 2 can be combined into

( [ ])

1 H i ) F Wi H i-1

i ) 1, 2, ..., L + 1

(3)

where

Hi ) [hi,1, hi,2, ..., hi,ni]T and

[

i i i w1,0 w1,1 ... w1,ni-1

Wi )

i i i w2,0 w2,1 ... w2,ni-1 ... ... ... ... wni i,0 wni i,1 ... wni i,ni-1

]

4332

Ind. Eng. Chem. Res., Vol. 38, No. 11, 1999

Figure 1. Multilayer feedforward network and its structure.

Equation 3 indicates that matrix group W1, W2, ..., WL+1 can uniquely represent the MFNN of Figure 1. In matrix Wi, the first column is the threshold of the ith hidden layer, and the columns from the second to the last are the weights from the (i - 1)th hidden layer to the ith hidden layer. Each matrix in the matrix group satisfies the following relationships:

n0 ) Column(W1) - 1

1.

(4)

where n0 is the input node number and Column(W1) is the column number of W1.

2.

ni ) Row(Wi) ) Column(Wi+1) - 1, i ) 1, 2, ..., L (5)

where ni is the node number of the ith hidden layer and Row(Wi) represents the number of rows in Wi.

nL+1 ) Row(WL+1)

3.

(6)

where nL+1 is the output node number. The proceedings indicate that a MFNN can be uniquely represented by a matrix group, and vice versa. It is therefore possible to view the matrix group as an encoding of the MFNN. 3.2. Fitness of MFNNs. The fitness function may be defined according to the performance requirements, a common definition is

F(ηi) ) 1 -

E(ηi) Emax

(7)

where ηi denotes the ith MFNN of the population, E is the error function of the given training set {{X(1), D(1)}, {X(2), D(2)}, ..., {X(P), D(P)}}, where X and D are the input and desired output of the network, P is the number of training data pairs, Emax is the maximum value of the error function in the population. Three possible error functions are sum of square error, sum of absolute errors, and sum of exponential absolute errors. For simplicity, the mean square error function is used which can be mathematically represented by

E)

1 2PnL+1

P

nL+1

(di(k) - yi(k))2 ∑ ∑ k)1 i)1

(8)

To prevent the network from structure over-fitting, it is necessary to consider the structure information in the error cost function. Equation (8) is thus refined to be

( ( )

E)

L

1+λ

L

Lmax

+

ni ∑ i)1

1

Nmax 2PnL+1

P

nL+1

(di(k) - yi(k))2 ∑ ∑ k)1 i)1 (9)

where λ is a weight factor, L is the hidden layer number, Lmax is the maximum hidden layer number in the population, and Nmax is the maximum of the sum of the network neurons in the population. 3.3. Crossover Operations for the Matrix Encoding. Four operations are introduced for the crossover of the matrix representation: selection operation, structure assimilation operation, crossover operation, and structure restoration operation. Two individuals of the population are chosen by selection operation as parents to produce offspring with a given crossover probability Pc. Crossover cannot be directly operated with those networks of different hidden node number in their corresponding hidden layers. To overcome this problem, a structure assimilation operation is proposed. This operation makes the node number be the same by adding new nodes with random initial weights to the hidden layer with fewer nodes. These two parents, after the structure assimilation, have not only the same hidden layer number, but also the same node number in their corresponding hidden layers, so crossover operation can be performed. A matrix is chosen from the group with a uniform probability, and a single cutting row of the matrix is chosen with the same probability. Then the first part of one matrix group and the last part of the other matrix group are juxtaposed to form two new individuals. A structure restoration operation finally deletes the nodes added by the assimilation operation. A crossover example is given below for the two MFNNs of the structures and matrix group encoding shown in Figure 2. The structure assimilation operation makes the structures and the corresponding encoding matrix group of these two MFNNs as shown in Figure 3. A cutting point is randomly chosen at the second row of the second matrix W2. The offspring produced by this

Ind. Eng. Chem. Res., Vol. 38, No. 11, 1999 4333

Figure 2. Two different MFNNs and their matrix group encoding: (a) network structure of the first MFNN; (b) encoding matrix of the first MFNN; (c) network structure of the second MFNN; (d) encoding matrix of the second MFNN.

Figure 3. MFNNs and their matrix group encoding after structure assimilation operation: (a) network structure of the first MFNN; (b) encoding matrix of the first MFNN; (c) network structure of the second MFNN; (d) encoding matrix of the second MFNN (In the figure, b denotes the added nodes and their corresponding weights for the first network and 9 denotes the added nodes and their corresponding weights for the second network).

crossover operation are illustrated in Figure 4. Applying the structure restoration operation for the new offspring, the final results are shown in Figure 5. 3.4. Mutation Operation of MFNNs. Both parametric mutation and structural mutation are incorporated in the proposed algorithm. Parametric Mutation. It is accomplished by EP by perturbing each weight of a network with Gaussian noise. Offspring are generated by parents according to the equation

WO ) WP + ∆WP

(10)

where WO represents the weights of the offspring matrix group, WP the weights of the parent matrix group, and ∆WP a matrix whose elements are random variables with Gaussian distribution N(0,REP), where R is a scaling coefficient and EP is the mean square error of the parent network. The scaling factor R is a probabilistic analogue to the learning rate of the gradient-based methods. For simplicity, a constant R is used in this paper.

Figure 4. Offspring produced by the crossover operation and their matrix encoding: (a) network structure of the first MFNN; (b) encoding matrix of the first MFNN; (c) network structure of the second MFNN; (d) encoding matrix of the second MFNN.

Figure 5. The final networks and their matrix group encoding: (a) network structure of the first MFNN; (b) encoding matrix of the first MFNN; (c) network structure of the second MFNN; (d) encoding matrix of the second MFNN.

Structural Mutation. In the GAs-EP algorithm, the number of hidden nodes of the parent network is altered by deleting and adding nodes in hidden layers. To avoid radical jumps in the fitness from the parent to the offspring, the structural mutation attempts to preserve the network behavior by initializing the weight of added nodes to zero. Unfortunately, the node deletion still may cause jumps in the fitness. The node adding or deleting operation is conducted with a structural mutation probability, Pm. In this paper, only one node is added or deleted at each structural mutation operation. 3.5. GAs-EP Operation Procedure. The combined GAs-EP algorithm may be summarized in the following procedure: 1. Initialization: specify the number of individuals, N, the number of maximum hidden layers, M, the number of maximum hidden nodes, H, the number of maximum iterations, T, the crossover probability, Pc, the structure mutation probability, Pm, the scaling coefficient, R, the weight factor, λ, and the desired performance target, . 2. Subpopulation generation: randomly generate M subpopulations, each with N/M individuals. Make the individuals in the first subpopulation have one hidden layer, the individuals in the second subpopulation have

4334

Ind. Eng. Chem. Res., Vol. 38, No. 11, 1999

two hidden layers, and so on, and the individuals in the Mth subpopulation have M hidden layers. 3. Evaluate the error of every individual in each subpopulation according to eq 9. If E(ηi) <  or t g T, then go to step 6. 4. The following steps are repeated for each subpopulation: (a) The individuals are first evaluated by the fitness function (7), and the networks scored in the top 50% are designed as the parents of the next generation. (b) 80% of parents are selected to generate the offspring by the parameter mutation operation of eq 10. (c) The other 20% of parents generate the offspring by the GAs with the crossover probability Pc and the structure mutation probability Pm. (d) A new subpopulation is generated by selecting N/M individuals with the top fitness among the parents, the offspring, and the networks that are not selected as the parents. 5. Set t ) t + 1, then go to step 3. 6. Select the MFNN with the minimum error as the optimal solution. Remark 1. Both the proposed algorithm and exhaustive search method involve an initial solution search pool, the proposed algorithm, however, is different from the exhaustive search method that requires the solution of the problem to be in the initial solution pool. The proposed algorithm, which has the ability of evolving solution population, can find an optimal solution, even when the solution is not within the initial population pool. Remark 2. The proposed algorithm combines GAs, a global search but slow fine-tuning operation, and EP, an operation that is stable but susceptible to local minimal, to form a new strategy to evolve the neural networks. The proposed algorithm relies on GAs operation to diversify the network population while it uses EP operation to evolve the networks. This allows the advantages of both GAs and EP to be inherited. The effectiveness of the proposed scheme will be demonstrated through simulation in the following section. Our experience suggests that it is a good distribution to have 80% and 20% of the population for EP and GAs operation, respectively. These percentages, however, may be adjusted differently according to problems. 4. Simulation and Experimental Results In this section, three examples are given to demonstrate the effectiveness of the proposed GAs-EP algorithm. The first two are computer simulations of standard benchmark problems: XOR and Multiplexer. The third example is to develop a neural network for the prediction of polymer melt temperature in a injection molding process. In all cases, the sigmoid function, f(x) ) 1/(1 + e-x), is used, the weights of MFNNs are initialized randomly in the range [-1, 1], and the weight factor λ ) 1 is selected. 4.1. Simulations. XOR Problem. It is a hard classification task, and it is often used as a standard benchmark to test new learning algorithms. The network consists of two input nodes and one output node whose output is the exclusive OR of the inputs. The parameters of the algorithm are selected as follows: the individual number N ) 30, the maximum hidden layer number M ) 3, the maximum hidden node number H ) 10, the maximum iteration number T ) 1000, the crossover probability Pc ) 0.8, the structure mutation

Figure 6. Comparison of the mean square error curves for XOR problem with the proposed algorithm, and EP and GAs in their separate forms.

Figure 7. Comparison of the mean square error curves for Multiplexer problem with the proposed algorithm, and EP and GAs in their separate forms.

probability Pm ) 0.2, the scaling factor R ) 50, and the desired performance target  ) 0.0005. The resulting network has two hidden layers, with four nodes in the first hidden layer and four nodes in the second hidden layer. The mean square error curve of the proposed algorithm is given in Figure 6 (solid line). To illustrate the advantage of the proposed algorithm, EP and GAs in their separate forms are also used to train the neural network, respectively. In these cases, 100% of the parents are selected to generate the offspring by EP or GAs algorithm. All other simulation parameters are set the same as the case for the proposed algorithm. The mean square error curves of the EP (dotted line) and GAs (dashed line) are shown in Figure 6. With GAs, a global search scheme that is slow in fine-tuning, the mean square error curve declines slowly. With EP, a stable fine-tuning search scheme, the mean square error declines faster than with GAs in this case, but also much slower than the proposed algorithm. The superiority of the proposed scheme can be clearly seen with Figure 6 for the XOR benchmark problem. Multiplexer. It is another common benchmark for testing neural network learning algorithms. The network consists of six inputs and one output. Among the inputs, four are data bits and the other two are address bits for the selection of the data to be transmitted. The desired output is equal to the selected input. The mean square error curve of the proposed algorithm is given in Figure 7. The resulting network has three hidden layers, with eight nodes in the first hidden layer, four nodes in the second hidden layer, and two nodes in the

Ind. Eng. Chem. Res., Vol. 38, No. 11, 1999 4335 Table 1. Network Structures and Training Results no.

network output

hidden layer

neuron

epoch

MFNN1 MFNN2 MFNN3 MFNN4 MFNN5 MFNN6 MFNN7 MFNN8

T1 T2 T3 T4 T5 T6 T7 S3

1 1 2 2 1 1 1 2

15 8 7, 6 14, 12 18 15 12 10, 7

374 1024 3978 3885 2324 2529 3283 4129

Figure 8. Melt temperature profile representation.

third hidden layer. The network learning is also performed via EP and GAs in their separate forms; the mean square error curves are shown in Figure 7 for comparison. In this simulation, all parameters for EP, GAs, and the proposed algorithm are set to be the same as in the XOR simulation. The superiority of the proposed algorithm is also clearly shown in this example. 4.2. Prediction of Melt Temperature in an Injection Molding Process. Injection molding is widely used in plastic processing to produce parts ranging from simple toys to precision lens. It is estimated that approximately 32 wt % of all plastic materials go through injection molding. The process features three basic components: an injection unit to melt and transfer the plastic to the mold, a clamping unit to close, hold, and open the mold, and an injection mold to form the part and act as a heat exchanger. The applications of computers in the injection molding process were mainly focused on the mold design and the simulation of the plastic melt flow in the mold cavity. In injection molding, the melt temperature at the nozzle exit has a decisive effect on the quality of the molded parts. This temperature is mainly determined by three factors: (1) the melt temperature distribution in the reservoir of the injection molding machine at the plastication phase, (2) the heat conduction between the melt and the surroundings at the dwell phase, and (3) the shear heating effect at the injection phase. In the current research, the effect of the plastication conditions on the melt temperature in the reservoir is investigated to gain an insight into the processing of plastics. The operating conditions that affect the melt temperature during plastication phase can be represented with the following seven process variables: the nozzle heater temperature (Tn), barrel heater temperatures (Tz1, Tz2, and Tz3), the screw rotation speed (rpm), back pressure (Pb), and the required stroke length (SL). The temperature profile in the barrel reservoir is characterized with seven points as shown in Figure 8. Points (T1, S1) and (T7, S7) denote the temperatures when the stroke lengths are 0 and 100% of the full stroke (St), respectively. The point (T3, S3) represents the maximum temperature and the corresponding stroke length. The point (T2, S2) denotes the temperature when the stroke length equals the average value of S1 and S3. Points (T5, S5) and (T6, S6) represent the temperatures when the stroke lengths are 75% and 90% of the full stroke, respectively. The point (T4, S4) denotes the temperature when the stroke length is the average value of S3 and S5. Therefore, when the seven temperatures (T1, T2, T3, T4, T5, T6, and T7) and one

Figure 9. Comparison of the prediction and measurements.

stroke length (S3) are determined, the seven points can be obtained and then the temperature profile can be established. It is difficult to measure the melt temperature distribution in the reservoir. However, if the melt is injected at a very low injection velocity, then the shear heating effect due to injection on the melt can be ignored and the melt temperature at the nozzle exit can approximately represent the melt temperature distribution in the reservoir. It is also difficult to establish a mathematical relationship between plastication conditions and the melt temperature based on the fundamental principles. The input and the output of the process can, however, be measured. With sufficient input and output data, neural networks can be trained to capture the relationship between plastication conditions and the melt temperature. Eight feedforward networks are developed to predict the melt temperature using the proposed GAs-EP algorithm. Each network has seven inputs (Tn, Tz1, Tz2, Tz3, rpm, Pb, and SL) and one of the eight outputs (T1, T2, T3, T4, T5, T6, T7, and S3). The proposed neural networks are trained with 64 set of experimental data and tested with 10 set of untrained experimental data. The parameters of the algorithm are chosen as follows: the individual number N ) 60, the maximum hidden layer number M ) 3, the maximum hidden node number H ) 20, the maximum iteration number T ) 5000, the crossover probability Pc ) 0.8, the structure mutation probability Pm ) 0.2, the scaling factor R ) 10, and the desired performance target  ) 0.001. The network structures are shown in Table 1. The seven points are obtained by use of the eight neural network outputs, and then the temperature profile is fitted by use of cubic spline data interpolation. Three sets of experimental data and the corresponding prediction results are shown in Figure 9. It can be seen that the neural networks have good generalization capabilities. 5. Conclusion A GAs-EP algorithm for simultaneously learning multilayer feedforward network structures and weights

4336

Ind. Eng. Chem. Res., Vol. 38, No. 11, 1999

has been proposed, and it has been shown to be effective through both simulations and the application to injection molding. Acknowledgment

(5) Reed, R. Pruning algorithmsa survey. IEEE Trans. Neural Networks 1993, 4 (5), 740. (6) Moody, J. O.; Antsaklis, P. J. The dependence identification neural network construction algorithm. IEEE Trans. Neural Networks 1996, 7 (1), 3.

This work was supported in part by the Hong Kong Research Grant Council with Grant HKUST666/96P.

(7) Lee, B. W.; Sheu, B. J. Modified hopfield neural networks for retracting the optimal solution. IEEE Trans. Neural Networks 1991, 2 (1), 137.

Literature Cited

(8) Fogel, D. B. An introduction to simulated evolutionary optimization. IEEE Trans. Neural Networks 1996, 5 (1), 3.

(1) Wang, H. Y. O.; Yoon, E. S. Strategies for modeling and control of nonlinear chemical processes using neural networks. Comput. Chem. Eng. 1998, 22 (S), 823. (2) Syu, M. J.; Chen, B. C. Backpropagation neural-network adaptive-control of a continuous wastewater treatment process. Ind. Eng. Chem. Res. 1998, 37 (8), 3625. (3) Kim, S. J.; Lee, M. H.; Park, S. M.; Lee, S. Y.; Park, C. H. A neural linearizing control for nonlinear chemical processes. Comput. Chem. Eng. 1997, 21 (2), 187. (4) Milanic, S. D.; Hvala, S. N.; Strmcnik, S.; Karba, R. Applying artificial neural-network models to control a time-variant chemical-plant. Comput. Chem. Eng. 1997, 21 (S), 637.

(9) Maniezzo, V. Genetic evolution of the topology and weight distribution of neural networks. IEEE Trans. Neural Networks 1994, 5 (1), 39-53. (10) Angeline, P. J.; Saunders: G. M.; Pollade, J. B. An evolutionary algorithm that constructs recurrent neural networks. IEEE Trans. Neural Networks 1994, 5 (1), 54.

Received for review April 9, 1999 Revised manuscript received August 17, 1999 Accepted August 23, 1999 IE990256H