Neural Networks Applied in Chemistry. II. Neuro-Evolutionary

Aug 12, 2013 - We review here some key techniques for optimizing artificial neural networks and comment on their use in process modeling and optimizat...
9 downloads 15 Views 3MB Size
Review pubs.acs.org/IECR

Neural Networks Applied in Chemistry. II. Neuro-Evolutionary Techniques in Process Modeling and Optimization Hugh Cartwright† and Silvia Curteanu*,‡ †

Physical and Theoretical Chemistry Laboratory, Oxford University, South Parks Road, Oxford, England OX1 3QZ Department of Chemical Engineering, “Gheorghe Asachi” Technical University Iasi, Bd. Prof. dr. doc. Dimitrie Mangeron, No. 73, 700050, Iasi, Romania



ABSTRACT: Artificial neural networks are widely used in data analysis and to control dynamic processes. These tools are powerful and versatile, but the way in which they are constructed, in particular their architecture, strongly affects their value and reliability. We review here some key techniques for optimizing artificial neural networks and comment on their use in process modeling and optimization. Neuro-evolutionary techniques are described and compared, with the goal of providing efficient modeling methodologies which employ an optimal neural model. We also discuss how neural networks and evolutionary algorithms can be combined. Applications from chemical engineering illustrate the effectiveness and reliability of the hybrid neuro-evolutionary methods. Neither method is ideal: the first is potentially slow, since changes to the parameter values may move the model in an unproductive direction. Gradient searches cannot be applied to discontinuous or nondifferentiable functions, as the gradients are not available.2 In both methods, the initial parameter values are to an extent arbitrary (were that not the case there would be no need to refine them), and convergence to a satisfactory solution depends on how well, or how fortuitously they are chosen. There is also a danger that the search may become trapped at a solution of modest quality, bringing refinement to an end. Further difficulties exist, as goals may conflict or be noncommensurable. To take a chemical engineering example: in an endothermic synthesis it is desirable to maximize reaction yield, but to minimize energy consumption. If there are several goals, they must be collapsed into a multiobjective function that is computationally tractable and experimentally reasonable, which complicates optimization.3 If the function is cast as a weighted average of objectives, prior knowledge is needed to set the weights, as the different objectives may interactin a probably nonlinear waywithin the algorithm. The objectives could be expressed as a vector and optimized as a group, but it then may be difficult to find a single optimal solution, and a set of several solutions of comparable quality, referred to as a Pareto set, may be the result. 2. Intelligent Methods for Process Modeling. The need for reliable process models has raised interest in “intelligent” algorithms, such as those within artificial intelligence (AI). One of the most popular of these algorithms is the artificial neural network (ANN), a computational tool loosely based upon the structure and behavior of the brain. When first conceived, it was thought that a sufficiently sophisticated ANN could reproduce

1. Introduction: Computational Models in Process Engineering. The practical understanding that scientists have of the operation of a complex industrial process often exceeds their knowledge of the chemistry and physics that underlies it. Not enough may be known to build a comprehensive theoretical model, and many processes display nonlinear behavior, making a closed form analytical description problematic. Whether the process is ill-defined or nonlinear, a computationally expensive numerical method may be required if the process is to be satisfactorily modeled. In conventional empirical (regression) modeling, a numerical representation of the experimental data acts as a proxy for a model that incorporates scientifically realistic principles.1 To provide flexibility, a data fitting function of quite general form may lie at the heart of the empirical model, but while generality increases power, the increased computational effort does not guarantee an acceptable fit to experimental data. A dependable model is particularly important in process control. Bioprocesses such as fermentation may move from the laboratory to industrial scale using an empirical process model which is not readily amenable to analytical representation. This is a minor limitation if the process runs as intended, but more serious if deviations from expected behavior occur, a common event during scale up. It is desirable therefore that, when an empirical or “black box” model is used, it be comprehensive and, ideally, fully optimized. Two approaches in model optimization are common. In a direct search, an objective function is created to measure how well the model reproduces system behavior. Model parameters are set, the value of the objective function is calculated and the predictions of the model are compared to experimental data. Based on the level of agreement with experiment, the parameter values are then randomly or heuristically modified and the process repeated until prediction and experiment are in satisfactory agreement. Gradient descent is used in a similar fashion, except that derivatives of the objective function replace, or are used with heuristics and random adjustments to guide the parameter search. © 2013 American Chemical Society

Received: Revised: Accepted: Published: 12673

January 8, 2013 August 8, 2013 August 11, 2013 August 12, 2013 dx.doi.org/10.1021/ie4000954 | Ind. Eng. Chem. Res. 2013, 52, 12673−12688

Industrial & Engineering Chemistry Research

Review

to learn, while, perversely, large networks may be too flexible, and learn examples rather than rules. Finding the best architecture is an optimization problem. The literature in this area is extensive but patchy, containing many examples of architecture chosen by experience rather than by algorithm. It is, however, easy to set the simplest features of the architecture, the input and output layer sizes. Unless there is redundancy in the inputs, the number of inputs is determined by the problem. Output redundancy is unlikely, as there is one output for each environmental parameter for which a prediction is required; if supervised learning is used, this equals the number of factors for which training data exist. While the size of these layers is determined by the number of environmental variables, there is no limitation on the number of neurons in the hidden layers, or the number of layers. Optimization of the topology of hidden layers, which is a key step, can be accomplished using trial-and-error, empirical or statistical methods, constructive or destructive methods, and evolutionary techniques.9,10 In trial-and-error, the architecture is repeatedly tested and modified by hand, until an acceptable architecture is located. This simple method is not suitable for large networks as it covers little of the search space, and user bias may divert the search from promising architectures. To reduce noise, the architecture may be changed only when the error diminishes, but this increases the computing overhead and transforms the search into a gradient descent, which is prone to trapping in local minima. More sophisticated ways to seek a good architecture are available. Empirical or statistical methods use the effect of changes in architecture or parameters on performance to guide the search. Fiszelew et al.104 used BP training with early stopping in this way. A network can also be built piecemeal from a small beginning. In a constructive algorithm, the network grows as neurons and/or layers are added until performance is adequate. The inverse of this procedure is a pruning algorithm, in which neurons and/or layers are deleted from a large network until performance falls below an acceptable level. By returning to the smallest network which gave satisfactory performance, a suitable network is identified. Among other promising methods for optimizing architecture are evolutionary algorithms,11−13 on which we concentrate here. An evolutionary search can optimize any of the ANN parameters, provided that they are specified in a way that is compatible with the algorithm.105,106 Like other methods, evolutionary algorithms cannot always guarantee to locate the optimum architecture, but they typically outperform rival methods in developing complex networks for real-world problems. As well as the architecture, other network parameters must be chosen, and again various approaches exist. In early studies, network architecture was predefined and the search focused on the parameters: see, for example, Marin and Sandoval,107 or Yao.108 De Falco et al.109 used such an approach to determine network weights, while Keesing and Stork110 combined a genetic algorithm with a neural network to solve simple pattern recognition problems. Learning rate and momentum, and their mutual interaction, were considered by Vlahogianni et al.,111 Kim and Yum112 and Packianather et al.,113 while Almeida and Ludermir,8 Dam and Saraf,114 Sukthomya and Tannock,115 Kim and Yum,112 and Wang et al.116 investigated how activation functions and the sizes of training and testing sets affect performance. Other elements to which optimization procedures

in silicon the operation of the brain; this hope proved overoptimistic, but ANNs have proved valuable in the modeling of numerous scientific and other processes. An ANN combines several discrete software units, known as neurons, which perform the same simple mathematical tasks. In the most popular type of network, the feed-forward multilayer perceptron (MLP), these form layers, within which each neuron is linked to every neuron in the immediately adjacent layers. On each link is a weight which amplifies or diminishes signals that travel along it. MLPs are simpler to construct and faster than phenomenological models, though the essential oneoff “training” can be lengthy. Provided they are of suitable size and appropriately trained, MLPs can approximate any continuous function.4 Their primary disadvantage is that they are even more opaque than empirical models; they are the archetypal “black box”, so when applied to an industrial process they form a model which contains no knowledge of the chemistry or physics of the process. 3. Creating a Working Artificial Neural Network. There are three steps in the preparation of an ANN: creation of the layered structure and the interneuron links, which we shall refer to collectively as the “architecture”; selection of values for the parameters that control operation of the network; and determination of the connection weights, through training. In a layered network, the numbers of layers and neurons are adjustable. The numbers of neurons in the input and output layers are largely defined by the problem, since one neuron is allocated to each type of input data, such as temperature, pH, or flow rate, while one output neuron is allocated to each item of data fed back into the environment, such as a predicted pressure or product concentration. If the number of input variables is large, or if there is redundancy among them, it is helpful to reduce dimensionality by choosing a subset;5 this reduces the time required for training and leads to a more robust network. Between the input and output layers lie hidden layers, whose number is adjustable. Most applications require no more than three hidden layers, each containing, at least in principle, any number of neurons. The parameters to be set include the numerical variables required by the algorithm and the type of activation and error functions, and training algorithm. The choice of these depends on the problem being tackled and is a significant issue in realworld applications.6,7 To reduce computational requirements, a metamodel may be used.99 Training, which may be supervised using error correction, or unsupervised, with no access to training data, is commonly based on iterative gradient descent, particularly back-propagation (BP) in its various forms (see for example QuickProp100 or RPROP101,102). Although convergence of BP-based methods can be encouraged by gradient following,101,103 the rate of convergence depends on the initial weights and may be slow; there is no certainty that the global optimum will be found. 4. Optimizing the Neural Network. Neural networks are widely used, simpler conceptually than many numerical algorithms and readily available as commercial software. One might assume that all aspects of their use would be well understood, but the simplicity is misleading and their use in research not without its difficulties. They are not a true “turnkey” method as the quality of results depends on how the network is set up;8 performance is sensitive to the architecture, and a poorly chosen architecture will wreck an ANN application. Very simple architectures are not flexible enough 12674

dx.doi.org/10.1021/ie4000954 | Ind. Eng. Chem. Res. 2013, 52, 12673−12688

Industrial & Engineering Chemistry Research

Review

have been applied include learning rules,8,117 the error function, and noise factors.116 Transfer functions and training algorithms are usually picked by hand, though Annunziato and coworkers118 applied evolutionary algorithms to the off-line evolution of weights and transfer functions of feed-forward ANNs, improving on the performance when weights alone were adjusted. 5. An Introduction to Evolutionary Algorithms (EAs). Evolutionary algorithms, which draw inspiration from natural evolution, form a powerful family of methods that may design any or all of the architecture, internal parameters, connection weights, and learning rules of an ANN.105,106,14 Both the “selection of species” under environmental pressure and the processes of reproduction and mutation have computational analogues in the family of EAs, which comprises the genetic algorithm (GA), genetic programming (GP), evolutionary programming (EP), and evolutionary strategies (ES). Each is a population-based stochastic search employing a fitness criterion. An initial population of potential solutions (in the applications we discuss, a “solution” is an ANN of value in solving a scientific problem) is iteratively improved through fitness-based selection and the generation of diversity.15 Population-based EAs can create several elements of a Pareto optimal set in a single run when the chance of finding a unique optimal solution is low.105 The GA, currently the most popular EA, was sketched out by Holland16,17 and developed by numerous groups. It is suited to problems in which the search space is very large or discontinuous, or that contain mixed continuous-discrete variables;119,120 problems of this nature present particular challenges within applied science.18 In addition to its use in network design, the GA has been used in many other applications, including process control, chemical mass balance, multipurpose chemical batch plant design, and scheduling.19−21 GAs scale well, a helpful characteristic since evaluation of solutions is computation-intensive.121 In a basic GA, an initial population of random solutions is created and the quality of each solution (its “fitness”) is calculated. A new population is prepared by selecting members from the old one with a probability related to their fitness; evolutionary manipulation then creates new, potentially better, solutions. The fitness of each new solution is determined, and the cycle repeats until a predefined termination criterion is met, generally the emergence of a solution of acceptable quality.2,122 The fitness need not be a pure number. Real-world problems include constraints, such as a restriction on temperature or processing rate, which can be handled in two ways: each potential solution can be examined and, if it fails to meet a constraint, it can be replaced by a new random solution, prepared so that the constraint is satisfied. Alternatively, solutions that break a constraint can be given a fitness penalty; this latter tactic creates a more diverse population, but at the expense of additional computing time, and with the risk that the population may fill with constraint-breaking solutions. Other factors may contribute to the fitness: architectural simplicity promotes learning rather than remembering, so the fitness may be linked to network complexity. In an example of a multifactor function, Mizuta et al.123 used a GA to both design and train a network; their fitness function depended on the number of valid networks, the number of hidden neurons, and the number of connections. The GA-ANN is a powerful combination for several reasons. Functions that measure the quality of the architecture are

discontinuous, since a minimal change in architecture may change the performance abruptly, but, unlike gradient-descent methods, the GA works with a population of solutions, so the search is less likely to be trapped in a local optimum.9 In addition, parallel processing, which is readily incorporated into EA methods, increases processing speed. 6. Coding the Evolutionary Algorithm. The solutions that an EA manipulates are encoded; the effectiveness of the algorithm depends on how this is done.When applied to ANNs, each part of an EA solution might represent as little as one interneuron connection, or as much as a complete, functioning network. The population commonly comprises a number of genotypes, each being an array of genes taking a value from a defined domain. The genotype codes a phenotype or candidate solution, in this case a neural network architecture. In direct or “strong specification” encoding every network connection is coded explicitly.120 Most early workers124,125 defined the network directly as a binary string, and this is still done by some (see, for example, Gan et al.126), while Gray encoding has also been used.127 Castilllo et al.6,7 combined GA with BP in a direct method to train MLPs with a single hidden layer. While direct representations have the merit of simplicity, they scale poorly, as the string length increases exponentially with the number of neurons,128 so they are not usually a feasible option for optimizing complex networks. Indirect or “weak specification” encoding describes the arragements of layers and nodes but not individual connections, and is reminiscent of biological gene reusability.129−132 This encoding reduces string length, but variable length strings complicate interpretation.10 Within computational limits, any feed-forward architecture can be created, so indirect encoding ensures generality. White and Ligomenides133 used a GA in which each member of the population was a complete network, with different sizes permitted; each allele was a node with its associated links. Though there is some flexibility in this approach, the network sizes were limited to those generated during initialization. Developmental encodings are even more compact; a genome specifies a process that in turn controls construction of the network, so the representation is particularly compact.134,135 The networks produced by decoding may incorporate learning algorithms specified by the genes, or these algorithms may be preselected. The evolutionary procedure manipulates a population of genotypes, applying the usual operators with the aim of evolving phenotypes of high fitness.104 Coding may include both weights and learning parameters,136,137 or even a modification of the BP algorithm itself, as described in Kinnebrock’s genetic approach:138 a mutation operator modifies the network weights after each iteration, though such alterations do not necessarily improve the net classification error. The flexibility that characterizes the GA encourages tailoring of representation schemes to a particular problem. Typical of such efforts, Mizuta et al.123 represented the architecture by a header-and-body chromosome. The header (Figure 1, left) holds an identification, the fitness function and tags for the output neurons, while the body (Figure 1, right) contains identification data for the hidden neurons, their connections, and the connection weights. 7. Evolving a Network. The evolutionary process in topology design is depicted in Figure 2. The efficiency of this simple scheme can be impaired by the permutation problem. The mapping from a coded representation of an ANN to the 12675

dx.doi.org/10.1021/ie4000954 | Ind. Eng. Chem. Res. 2013, 52, 12673−12688

Industrial & Engineering Chemistry Research

Review

stopping allowed small networks to be trained sufficiently to learn the data associations while reducing overfitting in complex networks. Their “complete” fitness function combined four factors: training error, generalization error, a criterion favoring small architectures, and a solution space criterion that favored networks whose predictions were consistent throughout the solution space. In a thermodynamics-based study, Gao and Loney36 used genetic programming to design networks of no pre-specified type of architecture, to predict the phase equilibria of poly(ethylene glycol)/phosphate aqueous systems. The final network contained several recursive (feedback) links, which the authors suggested were the network’s attempt to include the effect of experimental error. Optimization of architecture and of the network parameters is best regarded as a single optimization task.14,105,106,123 This may be accomplished by incorporating both weights and learning parameters into chromosomes,136,137 or through a modification of the BP algorithm.138 Mizuta et al.123 used a GA with a fitness function dependent on both the output errors and the complexity of the proposed network, determining the number of hidden neurons and connections, and the values of weights. Bebis et al.142 coupled GAs with weight elimination145 to prune initially oversized networks, while Dam and Saraf114 used a GA to design a neural network and to find the optimum number of inputs. During network design, all potential inputs were used and the number of neurons in the input layer was varied. Subsequently, chromosomes using different subsets of the input set were created and the GA allowed to select the best set of inputs. To encourage selection of the best architecture, Arifovic and Gencay146 used a local elitism strategy after a GA determined the connection structure, number of hidden units and type of inputs. A Monte Carlo algorithm was used to assess the sensitivity of the procedure. Castillo et al.6,7 took a similar approach. After using the GA to find an initial set of weights and hidden layer size, BP was used to modify the weights. However, unlike other approaches,139,147 the maximum hidden layer size was not set in advance, though computational restrictions effectively provided an upper limit. Meta-learning principles have been applied by Kordik et al.26 to optimize neural network structure and function. Several neuron types trained by different optimization algorithms were combined to build supervised feed-forward networks in a procedure dubbed Group Adaptive Models Evolution. The variables in a GA-ANN method are not limited to those within the network; the power of a GA optimizer depends on its own internal parameters, so GA operators and variables are also chosen to optimize the operation of the GA. Koumousis and Katsaras148 considered how a variable population size and reinitialization strategy might affect the GA, while Cao and Wu149 treated optimization of the mutation and crossover probabilities as a controlled Markov process.150 Mizuta et al.123 employed standard selection and mutation, ignoring crossover as the chromosome length was variable, though other authors have successfully used variable length chromosomes: Kerachian and Karamouz151 developed varying length based GA (VLGA), in which the chromosome is sequentially lengthened; extended versions of VLGA which involved significantly less computation have been presented by Zahraie et al.152 8. Evolutionary Algorithms Used to Select Input Parameters. In some problems, there is redundancy among input parameters. D’heygere et al.37 used a GA to select input

Figure 1. (Left) Chromosome header;46 (right) chromosome body (per hidden neuron).46

Figure 2. The evolutionary process of topology design.

real network may yield two networks that are functionally equivalent but in which the nodes are differently ordered.22 To minimize these effects, Fiszelew and co-workers104 implemented a phenotype crossover, an operation on the network itself, rather than on the genes in the population. The surface across which an architecture search takes place is infinite, as the number of nodes and connections is unbounded. It is also complex and noisy: architecture evolves outside the training procedure, so performance depends indirectly on architecture, via the connection weights. The disruptive influence of noise can be lessened by training each architecture several times from different initial weights, with the best result used to determine phenotype fitness, but such a procedure is computationally expensive. A further complication is that the surface is deceptive, since networks of similar structure may perform differently,23 while different networks may deliver comparable performance. A suitable GA can address each of these problems. EAs can, in principle, design any type of network; examples include feed-forward networks,10,114,123 modular neural networks,24,139 recurrent networks,25,140 and neural network ensembles.26−28,141 Reflecting this variety of approaches, a range of EAs can be used, among them GAs,10,30,114,123,135,142 differential evolution (DE),31,32,143 group search optimizer,33−35 and bacterial foraging algorithms.144 Benardos and Vosniakos9 have proposed a method of architecture determination that relies upon the use of novel criteria to help the GA quantify both ANN performance and its complexity. As the number of hidden layers and neurons largely determine the generalization ability of the network, these authors concentrated on finding suitable values for these parameters. They used a multiobjective optimization method in which the solution space of all possible architectures was divided into regions that were searched independently. The best solutions from each region were retained, and terms were added to the objective function to reflect the generalization error. Early 12676

dx.doi.org/10.1021/ie4000954 | Ind. Eng. Chem. Res. 2013, 52, 12673−12688

Industrial & Engineering Chemistry Research

Review

variables, using a binary chromosome in which each gene represented an input. The performance of a feed-forward ANN was enhanced by reducing the number of variables from 17 to between 5 and 10. To identify input nodes that conveyed no new information, Nadi et al.153 used an evolutionary algorithm, monitoring the weights that link an input to the hidden layer. A weight near zero indicated a redundant input, which was eliminated. However, this method of determining redundancy was slower than alternative methods, and, if two inputs provide similar information, optimization is likely to blend the inputs, rather than drive the weight of one of them to zero. A study by Mouton et al.38 used neural networks to investigate how river characteristics affected the abundance of Asellus (Crustacea, Isopoda). Several methods were used to assess the relative importance of 24 environmental variables, and so identify the key variables. All methods ranked the input variables similarly, suggesting that the particular method used was not critical. Indeed, for data sets in which the level of redundancy is high, there is evidence that heuristics encoded as lookup tables derived from reconstructability analysis may be almost as effective as ANNs working on a subset of the input data.39 A network can itself be used to identify a parsimonious input set. Kourentzes and Crone40 describe an application to multiyear time series forecasting, in which the components in different series can change in unpredictable ways. They argued that automated input selection is possible using an ANN-based filter, relying on the identification of seasonal frequencies and the separation of deterministic and stochastic signals. 9. Differential Evolution and Other Algorithms. Though the most widely used method to evolve ANNs is the GA, the search capabilities of DE make it a potential alternative. DE is a population-based meta-heuristic, loosely based upon Darwinian principles.41,42 The algorithm can select evolutionary procedures to create new variants, strategies, or mutation routes, such as new mechanisms to select values for the control parameters.154,155 Among the earliest attempts to optimize a neural network using DE was that of Fischer et al.,156 who used a fixed architecture to find a new training method. Subsequently, Plagianakos et al.31 used DE to train neural networks with discrete activation functions. Lahiri and Khalfe143 made an attempt to determine a partial architecture using DE to vary the number of neurons in the hidden layer, the weights, and the activation functions. Bhuiyan157 proposed a DE-based algorithm to determine topology and to partially train a network during evolution. A DE-ANN was used by Subudhi for system identification.43,44 The DE trained the ANN to near the global minimum, after which the network was trained by a Levenberg−Marquardt (LM) algorithm. Subudhi argued that this approach provided better identification accuracy and was more efficient than typical neural or neuro-fuzzy methods. Figure 3 compares the error curves for these techniques and Figure 4 shows the relationship between mean squared error and number of iterations. Chen and co-workers45 described an improved DE algorithm to encode prior knowledge into networks during training. LM descent and random perturbation were used to speed up DE and help escape local minima. The algorithm’s efficiency was demonstrated by using it to model chemical curves with increasing monotonicity. Venkatraman and Yen46 proposed a generic framework for constrained optimization problems.

Figure 3. Comparison of errors in modeling for DE-ANN and neuro fuzzy identification.

Figure 4. Mean squared error obtained with different methods for system identification.

These can often be cast as multiobjective optimizations, for which many approaches exist, including vector evaluated GA (VEGA), 158 multiobjective GA, 159 niched Pareto GA (NPGA),160 strength Pareto approach,161 nondominated sorting genetic algorithm (NSGA),162 elitist NSGA-II,163 NSGA-II with jumping gene,47 and extended multiobjective GA.164 Other workers have investigated hybrid algorithms, in which two or more methods are combined to create optimizers.165 Fan and Lampien166 proposed a trigonometric strategy, in which the objective function plays a part in mutation to increase the rate of convergence.167 Recently, swarm intelligence (SI) algorithms, which mimic aspects of animal behavior, have begun to attract interest. Particle swarm optimizer (PSO) and ant colony optimizer (ACO) are among the most promising methods. PSO, which is related to bird flocking, is a trajectory tracking algorithm. Although it contains no evolutionary operators, instead resembling a directed stochastic search,48 it shares some elements with EA, and is thus a related technique. In a SI algorithm, many nonintelligent agents interact in a way which leads to intelligent behavior. There is no central control of the agents: each investigates its immediate environment independently and in simple fashion, yet the overall behavior is complex. PSO offers several potential advantages in science: it is simple to understand and implement, is well-suited to parallel computation, does not require access to derivatives, scales well, and has few adjustable parameters, though this final characteristic does reduce flexibility. Variants of PSO have appeared, such as active target PSO,168 adaptive mutation PSO,169 adaptive PSO guided by acceleration information,170 angle modulated PSO,171 best rotation PSO,172 cooperatively coevolving particle swarms,173 modified genetic PSO,174 and others.175 12677

dx.doi.org/10.1021/ie4000954 | Ind. Eng. Chem. Res. 2013, 52, 12673−12688

Industrial & Engineering Chemistry Research

Review

Ahmadi and Shadizadeh49 used PSO methods to determine the network weights in a feed-forward network whose role was to predict the degree of asphaltene precipitation in oil samples; they obtained good agreement with experimental data, and argued that their results were superior to alternative methods of prediction. Zhao also has used PSO to optimize an ANN,50 applying the tool to the fitting of experimental data from a tin(IV) oxide gas nanosensor. Once again the network optimized using PSO was found to be superior to a traditional BP network. It is not necessary that the PSO be used to optimize the network before analysis; it may instead be used to refine results generated by a network. In the rapidly expanding field of proteomics there is a need to predict both the structure of proteins and their function. This is an application of considerable commercial interest, and of interest more generally to structural biologists. Saraswathi and co-workers51 have used an ANN to predict a secondary structure of proteins, with an initial structure predicted by the network being refined by a PSO; results were comparable to those obtained using alternative strategies. Cai et al.140 combined PSO and evolution strategies to train recurrent neural networks for analyzing times series. The usual evolutionary operators were integrated into a conventional PSO algorithm to retain the best particles; mutation maintained diversity within the population. An improved PSO with a selfadaptive evolution strategy was used by Yu et al.176 to design ANNs. The architecture and parameters coevolved, using EA as an operator for a local search through small Gaussian mutations. Almeida and Ludermir8 have suggested an automatic search to optimize the parameters and performance of ANNs that takes advantage of several methods. ES, PSO, and GA are combined with local searches, BP and LM training algorithms. A small portion of the space is searched in depth to refine the solutions found by the ES. The solutions are submitted to a training phase in which BP and/or LM execute a more precise local search, and the solutions are finally submitted to a GA selection process. Ant colony optimization is also trajectory tracking in nature, simulating the creation of a pheromone trail laid down by successful searches. ACO is readily scaleable and, like PSO, exists in several variants, among them ant system,52 max−min ant system,177 elitist ant system,178 ACO with adaptive refinement,179 continuous ACO,180 and others. ACO has been used in a variety of recent applications. For example, Goodarzi et al.53 studied its use as a prefilter to extract key descriptors from a large pool of theoretically derived descriptors in a study of non-nucleoside HIV-1 reverse transcriptase inhibitors. Multiple linear regression, partial least-squares regression, and radial basis function neural networks were compared. The neural network approach was found to be best, and superior to comparative molecular field analysis approaches. Li and Liu54 identified a niche for a PSO in their study of the synthesis of polypropylene. The melt index is an important property of polymers as it indicates the progress of polymerization, but the authors argued that polymerization relies on too many variables for the melt index to be predicted directly from a complete set of process variables. They therefore used an ACO step to optimize both a radial basis function network and the choice of process variables. They obtained satisfactory agreement with experiment, but did not consider whether an

input trimming procedure could be used to eliminate process variables found by the network to be of little value. Ant colony and particle swarming methods have potential as “add-ons” to enhance the performance of ANNs; the situation is similar for Artificial Bee Colony (ABC) algorithms. Karaboga and Ozturk181 for example, used an ABC to train feed-forward ANNs, which were then used on standard machine learning problems. The results suggested that, like ant colony and swarming methods, the algorithm can be helpful in training ANNs, but it remains unclear whether it can compete with more conventional approaches or is best used as an aid to them. Group search optimizer (GSO) has recently joined the SI family. The GSO mimics a “producer−scrounger” model of animal behavior.48 Vision-like scanning mechanisms are included and “rangers” perform random walks to avoid local minima. Variants of GSO include quick group search optimizer,182 improved group search optimizer,183 hybrid GSO,184 GSO combined with PSO,185 and improved GSO with quantum-behaved swarm.186 Few applications of GSO have been published, among them some for developing ANNs.34,35 A novel multilayer neural network based on a fast bacterial swarming algorithm for image compression has been proposed by Ying and co-workers.144 They used bacterial foraging, incorporating such features as chemotaxis, metabolism, and quorum sensing, to improve the quality of reconstructed images. Liu et al.24 proposed a PSO-based memetic algorithm in which the evolutionary searching of PSO is linked to adaptive training to enhance the search for the best set of weights and bias, balancing exploration and exploitation. Further combined methods include GA + BP,6,7 GA + LM,141 NSGA-II + Quasi-Newton,27 DE + LM,43 DE + BP,157 PSO + EA,140,176 and PSO + ES + GA+ BP (LM).8 10. Chemical, Engineering, and Related Applications. Many examples exist of the use of evolutionary methods, often in combination with ANNs, to chemical and engineering problems and in related fields. In this section we cover some of the more significant and interesting applications. Using a GA, Dam and Saraf114 developed ANN-based sensors to predict the properties of petroleum distillates. Network design, including selection of the architecture and activation functions, was completed in a few hours. A network ensemble approach was used by Dondeti and co-workers141 to study spectrophotometric multicomponent analysis. Several ANNs, with varied numbers of inputs, hidden neurons, initialization, and training sets were trained with the LM algorithm. A GA created a pool of trained ANNs from which subsets were selected to form an ensemble. A population of several such ensembles then evolved to generate the fittest ensemble. In a solid state chemistry application, rare earth metals and lanthanum were encapsulated in silica by in situ complexation to chelating groups attached to the silica.30 The relation between reaction mixture composition and fluorescence intensity was modeled by ANNs optimized using GAs. Subsequently, networks were combined in stacks to yield a model which outperformed models comprising individual neural networks. In a further solid state chemistry application, Cartwright and Leontjev55 described the use of a GA-ANN hybrid algorithm in the search for high efficiency solid-state phosphors, a crucial component in light-emitting diodes. They found a novel phosphor composition whose efficiency is 12678

dx.doi.org/10.1021/ie4000954 | Ind. Eng. Chem. Res. 2013, 52, 12673−12688

Industrial & Engineering Chemistry Research

Review

considered were the number of hidden layers and the number of neurons within them, the activation functions, the biases, and the input weights. For classification, individual networks were found to give acceptable solutions; neural network based modeling with the DE algorithm was found to generate nearoptimal network topology. A typical result is presented in Figure 6.

predicted to lie well beyond that of previously known phosphors. The development and optimization of a stacked neural network through an evolutionary hyper-heuristic has been described by Furtuna and co-workers.27 A hyper-heuristic based on the NSGA-II multiobjective optimization evolutionary algorithm incorporated Quasi-Newton (QN) optimization. QN was used to train each network in the stack and create a Pareto optimal front. The number of stack networks, the weights for each network output, and the number of hidden neurons in each ANN form the decision variables in model optimization. Each stacked neural network was trained and used to model polyacrylamide-based multicomponent hydrogels synthesis, relating yield and swelling degree to reaction conditions. The resulting Pareto optimal front (Figure 5) contains a discontinuity due to disconnected areas in the solution space.27

Figure 6. Experimental and predicted oxygen mass transfer coefficients in stirred bioreactors.

In this method, a simple self-adaptive mechanism was employed to find near optimal control parameters of the algorithm, and direct encoding was used. A stack generalization routine combined the networks, leading to individual networks of MLP(4:11:1:1), MLP(4:1:1), and MLP(4:3:1). A self-adaptive version of DE has been used to simultaneously optimize architecture and parameters for the classification of organic liquid crystals,32 using geometrical molecular descriptors estimated by molecular mechanics. A network with one hidden layer was able to make accurate predictions, suggesting that the proposed method was both efficient and flexible. In Table 1, which presents some results obtained from the best classifier, 1 represents liquid crystalline behavior and 0 other situations. Freeze-drying is widely used industrially in food preparation and to extract drugs from solution, as it protects heat-sensitive material. Heuristic models have been used to optimize and monitor this process, but Dragoi et al.29 have proposed a blackbox model for the freeze-drying of pharmaceuticals, based on a self-adaptive DE-BP combination. Using this model, the

Figure 5. The Pareto optimal front provided by the NSGA-II-QNSNN evolutionary hyper-heuristic.

In Figure 5, Total Hn is the number of hidden neurons in the stack; perf_index is used to evaluate the accuracy and the generalization capacity of the stacked neural network: perf_index = r − (MSEtrain + MSEtest)

(1)

where MSEtrain is the mean squared error from training, MSEtest is the mean squared error obtained for testing, and r is the linear correlation coefficient at testing. In this study, the goal was to find the best topology and weights, while minimizing stack size. The best balance between the potentially conflicting objectives of stack complexity and performance index was found to be perf_index = 0.89, Total Hn = 16, with a stack of two networks. Lahiri and Ghanta56 described a robust hybrid ANN for process engineering problems, such as the prediction of solid liquid slurry flow hold up. Meta−parameters for a MLP network were optimized by minimizing the generalization error; DE was used to optimize network parameters and to train the neural network, but the question of how network topology might best be determined was largely unexplored. Dragoi et al.28 used both a classical DE algorithm and a selfadaptive mechanism to optimize neural networks for prediction and classification problems. Oxygen mass transfer in stirred bioreactors is a key factor in microorganism growth in aerobic biosynthesis. The mass transfer rate was related to the viscosity, superficial air speed, and oxygen fraction, among other factors, using both individual and stacked networks. The parameters

Table 1. Testing data and results obtained from the best neural model in the prediction of liquid crystalline behavior input data 9.21 9.21 9.21 9.23 9.21 9.21 9.21 9.21 9.21 9.22 9.22 12679

22.91 8.61 16.43 16.62 17.51 18.48 15.2 18.93 20.63 28.62 12.51

0.09 0.161 0.111 0.111 0.109 0.105 0.117 0.101 0.095 0.076 0.134

486.66 296.329 418.497 380.535 428.579 442.607 404.47 444.579 458.606 522.778 340.426

experimental

predicted

0 1 0 0 1 1 0 0 0 0 1

0 1 0 0 1 1 0 0 0 0 1

dx.doi.org/10.1021/ie4000954 | Ind. Eng. Chem. Res. 2013, 52, 12673−12688

Industrial & Engineering Chemistry Research

Review

Reliable estimations were obtained when the sensor was used to monitor a process characterized by different values of heat and mass transfer coefficients. As another example, a self-adaptive variant of the DE algorithm has been proposed to find the optimal topology of an ANN for predicting oxygen mass transfer in stirred bioreactors in the presence of n-dodecane as oxygen vector.57 The algorithm included two initialization strategies and a modified mutation. Mass transfer coefficients were accurately predicted and the best models were used to determine the optimal conditions for which the mass transfer coefficient was maximized. GSO has been used for a three-layer feed-forward ANN to investigate how ultrasound data from grinding machines might be modeled and analyzed.33 Good convergence and generalization performance was demonstrated on two benchmark problems,34,58 showing that the method is capable of detecting machine malfunction from ultrasound data. Yan et al.59 have adapted the basic GSO to improve the diversity of scroungers. Their tests on standard learning problems, and on the prediction of ammonia concentration in a fertilizer plant, suggest that their algorithm can at least compete with alternative approaches. Silva and co-workers35 introduced two hybrid GSO methods based on a divide-and-conquer paradigm, employing cooperative behavior among multiple GSO groups to improve performance. They found better generalization performance than traditional GSO in benchmark data sets. Cholesteryl esters are believed to be important in the human immune system, but their separation, a necessary step in characterization, is still under investigation. Jansen and coworkers60 designed an ANN-GA to assess the principle factors that influence the success of separation methods: temperature, flow rate, and organic layer composition. Chromatographic separation of samples of human milk was assessed using mass spectrometry and the data fed into an ANN-GA that used Levenberg−Marquardt backpropagation with a sigmoid transfer function. The optimum architecture was, as one might anticipate, found to be simple, with one hidden layer of three nodes. Satisfactory agreement with experimental data was obtained. Lisa et al.61 have studied the thermostability of organic compounds, using an ANN-GA. Structural descriptors from molecular modeling were filtered using a GA to determine which influenced stability. Selection of the minimum subset of input parameters, shown in Table 2, was performed with a GA. Of the 20 parameters, the first five to eight, with weights significantly higher than the rest, were assumed to form a suitable input set. This culling of input parameters, in which the most significant features are selected, is similar in its effect to methods that rely on target transformation factor analysis;62 it can offer a useful reduction in computational effort without a significant loss in predictive ability of the network. Ahmad and co-workers63 have used a GA to design the architecture for an ANN model used in cancer diagnosis, achieving a reported 97% accuracy. Silva and Biscaia64 used a GA to obtain a Pareto optimal set of temperature and initiator feed in a study of monomer conversion in batch styrene polymerization. Optimization of process parameters to maximize yield of gasoline and minimize coke formation using NSGA-II with jumping genes for a FCCU has been reported by Kasat and Gupta.47 Guria et al.65 used similar methods to maximize recovery in a two-cell flotation circuit, while studies of the design and operating parameters for an industrial

temporal variation of both temperature and residual ice content can be determined off-line, given the temperature and pressure in the drying chamber. An alert can then be issued for overtemperature conditions, or when drying is complete. Alternatively, the product temperature can be used as input to the network, with the amount of residual ice being provided as the output. In related work, Dragoi et al.25 used ANNs to monitor pharmaceutical freeze-drying, using data from a phenomenological model for training. A self-adaptive DE was combined with BP to determine the structure and parameters of the neural network model and delay times (Figure 7A,B).

Figure 7. General structure (A) and the best model (B) of the neural network model used to estimate the maximum product temperature and the drying time.

Using experimental data and time-delayed inputs, the ANN was able to estimate future product temperature and residual ice for a given heating shelf temperature and drying chamber pressure. The duration of the primary drying phase and the maximum product temperature could be predicted for constant operating conditions. Figure 7B shows the best ANN from a series of simulations. The three fixed inputs are time, temperature of the heating shelf, Tshelf, and pressure in the drying chamber, Pchamber; there are six time recurrent inputs. The number of recurrent inputs must be modest to prevent the network becoming too complex. Each fixed input has a major influence on the process and is easy to measure; less simple to measure are product temperature (T) and the thickness of the dried cake (Ldried). The network was trained using data derived from the heat transfer coefficient between the heating shelf and the product and of mass transfer resistance of the dried cake to vapor flow. 12680

dx.doi.org/10.1021/ie4000954 | Ind. Eng. Chem. Res. 2013, 52, 12673−12688

Industrial & Engineering Chemistry Research

Review

performance comparable to that of manual process control. Huang et al.72 have used a fuzzy inference system to assess and control the treatment of wastewater from a paper-making plant, taking into account the predicted chemical oxygen demand of the waste. Clustering techniques and principal component analysis were used to optimize the fuzzy rules. Predictive fuzzy methods have also been used by Precup et al.73 for the speed control of electrical drives under conditions of nonconstant inertia, using a preferred phase margin in the fuzzy system. Curteanu and Leon74 have applied an optimization method based on GA and ANNs to a polymerization process. The GA determined the network topology to be used in the optimization procedure. To obtain a polymeric material with prespecified molecular weight and other key properties, at least two variables must be specified: initiator addition policy and reactor temperature. The control variable vector, u, contained the initial initiator concentration, I0 and temperature, T. An admissible control input u* was used to minimize the performance index J:

Table 2. Weights of Molecular Descriptors for Thermostability of Organic Compounds structural parameter 1. 2.

P NA

3.

NF

4.

NC

5. 6.

M CO

7.

NC

8.

NN

9.

C−O

10.

C−N

11. 12.

C−C C−H

13. 14. 15. 16.

Ltot S V C

17. 18.

Fe H

19.

N

20.

O

description

weight

polarizability number of aromatic cycles in the molecule number of ferrocene units in the molecule number of cholesteryl units in the molecule molecular weight number of CO bonds in the molecule number of NC bonds in the molecule number of NN bonds in the molecule number of C−O bonds in the molecule number of C−N bonds in the molecule number of C−C bonds in the molecule number of C−H bonds in the molecule total length of the molecule surface of the molecule total volume number of carbon atoms in the molecule number of iron atoms in the molecule number of hydrogen atoms in the molecule number of nitrogen atoms in the molecule number of oxygen atoms in the molecule

1.2174 1.1010 0.9425 0.8142 0.8050 0.7253 0.6432 0.6088 0.0995 0.0717

MinJ[u] = wQ ·Q f (u) + wx ·(1 − xf (u)) + wDPn

0.0520 0.0089

2 ⎛ DPnf (u) ⎞ · ⎜1 − ⎟ DPnd ⎠ ⎝

0.0077 0.0024 0.0019 0.00077

(2)

In eq 2, J is the objective function to be minimized, w are weighting factors, Q is the polydispersity index (Q = DPw/DPn, where DPw is weight average polymerization degree), x is the monomer conversion, and DPn is the number average chain length with DPnd the desired value at the end of reaction and DPnf the actual value corresponding to the final reaction time. The neural network model within the optimization procedure was formulated as ANN [Inputs: T, I0. Outputs: x, DPn, DPw]. A potential difficulty of any optimization in which several objectives exist is that it may be difficult to choose weights (w in eq 2) when little is known about the problem. The authors’ approach avoids this difficulty by computing the optimal values for these weights within the GA, along with optimal values for decision variables. A result for DPnd = 1500 was: T = 81.5 °C, I0 = 11.6 mol/m3, wx = 26.1874, wQ = 10.2436, wDPn = 271.1614, and the outputs of the model MLP(2:24:3) were found to be x = 0.9698, DPn = 1466.67, DPw = 7837.72, Q = 5.3439. ANN-GA has also been applied to synthesis of statistical dimethyl-methylvinylsiloxane copolymers.75 A feedforward neural network modeled the dependence of monomer conversion and copolymer composition on working conditions (temperature, reaction time, amount of catalyst, and initial composition). The training and validation sets were gathered by ring-opening copolymerization of the octamethylcyclotetrasiloxane (D4) with 1,3,5,7-tetravinyl-1,3,5,7-tetramethylcyclotetrasiloxane (D4V), with a cation exchange (styrene− divinylbenzene copolymer containing sulfonic groups) catalyst, in the absence of solvent. This model was included within an optimization procedure based on a scalar objective function and solved with a GA to determine the optimal control variables and objective weights. Figure 8 provides a flowchart for the ANN-GA procedure. The GA fitness function was calculated as

0.00056 0.00033 0.00006 0.00003

ethylene reactor using NSGA-II have been carried out by Tarafder et al.66 Mohanty2 used a real parameter NSGA to investigate gas-phase reactions involving oxygen and methane. A modified DE algorithm with only one population was found by Babu and Angira67 to reduce memory and computational requirements for benchmark test functions and five nonlinear chemical engineering problems. DE was used by Kapadi and Gudi68 to model fed-batch fermentation with a reduction of dimensionality by control vector parametrization. Simple representative problems as well as a complex, multiple feed problem of simultaneous saccharification and fermentation demonstrated the validity of this approach, yielding a reported productivity increase of 20%. Classical DE and its variants, such as opposition-based DE, adaptive DE, and adaptive oppositionbased on DE were used by Yüzgeç69 to determine the feeding flow profile of an industrial scale fed-batch baker’s yeast fermentation process, maximizing biomass and minimizing ethanol production. The control of high-purity distillation processes is an industrially important nonlinear problem. A strategy based on a radial basis function neural network with integrated control and online optimization with predictive control of the split ratio is described by Lü et al.70 They demonstrated integrated control and online optimization of gas separation plants leading to rapid attainment of steady state conditions. Chemical distillation processes have also been addressed by Klett71 in a study of the ways in which fuzzy logic models could be used to control batch distillations; the model was found to yield a

2 ⎛ F ⎞ J = wx ·(1 − xf ) + wF1·⎜1 − 1f ⎟ F1d ⎠ ⎝

(3)

In eq 3, J is the objective function, w are weighting factors (wx for conversion and wF1 for copolymer composition), xf is the 12681

dx.doi.org/10.1021/ie4000954 | Ind. Eng. Chem. Res. 2013, 52, 12673−12688

Industrial & Engineering Chemistry Research

Review

Figure 8. Optimization based on ANN and GA in a copolymerization process.

conversion at the end of the reaction, F1d is the desired value of the copolymer composition, F1f is the actual value of the copolymer composition at the final reaction time. The GA provided optimal values for the decision variables and the objective function weights. From these inputs, the neural network computed x and F1 for comparison with F1d. Optimization was judged to be complete when the two values were very similar. Further chemical engineering applications, using ANNs and GAs with a scalar objective function include estimation of the optimal conditions for dyestuff wastewater treatment through heterogeneous photocatalytic oxidation,76 determination of optimum conditions for the absorption of copper ions from aqueous solutions by silica functionalized with dihydroxyazomethine groups,77 determinination of molecular structures of thermally stable liquid crystalline ferrocene derivatives and phenyl compounds,61 optimization of the color removal efficiency for TiO2-assisted photodegradation of the cationic dye Alcian Blue 8 GX.78 Furtuna et al.79 modeled a complex polymerization process using two neural networks and a vectorial GA. Polydimethylsiloxane nanoparticles were obtained by nanoprecipitation, using a siloxane surfactant as stabilizer. By minimizing particle diameter and polydispersity, the optimum values for surfactant and polymer concentrations and storage temperature were found. To improve performance of the NSGA, a genetic operator was introduced: the transposition operator or “real jumping genes” (NSGA-II-RJG). The algorithm included two fitness functions: an ANN which modeled the dependence of particle diameter on surfactant and polymer concentrations, and storage temperature and a second network for modeling the dependence of polydispersity on surfactant and polymer concentrations. Nondominated solutions of great diversity were found rapidly. Figure 9 illustrates the optimal decision variables 79 (surfactant concentration, polymer concentration, and storage temperature), corresponding to each point from the optimal Pareto front. The storage temperature was found to affect only the average diameter, while the surfactant concentration and the polymer concentration affected both optimization objectives. Regime identification is important for slurry pipeline design but phenomenological models in the area are unreliable over a range of conditions. On the basis of a data set of around 800 measurements collected from the open literature, Lahiri and Gantha80 used DE to tune ANN parameters. DE was found to

Figure 9. The optimal decision variables corresponding to each point from the Pareto front obtained using NSGA-II-RJG in a polymerization process.

converge more rapidly than competing techniques without compromising accuracy; the method had an average misclassification error of 0.03%. A comparison with selected previous work showed that this method improved regime prediction over a wide range of operating conditions, physical properties, and pipe diameters. Wu et al.81 developed a nonlinear predictive controller based on an improved radial basis function neural network identification model, using it to relate fuel use to load change. A GA was applied to optimize the parameters of the RBF neural network, and the resultant controller compared well with constant fuel use models. Bayesian Networks, which are nonlinear stochastic regressors, have been successfully applied to a variety of scientific areas. This type of network is used to simulate domains containing uncertainty caused by incomplete domain knowledge, randomness in the mechanisms governing the behavior of the domain, or both. They can help in decision making and are 12682

dx.doi.org/10.1021/ie4000954 | Ind. Eng. Chem. Res. 2013, 52, 12673−12688

Industrial & Engineering Chemistry Research

Review

molecular descriptors provided insights into significant main structural and atomic properties.92 Artificial immune systems (AIS) are a group of computational methods inspired by the vertebrate immune system,93,94 and are included in a general class of nature-inspired metaheuristics. They have many properties desirable for an artificial system such as uniqueness, self-organization, learning and memory adaptation, anomaly detection, recognition, robustness, and scalability.95,96 The clonal selection (CS), the mostly used of these algorithms, mimics the mechanisms of clonal selection, clonal expansion, and affinity maturation from clonal selection theory. The CSs were developed as effective mechanisms for search and optimization.97 The cells undergo a process of cloning, mutation, and selection that is very similar to the concept of evolution employed in evolutionary algorithms. Only a few applications related to the use of AIS in chemical engineering have been published. For instance, a feed-forward neural network has been designed by employing a CS algorithm, the optimal model being applied to the simulation of heavy metals removal from residual waters.98 The CS-NN methodology proved to be efficient and accurate, creating a neural model with good generalization capability. 11. Conclusion. The use of artificial neural networks is now well-established across science, particularly in chemical engineering. ANNs possess advantages in terms of speed, ease of use, and suitability for situations in which no adequate analytical model is available. The use of ANNs requires care though, and the literature contains examples where networks have been used with inappropriate architecture or choice of parameters. A cautious approach to the use of these tools is therefore necessary. Research groups are increasingly taking advantage of tools that can be combined with ANNs to determine optimum, or near-optimum network architecture and parameters. The use of such tools promises to enhance the future value and reliability of neural network algorithms in science. Evolutionary algorithms and neural networks combined at different levels have proved to be an efficient modeling and optimization tool, suitable for tackling a variety of chemical engineering processes; their use will continue to grow.

especially useful when many interlinked factors need to be considered. Such a network was used by Malekmohammadi et al.82 to develop operating rules for a cascade system of reservoirs whose role is mainly to meet flood control and supply irrigation needs. The inputs were monthly inflows, reservoir storage, and water demand. The long-term optimization model was developed using an extended version of the Varying Chromosome Length Genetic Algorithm. To incorporate reservoir preparedness for flood control, flood damage was included in a short-term optimization model which then provided data for the Bayesian Network. This neuroevolutionary technique has been extended to include a Bayesian network for predicting operating rules. Prior-knowledge-based feed-forward networks have shown superior performance in modeling chemical processes. Generally, training a network to approximate functions can be cast as minimization of an error function, and accomplished by iteratively adjusting connection weights. While networks trained in this way have no knowledge of the chemical process that they are modeling, a description of the actual chemical process is likely to include information such as monotonicity and concavity, which, though not within the sample data, is potentially valuable. This “prior knowledge” represents chemical and physical knowledge, so its reliability and accuracy can be established. Network training data are finite, may be sparse and noisy, and may not encapsulate prior knowledge in a form that the network can easily model,45 so networks trained using prior knowledge may have more reliable performance and offer an effective way to make use of additional process data.83,84 Lahiri et al.1 used ANN with DE to model and optimize a catalytic industrial ethylene oxide reactor. A model for correlating operating and performance variables was constructed using ANN, and process yield and selectivity were then optimized using DE. Narendra and Parthasarathy85 have shown that multilayer neural networks can be used effectively for the identification of nonlinear dynamic paths. Other groups that have tackled this problem include Subudhi,44 Subudhi and Jena,86 Lahiri and Ghanta,80 Dragoi et al.25,29 MOGA-ANN combines a multiobjective GA with adaptive neural networks.87 Within a GA search, a full fitness model is progressively replaced with a periodically (re)trained adaptive neural network metamodel where (re)training uses data supplied by the full model. The methodology was tested on a benchmark problem and then applied to determine optimal sampling locations for pressure loggers in a water distribution system. Various versions of DE have been used to tackle chemical processing problems. Angira and Babu88 studied the operating conditions of an alkylation unit and dynamic optimization of a batch reactor for an industrial scale yeast fermentation process. Further examples69 include determination of control policies in semi/fed-batch reactors,68 multiobjective optimization of an adiabatic styrene reactor,89 and studies of the oxidation of terephthalic acid.90 DE has also been combined with neural networks and other techniques, including its use in modeling the boiling point curve of crude oil, the relationship between pressure and entropy,45 and the free radical polymerization of styrene.91 Fernandez and co-workers have applied a GA to a quantitative structure−activity relationship (QSAR) modeling in Bayesian-regularized ANNs. By selecting BRANN inputs inside a GA framework, performance was achieved that was superior to other models, and feature selection of significant



AUTHOR INFORMATION

Corresponding Author

*E-mail: [email protected]. Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS This work was supported by the “Partnership in priority areas− PN-II” program, financed by ANCS, CNDI-UEFISCDI, project PN-II-PT-PCCA-2011-3.2-0732, No. 23/2012.



REFERENCES

(1) Lahiri, S. K.; Khalfe, N.; Garawi, M. A. Process modelling and optimization strategies integrating neural networks and differential evolution. Hydrocarbon Proc. 2008, No. SpecialReport, 35. (2) Mohanty, S. Multiobjective optimization of synthesis gas production using non-dominated sorting genetic algorithm. Comput. Chem. Eng. 2006, 30, 1019. (3) Sunar, M.; Kahraman, R. A comparative study of multiobjective optimization methods in structural design. Turk. J. Eng. Environ. Sci. 2001, 25, 69.

12683

dx.doi.org/10.1021/ie4000954 | Ind. Eng. Chem. Res. 2013, 52, 12673−12688

Industrial & Engineering Chemistry Research

Review

(4) Chen, S.; Billings, S. A. Neural networks for nonlinear dynamic system modeling and identification. Int. J. Control. 1992, 56, 319. (5) D’heygere, T.; Goethals, P. L. M.; De Pauw, N. Genetic algorithms for optimization of predictive ecosystems models based on decision trees and neural networks. Ecol. Model 2006, 195, 20. (6) Castillo, P. A.; Merelo, J. J.; Prieto, A.; Rivas, V.; Romero, G. GProp: Global optimization of multilayer perceptrons using GAs. Neurocomputing 2000, 35, 149. (7) Castillo, P. A.; Carpio, J.; Merelo, J. J.; Prieto, A.; Rivas, V. Evolving multilayer perceptrons. Neural Process. Lett. 2000, 12, 115. (8) Almeida, L. M.; Ludermir, T. B. A multi-objective memetic and hybrid methodology for optimizing the parameters and performance of artificial neural networks. Neurocomputing 2010, 73, 1438. (9) Benardos, P. G.; Vosniakos, G. C. Optimizing feedforward artificial neural network architecture. Eng. Appl. Artif. Intell. 2007, 20, 365. (10) Curteanu, S.; Cartwright, H. Neural networks applied in chemistry. I. Determination of the optimal topology of neural networks. J. Chemom. 2011, 25, 527. (11) Almeida, L. M.; Ludermir, T. B. Automatically searching nearoptimal artificial neural networks. Proc. Eur. Symp. Artif. Neural Networks (ESANN’07) 2007, 549. (12) Almeida, L. M.; Ludermir, T. B. An evolutionary approach for tuning artificial neural network parameters. Proc. 3rd Intl. Workshop Hybrid Artif. Intell. Syst. (HAIS’08) 2008, 156. (13) Almeida, L. M.; Ludermir, T. B. An improved method for automatically searching near-optimal artificial neural networks. IEEE Intl. Joint Conf. Neural Networks (IJCNN’08) (IEEE World Congress on Computational Intelligence) 2008, 2235. (14) Xin, Y. Evolving artificial neural networks. Proc. IEEE 1999, 87, 1423. (15) Jebari, K.; Bouroumi, A.; Ettouhami, A. Parameters control in GAs for dynamic optimization. Int. J. Comput. Int. Syst. 2013, 1, 47. (16) Holland, J. H. Genetic algorithms and the optimal allocation of trials. SIAM J. Comput. 1973, 2, 88. (17) Holland, J. H. Adaptation in Natural and Artificial Systems; University of Michigan Press: Ann Arbor, MI, 1975. (18) Cartwright, H. M. The Genetic Algorithm in Science. Pestic. Sci. 1995, 45, 171. (19) Yan, X. F.; Chen, D. Z.; Hu, S. X. Chaos-genetic algorithms for optimizing the operating conditions based on RBF-PLS model. Comput. Chem. Eng. 2003, 27, 1393. (20) Cartwright, H. M.; Long, R. A. Simultaneous optimization of chemical flowshop sequencing and topology using Genetic Algorithms. Ind. Eng. Chem. Res. 1993, 32, 2706. (21) Cartwright, H. M.; Harris, S. P. Analysis of the distribution of airborne pollution using genetic algorithms. Atmos. Environ. A-Gen. 1993, 27, 1783. (22) Hancock, P. Genetic Algorithms and permutation problems: A comparison of recombination operators for neural net structure specification. Proceedings of Genetic Algorithms and Neural Networks, COGANN-92, Baltimore, Maryland, June 6, 1992. (23) Goldberg, D. E. Genetic Algorithms in Search, Optimization and Machine Learning; Addison-Wesley Longman Publishing Co., Inc.: Boston, MA, 1989. (24) Liu, L. B.; Wang, Y. J.; Huang, D. Designing neural networks using PSO-based memetic algorithm. Proc. 4th Intl. Symp. Neural Networks (ISNN’07) 2007, 219. (25) Drăgoi, E. N.; Curteanu, S.; Fissore, D. On the use of artificial neural networks to monitor a pharmaceutical freeze-drying process. Dry. Technol. 2013, 31, 72. (26) Kordík, P.; Koutník, J.; Drchal, J.; Kovárí̌ k, O.; Č epek, M.; Šnorek, M. Meta-learning approach to neural network optimization. Neural Networks 2010, 23, 568. (27) Furtună, R.; Curteanu, S.; Leon, F. Multi-objective optimization of a stacked neural network using NSGA-II-QNSNN algorithm. Appl. Soft Comput. 2012, 12 (1), 133. (28) Drăgoi, E. N.; Curteanu, S.; Leon, F.; Galaction, A. I.; Cascaval, D. Modeling of oxygen mass transfer in the presence of oxygen-vectors

using neural networks developed by differential evolution algorithm. Eng. Appl. Artif. Intel. 2011, 24, 1214. (29) Drăgoi, E. N.; Curteanu, S.; Fissore, D. Freeze-drying modeling and monitoring using a new neuro-evolutive technique. Chem. Eng. Sci. 2012, 72, 195. (30) Curteanu, S.; Nistor, A.; Curteanu, A.; Airinei, A.; Cazacu, M. Applying soft computing methods to fluorescence modelling of the polydimethylsiloxane/silica composites containing lanthanum. J. Appl. Polym. Sci. 2010, 117, 3160. (31) Plagianakos, P.; Magoulas, G. D.; Nousis, N. K.; Vrahatis, M. N. Training multilayer networks with discrete activation functions. Proceedings of the INNS-IEEE International Joint Conference on Neural Networks, Washington DC, July 15−19, 2001. (32) Drăgoi, E. N.; Curteanu, S.; Lisa, C. A neuro-evolutive technique applied for predicting the liquid crystalline property of some organic compounds. Eng. Optimiz. 2012, 44, 1261. (33) He, S.; Li, X. Application of a group search optimization based Artificial Neural Network to machine condition monitoring. IEEE International Conference on Emerging Technologies and Factory Automation, ETFA, Hamburg, Germany, Sept 15−18, 2008. (34) He, S.; Wu, Q. H.; Saunders, J. R. Group search optimizer: an optimization algorithm inspired by animal searching behavior. IEEE Trans. Evolut. Comput. 2009, 13 (5), 973. (35) Silva, D. N. G.; Pacifico, L. D. S.; Ludermir, T. B. Improved group search optimizer based on cooperation among groups for feedforward networks training with weight decay. Conf. Proc. IEEE Intl. Syst. Man Cyber. 2011, 2133. (36) Gao, L.; Loney, N. W. New hybrid neural network model for prediction of phase equilibrium in a two-phase extraction system. Ind. Eng. Chem. Res. 2002, 41, 112. (37) D’heygere, T.; Goethals, P. L. M.; De Pauw, N. Use of genetic algorithms to select input variables in decision tree models for the prediction of benthic macroinvertebrates. Ecol. Model 2003, 160, 291. (38) Mouton, A. M.; Dedecker, A. P.; Lek, S.; Goethals, P. L. M. selecting variables for habitat suitability of asellus (crustacea, isopoda) by applying input variable contribution methods to artificial neural network models. environ. model. assess. 2010, 15, 65. (39) Shervais, S.; Zwick, M. Using reconstructability analysis to select input variables for artificial neural networks. Proc. IEEE Intl. Joint Conf. Neural Networks 2003, 1−4, 3022. (40) Kourentzes, N.; Crone, S. F. Frequency independent automatic input variable selection for neural networks for forecasting. 2010 International Joint Conference on Neural Networks (IJCNN 2010), July 18−23, 2010, Barcelona, Spain. (41) Storn, R. M.; Price, K. V. Differential evolutionA simple and efficient adaptive scheme for global optimization over continuous spaces. Technical Report TR-95-012; International Computer Science Institute: Berkley, CA, 1995. (42) Price, K. V.; Storn, R. M.; Lampien, J. A. Differential Evolution. A Practical Approach to Global Optimization; Springer: Berlin, 2005. (43) Subudhi, B.; Jena, D. Differential evolution and Levenberg Marquardt trained neural network scheme for nonlinear system identification. Neural Process. Lett. 2008, 27, 285. (44) Subudhi, B. A combined differential evolution and neural network approach to nonlinear system identification. Proceedings of TENCON 2008 IEEE Region 10 Conference, University of Hyderabad, India, Nov. 19−21, 2008, (45) Chen, C. W.; Chen, D. Z.; Cao, G. Z. An improved differential evolution algorithm in training and encoding prior knowledge into feedforward networks with application in chemistry. Chemom. Intell. Lab. 2002, 64, 27. (46) Venkatraman, S.; Yen, G. G. A generic framework for constrained optimization using genetic algorithms. IEEE T. Evolut. Comput. 2005, 9, 424. (47) Kasat, R. B.; Gupta, S. K. Multi-objective optimization of an industrial fluidized-bed catalytic cracking unit (FCCU) using genetic algorithm (GA) with jumping genes operator. Comput. Chem. Eng. 2003, 27, 1785. 12684

dx.doi.org/10.1021/ie4000954 | Ind. Eng. Chem. Res. 2013, 52, 12673−12688

Industrial & Engineering Chemistry Research

Review

(48) Tang, W. J.; Wu, Q. H. Biologically inspired optimization: A review. Trans. Intl. Meas. Control. 2009, 31 (6), 495. (49) Ahmadi, M. A.; Shadizadeh, S. R. New approach for prediction of asphaltene precipitation due to natural depletion by using evolutionary algorithm concept. Fuel 2012, 102, 716. (50) Zhao, W. BP neural network based on PSO algorithm for temperature characteristics of gas nanosensor. J. Comput. 2012, 7, 2318. (51) Saraswathi, S.; Fernández-Martínez, J. L.; Kolinski, A.; Jernigan, R. L.; Kloczkowski, A. Fast learning optimized prediction methodology (FLOPRED) for protein secondary structure prediction. J. Mol. Model 2012, 18, 4275. (52) Zhao, J. H.; Liu, Z.; Dao, M. T. Reliability optimization using multiobjective ant colony system approaches. Reliab. Eng. Syst. Safe 2007, 92 (1), 109. (53) Goodarzi, M.; Freitas, M. P.; Vander Heyden, Y. Linear and nonlinear quantitative structure-activity relationship modeling of the HIV-1 reverse transcriptase inhibiting activities of thiocarbamates. Anal. Chim. Acta 2011, 705, 166. (54) Li, J.; Liu, X. Melt index prediction by RBF neural network optimized with an adaptive new ant colony optimization algorithm. J. Appl. Polym. Sci. 2011, 119, 3093. (55) Cartwright, H. M.; Leontjev, A. Use of a Genetic Algorithm Neural Network hybrid in the search for high-efficiency solid-state phosphors. WSEAS T. Comput. 2011, 10, 396. (56) Lahiri, S. K.; Ghanta, K. C. Artificial neural network model with the parameter tuning assisted by a differential evolution technique: The study of the hold up of the slurry flow in a pipeline. Chem. Ind. Chem. Eng. Q. 2009, 15 (2), 103. (57) Drăgoi, E. N.; Curteanu, S.; Galaction, A. I.; Caşcaval, D. Optimization methodology based on neural networks and self-adaptive differential evolution algorithm applied to an aerobic fermetation process. Appl. Soft Comput. 2013, 13, 222. (58) He, S.; Wu, Q. H. A novel group search optimizer inspired by animal behavioural ecology; IEEE Congress on Evolutionary Computation: Vancouver, 2006. (59) Yan, X.; Yang, W.; Shi, H. A group search optimization based on improved small world and its application on neural network training in ammonia synthesis. Neurocomputing 2012, 97, 94. (60) Jansen, M. A.; Kiwata, J.; Arceo, J.; Faull, K. F.; Hanrahan, G.; Porter, E. Evolving neural network optimization of cholesteryl ester separation by reversed-phase HPLC. Anal. Bioanal. Chem. 2010, 397, 2367. (61) Lisa, G.; Apreutesei Wilson, D.; Curteanu, S.; Lisa, C.; Piuleac, C. G.; Bulacovschi, V. Ferrocene derivatives thermostability prediction using neural networks and genetic algorithms. Thermochim. Acta 2011, 521 (1−2), 26. (62) Cartwright, H. M. Factor-analysis of the tungsten(VI)−Rutin system. Michrochem. J. 1986, 34, 313. (63) Ahmad, F.; Mat-Isa, N. A.; Hussain, Z.; Boudville, R.; Osman, M. K. Genetic Algorithm-Artificial Neural Network (GA-ANN) Hybrid Intelligence for Cancer Diagnosis. Proceedings of the 2010 2nd International Conference on Computational Intelligence, Communication Systems and Networks (CICSyN 2010). 2010, 78− 83. (64) Silva, C. M.; Biscaia, E. C. Genetic algorithm development for multi-objective optimization of batch free-radical polymerization reactors. Comput. Chem. Eng. 2003, 27, 1329. (65) Guria, C.; Verma, M.; Mehrotra, S. P.; Gupta, S. K. Multiobjective optimal synthesis and design of froth flotation circuits for mineral processing, using the jumping gene adaptation of genetic algorithm. Ind. Eng. Chem. Res. 2005, 44, 2621. (66) Tarafder, A.; Lee, B. C. S.; Ray, A. K.; Rangaiah, G. P. Multiobjective optimization of an industrial ethylene reactor using a nondominated sorting genetic algorithm. Ind. Eng. Chem. Res. 2005, 44, 124. (67) Babu, B. V.; Angira, R. Modified differential evolution (MDE) for optimization, of non-linear chemical processes. Comput. Chem. Eng. 2006, 30, 989.

(68) Kapadi, M. D.; Gudi, R. D. Optimal control of fed-batch fermentation involving multiple feeds using differential evolution. Process. Biochem. 2004, 39, 1709. (69) Yüzgeç, U. Performance comparison of differential evolution techniques on optimization of feeding profile for an industrial scale baker’s yeast fermentation process. Intl. Soc. Automat. Trans. 2010, 49, 167. (70) Lü, W.; Zhu, Y.; Huang, D.; Jiang, Y. A New Strategy of Integrated Control and On-line Optimization on High-purity Distillation Process. Chinese J. Chem. Eng. 2010, 18 (1), 66. (71) Klett, G. Application of fuzzy control in chemical distillation processes. Proc. 2nd IEEE Intl. Conf. Fuzzy Syst. 1993, 1, 375. (72) Huang, M.; Ma, Y.; Wan, J.; Zhang, H.; Wang, Y. Modeling a paper-making wastewater treatment process by means of an adaptive network-based fuzzy inference system and principal component analysis. Ind. Eng. Chem. Res. 2012, 51, 6166. (73) Precup, R.-E.; Preitl, S.; Faur, G. PI predictive fuzzy controllers for electrical drive speed control: Methods and software for stable development. Comput. Ind. 2003, 52, 253. (74) Curteanu, S.; Leon, F. Optimization strategy based on genetic algorithms and neural networks applied to a polymerization process. Int. J. Quantum Chem. 2008, 108, 617. (75) Curteanu, S.; Cazacu, M. Neural networks and genetic algorithms used for modeling and optimization of the siloxanesiloxane copolymers synthesis. J. Macromol. Sci. A. 2007, A45 (1), 23. (76) Suditu, G. D.; Secula, M.; Piuleac, C. G.; Curteanu, S.; Poulios, I. Genetic algorithms and neural networks based optimization applied to the wastewater decolorization by photocatalytic reaction. Rev. Chim.-Bucharest 2008, 7, 816. (77) Piuleac, C. G.; Curteanu, S.; Cazacu, M. Optimization by NNGA technique of the metal complexing processPotential application in wastewater treatment. Environ. Eng. Manage. J. 2010, 9 (2), 239. (78) Caliman, F. A.; Curteanu, C.; Betianu, C.; Gavrilescu, M.; Poulios, I. Neural networks and genetic algorithms optimization of the photocatalytic degradation of alcian blue 8gx. J. Adv. Oxid. Technol. 2008, 11 (2), 316. (79) Furtună, R.; Curteanu, S.; Racleş, C. NSGA-II-JG applied to multiobjective optimization of polymeric nanoparticles synthesis with silicone surfactants. Cent. Eur. J. Chem. 2011, 9 (6), 1080. (80) Lahiri, S. K.; Ghanta, K. C. Regime identification of slurry transport in pipelinesA novel modeling approach using ANN and differential evolution. Chem. Ind. Chem. Eng. Q. 2010, 16 (4), 329. (81) Wu, X. J.; Zhu, X. J.; Cao, G. Y.; Tu, H. Y. Predictive control of SOFC based on GA-RBF neural network model. J. Power Sources 2008, 179 (1), 232. (82) Malekmohammadi, B.; Kerachian, R.; Zahraie, B. Developing monthly operating rules for a cascade system of reservoirs: Application of Bayesian Networks. Environ. Modell. Software 2009, 24, 1420. (83) Chen, C. W.; Chen, D. Z.; Ye, S. X. Feedforward networks based on prior knowledge and its application in modeling the true boiling point curve of the crude oil. J. Chem. Eng. Chin. Univ. 2001, 15 (4), 351. (84) Chen, C. W.; Chen, D. Z.; Wu, S. X. Prior-knowledge-based feedforward network simulation of true boiling point curve of crude oil. Comput. Chem. 2001, 25, 541. (85) Narendra, K. S.; Parthaasarathy, K. Identification and control of dynamical systems using neural networks. IEEE Trans. Neural Networks 2005, 16, 8624. (86) Subudhi, B.; Jena, D. A differential evolution based neural network approach to nonlinear system identification. Appl. Soft Comput. 2011, 11, 861. (87) Behzadian, K.; Kapelan, Z.; Savic, D.; Ardeshir, A. Stochastic sampling design using a multi-objective genetic algorithm and adaptive neural networks. Environ. Modell. Software 2009, 24, 530. (88) Angira, R.; Babu, B. V. Performance of modified differential evolution for optimal design of complex and non-linear chemical processes. J. Exp. Theor. Artif. Intell. 2006, 18, 501. 12685

dx.doi.org/10.1021/ie4000954 | Ind. Eng. Chem. Res. 2013, 52, 12673−12688

Industrial & Engineering Chemistry Research

Review

(89) Babu, B. V.; Chakole, P. G.; Syed Mubeen, J. H. Multiobjective differential evolution (MODE) for optimization of adiabatic styrene reactor. Chem. Eng. Sci. 2005, 60, 4822. (90) Gujarathi, A. M.; Babu, B. V. Improved multiobjective differential evolution (MODE) approach for purified terephthalic acid (PTA) oxidation process. Mater. Manuf. Process 2009, 24, 303. (91) Curteanu, S.; Leon, F.; Furtuna, R.; Dragoi, E. N.; Curteanu, N. Comparison between different methods for developing neural network topology applied to a complex polymerization process. The 2010 International Joint Conference on Neural Networks IJCNN, IEEE, Barcelona, Spain, July 18−23, 2010, 1. (92) Fernandez, M.; Caballero, J.; Fernandez, L. Genetic algorithm optimization in drug design QSAR: Bayesian-regularized genetic neural networks (BRGNN) and genetic algorithm-optimized support vectors machines (GA-SVM). Mol. Divers. 2011, 15, 269. (93) Bernardino, H.; Barbosa, H. Artificial immune systems for optimization. In: Nature-Inspired Algorithms for Optimization: Studies in Computational Intelligence; Chiong, R., Ed.; Springer: Berlin, Heidelberg, 2009; pp 389−411. (94) Brownlee, J., Clonal selection algorithms. Technical Report 070209A; Swinburne University of Technology: Melbourne, Australia, 2007. (95) Abdul Hamid, M. B.; Abdul Rahman, T. K. Short term load forecasting using an artificial neural network trained by artificial immune system learning algorithm. 12th Intl. Conf. Comput. Modell. Simulat. (UKSim) 2010, 408−413. (96) Timmis, J.; Hone, A.; Stibor, T.; Clark, E. Theoretical advances in artificial immune systems. Theor. Comput. Sci. 2008, 403, 11. (97) Cutello, V.; Nicosia, G.; Pavone, M. Real coded clonal selection algorithm for unconstrained global optimization using a hybrid inversely proportional hypermutation operator. Proc. 2006 ACM Symp. Appl. Comput. (SAC ’06) 2011, 950−954. (98) Dragoi, E.; Suditu, G. D.; Curteanu, S. Modelling methodology based on Artificial Immune System algorithm and neural networks applied to removal of heavy metals from residual waters. Environ. Eng. Manage. J. 2012, 11 (11), 1907. (99) Blanning, R. W. The construction and implementation of metamodels. Simulation 1975, 24 (6), 177. (100) Fahlman, S. E. Faster-learning variations on back-propagation: An empirical study. Proceedings of the 1988 Connectionist Models Summer School; Morgan Kaufmann: Los Altos, CA, 1988. (101) Riedmiller, M.; Braun, H. A direct adaptive method for faster backpropagation learning: the RPROP algorithm. In Ruspini, H., Ed.; Proceedings of the ICNN93, San Francisco, March 28−April 1, 1993, 586−591. (102) Riedmiller, M. RPROP: Description and implementation details. University of Karlsruhe: Karlsruhe, Germany, 1994. (103) Fahlman, S.; Lebière, C. The cascade-correlation learning architecture. In Neural Information Systems 2; Touretzky, D. S., Ed.; Morgan-Kaufmann: Los Altos, CA, 1990; pp 524−532. (104) Fiszelew, A.; Britos, P.; Ochoa, A.; Merlino, H.; Fernandez, E.; Garcia-Martinez, R. Finding optimal neural network architecture using genetic algorithms. Adv. Comput. Sci. Eng. Res. Comput. Sci. 2007, 27, 15. (105) Coello Coello, C. A.; Lamont, G. B.; Van Veldhizen, D. A. Evolutionary Algorithms for Solving Multi-objective Problems, 2nd ed.; Goldberg, D. E.; Koza, J. R., Eds.; Springer: Berlin, 2007. (106) Eiben, A. E.; Smith, J. E. Introduction to Evolutionary Computing; Springer: Berlin, 2003. (107) Marin, F. J.; Sandoval, F. Diseno de redes neuronales artificiales mediante algoritmos genéticos. Computacion Neuronal; , Universidad de Santiago de Compostela: Galicia, Spain, 1995; p 385. (108) Yao, X. A review of evolutionary artificial neural networks; CSIRO Australia, 1992. (109) De Falco, I.; Iazzetta, A.; Natale, P.; Tarantino, E. Evolutionary neural networks for nonlinear dynamics modelling. Lecture Notes Comput. Sci. 1998, 1498, 593−602.

(110) Keesing, R.; Stork, D. G. Evolution and learning in neural networks: the number and distribution of learning trials affect the rate of evolution. Adv. Neural Inform. Process. Syst. 1991, 3, 805. (111) Vlahogianni, E. I.; Karlaftis, M. G.; Golias, J. C. Optimized and meta-optimized neural networks for short-term traffic flow prediction: A genetic approach. Transport. Res. C-Emer. 2005, 13 (3), 211. (112) Kim, Y. S.; Yum, B. J. Robust design of multilayer feedforward neural networks: An experimental approach. Eng. Appl. Artif. Intell. 2004, 17 (3), 249. (113) Packianather, M. S.; Drake, P. R.; Rowlands, H. Optimizing the parameters of multilayered feedforward neural networks through Taguchi design of experiments. Qual. Reliab. Eng. Int. 2000, 16, 461. (114) Dam, M.; Saraf, D. N. Design of neural networks using genetic algorithm for on-line property estimation of crude fractionator products. Comput. Chem. Eng. 2006, 30, 722. (115) Sukthomya, W.; Tannock, J. The optimization of neural network parameters using Taguchi’s design of experiments approach: an application in manufacturing process modeling. Neural Comput. Appl. 2005, 14, 337. (116) Wang, Q.; Stockton, D. J.; Baguley, P. Process cost modeling using neural networks. Int. J. Prod. Res. 2000, 38 (16), 3811. (117) Sureerattanan, S.; Sureerattanan, N. New Training Method and Optimal Structure of Backpropagation Networks. Advances in Natural Computation; Springer: Berlin/Heidelberg, 2005. (118) Annunziato, M.; Bertini, I.; Lucchetti, M.; Pizzuti, S. Evolving weights and transfer functions in feed forward neural networks. Proc. EUNITE 2003. European Network on Intelligent Technologies for Smart Adaptive Systems: Oulu, Finland, July 10−12, 2003. (119) Laouafi, F.; Boukadoum, A.; Leulmi, S. Reactive power dispatch with hybrid formulation: Particle swarm optimization and improved Genetic Algorithms with real coding. Int. Rev. Elec. Eng. IREE 2010, 5, 601. (120) Floreano, D.; Durr, P.; Mattiussi, C. Neuroevolution: From architectures to learning. Evol. Intell. 2008, 1, 47. (121) Ragg, T.; Gutjahr, S.; Sa, H. M. Automatic determination of optimal network topologies based on information theory and evolution. 23rd EUROMICRO Conference '97 New Frontiers of Information Technology 2007, 549. (122) Jang, W. H.; Hahn, J.; Hall, R. K. Genetic/quadratic search algorithm for plant economic optimizations using a process simulator. Comput. Chem. Eng. 2005, 30, 285. (123) Mizuta, S.; Sato, T.; Lao, D.; Ikeda, M.; Shimizu, T. Structure Design of Neural Networks Using Genetic Algorithms. Complex Syst. 2001, 13, 161. (124) Miller, G. F.; Todd, P. M.; Hegde, S. U. Designing neural networks using genetic algorithms. Proceedings of Third International Conference on Genetic Algorithms and Their Applications; Morgan Kaufmann: San Mateo, CA, 1989; p 379. (125) Whitely, D.; Starkweather, T.; Bogart, C. Genetic algorithm and neural networks: Optimizing connections and connectivity. Parallel Comput. 1990, 14, 347. (126) Gan, M.; Peng, H.; Dong, X. P. A hybrid algorithm to optimize RBF network architecture and parameters for nonlinear time series prediction. Appl. Math. Model 2012, 36, 2911. (127) Schraudolph, N. N.; Belew, R. K. Dynamic parameter encoding for genetic algorithms. Mach. Learn. 1992, 9 (1), 9. (128) Kitano, H. Neurogenetic learning: An integrated method of designing and training neural networks using genetic algorithm. Phys. D. 1994, 75, 225. (129) Mouret, J. B.; Doncieux, S. P. MENNAG: a modular, regular and hierarchical encoding for neural-networks based on attribute grammars. Evol. Intell. 2008, 1, 187. (130) Harp, S. A.; Samad, T.; Guha, A. Towards the genetic synthesis of neural networks. Proceedings of Third International Conference on Genetic Algorithms and Their Applications; Morgan Kaufmann: San Mateo, CA, 1989. (131) Schaffer, J. D.; Caruna, R. A.; Eshelman, L. J. Using genetic search to exploit the emergent behavior of neural networks. Phys. D. 1990, 42, 244. 12686

dx.doi.org/10.1021/ie4000954 | Ind. Eng. Chem. Res. 2013, 52, 12673−12688

Industrial & Engineering Chemistry Research

Review

(154) Brest, J. Constrained Real-Parameter Optimization with e-SelfAdaptive Dif ferential Evolution. In Constraint-Handling in Evolutionary Optimization; Mezura-Montes, E., Ed.; Springer: Berlin, 2009. (155) Storn, R. On the usage of differential evolution for function optimization. In Biennial Conference of the North American Fuzzy Information Processing Society − NAFIPS; Smoth, M. H.; Lee, M. A.; Keller, J.; Yen, J., Eds.; IEEE: Berkeley, CA, 1996. (156) Fischer, M. M.; Reismann, M.; Hlavackova-Schindler, K. Parameter estimation in neural spatial interaction modelling by a derivative free global optimization method. Proceedings of IV International Conference on Geocomputation, Fredericksburg, VA, July 25−28, 1999. (157) Bhuiyan, M. Z. A.; An algorithm for determining neural network architecture using differential evolution. In Business Intelligence and Financial Engineering. International Conference on (BIFE 2009); Wabg, S.; Yu, L.; Wen, F.; He, S.; Fang, Y.; Lai, K. K., Eds.; IEE Computer Society: Washington, DC, 2009; p 3. (158) Schaffer, J. D. Multiple objective optimization with vector evaluated genetic algorithms. Proc. 1st Intl. Conf. Genet. Algorithms 1985, 93. (159) Fonseca, C. M.; Fleming, P. J. Genetic algorithms for multiobjective optimization: Formulation, discussion, and generalization. Proc. 5th Intl. Conf. Genet. Algorithms 1993, 416. (160) Horn, J.; Nafploitis, N.; Goldberg, D. E. A niched Pareto genetic algorithm for multiobjective optimization. Proc. 1st IEEE Conf. Evol. Comput. 1994, 82. (161) Zitzler, E.; Thiele, L. Multi objective optimization using evolutionary algorithmsA comparative case study. Lect. Notes Comput. Sci. 1998, 1498, 292−301. (162) Srinivas, N.; Deb, K. Multi-objective function optimization using non-dominated sorting genetic algorithms. Evol. Comput. 1994, 2 (3), 221. (163) Deb, K.; Pratap, A.; Agarwal, S.; Meyarivan, T. A fast elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evolut. Comput. 2002, 6, 182. (164) Rodriguez-Vasquez, K.; Fonseca, C. M.; Fleming, P. J. Identifying the structure of nonlinear dynamic systems using multiobjective genetic programming. IEEE Trans. Syst. Man Cy. A . 2004, 34, 531. (165) Qian, B.; Wang, L.; Hu, R.; Wang, W. L.; Huang, D. X.; Wang, X. A hybrid differential evolution method for permutation flow-shop scheduling. Int. J. Adv. Manuf. Technol. 2008, 38, 757. (166) Fan, H. Y.; Lampinen, J. A Trigonometric Mutation Operation to Differential Evolution. J. Global Opt. 2003, 27, 105. (167) Fan, H. Y.; Lampinen, J. A directed mutation operation for the differential evolution algorithm. Int. J. Ind. Eng.-Appl. P. 2003, 1, 6. (168) Zhang, Y. N.; Hu, Q. N.; Teng, H. F. Active target particle swarm optimization: Research Articles. J. Concurr. Comput.-Pract. E. 2008, 20 (1), 29. (169) Pant, M.; Thangaraj, R.; Abraham, A. Particle swarm optimization using adaptive mutation. IEEE/DEXA'08 2008, 519. (170) Zeng, J.; Hu, J.; Jie, J. Adaptive particle swarm optimization guided by acceleration information. Proc. IEEE/ ICCIAS. 2006, 1, 351. (171) Pampara, G.; Franken, N.; Engelbrecht, A. P. Combining particle swarm optimization with angle modulation to solve binary problems. IEEE Congress Evol. Comput. 2005, 1, 89. (172) Alviar, J. B.; Peña, J.; Hincapié, R. Subpopulation best rotation: A modification on PSO. Rev. Facultad Ingen. 2007, 40, 118. (173) Yao, X. Cooperatively Coevolving Particle Swarms for Large Scale Optimization; Conf. of EPSRC. IEEE Trans. Evol. Comput. 2012, 16, 210−224. (174) Zhiming, L.; Cheng, W.; Jian, L. Solving constrained optimization via a modified genetic particle swarm optimization. First Intl. Workshop on Knowledge Discovery and Data Mining 2008, 217−220. (175) Sedighizadeh, D.; Masehian, E. Particle Swarm Optimization Methods. Taxonomy and Applications. Int. J. Comput. Theory Eng. 2009, 1 (5), 1793.

(132) Kitano, H. Designing neural networks using genetic algorithms with graph generation system. Complex Syst. 1990, 4, 461. (133) White, D.; Ligomenides, P. GANNet: A genetic algorithm for optimizing topology and weights in neural network design. Lect. Notes Comput. Sci. 1993, 686, 322−327. (134) Plagianakos, V.; Tasoulis, D.; Vrahatis, M. A Review of Major Application Areas of Differential Evolution, In Advances in Differential Evolution; Chakraborty, U. Ed.; Springer: Berlin, 2008. (135) Boozarjomehr, R. B.; Svrcek, W. Y. Automatic design of neural network structures. Comput. Chem. Eng. 2001, 25, 1075. (136) Merelo, J. J.; Paton, M.; Canas, A.; Prieto, A.; Moran, F. Optimization of a competitive learning neural network by genetic algorithms. Lect. Notes Comput. Sci. 1993, 686, 185−192. (137) Petridis, V.; Kazarlis, S.; Papaikonomu, A.; Filelis, A. A hybrid genetic algorithm for training neural networks. Artif. Neural Networks 1992, 2, 953. (138) Kinnebrock, W. Accelerating the standard backpropagation method using a genetic approach. Neurocomputing 1994, 6, 583. (139) Yao, X.; Liu, Y. Towards designing artificial neural networks by evolution. Appl. Math. Comput. 1998, 91 (1), 83. (140) Cai, X.; Zhang, N.; Venayagamoorthy, G. K.; Wunschil, D. C. Time series prediction with recurrent neural networks trained by a hybrid PSO-EA algorithm. Neurocomputing 2007, 70 (13-15), 2342. (141) Dondeti, S.; Kannan, K.; Manavalan, R. Genetic algorithm optimized neural networks ensemble for estimation of mefenamic acid and paracetamol in tablets. Acta Chim. Slov. 2005, 52, 440. (142) Bebis, G.; Georgiopoulos, M.; Kasparis, T. Coupling weight elimination with genetic algorithms to reduce network size and preserve generalization. Neurocomputing 1997, 17, 167. (143) Lahiri, S.K.; Khalfe, N. Modeling of commercial ethylene oxide reactor: A hybrid approach by artificial neural network & differential evolution. Int. J. Chem. React. Eng. 2010, 8, Article A4. (144) Ying, C.; Hua, M.; Zhen, J.; Shao, Z. B. Image compression using multilayer neural networks based on fast bacterial swarming algorithm. Proceedings of the Seventh International Conference on Machine Learning and Cybernetics: Kunming, China, July 12−15, 2008. (145) Weigend, A.; Rumelhart, D.; Huberman, B. Generalization by weight elimination with application to forecasting. Adv. Neural Inf. Proc. Syst. 1991, 3, 875. (146) Arifovic, J.; Gencay, R. Using genetic algorithms to select architecture of a feedforward artificial neural network. Phys. A. 2001, 289, 574. (147) Liu Y.; Yao, X. Evolving modular neural network which generalise well. In Proceedings of the IEEE International Conference on Evolutionary Computation, Indianapolis, Indiana, April 13−16, 1997. (148) Koumousis, V. K.; Katsaras, C. P. A saw-tooth genetic algorithm combing the effects of variable population size and reinitialization to enhance performance. IEEE Trans. Evolut. Comput. 2006, 10, 19. (149) Cao, Y. J.; Wu, Q. H. Optimisation of control parameters in genetic algorithms: A stochastic approach. Int. J. Syst. Sci. 1999, 30, 551. (150) Katare, S.; Bhan, A.; Caruthers, J.; Delgass, W. N. A hybrid genetic algorithm for efficient parameter estimation of large kinetic models. Comput. Chem. Eng. 2004, 28, 2569. (151) Kerachian, R.; Karamouz, M. Optimal reservoir operation considering the water quality issues: A stochastic conflict resolution approach. Water Resour. Res. 2006, 42 (12), 1. (152) Zahraie, B.; Kerachian, R.; Malekmohammadi, B. Reservoir operation optimization using adaptive varying chromosome length genetic algorithm. Water Int. 2008, IWRA 33(3), 380. (153) Nadi, A.; Tayarani-Bathaie, S. S.; Safabakhsh, R. Evolution of Neural Network Architecture and Weights Using Mutation Based Genetic Algorithm. Proceedings of the 14th International CSI Computer Conference (CSICC'09) Tehran, Iran. Oct. 20−21, 2009, p 536. 12687

dx.doi.org/10.1021/ie4000954 | Ind. Eng. Chem. Res. 2013, 52, 12673−12688

Industrial & Engineering Chemistry Research

Review

(176) Yu, J.; Wang, S.; Xi, L. Letters: evolving artificial neural networks using an improved PSO and DPSO. Neurocomputing 2008, 71 (4-6), 1054. (177) Zecchin, A. C.; Maier, H. R.; Simpson, A. R.; Roberts, A.; Berrisford, M. J.; Leonard, M. Max−Min ant system applied to water distribution system optimization. Modsim 2003international congress on modeling and simulation; Modeling and Simulation Society of Australia and New Zealand Inc: Townsville, Australia, 2003; Vol. 2. (178) Afshar, M. H. A new transition rule for ant colony optimisation algorithms: application to pipe network optimisation problems. Eng. Optimiz. 2005, 37 (5), 525. (179) Afshar, M. H. Improving the efficiency of ant algorithms using adaptive refinement: application to storm water network design. Adv. Water Res. 2006, 29, 1371. (180) Afshar, M. H. A parameter free Continuous Ant Colony Optimization Algorithm for the optimal design of storm sewer networks: Constrained and unconstrained approach. Adv. Eng. Soft. 2010, 41 (2), 188. (181) Karaboga, D.; Ozturk, C. Neural networks training by artificial bee colony algorithm pattern classification. Neural Network World 2009, 19, 279. (182) Guand, Q.; Feng, L.; Lijuan, L. A quick group search optimizer and its application to the optimal design of double layer grid shells. AIP Conf. Proc. 2009, 1233, 718−723. (183) Shen, H.; Zhu, Y.; Niu, B.; Wu, Q. H. An improved group search optimizer for mechanical design optimization problems. Prog. Nat. Sci. 2009, 19, 91. (184) Fang, J.; Cui, Z.; Cai, X.; Zeng, J. A hybrid group search optimizer with metropolis rule. 1020 Intl. Conf. Modell., Ident., Control 2010, 556−561. (185) Kang, Q.; Lan, T.; Yan, Y.; Wang, L.; Wu, Q. Group search optimizer based optimal location and capacity of distributed generations. Neurocomputing 2012, 78 (1), 55. (186) Chen, D.; Wang, J.; Zou, F.; Hou, W.; Zhao, C. An improved group search optimizer with operation of quantum-behaved swarm and its application. Appl. Soft Comput. 2012, 12, 712.

12688

dx.doi.org/10.1021/ie4000954 | Ind. Eng. Chem. Res. 2013, 52, 12673−12688