Ind. Eng. Chem. Res. 2002, 41, 2159-2169
2159
Reaction Modeling and Optimization Using Neural Networks and Genetic Algorithms: Case Study Involving TS-1-Catalyzed Hydroxylation of Benzene Somnath Nandi,† P. Mukherjee,‡ S. S. Tambe,*,† Rajiv Kumar,‡ and B. D. Kulkarni† Chemical Engineering Division and Catalysis Division, National Chemical Laboratory, Pune 411 008, India
This paper proposes a hybrid process modeling and optimization formalism integrating artificial neural networks (ANNs) and genetic algorithms (GAs). The resultant ANN-GA strategy has the advantage that it allows process modeling and optimization exclusively on the basis of process input-output data. In the hybrid strategy, first an ANN-based process model is developed from the input-output process data. Next, the input space of the model representing process input variables is optimized using GAs, with a view to simultaneously maximize multiple process output variables. The GAs are stochastic optimization methods possessing certain unique advantages over the commonly used gradient-based deterministic algorithms. The efficacy of the hybrid formalism has been evaluated for modeling and optimizing the zeolite (TS-1)-catalyzed benzene hydroxylation to phenol reaction whereby several sets of optimized operating conditions have been obtained. A few optimized solutions have also been subjected to the experimental verification, and the results obtained thereby matched the GA-maximized values of the three reaction output variables with a good accuracy. 1. Introduction Predicting a priori the chemical composition and/or structure of an improved or a new catalyst that is ideally suited for a specific reaction is a difficult task, and therefore new catalysts are traditionally explored heuristically. Having obtained a new or an improved catalyst, the catalytic process is designed by carrying out the computer-based process modeling and optimization activity. It allows a cost-effective alternative to conducting a large number of experiments that are otherwise needed to (i) study the effects of various reaction input variables and (ii) improve the overall process performance. Conventionally, process models are developed using the “first principles (phenomenological)” approach wherein the detailed knowledge of the mass, momentum, and energy balances, and other chemical engineering principles, is a prerequisite. More often than not, fulfillment of the stated prerequisites becomes a practically challenging task owing to (i) the lack of sufficient understanding of the kinetic and transport phenomena underlying the catalytic process and (ii) the tedious, costly, and large number of experiments needed to acquire the requisite process data. The nonlinear nature of most chemical reactions poses additional difficulties for the development and solution of the resultant nonlinear phenomenological models. Thus, it becomes necessary to explore alternative approaches to the phenomenological process modeling. One such option is employing the linear/nonlinear regression techniques for building empirical models. Despite the availability of efficient numerical methods, the regression approach suffers from a significant drawback that the form of the * Corresponding author. E-mail:
[email protected]. Phone: +91-20-5893095. Fax: +91-20-5893041. † Chemical Engineering Division. ‡ Catalysis Division.
data-fitting function needs to be prespecified. In the case of reactions exhibiting nonlinear dynamical behavior, guessing an appropriate form of the nonlinear fitting function becomes difficult and, therefore, the functional forms are selected heuristically, which leads to a significant increase in the computational effort. Even after expending such an effort, there is no guarantee that a correct functional form that fits the experimental data with a reasonable accuracy can indeed be found. In the last decade, artificial neural networks (ANNs) have been found to be an attractive tool for the steadystate/dynamic process modeling in situations where the development of the phenomenological or regression models becomes either impractical or cumbersome.1-8 The most widely utilized ANN paradigm is a multilayered perceptron (MLP) that approximates nonlinear relationships existing between an input set of data (causal process variables) and the corresponding output (dependent variables) data set. A three-layered MLP with a single intermediate (hidden) layer housing a sufficiently large number of nodes (also termed neurons or processing elements) can approximate (map) any nonlinear computable function to an arbitrary degree of accuracy.9,10 It learns the approximation through a numerical procedure called “network training” wherein network parameters (weights) are adjusted iteratively such that the network, in response to the input patterns in an example set, accurately produces the corresponding outputs. The advantages of an ANN-based model are as follows: (i) it can be constructed solely from the historic process input-output data (example set); (ii) no knowledge of the process phenomenology is necessary for the model development; (iii) a properly trained model possesses generalization ability due to which it can accurately predict outputs for a new input data set; (iv) even multiple input-multiple output relationships can be simultaneously approximated. Being exclusively databased, the ANN models require example data which are noise-free and statistically well-distributed. It may
10.1021/ie010414g CCC: $22.00 © 2002 American Chemical Society Published on Web 03/30/2002
2160
Ind. Eng. Chem. Res., Vol. 41, No. 9, 2002
also be noted that the ANN-based models are not good at extrapolation. This weakness, which is shared by all of the databased nonlinear models, can be overcome by collecting more example data in the regions where extrapolation is desired. Employing an appropriate optimization algorithm in conjunction with the process model allows a design engineer to obtain the optimal values of process input variables that maximize or minimize a specified objective function. Conventionally, various deterministic gradient-based methods11 are used for performing optimization of the phenomenological models. Most of these methods require that the objective function should simultaneously satisfy the smoothness, continuity, and differentiability criteria. Although the nonlinear relationships approximated by an ANN model can be expressed in the form of generic closed-form expressions, the objective function(s) derived thereby cannot be guaranteed to satisfy the smoothness criteria. Thus, the gradient-based methods cannot be efficiently used for optimizing the input space of an ANN model and, therefore, it becomes necessary to explore alternative optimization formalisms, which are lenient toward the form of the objective function. In recent years, genetic algorithms (GAs) that are members of the stochastic optimization formalisms have been used with great success in solving problems involving very large search spaces. The GAs were originally developed as the genetic engineering models mimicking population evolution in natural systems.12,13 Specifically, GAs enforce the “survival-of-the-fittest” and “genetic propagation of characteristics” principles of biological evolution for searching the solution space of an optimization problem. The principal features possessed by the GAs are as follows: (i) they require only scalar values and not the second- and/or first-order derivatives of the objective function; (ii) they are capable of handling nonlinear and noisy objective functions; (iii) they perform global searches and thus are more likely to arrive at or near the global optimum; (iv) they do not impose preconditions, such as smoothness, differentiability, and continuity, on the form of the objective function. Because of these attractive features, GAs are being increasingly used for solving diverse optimization problems in chemical engineering.14-19 In the present paper, the ANN and GA formalisms are integrated to arrive at a hybrid process modeling and optimization strategy. The strategy (henceforth referred to as “ANN-GA”) uses an ANN as the process modeling paradigm and the GA for optimizing the input space of the ANN model such that an improved process performance is realized. This paper is structured as follows. In the following section, the ANN-GA hybrid methodology is described along with its implementation details. Next, the results pertaining to the experimental verification of the ANN-GA methodology for the modeling and optimization of the TS-1-catalyzed benzene hydroxylation to phenol process are presented and discussed. 2. ANN-GA Hybrid Methodology The process optimization objective under consideration is expressed as follows: Given the catalytic process data comprising values of the multiple process inputs and the corresponding values of the multiple process outputs, find the optimal values of the process inputs
Figure 1. Schematic of an MLP network.
such that the prespecified measures of process performance are simultaneously maximized. The ANN-GA strategy fulfills the above-stated objective in two steps. In the first step, an ANN-based process model is developed. This model has the inputs describing process operating parameters and variables (reactant/ catalyst concentration, temperature, pressure, etc.) and its outputs representing process output variables (conversion, selectivity, etc.). In the second step of the ANN-GA procedure, the input space of the ANN model is optimized using a GA such that the optimized process inputs result in the enhanced values of the output variables. The next two subsections (2.1 and 2.2, respectively) provide the details of the ANN-based modeling and the GA-based optimization of the ANN model. 2.1. ANN-Based Modeling. The MLP network used in the model development is depicted in Figure 1. As shown, the network usually consists of three layers of nodes. The layers described as input, hidden, and output layers comprise N, L, and K numbers of processing nodes, respectively. Each node in the input (hidden) layer is linked to all of the nodes in the hidden (output) layer using weighted connections. In addition to the N and L numbers of input and hidden nodes, the MLP architecture also houses a bias node (with a fixed output of +1) in its input and hidden layers; the bias nodes are also connected to all of the nodes in the subsequent layer, and they provide additional adjustable parameters (weights) for the model fitting. The number of nodes (N) in the MLP network’s input layer is equal to the number of inputs in the process, whereas the number of output nodes (K) equals the number of process outputs. However, the number of hidden nodes (L) is an adjustable parameter whose magnitude is determined by issues such as the desired approximation and generalization capabilities of the network model. The MLP network is a nonlinear function-mapping device that determines the K-dimensional nonlinear function vector f where f: X f Y. Here, X is a set of N-dimensional input vectors (X ) {xp}; p ) 1, 2, ..., P and x ) [x1, x2, ..., xn, ..., xN]T), and Y is the set of corresponding K-dimensional output vectors (Y ) {yp}; p ) 1, 2, ..., P and y ) [y1, y2, ..., yk, ..., yK]T). The precise form of f is determined by (i) network topology, (ii) choice of the activation function used for computing outputs of the hidden and output nodes, and (iii) network weight matrixes WH and WO (they refer to the weights between
Ind. Eng. Chem. Res., Vol. 41, No. 9, 2002 2161
input and hidden nodes and hidden and output nodes, respectively). Thus, the nonlinear mapping can be expressed as
y ) y(x;W)
(1)
where W ) {WH, WO}. This equation suggests that y is a function of x, which is parametrized by W (Bishop21). It is now possible to write the closed-form expression of the input-output relationship approximated by the three-layered MLP as L
yk ) ˜f2(
∑ l)0
N
wO f1( lk ˜
wH ∑ nl xn)); n)0
k ) 1, 2, ..., K
(2)
where yk refers to the kth network output; ˜f1 and ˜f2 denote the nonlinear activation functions; wO lk refers to the weight between the lth hidden node and the kth output node; wH nl is the weight between the nth input and the lth hidden node; and xn represents the nth network input. Note that, in eq 2, the bias node is indexed as the zeroth node in the respective layer. In order that an MLP network approximates the nonlinear relationship existing between the process inputs and the outputs, it needs to be trained in a manner such that a prespecified error function is minimized. In essence, the MLP-training procedure aims at obtaining an optimal set (W) of the network weight matrices WH and WO, which minimize an error function. The commonly employed error function is the root-mean-squared error (RMSE) defined as
patterns in the training set completes one network training iteration. For the RMSE minimization, several training iterations are usually necessary. The RMSE minimization procedure by itself does not ensure that the trained network would possess the much desired generalization ability owing to which the network also predicts accurately the outputs corresponding to a new set of input data. To ensure that the MLP model possesses satisfactory generalization capability, care must be exercised to avoid what is known as “network overfitting”. It occurs when (i) the network is trained excessively (overtraining) and/or (ii) the network architecture houses more hidden nodes than necessary (overparametrization). The procedure for obtaining an optimal network architecture and the corresponding optimal weight matrix set W is described in Appendix I (for more details, see works by Bishop,21 Tambe et al.,22 Freeman and Skapura,23 and Nandi et al.24). The EBP training algorithm makes use of two adjustable parameters, namely, the learning rate (η, where 0 < η e 1) and the momentum coefficient (R, where 0 < R e 1). The magnitudes of both of these parameters also need to be optimized heuristically. 2.2. GA-Based Optimization of the ANN Model. Having developed an ANN-based process model, a GA is used to optimize the N-dimensional input space (x) of the ANN model. The corresponding optimization objective can be defined as follows. Find the N-dimensional optimal decision vector, x* ) [x1*, x2*, ..., xn*, ..., xN*]T, such that it maximizes the K-dimensional objective function vector, f, defined as K
RMSE )
x
f(x*,W) )
NP
Ei ∑ i)1
NP K
(3)
where NP refers to the number of patterns used in the training; i denotes the index of the input pattern (vector); and Ei represents the sum-squared error (SSE) pertaining to the ith training pattern. The SSE is expressed as K
Ei )
(tik - yik)2 ∑ k)1
(4)
where tik and yik are the desired (target) and predicted values of the kth output node, respectively. The most widely used formalism for the RMSE minimization is the error-back-propagation (EBP) algorithm20 utilizing a gradient-descent technique known as the generalized delta rule (GDR). In the EBP methodology, the weight matrix set, W, is initially randomized. Thereafter, an input vector from the training set is applied to the network’s input nodes and the outputs of the hidden and output nodes are computed. The outputs are computed as follows. First the weighted sum of all of the nodespecific inputs is evaluated, which is then transformed using a nonlinear activation function, such as the logistic sigmoid. The outputs from the output nodes {yik} are then compared with their target values {tik}, and the difference is used to compute the SSE defined in eq 4. Upon SSE computation, the weight matrices WH and WO are updated using the GDR framework. This procedure when repeated with the remaining input
∑ fk(x*,W) k)1
(5)
It can be noticed in the above formulation that the problem belongs to multiobjective (MO) optimization, that is, simultaneous maximization of fk(x*,W); k ) 1, 2, ..., K. A simple and popular approach to solve the MO optimization problem is to aggregate the multiple objectives into a single objective (SO) function. In the aggregation method (also termed the “weighting objective” method), each objective function is multiplied by a weighting coefficient (wk) and the resulting functions are combined to form a single profit function to be maximized. The maximization of a single aggregated objective function can be expressed as K
maximize ˆf(x,W) )
∑ wk fk(x,W);
k)1
K
0 e wk e 1 and
∑ wk ) 1 k)1
(6)
where ˆf refers to the aggregated function. In the GA procedure, the search for an optimal solution (decision) vector, x*, begins from a randomly initialized population of probable (candidate) solutions. The solutions, usually coded in the form of binary strings (chromosomes), are then tested to measure their fitness in fulfilling the optimization objective. Subsequently, a main loop of operations comprised of (i) selection of better (fitter) parent chromosomes, (ii) production of an offspring solution population by crossing over the genetic material between pairs of fitter parent chromosomes, and (iii) mutating (bit-flipping) elements of the offspring strings is executed. Imple-
2162
Ind. Eng. Chem. Res., Vol. 41, No. 9, 2002
of cosolvents is that the reaction conversion is usually low to make the process economically viable. An alternative approach to carry out the heterophase reactions is utilization of the phase-transfer catalysts (PTCs).26-28 This approach also leads to the workup problems because a PTC promotes emulsion formation. The tetraalkylammonium cation supported resins29 and aluminas are the examples of PTC and have been used as efficient triphase catalysts (solid catalyst residing at the interface of the two liquid layers) in certain organic transformations.30,31 In addition to the usage of cosolvents and PTCs, the heterophasic reactions have also been conducted in the presence of crystalline microporous solid catalysts. Particularly, the titanosilicalite (TS-1)32,33 has been extensively used in the oxidation of various organic substrates. Typically, TS-1 is employed in the presence of dilute hydrogen peroxide (H2O2) and cosolvents, such as acetone and methanol.34-36 A TS-1-based superior process for the direct one-step oxidation (hydroxylation) of simple aromatic compounds (benzene, toluene, anisole, etc.) has been reported recently.37,38 The process employs microporous TS-1 as the solid catalyst, aqueous H2O2 as an oxidant, and water as the dispersion medium [solid-liquid-liquid (S-L-L)-type triphase system]. This triphasic catalytic process is ecofriendly (green) because it uses water as the dispersion medium in place of an organic cosolvent. A significant success of the TS-1-based triphasic system has been achieved for the benzene hydroxylation to phenol reaction39 wherein a 15-20-fold increase (as compared to the biphasic system) in the benzene conversion was realized; also selectivity for the desired product (phenol) was significantly higher. The TS-1catalyzed benzene hydroxylation to phenol reaction is described as Figure 2. Flowchart showing implementation of the ANN-GA hybrid methodology.
mentation of this loop generates a new population of candidate solutions, which as compared to the previous population usually fares better at fulfilling the optimization objective. The best string that evolves after repeating the above-described loop several times forms the solution to the optimization problem. The stepwise procedure for the GA-based optimization of the ANN model is provided in appendix II (see also the flowchart in Figure 2). 3. Modeling and Optimization of the Benzene Hydroxylation Reaction Using the ANN-GA Strategy A commonly encountered difficulty in catalytic liquid heterophase organic transformations is allowing an interaction between the mutually immiscible reagents and the substrate.25 Under heterogeneous conditions (organic phase-aqueous phase reaction), the observed reaction rates are very low owing to the limitation on the mass transfer from one phase to another via a phase boundary. To overcome this difficulty, cosolvents are commonly used, which eliminate the phase boundaries completely and thereby make the system homogeneous. A drawback of the cosolvent utilization, however, is that it leads to product separation and solvent recycle (workup procedure) problems; also the usage of organic cosolvents is not environment-friendly. An additional difficulty for the reactions carried out in the presence
In this reaction, apart from phenol, which is the desired reaction product, small amounts of secondary products such as hydroquinone, catechol, and p-benzoquinone are also formed. With phenol being a widely consumed industrial chemical, it is economically sensible to optimize the TS1-catalyzed benzene to phenol process. For achieving this objective, we have used the ANN-GA hybrid modeling and optimization formalism described earlier. The specific goal of this process modeling and process optimization study is the development of an ANN-based process model from the reaction input-output data and the GA-based optimization of the ANN model with a view to obtain the optimal reaction conditions effecting (a) higher phenol selectivity, (b) enhanced H2O2 utilization, and (c) increased benzene conversion. We thus have an MO optimization problem with nonconflicting objectives at hand because it considers simultaneous maximization of the multiple (three) objectives.
Ind. Eng. Chem. Res., Vol. 41, No. 9, 2002 2163
Figure 3. Schematic of the reactor setup.
4. Results and Discussion 4.1. ANN-Based Modeling of the Benzene Hydroxylation Reaction. The TS-1 catalyst was synthesized using phosphoric acid (85% aqueous) as the promoter and characterized via well-known techniques, such as X-ray diffraction (Rigaku D MAX III VC), scanning electron microscopy (JEOL JSM 5200), Fourier transform infrared spectroscopy (Nicolet 60 SXB spectrometer), ultraviolet-visible (UV-2101 PC), and Brunauer-Emmett-Teller methods. The details of the catalyst synthesis and characterization are given in works by Bhaumik et al.39 and Kumar et al.40 Bhaumik and coauthors have tested the TS-1 catalyst extensively using a two-neck round-bottomed glass batch reactor (capacity 50 mL) fitted with a water condenser. In the reactor (Figure 3), benzene was reacted with the requisite amount of aqueous H2O2 and the products were analyzed by high-resolution capillary gas chromatography using a flame ionization detector. A total of 24 experiments, including those reported in the work by Bhaumik et al.,39 have been performed for studying the effects of five reaction input variables, namely, reaction time, catalyst weight percentage, reaction temperature, benzene to peroxide mole ratio, and water to benzene weight ratio. The catalyst performance was monitored in terms of the three reaction output variables, namely, phenol selectivity (mole percent), peroxide utilization (mole percent), and benzene conversion (mole percent). The peroxide utilization is evaluated using
H2O2 utilization (mole %) ) 100yprxd/y0prxd
(7)
where yprxd refers to the moles of H2O2 consumed to produce the oxygenated benzene products and y0prxd denotes the moles of H2O2 taken initially. The values of five operating conditions and the three reaction outputs corresponding to the 24 experiments are tabulated in Table 1. For modeling purposes, the reaction operating conditions data can be viewed as an example input matrix (X) of size 24 × 5 and the corresponding reaction output data as an example output matrix (Y) of size 24 × 3. For ANN training, each row of X represents a five-dimensional input vector x ) [x1, x2, ..., x5]T, and the corresponding row of matrix Y denotes the three-dimensional desired (target) output vector y ) [y1, y2, y3]T.
The MLP network architecture considered for modeling has five input nodes (N ) 5) for representing the five input variables and three output nodes (K ) 3) for describing as many output variables. Prior to the network training, the 24 example input-output vectors were partitioned into the training and test sets comprising 19 and 5 patterns, respectively; the test set patterns are marked separately in Table 1. In each network training iteration, the training set was utilized for adjusting the weight matrix set, W, and the test set was used for gauging the network’s generalization performance. When the training and test sets were utilized in this manner, an optimal process model was developed by following the six-step procedure described in appendix I. In the network training procedure, the logistic sigmoid activation function was used for computing the outputs of the hidden and output nodes. During the development of an optimal MLP model, the effect of varying the number of hidden nodes on the RMS errors pertaining to the training and test sets was rigorously studied. The results of this study are portrayed in Figure 4 where it is seen that an MLP architecture housing four hidden nodes resulted in the least RMSE value for the test set (Etst ) 0.0485). The values of the learning coefficient (η) and momentum coefficient (R), which resulted in the minimum Etst value, were 0.3 and 0.01, respectively; the corresponding RMSE value for the training set (Etrn) was 0.0482. The small and comparable Etrn and Etst magnitudes suggest that the MLP-based model possesses good approximation and generalization characteristics. A comparison of the network-predicted and desired values of the three reaction output variables is depicted in Figure 5. 4.2. GA-Based Optimization of the MLP Model. The MO optimization problem involving simultaneous maximization of the three reaction output variables is converted into an SO optimization problem (see eq 6) by defining
maximize ˆf(x,W) ) w1y1 + w2y2 + w3y3; xLn e xn e xU n (8) where ˆf represents the aggregated objective function; the N-dimensional decision vector, x, denotes the experimental operating conditions (N ) 5); wk (k ) 1-3) refers to the kth weighting coefficient; xLn and xU n respectively represent the lower and upper limiting values of xn; and y1, y2, and y3 represent the three reaction output variables. The five-dimensional input space of the MLP-based process model was optimized following the GA-based procedure outlined in appendix II; the values of the GAspecific parameters used in the optimization simulations were lchr ) 60, Npop ) 30, pcr ) 0.95, pmut ) 0.01, and ) 250. For computing the fitness value (ξj) of the Nmax g jth candidate solution (xj) in a population, the following fitness function was employed:
ξj ) w1(yj1/100) + w2(yj2/100) + w3(yj3/100); 3
∑ wk ) 1.0; j ) 1, 2, ..., Npop
(9)
k)1
where weighting coefficients w1, w2, and w3 represent the relative importance of the three objectives and yj1, yj2, and yj3 represent the MLP model-predicted values of the three output variables when the jth candidate
2164
Ind. Eng. Chem. Res., Vol. 41, No. 9, 2002
Table 1. Experimental Data Used in the ANN-Based Modeling reaction outputs (ANN output) expt. no.
reaction time (h): x1
1b 2 3 4 5 6 7 8b 9 10 11b 12 13 14 15b 16 17 18 19b 20 21 22 23 24
2.0 2.0 2.0 3.0 1.5 1.0 0.25 0.5 0.75 1.0 1.5 2.0 4.0 3.0 2.5 1.5 2.0 2.0 2.0 2.0 2.0 2.0 2.4 3.0
a
reaction operating conditions (ANN input) catalyst temp benzene/H2O2 water/benzene concn:a x2 (K): x3 (w/w): x5 (mol/mol): x4 15.0 15.0 15.0 15.0 15.0 15.0 15.0 15.0 15.0 15.0 15.0 15.0 5.0 10.0 12.5 20.0 15.0 15.0 15.0 15.0 15.0 9.8 24.9 30.0
333 333 333 323 343 353 333 333 333 333 333 333 333 333 333 333 333 333 333 333 333 333 347 340
1.5 2.0 3.0 1.0 1.0 1.0 8.0 4.0 2.67 2.0 1.33 1.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 1.0 1.5
5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 1.0 2.0 3.0 4.0 7.5 3.0 7.2 10.0
phenol selectivity (%): y1
H2O2 utilization (%): y2
benzene conversion (%): y3
89.5 90.4 95.0 96.0 86.5 78.4 75.0 79.8 84.6 87.4 91.6 88.6 90.8 86.4 85.0 65.7 60.3 64.9 71.7 79.8 90.3 92.9 77.6 92.0
86.7 87.2 90.7 70.5 89.2 87.9 91.0 91.4 93.3 92.3 90.8 90.0 58.5 84.7 87.6 90.3 39.9 81.3 85.7 87.4 75.9 85.7 92.3 96.3
52.3 41.1 28.8 67.8 78.6 72.3 9.1 19.0 30.3 41.0 63.0 80.8 26.8 37.3 38.1 33.6 14.3 30.1 33.4 36.2 34.6 40.0 81.2 59.4
Weight percentage with respect to benzene. b Test set pattern.
Figure 4. Plot showing the effect of varying numbers of hidden nodes on the RMSE with respect to training and test sets.
solution is applied to the network’s input nodes. Because maximizing the percentages of the three output variables, namely, phenol selectivity, hydrogen peroxide utilization, and benzene conversion, are equally important objectives, the values of w1, w2, and w3 were fixed at 0.333. During GA implementation, the search for the optimal solutions was restricted to the following ranges [xLn , xU n ] of the five process input variables: (1) reaction time (x1) ) [0.25, 4.0], (2) catalyst concentration (x2) ) [5.0, 30.0], (3) temperature (x3) ) [323.0, 353.0], (4) benzene to peroxide ratio (x4) ) [1.0, 8.0], and (5) water to benzene ratio (x5) ) [1.0, 10.0]. These ranges are the same over which the process input variables were varied for collecting the experimental data listed in Table 1. When a process model attains nonlinear character, as in the present case, it is possible that we obtain a locally optimal solution rather than the globally optimal one. Thus, in the case of an objective function maximization problem, a thorough exploration of the solution space is necessary to secure a solution that corresponds to the tallest local or the global maximum. For exploring the solution space exhaustively, the GA-based optimization simulations were repeated by using each time a
different randomly initialized population of the candidate solutions (step 1 of the GA procedure). Dissimilar initial populations ensure that the GA begins its search from a different search subspace, which helps in locating the tallest local or the global maximum. Following this approach, numerous (≈50) optimal solutions were obtained and it was observed that many solutions were similar. In Table 2, the top three dissimilar solutions in the fitness hierarchy that maximize all of the three output variables simultaneously are listed. The similarity between a large number of GA-based solutions suggests that the ANN model and, consequently, the objective function derived from it possess only a few maxima, which are captured repeatedly in the multiple GA simulations. Such a scenario is possible when (i) the input data used for building an ANN model are diversified but the resultant model still comprises only a few maxima and (ii) the input data are not diversified enough and, therefore, the resultant model is unable to represent the large number of intrinsically present maxima. An occurrence of the first scenario is not worrisome because the GA will certainly capture solutions corresponding to the few maxima. However, a precaution such as using a diversified input space for developing the ANN model must be exercised to avoid the undesirable effects (i.e., the GA unable to capture the true global maximum) emanating from the second scenario. It has been observed experimentally that higher (>1) values of the benzene to hydrogen peroxide ratio effect enhanced phenol selectivity (see experiment nos. 1-3 and 12). Accordingly, in another set of GA-based optimization simulations, the benzene to H2O2 ratio was constrained to lie in the 1.75-3.0 range. The top two optimized solutions obtained thereby are listed in Table 2 (nos. 4 and 5). It is important at this juncture to verify whether the best solution searched by the GA indeed corresponds to the tallest local or the global maximum on the
Ind. Eng. Chem. Res., Vol. 41, No. 9, 2002 2165
along with the values of the verification error (EVk ) can also be found in Table 2; for calculating the EVk values, the following expression was utilized:
EVk )
Figure 5. Plots comparing the experimental and model-predicted values of the three output variables.
objective function surface. Such an analysis can be performed by plotting the surface formed by the objective function values as a function of varying values of the decision variables. In the present case, it is not possible to depict the objective function surface because the decision space is five-dimensional. Alternatively, the objective function surface can be mapped as a function of variations in a single decision variable. The five plots generated in this manner are depicted in Figure 6 where the objective function (eq 8) values evaluated using the ANN model are plotted as a function of the varying values of each reaction input variable. These plots correspond to the optimized solution 1 (see Table 2) where an input variable was systematically varied while maintaining the remaining four variables at their optimal magnitudes. It is seen in the five panels of Figure 6 that the GA has indeed searched values of the five process input variables that either exactly (Figure 6a,c,d) or very nearly (Figure 6b,e) correspond to the tallest local maximum. 4.3. Experimental Verification of GA-Optimized Solutions. The five GA-optimized solutions in Table 2 were subjected to verification by conducting as many fresh experiments, and the results obtained thereby
100(ypk - yvk) ; k ) 1-3 ypk
(10)
where ypk refers to the GA-maximized value of the kth output variable and yvk represents the corresponding value obtained via experimental verification. The average absolute verification error values pertaining to the three output variables are 0.99% (phenol selectivity), 2.11% (H2O2 utilization), and 5.06% (benzene conversion). It can be observed from these magnitudes that the values of all of the three output variables corresponding to the verification experiments exhibit a reasonably good match with the respective GA-maximized values. It is necessary at this point to delve into the possible reasons that led to the small discrepancies between the GA-maximized and experimentally verified values of the three output variables. Collection of the process input-output data always suffers from two types of errors, namely, instrumental and measurement errors. For instance, however good the control system may be, the values of process variables do vary, albeit within a narrow window, around their nominal set points (instrumental error). When process variables are monitored and analyzed manually, they are subjected to the measurement errors. An additional type of error, namely, the approximation error, gets introduced while fitting the process models. For instance, modeling performed phenomenologically or using the ANN approach is vulnerable to approximation errors because an inherently microscopic physicochemical phenomenon is approximated in terms of a macroscopic or an empirical model. In the present study, these errors could have contributed to the small discrepancies (0.99-5.06%) observed between the GA-maximized and experimentally verified values of the three output variables. Nevertheless, it can be noticed from the experimental validation results that the ANN-GA formalism has been successful in obtaining a number of solutions effecting a significant improvement in the overall process performance. It is observed from the experimental data used for developing the ANN model that the maximum values of the three output variables were (i) 96.0% (phenol selectivity), (ii) 96.3% (H2O2 utilization), and (iii) 81.2% (benzene conversion). These values, however, were obtained in three different experiments (nos. 4, 24, and 23). In these experiments, although one output was high, the values of the remaining two were suboptimal. Because values of the reaction output variables are nonlinearly dependent on the reaction input variables, it was not possible to manually guess the optimal input conditions that would concurrently maximize all of the three reaction output variables. The experimentally verified results clearly indicate that the ANN-GA hybrid methodology has indeed succeeded in optimizing the reaction operating conditions, which simultaneously maximize all of the three reaction outputs. 5. Conclusion In summary, this paper presents a hybrid process modeling and optimization methodology integrating two artificial intelligence formalisms, namely, neural net-
2166
Ind. Eng. Chem. Res., Vol. 41, No. 9, 2002
Table 2. Experimental Verification of the Optimized Operating Conditions Obtained Using Hybrid ANN-GA Methodology optimized process input variables B to P time catal. temp ratiob W to B expt. (h): concn:a (K): (mol/mol): ratioc no. x1 x3 x4 x2 (w/w): x5 1 2 3 4 5 d
3.0 2.8 3.4 3.7 4.0
25.0 22.5 22.9 28.5 29.8
349.0 345.5 348.0 343.6 332.1
1.0 1.0 1.0 2.0 1.9
8.0 8.6 9.5 9.9 10.0
reaction output variables phenol selectivity (%) H2O2 utilization (%) benzene conversion (%) exptl. exptl. exptl. maximized value: error:d maximized value: error:d maximized value: error:d p v v p v v p y1 y2 yv3 E1 E2 Ev3 value: y1 value: y2 value: y3 93.3 93.3 92.8 94.3 93.9
95.0 94.0 91.0 94.0 94.0
-1.82 -0.75 +1.94 +0.32 -0.11
89.8 90.0 89.3 90.5 90.0
87.9 90.4 90.5 93.7 92.8
+2.11 -0.44 -1.34 -3.53 -3.11
80.2 79.8 79.4 46.5 49.3
83.8 85.3 83.0 44.2 47.1
-4.49 -6.89 -4.53 +4.95 +4.46
a Catalyst concentration expressed as a weight percentage with respect to benzene. b Benzene to H O ratio. c Water to benzene ratio. 2 2 Verification error (%) computed using eq 10.
works and GAs. The first step in implementing the hybrid methodology is the development of an ANNbased process model. A significant advantage of such a model is that it can be constructed exclusively from the historic process input-output data, without invoking the process phenomenology. Also, it is easy to develop multiple input-multiple output nonlinear models using ANNs. In the present study, although the most popular and simple EBP algorithm has been employed for training the MLP network, other algorithms such as Quickprop41 and RPROP42 could also be utilized. In the second phase of the ANN-GA implementation, the input space of the ANN model, representing process input variables, is optimized using GAs such that the model outputs are simultaneously maximized. The choice of the GAs as an optimization paradigm primarily stems from the following: (i) they are lenient toward the form of the objective function, (ii) GAs need only the scalar values of the objective function and not its derivatives, and (iii) they search the solution space stochastically and, therefore, are most likely to find the globally optimal solution. Besides GAs, there exist other stochastic optimization algorithms, namely, simulated annealing,43 tabu search,44,45 etc. These algorithms, though they differ in their rigor, share many of the above-stated attractive features of the GAs and, therefore, could be employed for optimizing the input space of the ANN model. The efficacy of the ANN-GA formalism was tested for modeling and optimizing the TS-1-catalyzed benzene hydroxylation to the phenol reaction. Here, experimental process data were used for constructing the ANN model. The input space of the ANN model comprising five input variables was then optimized with a view to simultaneously maximize the corresponding three output variables. A few GAoptimized solutions were subjected to the experimental verification. The results obtained thereby validate the predictions of the ANN-GA hybrid formalism with a good accuracy. To conclude, the ANN-GA methodology presented in this paper offers a cost-effective and relatively simple alternative for process modeling and optimization.
presented below. The procedure ensures that the network is not overfitted. 1. Partition the available example input-output data into two sets, namely, training and test sets. 2. Fix the number of nodes in MLP’s hidden layer to a small number (1 or 2). 3. Initialize the network weight set, W, randomly. 4. Train the network over several iterations using input-output vectors in the training set such that the RMSE with respect to the test set (defined as Etst and computed at the end of each iteration) is minimized. Store the weight matrix that has resulted in the lowest Etst magnitude. 5. Repeat steps 3 and 4 many times (say 10-15) by changing the seed value of the random number generator. This ensures that the weights are initialized differently in each repeated training run, which helps in exploring the error surface rigorously and thereby locating the deepest local or the global minimum on the error surface. 6. Go to step 2 and repeat steps 3-5 by systematically incrementing the number of hidden nodes (L); the network architecture and the weight set, W, that result in the overall least Etst magnitude are taken as optimal. Appendix II: Stepwise Procedure for the GA-Based Optimization of the ANN Model The stepwise numerical procedure for the GA-based optimization of the ANN model is provided below (also see the flowchart in Figure 2). Step 1 (initialization of solution population): Set the generation index (Ng) to zero and generate a population of Npop binary strings (chromosomes) randomly; each string possessing a total of lchr bits is divided into as many segments as the number of decision variables (N) to be optimized. Step 2 (fitness computation): Evaluate the performance (fitness) of the solution strings in the current population using a prespecified fitness function (ξ). In this step, the jth binary-coded chromosome (j ) 1, 2, ..., Npop) is converted to its equivalent decimal-valued solution vector (xj) using following relationship:
Acknowledgment S.N. thanks the Council of Scientific and Industrial Research (CSIR), the Government of India, New Delhi, for a Senior Research Fellowship (SRF). Appendix I: Stepwise Procedure for Developing an Optimal MLP Model The stepwise procedure for obtaining an optimal network architecture and the weight matrix set W is
xjn )
xLn
L (xU n - xn )Sn ; + 2ln - 1
N
∑ ln ) lchr
(II.1)
n)1
where xLn and xU n respectively refer to the lower and upper limiting values for xn; ln is the size of the nth binary segment; and Sn denotes the decimal equivalent of the nth binary segment. Next, vector xj is applied to the trained ANN to obtain the corresponding outputs
Ind. Eng. Chem. Res., Vol. 41, No. 9, 2002 2167
Figure 6. Effect of the variation in a process input variable on the objective function, ˆf. The plots pertain to the GA-searched optimal solution: x1 ) 3.0, x2 ) 25.0, x3 ) 349.0, x4 ) 1.0, and x5 ) 8.0.
yj that are subsequently utilized to compute the fitness value (ξj) of the solution string. Upon computation of the fitness scores of all of the chromosomes in the current population, the chromosomes are ranked in decreasing order of their fitness scores. Step 3 (selection of parents): Choose Npop number of parent chromosomes from the current population to form the mating pool. The members of this pool possess relatively high fitness scores and are used to produce offspring strings. The commonly used parent selection techniques are the roulette-wheel (RW) method and the more stable variant of the RW method known as the stochastic remainder selection (SRS) technique.12 Step 4 (crossover): Randomly form Npop/2 number of parent pairs from the mating pool and perform crossover
on each pair with a probability equal to Pcr. In crossover, each member of the parent pair is cut at the same randomly chosen crossover point along the length of the parent strings. As a result, two substrings are formed from each parent string. The substrings are then mutually exchanged between the parents and combined to obtain two offspring chromosomes. This crossover operation is repeated on the other randomly selected parent strings until an offspring population of the size of the parent pool is generated. Step 5 (mutation): Perform mutation (bit-flipping) operation on the offspring strings where the probability of a bit getting flipped (zero to one or vice versa) is equal to pmut; the recommended range of pmut is 0.01-0.05.
2168
Ind. Eng. Chem. Res., Vol. 41, No. 9, 2002
Step 6: Increment the generation index by 1: Ng ) Ng + 1. Step 7: Repeat steps 2-6 on the new-generation offspring strings until convergence is achieved. The criterion for the GA convergence could be the following: Ng exceeds its maximum limit (Nmax g ), or the fitness score of the best string in the offspring population undergoes a very small or no change over successive generations. After GA convergence is achieved, the string possessing the highest fitness value is decoded to obtain the optimized decision variable vector, x*. Nomenclature Ei ) sum-squared-error corresponding to the ith training pattern Etrn ) RMSE corresponding to the training set Etst ) RMSE corresponding to the test set EVk ) verification error corresponding to the kth process output variable ˆf ) aggregated objective function ˜f1 ) nonlinear activation function between input and hidden nodes ˜f2 ) nonlinear activation function between hidden and output nodes f ) vector of nonlinear functions approximated by the ANN K ) number of nodes in the network’s output layer lchr ) length of the chromosome L ) number of nodes in the network’s hidden layer N ) number of nodes in the network’s input layer Ng ) generation index Nmax ) maximum number of generations over which GA g simulations are conducted NP ) number of input-output patterns used for network training Npop ) size of the candidate solution (chromosome) population pcr ) crossover probability pmut ) mutation probability tik ) desired (target) output value for the kth output node when the ith input pattern is presented to the network’s input layer wk ) kth weighting coefficient W ) weight matrix set {WH, WO} WH ) weight matrix between input and hidden nodes WO ) weight matrix between hidden and output nodes x ) N-dimensional vector of process input (decision) variables x* ) optimal decision variable vector xj ) jth decimal-valued solution vector y ) K-dimensional vector of process output variables yj ) ANN model outputs corresponding to xj yprxd ) moles of H2O2 consumed to produce oxygenated benzene products y0prxd ) moles of H2O2 taken initially yik ) ANN model predicted value of the kth output node when the ith input pattern is presented to the network’s input layer ypk ) GA-maximized value of the kth process output variable yvk ) experimentally verified value of the kth process output variable Greek Symbols R ) momentum coefficient in the EBP algorithm η ) learning rate in the EBP algorithm ξj ) fitness score of the jth chromosome
Literature Cited (1) Bhat, N.; McAvoy, T. Use of Neural Nets for Modeling and Control of Chemical Process Systems. Comput. Chem. Eng. 1990, 14, 573. (2) Hernandez, E.; Arkun, Y. Study of the Control-relevant Properties of Back-Propagation Neural Network Models of Nonlinear Dynamical Systems. Comput Chem. Eng. 1992, 4, 227. (3) Nahas, E.; Henson, M.; Seborg, D. Nonlinear Internal Model Strategy for Neural Network Models. Comput. Chem. Eng. 1992, 16, 1039. (4) Ramasamy, S.; Tambe, S. S.; Kulkarni, B. D.; Deshpande, P. B. Robust Nonlinear Control with Neural Networks. Proc. R. Soc. London A 1995, 449, 655. (5) Chitra S. P. Use Neural Networks for Problem Solving. Chem. Eng. Prog. 1993, 89 (4), 44. (6) Tendulkar, S. B.; Tambe, S. S.; Chandra, I.; Rao, P. V.; Naik, R. V.; Kulkarni, B. D. Hydroxylation of Phenol to Dihydroxybenzenes: Development of Artificial-Neural-Network-Based Process Identification and Model Predictive Control Strategies for a Pilot Plant Scale Reactor. Ind. Eng. Chem. Res. 1998, 37, 2081. (7) Bhagat, P. An Introduction to Neural Nets. Chem. Eng. Prog. 1990, 86 (4), 44. (8) Narendra, K.; Parthasarthy, K. Identification and Control of Dynamical Systems. IEEE Trans. Neural Networks 1990, 1, 4. (9) Hornik, K.; Stinchcombe, M.; White, H. Multilayer Feedforward Networks are Universal Approximators. Neural Networks 1989, 2, 359. (10) Poggio, T.; Girosi, F. Regularization Algorithms for Learning that are Equivalent to Multilayer Networks. Science 1990, 247, 978. (11) Edgar, T. F.; Himmelblau, D. M. Optimization of Chemical Processes; McGraw-Hill: Singapore, 1989. (12) Goldberg, D. E. Genetic Algorithms in Search, Optimization, and Machine Learning; Addison-Wesley: New York, 1989. (13) Davis, L., Ed. Handbook of Genetic Algorithms; Van Nostrand Reinhold: New York, 1991. (14) Cartwright, H. M.; Long, R. A. Simultaneous Optimization of Chemical Flowshop Sequencing and Topology using Genetic Algorithms. Ind. Eng. Chem. Res. 1993, 32, 2706. (15) Hanagandi, V.; Ploehn, H.; Nikolaou, M. Solution of the Self-Consistent Field Model for Polymer Adsorption by Genetic Algorithms. Chem. Eng. Sci. 1996, 51, 1071. (16) Garcia, S.; Scott, E. P. Use of Genetic Algorithms in Thermal Property Estimation: I. Experimental Design Optimization. Numer. Heat Transfer, Part A 1998, 33, 135. (17) Garcia, S.; Guynn, J.; Scott, E. P. Use of Genetic Algorithms in Thermal Property Estimation: II. Simultaneous Estimation of Thermal Properties. Numer. Heat Trans. (Part-A), 1998, 33, 149. (18) Garrard A.; Fraga, E. S. Mass Exchange Network Synthesis Using Genetic Algorithms. Comput. Chem. Eng. 1988, 22, 1837. (19) Polifke, W.; Geng, W.; Dobbeling, K. Optimization of Rate Coefficients for Simplified Reaction Mechanisms with Genetic Algorithms. Combust. Flame 1988, 113, 119. (20) Rumelhart, D.; Hinton, G.; Williams, R. Learning Representations by Backpropagating Errors. Nature 1986, 323, 533. (21) Bishop, C. Neural Networks and Their Applications. Rev. Sci. Instrum. 1994, 65, 1803. (22) Tambe, S. S.; Kulkarni, B. D.; Deshpande, P. B. Elements of Artificial Neural Networks with Selected Applications in Chemical Engineering, and Chemical & Biological Sciences; Simulation & Advanced Controls Inc.: Louisville, KY, 1996. (23) Freeman, J. A.; Skapura, D. M. Neural Networks: Algorithms, Applications, and Programming Techniques; AddisonWesley: Reading, MA, 1991. (24) Nandi, S.; Ghosh, S.; Tambe, S. S.; Kulkarni, B. D. Artificial Neural-Network-Assisted Stochastic Process Optimization Strategies. AIChE J. 2001, 47, 126. (25) Menger, F. M. Interfacial Physical Organic Chemistry. Imidazole-Catalyzed Ester Hydrolysis at a Water-Heptane Boundary. J. Am. Chem. Soc. 1970, 92, 5965. (26) Starks, C. M. Phase-Transfer Catalysis; I. Heterogeneous Reactions Involving Anion Transfer by Quaternary Ammonium and Phosphonium Salts. J. Am. Chem. Soc. 1971, 93, 195. (27) Dockx, J. Quaternary Ammonium Compounds in Organic Synthesis. Synthesis 1973, 441.
Ind. Eng. Chem. Res., Vol. 41, No. 9, 2002 2169 (28) Dehmlow, E. V. Phase Transfer Catalyzed Two-phase Reactions in Preparative Organic Chemistry. Angew. Chem., Int. Ed. Engl. 1974, 13, 170. (29) Regen, S. L. Triphase Catalysis. Angew. Chem., Int. Ed. Engl. 1979, 18, 421. (30) Regen, S. L. Triphase Catalysis. J. Am. Chem. Soc. 1975, 97, 5956. (31) Molinari, H.; Montanari, F.; Quici, S.; Tundo, P. PolymerSupported Phase-Transfer Catalysts. High Catalytic Activity of Ammonium and Phosphonium Quaternary Salts Bonded to a Polystyrene Matrix. J. Am. Chem. Soc. 1979, 101, 3920. (32) Notari, B. Synthesis and Catalytic Properties of Titanium Containing Zeolites. Stud. Surf. Sci. Catal. 1988, 37, 413. (33) Thangaraj, A.; Kumar, R.; Mirajkar, S. P.; Ratnasamy, P. Catalytic Properties of Crystalline Titanium Silicalites; I. Synthesis and Characterization of Titanium-rich Zeolites with MFI Structures. J. Catal. 1991, 130, 1. (34) Thangaraj, A.; Kumar, R.; Ratnasamy, P. Direct Catalytic Hyroxylation of Benzene with Hydrogen Peroxide over TitaniumSilicate Zeolites. Appl. Catal. 1990, 57, L1. (35) Thangaraj, A.; Kumar, R.; Ratnasamy, P. Catalytic Properties of Crystalline Titanium Silicalites; II. Hydroxylation of Phenol with Hydrogen Peroxide over TS-1 Zeolites. J. Catal. 1991, 131, 294. (36) Kumar, R.; Raj, A.; Kumar, S. B.; Ratnasamy, P. Convenient Synthesis of Crystalline Microporous Transition Metal Silicates using Complexing Agents. Stud. Surf. Sci. Catal. 1994, 84A, 109. (37) Bhaumik, A.; Kumar, R. Titanium Silicate Molecular Sieve (TS-1)/H2O2 Induced Triphase Catalysis in the Oxidation of Hydrophobic Organic Compounds with Significant Enhancement of Activity and Para-Selectivity. J. Chem. Soc., Chem. Commun. 1995, 349.
(38) Kumar, R.; Mukherjee, P.; Bhaumik, A. Enhancement in the Reaction Rates in the Hydroxylation of Aromatics over TS-1/ H2O2 under Solvent Free Triphase Conditions. Catal. Today 1999, 49, 185. (39) Bhaumik, A.; Mukherjee, P.; Kumar, R. Triphase Catalysis Over Titanium-Silicate Molecular Sieves under Solvent-free Conditions; I. Direct Hydroxylation of Benzene. J. Catal. 1998, 178, 101. (40) Kumar, R.; Bhaumik, A.; Ahedi, R. K.; Ganapathy, S. Promoter-Induced Enhancement of the Crystallization Rate of Zeolites and Related Molecular Sieves. Nature 1996, 381, 298. (41) Fahlman, S. E. An Empirical Study of Learning Speed in Back-Propagation Networks; Technical Report CMU-CS-88-162; Computer Science Department, Carnegie-Mellon University: Pittsburgh, PA, 1988. (42) Riedmiller, M.; Braun, H. A Direct Adaptive Method for Faster Backpropagation Learning: The RPROP Algorithm. Proceedings of the IEEE International Conference on Neural Networks, San Fransisco, CA, Mar 28-Apr 1, 1993. (43) Kirkpatrick, S.; Gelatt, C. D., Jr.; Vecchi, M. P. Optimization by Simulated Annealing. Science 1983, 220 (4598), 671. (44) Glover, F. Tabu Search Part I. ORSA J. Comput. 1989, 1, 190. (45) Glover, F. Tabu Search Part II. ORSA J. Comput. 1990, 2, 4.
Received for review May 9, 2001 Revised manuscript received September 28, 2001 Accepted January 29, 2002 IE010414G