Ind. Eng. Chem. Res. 2001, 40, 4525-4535
4525
Modeling and Optimal Control of a Batch Polymerization Reactor Using a Hybrid Stacked Recurrent Neural Network Model Yuan Tian,† Jie Zhang,* and Julian Morris Centre for Process Analytics and Control Technology, Department of Chemical & Process Engineering, University of Newcastle, Newcastle upon Tyne NE1 7RU, U.K.
This paper presents a novel nonlinear hybrid modeling approach aimed at obtaining improvements in model performance and robustness to new data in the optimal control of a batch MMA polymerization reactor. The hybrid model contains a simplified mechanistic model that does not consider the gel effect and stacked recurrent neural networks. Stacked recurrent neural networks are built to characterize the gel effect, which is one of the most difficult parts of polymerization modeling. Sparsely sampled data on polymer quality were interpolated using a cubic spline function to generate data for neural network training. Comparative studies with the use of a single neural network show that stacked networks give superior performance and improved robustness. Optimal reactor temperature control policies are then calculated using the hybrid stacked neural network model. It is shown that the optimal control strategy based on the hybrid stacked neural network model offers much more robust performance than that based on a hybrid single neural network model. 1. Introduction Many polymeric products are low-volume, high-valueadded specialty materials designed to perform specified functions. Consequently, the most prevalent mode of high-value-added polymerization is in batch reactors, which allow for great flexibility. This flexibility is becoming increasingly important in market-driven multiproduct manufacturing, where only a few batches are produced at any one time. This leads to a lack of modelbuilding data. The operation of a batch process is intrinsically dynamic, with the process gain and time constant varying with operating conditions usually over very wide ranges during a batch cycle. Because of these features of batch processes, their effective control in multiproduct manufacturing demands controllers that are capable of providing good dynamic response over the entire operating ranges of the process variables and that can be configured from minimal process data in a highly responsive manufacturing environment. This contrasts with the large amounts of data and the precise control that can be achieved over a small range as required in many continuous processes.1 Appropriate process control technology and optimization provide a leverage point for cost reduction and improvement in product uniformity by enabling processes to be operated close to economic and plant constraints. Basically, two types of optimization problems are encountered in polymerization reactors.2 The first, the static optimization problem, deals with the selection of the best time-invariant controls so that, in the absence of disturbances, the molecular properties attain some desired values. The second, the time optimal control problem, refers to the determination of the optimal control trajectories to translate a polymer reaction from its initial state to a desired final state. * Corresponding author. Phone: +44-191-2227240, Fax: +44-191-2225292. E-mail:
[email protected]. † Present address: Panasonic Technology Inc., 2 Research Way, Princeton, NJ 08540.
The latter, which is associated with the dynamic operation of batch and semibatch reactors, is the main concern of this paper. The solution to these optimization problems can be obtained in terms of the following elements: (i) an accurate model of the process determined from minimal process data and capable of being frequently changed or updated, (ii) a selected number of generic control variables, (iii) a flexible objective function, and (iv) a suitable numerical method for solving the specific optimization problem. Ensuring product quality is a much more complex issue in polymerization processes than in conventional short-chain reactions because the molecular and morphological properties of a polymer product strongly influence its physical, chemical, thermal, rheological, and mechanical properties, as well as the polymer’s enduse applications. In this respect, the development of comprehensive mathematical models to predict “polymer quality” in terms of process operating conditions in a polymerization reactor is usually the key to the efficient production of high-quality, tailored polymers and the improvement of plant operability and economics. Many researchers have carried out extensive studies on the mechanistic modeling of polymerization reactors.2,3 However, a complete mechanistic model might contain an excessively large number of differential equations and might require too much computer time for numerical solution.4 Associated with the larger number of differential equations are many reaction kinetic parameters, which are usually not easy to determine. In addition, model development, especially the modeling of diffusion-controlled reactions under batch or semibatch conditions, is also extremely difficult in a multiproduct manufacturing environment where many different polymerizations can be run.4 Diffusioncontrolled free-radical polymerization is the manifestation of mesoscale mass transfer phenomena, and diffusion-controlled termination, propagation, and initiation reactions have been related to the well-known phenomena of the gel effect, the glass effect, and the cage effect,
10.1021/ie0010565 CCC: $20.00 © 2001 American Chemical Society Published on Web 09/25/2001
4526
Ind. Eng. Chem. Res., Vol. 40, No. 21, 2001
respectively. It is generally accepted that the most important and complicated part of a polymerization model lies in the equations for the gel and glass effects. Several workers have attempted to model these effects using semiempirical, time-varying rate constants for the termination and propagation reactions. For example, Ross and Laurence5 used the concept of free volume to express the apparent rate constants in terms of simple equations involving monomer conversion and temperature and curve fitted the parameters of their model. Cardenas and O’Driscoll6-8 developed a model by assuming that that two populations of radicals exist in a high-conversion polymerization system: long-chain radicals becoming entangled with neighboring molecules and, therefore, having restricted mobility and shorter radicals whose mobilities are not strongly affected by diffusional effects. Marten and Hamielec9 further improved this model by using a free-volume theory. Chiu et al.10 presented a model that had several advantages over earlier models. Several interesting studies with an engineering bias followed the publication of this model, focusing on such aspects as optimization,11,12 parametric sensitivity,13 and optimal parameter estimation.14 Achilias and Kiparissides15 proposed a unified mathematical framework for modeling diffusion-controlled polymerization reactions, whereas Ray et al.16 developed a new model to account for the gel and glass effects. In the study of Ray et al.,16 the model parameters were tuned using experimental data on the isothermal bulk and solution polymerization of methyl methacrylate (MMA) in batch reactors. In most cases, the model parameters are obtained by curve fitting (tuning) of available data under isothermal conditions, which is arguably unrealistic when the reactor temperature is being manipulated to achieve different polymer properties. To overcome the difficulties in the mechanistic modeling of polymerization processes, data-based empirical models and hybrid models should be utilized. In the past decade, many researchers have explored the integration of process knowledge into black box models to provide “grey box” or “hybrid” models. For example, Psichogios and Ungar17 implemented a neural network in series with a physical model to estimate the parameters of the first principles model. Tsen et al.18 applied a physical model to extend the domain of the training set, whereas Thompson and Kramer19 combined a physical model in parallel with a radial basis function neural network representation to enhance the extrapolation ability of the empirical model. However, to the authors’ knowledge, there have been no reported works on the development of a gel effect model using neural networks, which is then incorporated into a simplified first principles model. In this study, a hybrid model structure integrating a simplified mechanistic model and neural networks is proposed. The simplified mechanistic model includes mass and energy balances but neglects the gel and glass effects by setting the associated reaction rate parameters to constants and predicts the monomer conversion, number-average molecular weight, and weight-average molecular weight. Because of the neglect of the gel effect, the predictions have large errors, especially at the later stages of polymerization. The prediction residuals are then modeled by three stacked recurrent neural networks, which have enhanced longrange prediction capabilities. It is shown that improved model generalization capability can be obtained by using this hybrid model structure. The hybrid model is then
used to predict how different temperature policies will influence the behavior of the polymerization in terms of conversion, average molecular weights, and polydispersity. Optimal control policies for the batch MMA reactor are then computed for on-line application in real time. The results show that good performance and robustness are achieved in the effective control of the batch polymerization process. The paper is organized as follows. Section 2 introduces the concept of stacked neural networks. In section 3, a batch MMA polymerization reactor is described, along with the optimal control objective. The hybrid modeling methodology is presented in section 4. Section 5 presents the optimal control of this reactor based on the hybrid model. The last section draws some concluding remarks. 2. Stacked Neural Networks Neural networks have been increasingly used in the chemical process industry, especially for dealing with some complex nonlinear processes where process understanding is limited.20,21 The quality of a neural network, or its fitness-for-purpose, depends on the amount and representativeness of the available training data and the network training methods. In general, there is no assurance that any individual models have extracted all relevant information from the data set.22 Many researchers have shown that simply combining multiple neural networks can generate more accurate predictions than using any one of the individual networks alone.23-25 The idea of combining models to improve prediction accuracy originated with the work of Bates and Granger,26 who combined two different models for forecasting a time series and reported improved predictions using the combined model. A similar idea was introduced by Hansen and Salamon,24 who suggested training a group of networks that had the same architecture but were initialized with different connection weights. Cooper27 suggested constructing a multi-neuralnetwork system in which a number of networks independently compete to accomplish the required classification task. The multinetwork system learns from experience the networks that have the most effective separators, and those networks determine the final classification. Mani28 suggested training a portfolio of networks, possibly of different topologies, using a variety of learning techniques. Pearlmutter and Rosenfeld29 independently trained a number of identical networks on the same data and then averaged the results. They concluded that replication almost always results in a decrease in the expected complexity of the network and improves the expected generalization. The idea of combining neural network models is based on the premise that different neural networks capture different aspects of process behavior and that aggregating this information should reduce uncertainty and provide more accurate predictions. It is always possible that a stacked network could outperform a single besttrained network for the following reasons: (i) Optimal network architecture cannot always be guaranteed. (ii) The optimization of network parameters is a problem with many local minima. Even for a given architecture, final network parameters can differ between one run of the algorithm and another. For example, variations in the initial starting conditions for network training can lead to different solutions for the network param-
Ind. Eng. Chem. Res., Vol. 40, No. 21, 2001 4527
Figure 2. Batch polymerization reactor.
Figure 1. Structure of a stacked neural network (with N single networks).
eters. Therefore, the different networks might have independent errors on different subsets of the input space. (iii) Different activation functions and learning algorithms can also lead to different generalization characteristics, and no one activation function or learning algorithm is uniformly the best. (iv) Convergence criteria used for network training can lead to very different solutions for a given network architecture. Therefore, combining existing neural networks can provide a practical approach to developing a better overall model for prediction, rather than searching for a single best-trained model. Figure 1 shows the combination structure of a typical stacked neural network. Different methods for combining individual networks have been developed. Wolpert30 proposed stacked generalization as a technique for combining models that is performed by minimizing the squared prediction error using cross validation. Breiman31 imposed proper constraints on the weights. Freund and Schapire32 proposed using the confidence of an individual model as the weight for that model. Zhang et al.33 proposed using principal component regression (PCR) to explore, and thus discount, correlations among individual models. The usual approach is to model the true output as a linear combination of the outputs of the individual networks. A simple average or weighted average of the candidate outputs is a very common and practical method of combining the models. The overall output of the stacked neural network as a weighted combination of the individual network outputs is described by N
y(x) )
wiyi(x) ∑ i)1
(1)
where y(x) is the stacked neural network predictor with network input x, yi(x) is the ith network predictor, wi is the stacking weight for the ith network output, and N is the total number of the individual networks. 3. A Batch Polymerization Process 3.1. Process Description. The batch polymerization reactor studied here is a pilot-scale polymerization reactor installed at the Department of Chemical Engineering, Aristotle University of Thessaloniki, Greece. The reactor is shown in Figure 2. The reaction is the free-radical solution polymerization of MMA with a
water solvent and benzoyl peroxide initiator. The reactor is provided with a stirrer for thorough agitation of the reaction mixture. Heating and cooling of the reaction mixture is achieved by circulating water at appropriate temperatures through the reactor jacket. The reactor temperature is controlled by a cascade control system consisting of a primary PID and two secondary PI controllers. The reactor temperature is fed back to the primary controller, whose output is taken as the set point of the two secondary controllers. During the polymerization, samples are taken, and the conversion is determined gravimetrically, while the weight-average molecular weight and polydispersity are determined by gel permeation chromatography. The general description of the chemical reactions taking place during the free radical polymerization of MMA can be expressed by Kd
Initiator decomposition
I 98 2I′
Initiation
I′ + M 98 R1
Propagation
Rx + M 98 Rx+1
Chain transfer to monomer
Rx + M 98 Px + R1
Ki
Kp
Km
Ktd
Termination by disproportion Rx + Ry 98 Px + Py Termination by combination
Kpc
Rx + Ry 98 Px+y
A detailed mathematical model covering the reaction kinetics and heat and mass balances has been developed for the solution polymerization of MMA.15 Using this model, a rigorous simulation program has been developed and used as the real process to generate polymerization data under different batch operating conditions and to test the developed models and optimal control strategies. 3.2. Formulation of the Optimal Control Problem. A great number of optimization problems for polymerization reactors have been studied in terms of a single scalar objective function that combines all identifiable performance measures with appropriate weighting factors chosen a priori. However, the combination of several terms into a single objective function is not always easy, in the sense that some controlled variables might react in opposite directions to manipulations of a control variable. In general, the optimization objectives will fall into the following categories: molec-
4528
Ind. Eng. Chem. Res., Vol. 40, No. 21, 2001
ular property specifications; safety, reactor, and environmental constraints; and economic objectives. The control of the specified molecular properties is often carried out by manipulating the reaction temperature, which influences the monomer conversion and the molecular weights significantly. In this work, the optimal temperature trajectory is calculated so that the reactor will produce the polymer product with the desired number-average molecular weight at the prespecified monomer conversion rate and polydispersity. The objective function is given by
[
J ) w1
PD(tf) -1 PDd
]
2
+ w2[CONV(tf) - 1]2
(2)
where PD is the polydispersity, which equals the weightaverage molecular weight divided by the numberaverage molecular weight; PDd is the desired polydipersity (set to 3 in this study); CONV is the monomer conversion; tf is the final batch time, which is set to 120 min in this study; and w1 and w2 are the weighting factors that specify which of the two terms in eq 2 is more important. In this study, we consider the two terms to be of equal importance (i.e., it is equally important to have a desired final polydispersity and a high final conversion), and hence, both w1 and w2 are set to 1. The monomer property constraint is on the numberaverage molecular weight (MN)
0.95 e MN(tf)/(3.0 × 105) e 1.05
(3)
Also from a practical point of view, it might be desirable to impose an additional restriction on a manipulated variable, namely, the temperature profile, which is called move suppression34
|Ti+1 - Ti| e ∆T i ) 1, ..., m
(4)
where m is the number of segments in the temperature control profile. Here, the given batch time is divided into m (m ) 10 in this study) subintervals of equal length (0, t1), (t1, t2), ..., (tm-1, tm), tm ) tf so that the length of each subinterval is L ) tf/m. During each time subinterval, the temperature Ti is kept constant. In this case, ∆T is fixed at 20 K, as it would be unrealistic to have more than 20-K temperature variations over the length of each subinterval L (L ) 12 min) by conventional regulating facilities in this reactor. The temperature is bounded by
310 K e T e 360 K
(5)
4. Modeling the Batch MMA Polymerization Reactor Using a Hybrid Model 4.1. Diffusion-Controlled Reaction. The kinetics of polymerization is characterized by the fact that the termination and propagation rates are controlled by diffusion phenomena.35 The gel effect or the Trommsdorff effect has been attributed to the decrease of the termination rate constant caused by a decrease of the mobility of polymer chains. This phenomenon strongly affects the final polymer properties, as it leads to a broader molecular weight distribution. The instantaneous molecular weight increases rapidly; the weightaverage molecular weight increases more than the
number-average molecular weight, so that the polydispersity is larger. This effect can also cause the thermal runaway of a polymerization reactor. In the later stages of reaction, the glass transition temperature of the reaction mass becomes higher than the polymerization temperature, and the reaction mixture becomes glassy in nature. A decrease of the mobility of the monomer molecules causes a decrease of the propagation rate constant. A consequence of this phenomenon is the “freezing” of the reaction mixture at conversions below 100%. At the limiting conversion, the glass transition temperature of the polymer/monomer mixture becomes equal to the polymerization temperature. This effect is referred to as the glass effect.2 Diffusion-controlled reactions are related to both effects, and many researchers have regarded them as the most important parts of a polymerization model. 4.2. Hybrid Model. In this study, a hybrid stacked recurrent neural network model is developed to describe the dynamic behavior of a batch MMA polymerization reactor. The hybrid model is developed in two stages. The first stage is to develop a simplified mechanistic model, which is represented by mass and energy balances. This model specifies process variable interactions from physical considerations. Without taking into account the gel effect, the corresponding reaction rate kinetic parameters in the simplified first principles model are set to constants. The simplified mechanistic model predicts the monomer conversion, the numberaverage molecular weight, and the weight-average molecular weight. Because of the neglect of the gel and glass effects, the simplified mechanistic model has significant prediction errors, especially at high conversions close to the onset of the gel effect. This can be seen from Figure 4, where the solid lines represent the differences between the full model and the simplified model predictions of the conversion, number-average molecular weight, and weight-average molecular weight. The second stage is to build a stacked recurrent neural network model to predict the residuals of the simplified first principles model. The predicted residuals are added to the predictions from the simplified mechanistic model to form the final hybrid model predictions. Because the focus in batch process control is on the end-of-batch product quality, the model should offer accurate longrange predictions. Long-range prediction means that, given the current conditions of a process and a sequence of future control actions, a sequence of future process outputs can be predicted. This usually involves feeding back model predictions through one or more time delay units to obtain further future predictions. Recurrent neural networks can usually offer good long-range predictions because the training objective in a recurrent neural networks is to minimize its long-range prediction errors.36,37 To enhance long-range predictions, stacked recurrent neural networks were used in this study. 4.3. Development of the Stacked Recurrent Neural Network Model. Three stacked networks were developed to model the residuals of the simplified mechanistic model in predicting the conversion (Conv), number-average molecular weight (MN), and weightaverage molecular weight (MW). For each stacked neural network, six different networks were developed. This number of individual networks is a compromise between the fact that too many networks will greatly increase the computing time and too few networks will
Ind. Eng. Chem. Res., Vol. 40, No. 21, 2001 4529
not actually benefit from the stacked generalization. Studies of Zhang38,39 show that significant improvements in model generalization can usually be achieved by stacking the first five or six individual networks. Stacking further networks generally will not significantly improve model generalization performance. Diversity of individual models is a key issue in the stacked generalization and is not easy to achieve because all models are trained to do essentially similar tasks. In this study, individual networks for each stacked network were given different model input structures, training data, network topologies, and initial weights so that the different networks could perform differently in different operating regions and learn different parts of the problem to form the complementary stacked output. The first important step of the neural network model development is a proper selection of input variables. Because of the complexity of the nonlinear polymerization process, it is almost impossible to guarantee which model input structure is the best. With a careful analysis of the reactor based on the experimental observations as well as on the theory of mass transfer with simultaneous chemical reaction, a base model structure is determined for each stacked network with small adjustments being made to individual networks, not only to reduce the correlation but also to increase the generalization capability. Theoretical analysis and experiments of Zhang38,39 have shown that significant improvement in model generalization can be achieved by stacking individual models with diverse model characteristics (i.e., models not heavily correlated with each other). Zhang38,39 uses bootstrap resampling to generate different data sets for network training to reduce the correlations among the individual networks and principal component regression to obtain the stacking weights. However, such a technique cannot be directly applied in this study because recurrent neural networks have to be trained on a sequence (or several sequences) of continuous data. In this study, we try to develop a diverse set of individual recurrent neural network models by varying model input structures. Because diffusion phenomena are heavily affected by temperature, which plays an important role on the onset and extent of the two effects,40 reactor temperatures were included in the inputs for every network model. Conversion is closely related to the behavior of the number- and weight-average molecular weights. The proper model structures for the individual networks are listed as follows (where the subscript b represents the base model structure)
(a) Individual networks for conversion r•Convb(t) ) fb[T(t - 1),Conv(t - 1), Conv(t - 2),r•Convb(t - 1), r•Convb(t - 2)] (b represents networks 1-3) (6) r•Conv4(t) ) f4[T(t - 1),Conv(t - 1), r•Conv4(t - 1),r•Conv4(t - 2)] (7) r•Conv5(t) ) f5[T(t - 1),Conv(t - 1),Conv(t - 2), Conv(t - 3),r•Conv5(t - 1), r•Conv5(t - 2)] (8) r•Conv6(t) ) f6[T(t - 1),Conv(t - 1),Conv(t - 2), r•Conv6(t - 1), r•Conv6(t - 2), r•Conv6(t - 3)] (9)
(b) Individual networks for MN r•MNb(t) ) gb[T(t - 1),Conv(t - 1),Conv(t - 2), MN(t - 1),r•MNb(t - 1), r•MNb(t - 2)] (b represents networks 1-5) (10) r•MN6(t) ) g6[T(t - 1),Conv(t - 1),MN(t - 1), MN(t - 2),r•MN6(t - 1), r•MN6(t - 2)] (11) (c) Individual networks for MW r•MWb(t) ) hb[T(t - 1),Conv(t - 1),r•MWb(t - 1), r•MWb(t - 2)] (b represents networks 1-4) (12) r•MW5(t) ) h5[T(t - 1),Conv(t - 1),MW(t - 1), r•MW5(t - 1),r•MW5(t - 2)] (13) r•MW6(t) ) h6[T(t - 1),Conv(t - 1),Conv(t - 2), r•MW6(t - 1),r•MW6(t - 2)] (14) where T is the reactor temperature; Conv, MN, and MW denote, respectively, the conversion and the numberand weight-average molecular weights from the simplified first principles model; r•Convi, r•MNi, and r•MWi, represent, respectively, the residuals of Conv, MN, and MW; and fi, gi, and hi are the underlying nonlinear functions defined by the individual networks. The different network architectures presented above incorporate different information about the reaction process, which is very important for the stacked network to acquire the overall result comprehensively. The final hybrid model predictions are obtained as follows
Conv•final ) Conv + r•Conv
(15)
MN•final ) MN + r•MN
(16)
MW•final ) MW + r•MW
(17)
where Conv, MN, and MW are, respectively, the conversion and number- and weight-average molecular weights from the simplified first principles model; r•Conv, r•MN, and r•MW represent, respectively, the predicted residuals of Conv, MN, and MW from the stacked recurrent neural networks; and Conv•final, MN•final, and MW•final are, respectively, the final hybrid model predictions of the conversion and the number- and weight-average molecular weights. In this study, 20 different sets of temperature profiles were randomly chosen within a reasonable range and used to generate simulated process operation data, which form the original learning data. The reactor temperature was sampled every minute, whereas conversion, MN, and MW were sampled every 8 min, with the whole batch time set to 120 min. The conversion, number-average molecular weight, and weight-average molecular weight were corrupted with random measurement noises following a normal distribution with zero means and standard deviations of 0.008, 2000, and 5000, respectively. Data for conversion, MN, and MW were interpolated using a cubic spline to obtain their values at every minute. Figure 3 shows the measured, simulated, and cubic spline interpolated values. It can be seen that the interpolated values are quite close to the simulated values (the true values). The measured reactor temperature data and the interpolated polymer quality data were used to build hybrid recurrent neural
4530
Ind. Eng. Chem. Res., Vol. 40, No. 21, 2001
giving the best prediction performance on the validation data was selected. The hidden-layer neurons use the sigmoid function as the activation function, whereas the output-layer neuron uses the linear activation function. Network weights were initialized for each network as random numbers uniformly distributed in the range (-0.1, 0.1). Each neural network is trained using the Levenberg-Marquardt optimization algorithm along with a cross-validation-based early stopping mechanism to prevent overfitting. The network’s prediction performance on the validation data is continuously checked during training, and training is stopped at the point where the root-mean-square error (RMSE) of the validation data reaches a minimum. Early stopping is an implicit way to implement regularization, which can improve model robustness.41 After the training of the individual networks, each of the three stacked networks was constructed by combining the corresponding individual neural networks with a set of combination weights. Many approaches were explored in deciding the combination weights, including stacked generalization, stacked regression, PCR, etc. In this work, five sets of combination-weight-test data were generated, which were different from the learning data, to optimize the weighting factors. Feasible sequential quadratic programming (FSQP)42 was used in this optimization problem, in which the combination weights were constrained to be positive and the sum of the weights was constrained to be equal to 1.0. As for r•Conv and r•MN, the obtained combination weights were roughly similar to each other, which implies that every individual network makes almost the same contribution to the stacked generalization. Consequently, the simple average method could be used to construct the corresponding stacked networks. In contrast, for r•MW, the combination weights are apparently different from each other. The reason could be that MW is a much more complicated parameter than Conv and MN in the polymerization reactions so that the modeling and prediction of MW could easily deviate. The stacking weights for the MW predictor are given below.
wMW ) [0.05, 0.48, 0.15, 0.27, 0.02, 0.03] (18)
Figure 3. Measured, simulated, and cubic spline interpolated data for (a) conversion, (b) MN, and (c) MW.
network models. All of the data were scaled to zero mean and unit variance before being applied to network training. For each stacked network, training data for the 6 individual recurrent networks were randomly selected from the 20 sets of learning data. Fifteen sets of data were chosen for each individual network as training data, while the leftover fraction was used as early-stopping validation data. In this way, the stacked network as a whole sees all of the data sets, but each individual network avoids overfitting by use of the crossvalidation-based “early stopping” rule. In the course of network training, the number of hidden neurons was determined by cross-validation to be in the range of 5-20. The number of hidden neurons
4.4. Results and Discussion. To verify the three stacked neural networks, 5 sets of testing data were generated in addition to the 20 sets of learning data and the 5 sets of combination-weight-test data. All of the data were corrupted with measurement noises, as mentioned earlier. Predictions of r•Conv, r•MN, and r•MW for one set of the testing data are plotted in Figure 4, where the solid lines represent the spline interpolated values, and the dotted lines represent the predicted values. As can be seen from the figure, the stacked neural network predictions are quite accurate. To demonstrate the robustness of stacked networks, three single best-trained neural networks were developed to represent r•Conv, r•MN, and r•MW using the same learning data. Comparisons of the RMSE for all of the testing data sets between stacked networks and single best-trained networks are illustrated in Figure 5. In Figure 5, “Stacked NN” refers to stacking networks with different input structures, “Stacked Basic NN” refers to stacking networks with identical input structures (the basic structures indicated by eqs 6, 10, and 12), and “Single NN” refers to the best single neural networks (according to performance on the validation data). The prediction capabilities of both methods are
Ind. Eng. Chem. Res., Vol. 40, No. 21, 2001 4531
Figure 5. RMSE comparisons of (a) Conv, (b) MN, and (c) MW between stacked and single neural network predictions on five sets of testing data.
Figure 4. Stacked neural network predictions of r•Conv, r•MN, and r•MW.
clearly demonstrated. For the best single networks, although good results can be achieved in some cases, there are always more other cases where the best single networks show very poor predictive performance. Stacked networks consistently outperform the best single networks, with reductions in the predicted RMSE of as much as 79.3% in the case of weight-average molecular weight. Meanwhile, stacked networks exhibit much more consistent and accurate performance than single best-trained networks, as shown in Figure 5, especially in the case of MW, where the single network predictions are seen to be rather unstable. Figure 5 also shows that stacking networks with different input structures outperforms stacking networks with identical input struc-
tures. The stacking technique provides a much better choice for representing complicated polymerization processes, bridging the gap in robust empirical modeling. Figure 6 shows the comparisons between the simulations and the hybrid stacked neural network model predictions of Conv, MN, and MW for one set of the testing data. These results are in fairly good agreement, not only in terms of the monomer conversion, but also in view of the number- and weight-average molecular weights. Good agreement between the simulation results and the model predictions is also confirmed in Figure 7, which presents the results obtained through a different initial initiator concentration (with 6% variation from the nominal original condition) for the same temperature profile. This demonstrates the good generalization capability of the hybrid model. Once a model has been developed, it is always necessary to detect any departure of the process from the standard behavior. This is particularly the case for nonlinear systems where nonparametric confidence bounds are required. Confidence bounds are of strategic importance in industrial applications of neural net-
4532
Ind. Eng. Chem. Res., Vol. 40, No. 21, 2001
Figure 6. Comparisons of simulations and hybrid model predictions for Conv, MN, and MW.
works. They can provide an indication of the reliability of the resulting model predictions on previously unseen data.43,44 Here, 95% confidence bounds have been generated for the hybrid stacked neural network models using the method presented by Zhang et al.44 Figure 8 shows the confidence bounds for the hybrid model predictions. For clarity in the illustration, the predictions and confidence bounds are shown every 6 min. Note that these confidence bounds might not be very accurate because of the limited number of the individual networks, but they might still offer some indications of model prediction reliability. The narrower the confidence bounds are, the higher the confidence of the prediction is. It can be observed from Figure 8 that both
Figure 7. Comparisons of simulations and hybrid model predictions for Conv, MN, and MW under variations in the initial conditions.
the Conv and MN predictions have tight confidence bounds, whereas the confidence bounds for MW show less confidence toward the end of the batch process. This could mostly be due to the gel and glass effects near the end of the batch. 5. Optimal Control of the Batch MMA Process Temperature optimal control of polymerization reactors is the most common method being used to produce polymers with desired quality properties, including monomer conversion, molecular weight properties, and polydispersity.45,46 The optimization process always
Ind. Eng. Chem. Res., Vol. 40, No. 21, 2001 4533
Figure 9. Optimal temperature profiles based on the full mechanistic model and the hybrid model.
Figure 8. Simulations and hybrid model predictions for Conv, MN, and MW with confidence bounds.
results in a two-point-boundary-value problem that is normally solved by Pontryagin’s maximum principle. The optimal control problem for this batch reactor is to obtain a polymer with desired final conversion and polydispersity that also satisfies the constraint on the number-average molecular weight. The numerical solution of this optimization problem is achieved by using sequential quadratic programming (SQP). Figure 9 shows the optimal temperature profile for the batch polymerization reactor calculated from the hybrid stack neural network model, in comparison with that based on the full mechanistic model (“real process”). The solid line represents the optimal temperature profile obtained from the full mechanistic model, and
the dotted line represents the profile obtained from the hybrid stacked neural network model. It can be seen that the two temperature profiles follow similar trends. The hybrid-model-based control has an objective function value of 0.0259 on the hybrid model and an objective function value of 0.0279 on the simulation. The two values are fairly close, indicating the accuracy of the hybrid model. Figure 10 shows the resulting conversion, number-average molecular weight, and weightaverage molecular weight based on the optimal temperature profile from the hybrid model. The solid lines stand for the results from the simulation, whereas the dotted lines stand for the results from the hybrid stacked recurrent neural network model. It is clearly indicated that the hybrid stacked neural network model provides excellent performance on process modeling and, hence, leads to reliable optimal control. For the purpose of comparison, a best-single-networkbased hybrid model (as explained in the previous section) was also used to compute the optimal temperature control policy for the batch reactor. Although a seemingly better solution was achieved with an objective function value of 0.0215 on the best-single-networkbased hybrid model, in fact, the value will be 0.0356 on the real process. This means that the hybrid model with the single network can lead to unreliable optimal control as a result of the lack of model generalization capability. Figure 11 shows the simulated and predicted weightaverage molecular weights under this unreliable optimal control. Inspection of Figure 11 shows distinct offsets in MW, indicating the poor model performance. Figure 12 shows the objective function values of three optimal control policies calculated from the hybrid stacked neural network model, the full mechanistic model, and the hybrid best single network model on the respective models and on the real process. It can be seen that the control policy based on the hybrid stacked neural network model performs reliably on the real process, whereas that based on the hybrid best single network model offers significantly degraded performance on the real process. Once again, this shows the robustness and accuracy of the stacked neural network modeling technique. 6. Conclusions A good process model is a necessary prerequisite for the application of optimal control strategies to batch
4534
Ind. Eng. Chem. Res., Vol. 40, No. 21, 2001
Figure 11. Simulations and single neural network predictions of MW.
Figure 12. Optimal objective function values for hybrid models and actual process.
modeling, this contribution shows many promising results in the modeling and optimal control of complex batch polymerization processes. In comparison with the best-single-network-based hybrid model, stacked recurrent neural network modeling demonstrates a strong capability for the modeling of a wide variety of nonlinear complex polymerization processes and proves to be a much more reliable and accurate choice in the control of polymerization processes. Improved model generalization also allows less conservative safety and environmental constraints to be set. Acknowledgment
Figure 10. Simulations and hybrid model predictions of Conv, MN, and MW under the optimal temperature control policy.
polymerization reactors. In this work, a novel approach to the combined mechanistic and empirical modeling of polymerization processes has been proposed. Conventional empirical modeling approaches identify and use a single network for prediction, whereas in this paper, a novel stacked recurrent neural network architecture is presented that effectively integrates the knowledge acquired by different networks to obtain a better predictive model to express the difficult gel effect. Confidence bounds are also incorporated to indicate the reliability of the stacked nets. Because very little work has been done on stacked recurrent neural networks that take the dynamic characteristics of the process into consideration and also on approaches to dealing with gel effect
This work was carried out with the support of the Centre for Process Analytics and Control Technology (CPACT) under the Optimization of Batch Reactor Operations project, and the authors are grateful to all of the industrial members of this project. The authors particularly thank Prof. C. Kiparissides of Aristotle University in Thessaloniki, Greece, for providing the polymerization simulation program, as well as colleagues in CPACT for their constructive comments. The authors also thank the anonymous reviewers for their suggestions for improving the quality of this paper. Literature Cited (1) Soroush, M.; Kravaris, C. Nonlinear Control of a Batch Polymerization Reactor: An Experimental Study. AIChE J. 1992, 38, 1429-1448. (2) Kiparissides, C. Polymerisation Reactor Modeling: A Review of Recent Development and Future Directions. Chem. Eng. Sci. 1996, 10, 1637-1659.
Ind. Eng. Chem. Res., Vol. 40, No. 21, 2001 4535 (3) Ray, W. H. Modeling of Addition Polymerization Processess Free Radical, Ionic, Group Transfer, and Ziegler-Natta Kinetics. Can. J. Chem. Eng. 1991, 69, 626. (4) Penlidis, A.; Ponnuswarmy, S. R.; Kiparissides, C.; O’Driscoll, K. F. Polymer Reaction Engineering: Modeling Considerations for Control Studies. Chem. Eng. J. 1992, 50, 95-107. (5) Ross, R. T.; Laurence, R. L. Gel Effect and Free Volume in the Bulk Polymerization of Methyl Mehtacrylate; Bouton, T. C., Chappelear, D. C. Eds.; AIChE Symposium Series; AIChE: New York, 1976; Vol. 72, p 74. (6) Cardenas, J.; O’Driscoll, K. F. High-Conversion Polymerizations1. Theory Application to Methyl Methacrylate. J. Polym. Sci. 1976, 14 (A-1), 883-897. (7) Cardenas, J.; O’Driscoll, K. F. High-Conversion Polymerizations2. Influence of Chain Transfer on the Gel Effect. J. Polym. Sci. 1977, 15 (A-1), 1883-1888. (8) Cardenas, J.; O’Driscoll, K. F. High-Conversion Polymerizations3. Kinetic Behaviour of Ethyl Methacrylate. J. Polym. Sci. 1977, 15 (A-1), 2097-2108. (9) Marten, F. L.; Hamielec, A. E. High Conversion DiffusionControlled Polymerization. ACS Symp. Ser. 1979, 104, 43-70. (10) Chiu, W. Y.; Carratt, G. M.; Soong, D. S. A Computer Model for the Gel Effect in Free-Radical Polymerization. Macromolecules 1983, 16, 348. (11) Louie, B. M.; Soong, D. S. Optimization of Batch Polymerization ProcessessNarrowing the MWD. I. Model Simulation. J. Appl. Polym. Sci. 1985, 30, 3707. (12) Vaid, N. R.; Gupta, S. K. Optimal Temperature Profiles for Polymerization in the Presence of End-Point Constraints. Polym. Eng. Sci. 1991, 31, 1708. (13) Kapoor, B.; Gupta, S. K.; Varma, A. Parametric Sensitivity of Chain Polymerization Reactors Exhibiting the Trommsdorff Effect. Polym. Eng. Sci. 1989, 29, 1246. (14) Ravikumar, V.; Gupta, S. K. Optimal Parameter Estimation for Methyl Methacrylate Polymerization. Polymer 1991, 32, 3233. (15) Achilias, D.; Kiparissides, C. Development of a General Mathematical Framework for Modelling Diffusion-Controlled FreeRadical Polymerization Reactions. Macromolecules 1992, 25, 3739. (16) Ray, A. B.; Saraf, D. N.; Gupta, S. K. Free Radical Polymerizations Associated with the Trommsdorff Effect Under Semibatch Reactor Conditions. I: Modeling. Polym. Eng. Sci. 1995, 35, 1290-1299. (17) Psichogios, D. C.; Ungar, L. H. Direct and Indirect Model Based Control Using Artificial Neural Networks. Ind. Eng. Chem. Res. 1991, 30, 2564. (18) Tsen, A. Y. D.; Jang, S. S.; Wong, D. S. H.; Joseph, B. Predictive Control of Quality in Batch Polymerization Using Hybrid ANN Model. AIChE J. 1996, 42, 455-465. (19) Thompson, M. L.; Kramer, M. A. Modeling Chemical Processes Using Prior Knowledge and Neural Networks. AIChE J. 1994, 40, 1328-1340. (20) Bulsari, A. B., (Ed), Computer-Aided Chemical Engineering; Elsevier: Amsterdam, 1995; Vol. 6, Neural Networks for Chemical Engineers. (21) Mujtaba, I. M., Hussain, M. A., Eds. Application of Neural Networks And Other Learning Technologies in Process Engineering; Imperial College Press: London, 2001. (22) Sridhar, D. V.; Seagrave, R. C.; Bartlett, E. B. Process modelling using stacked neural networks. AIChE J. 1996, 42, 2529-2539. (23) Alpaydin, E. Multiple Networks for Function Learning. In Proceedings of the 1993 IEEE International Conference on Neural Networks, IEEE Press: Piscataway, NJ, 1993; Vol. 1, pp 9-14. (24) Hansen, L. K.; Salamon, P. Neural Network Ensembles. IEEE Trans. Pattern Anal. Mach. Intell. 1990, 12, 993-1001. (25) Perrone, M. P.; Copper, L. N. When Networks Disagree: Ensemble Methods for Hybrid Neural Networks. In Neural Networks for Speech and Image Processing; Chapman & Hall, London, 1993. (26) Bates, J. M.; Granger, C. W. J. The Combination of Forecasts. Oper. Res. Q. 1969, 20, 451-468.
(27) Cooper, L. Hybrid Neural Network Architectures, Equilibrium Systems that Pay Attention, Neural Networks: Theory and Applications; Academic Press: New York, 1991; pp 81-96. (28) Mani, G. Lowering Variance of Decisions by Using Artificial Neural Networks Portfolios. Neural Comput. 1991, 3, 484486. (29) Pearlmutter, B. A.; Rosenfeld, R. Chaitin-Kolmogorov Complexity and Generalization in Neural Networks. In Advances in Neural Information Processing Systems; MIT Press: Cambridge, MA, 1991; Vol. 3, pp 925-931. (30) Wolpert, D. H. Stacked Generalization. Neural Networks 1992, 5, 241-259. (31) Breiman, L. Stacked Regression. Mach. Learning 1996, 24, 49-64. (32) Freund, Y.; Schapire, R. Experiment With a New Boosting Algorithm. In Machine Learning: Proceedings of the 13th International Conference; Morgan Kaufmann: San Francisco, CA, 1996, 148-156. (33) Zhang, J.; Martin, E. B.; Morris, A. J.; Kiparissides, C. Inferential Estimation of Polymer Quality Using Stacked Neural Networks. Comput. Chem. Eng. 1997, 21, s1025-s1030. (34) Dadebo, S. A.; McAuley, K. B. Dynamic Optimization of Constrained Chemical Engineering Problems Using Dynamic Programming. Comput. Chem. Eng. 1995, 19, 513-525. (35) Baillagou, P. E.; Soong, D. S. Major Factors Contributing to the Nonlinear Kinetics of Free-Radical Polymerization. Chem. Eng. Sci. 1985, 40, 75. (36) Su, H. T.; McAvoy, T. J.; Werbos, P. Long Term Prediction of Chemical Processes Using Recurrent Neural Networks: A Parallel Training Approach. Ind. Eng. Chem. Res. 1992, 31, 13381352. (37) Zhang, J.; Morris, A. J.; Martin, E. B. Long Term Prediction Models Based on Mixed Order Locally Recurrent Neural Networks. Comput. Chem. Eng. 1998, 22, 1051-1063. (38) Zhang, J. Developing Robust Nonlinear Models through Bootstrap Aggregated Neural Networks. Neurocomputing 1999, 25, 93-113. (39) Zhang, J. Inferential Estimation of Polymer Quality Using Bootstrap Aggregated Neural Networks. Neural Networks 1999, 12, 927-938. (40) Maschio, G.; Bello, T.; Scali, C. Optimization of Batch Polymerisation Reactors: Modeling and Experimental Results for Suspension Polymerization of Methyl Methacrylate. Chem. Eng. Sci. 1992, 47, 2609-2614. (41) Sjoberg, J.; Zhang, Q.; Ljung, L.; Benveniste, A.; Delyon, B.; Glorennec, P. Y.; Hjalmarsson, H.; Juditsky, A. Nonlinear Black-Box Modeling in System Identification: A Unified Overview. Automatica 1995, 31, 1691-1724. (42) Zhou, J. L.; Tits, A. User’s Guide for FSQP Version 3.7; Technical Report SRC-TR-92-107r5; Institute for Systems Research, University of Maryland: College Park, MD, 1997. (43) Shao, R.; Martin, E. B.; Zhang, J.; Morris, A. J. Confidence Bounds for Neural Network Representations. Comput. Chem. Eng. 1997, 21, s1173-s1178. (44) Zhang, J.; Morris, A. J.; Martin, E. B.; Kiparissides, C. Estimation of Impurity and Fouling in Batch Polymerisation Reactors through the Application of Neural Networks. Comput. Chem. Eng. 1999, 23, 301-314. (45) Hicks, J.; Mohan, A.; Ray, W. H. The Optimal Control of Polymerization Reactors. Can. J. Chem. Eng. 1969, 47, 590. (46) Thomas, I. M.; Kiparissides, C. Computation of the NearOptimal Temperature and Initiator Policies for a Batch Polymerization Reactor. Can. J. Chem. Eng. 1984, 62, 284-291.
Received for review December 7, 2000 Revised manuscript received June 25, 2001 Accepted August 22, 2001 IE0010565