Developing targets for the performance index of a chemical reactor

Developing targets for the performance index of a chemical reactor network: isothermal systems. Luke K. E. Achenie, and Lorenz T. Biegler. Ind. Eng. C...
0 downloads 0 Views 3MB Size
Ind. Eng. Chem. Res. 1988,27, 1811-1821

1811

PROCESS ENGINEERING AND DESIGN Developing Targets for the Performance Index of a Chemical Reactor Network: Isothermal Systems Luke K. E. Achenie and Lorenz T. Biegler* Department of Chemical Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213 Target-based synthesis is a systems approach t o reactor synthesis; i t maximizes the performance index of a chemical reactor network as measured by its objective function (such as yield or reactor volume). The approach converts the synthesis problem into an optimal control formulation in which the residence time density function and the micromixing function are the control variables. T h e optimal control problem is then solved by using a gradient-based algorithm that employs successive quadratic programing and adjoint variables. Examples are given t o illustrate the approach taken. 1. Introduction A chemical reactor is frequently the key unit in a chemical process. Although the overall profitability of the chemical plant depends on the cost of the various units (heat exchangers and separators, for example) and on everyday operating costs, we focus on the reactor design in isolation here because it strongly affects downstream concerns. An extensive amount of work has been done in this area (Levenspiel, 1972; Aris, 1961); however, most of the approaches have been based on heuristics and on graphical techniques. Nestoridis et al. (1986) recently used convex analysis to show that the yield of a reaction (occurring in segregated flow) is maximized if such a reaction is carried out in a combination of two plug flow reactors in parallel. While this analytic approach is encouraging, the assumption of segregated flow and the convexity of the objective function (in this case, yield) is limiting. The algorithmic approach that we use is a viable alternative especially when the kinetics and the performance index (yield, selectivity, reactor size, etc.) are complicated functions of concentration as well as other process variables and parameters. We pose the reactor synthesis problem as the following two part problem: 1. Given the kinetics of a homogeneous reaction, what combination of ideal reactors, namely CSTR’s, recycle reactors, and PFR’s, will maximize a given performance index? 2. What is a bound (target) on the performance index irrespective of the reactor type and configuration? Optimal reactor networks for homogeneous reactions depend to a large extent on mixing, temperature effects, and reaction times. To tackle problem 1, we postulate a superstructure of reactors, together with a set of decision variables that exhibit the effects of mixing, temperature, and reaction times. By manipulation of these variables, an optimal subnetwork of reactors is derived (Achenie and Biegler, 1986a,b). However, the optimal structure depends to a large extent on the complexity of the superstructure, which may be characterized by how many substructures it contains. The need for an independent measure of the

* To whom correspondence should be addressed.

performance index, free of restrictions imposed by the reactor type and configuration, is thus clearly present. We propose to solve the target problem by determining the optimal micro- and macromixing profiles for a reactor model that takes into account both kinds of mixing. For isothermal homogeneous reactions occurring in a reactor with premixed feed, Chauhan et al. (1972) have shown that two extremes of mixing put a bound on the yield of a single reaction that has a concave or convex rate function. These mixing extremes are realized in a completely segregated reactor and a “maximum mixed reactor (Zwietering, 1959),respectively. It is immediately apparent that, for more complex kinetics and for performance indexes other than yield, the two extremes of mixing will not give rise to extreme values of the performance index. In fact, a reactor with an intermediate degree of mixing may result in the maximum yield or selectivity. To predict or simulate the effect of mixing on the throughput of a reaction (which may involve many elementary steps), a number of mixing models for reactors with both premixed and unmixed feeds have been proposed (see Rao and Edwards (1973) for a comparison of some of these models). Features that are common among these models are the macromixing and micromixing concepts. While on one hand macromixing is characterized by the residence time density function, on the other hand the micromixing level is a measure of how intimately molecules mix with each other and how early or late this mixing occurs (see for example Levenspiel (1972)). When these concepts are combined, the mixing models are capable of simulating various degrees of mixing (between completely segregated and maximum mixedness) with varying degrees of success. Three of the models that, in our opinion, are suitable for optimization are due to Ng and Rippin (1965), Villermaux and Zoulalian (19691, and Jackson and Glasser (1986). In Ng and Rippin’s two-environment model, molecules of the fresh feed initially enter a segregated environment. A fraction of molecules then transfer to the maximum mixed environment at a rate proportional to the total number of molecules in the segregated region. When the transfer coefficient, h, is varied from zero to infinity complete segregation, intermediate mixing and maximum mixedness are simulated. Since this model is a one-pa-

0888-588518812627-1811$01.50/0 0 1988 American Chemical Society

1812 Ind. Eng. Chem. Res., Vol. 27, No. 10, 1988

rameter model, its range of applicability is somewhat limited. Jackson’s model can be thought of as a number of plug flow reactors in parallel, each with a different residence time. By allowing these reactors to exchange material at various “remaining lives”, a wide range of degrees of mixing are simulated. While this approach is elegant and one of the more realistic ones, the resulting formulation is currently too complicated for optimization (Glasser et al., 1987);one of the reasons for this is the fact that there is a need to optimize a general joint probability distribution function, a function of the age and residual life of a molecule. The third model by Villermaux and Zoulalian is closely related to the Ng and Rippin model. Here, also, there is a segregated environment and a maximum mixed environment. They postulated that a fraction, h, of the molecules that will remain in a reactor for t + At units of time will spend all this time in the maximum mixed environment. The remaining fraction, 1- h, will spend all its time in the segregated environment. By choosing a different fraction, h, for different aggregates of molecules, one obtains a distribution, h(t),that describes the micromixing pattern. The residue time density (RTD) remains intact, and it characterizes the macromixing in the system. Because there is no exchange of material between the two mixing environments in the Villermaux model, it is less general and can be classified as a special case of an “extended” Ng and Rippin model.

2. Optimal Control Problem Of the three models, the Ng and Rippin model appears to have the best tradeoff between a realistic model and one that is easy to implement in an optimization algorithm. The model is appropriate for homogeneous reactions involving low-viscosity reactants in continuous flow reactors. In the model, the transfer coefficient, h, is a constant based on the assumption that all molecules in the segregated environment (irrespective of age) have the same probability of transferring to the maximum mixed environment. In our implementation, we have increased the range of applicability of the model by allowing h to be dependent on the age of a molecule in the segregated environment. With this modification, the model becomes a multiparameter model since it is a continuous function of age. The modification is justified on the grounds that molecules that have been in the segregated environment for a longer period of time will tend to have a greater probability of transferring to the maximum mixed environment than the younger molecules. For generality, however, we do not assume that h is a monotonic function of age. By use of the modified Ng and Rippin model as a basis, an optimal control problem (solved as a nonlinear program or NLP) can be formulated in which the objective function is the performance index and the decision variables are the macromixing function (the RTD) and the micromixing function (the transfer function, h). The mass balances and the bounds on the decisions form the set of constraints in this problem. To solve the optimal control problem, we approximate the residence time density function (associated with the RTD) and h(t)by a set of basis functions on finite elements. The NPL is solved by using successive quadratic programming (SQP). With the optimal control formulation, one can obtain solutions to the following problems: 1. For a given RTD, what micromixing profile will maximize the performance index? 2. Given a micromixing function, what RTD will maximize the performance index?

residual life O-/

t+

Xexit

p age Figure 1. Scheme of intermediate mixed model.

3. What are the optimal RTD and micromixing profiles that will give the optimal performance index? Since the mixing models use the residual life and age concepts, the optimal control problem may not yield the details of the reactor type and configuration. If the intermediate mixed reactor model is capable of simulating all degrees of mixing, and if mixing and reaction are the dominant determinants of the product mix, then the performance index thus obtained is a bound on the performance index that can be obtained by using the superstructure approach to reactor network design. However, although the modified Ng and Rippin intermediate mixed reactor model is capable of simulating a wide range of mixing states, we cannot make the claim that it covers all mixing states. From Figure 1,it can be seen that this model only considers a single region of maximum mixedness. Multiple regions of maximum mixedness can be beneficial for some reactions and can be treated as a straightforward extension of this model. Here, one needs only to add multiple mixed regions in Figure 1and corresponding h profiles for each mixed region. Despite this, the target approach represents an independent approach to getting an “approximate” upper bound on the performance index. Subsequently, the richness of the reactor network superstructure can be measured by how close its optimal performance index is to that obtained in the residual life/age domain. Although the modified Ng and Rippin model is based on isothermal homogeneous systems, it can be extended to nonisothermal systems. Our optimal control formulation can be made to include species-dependent SRTD and micromixing functions. Thus, a target for the performance index of a reactor superstructure which has component splits (by distillation, for instance) can be calculated. 3. The Modified Ng and Rippin Model In Figure 1,“MAX” and “SEG” refer to the maximum mixed and the segregated environments,respectively. X o is the fresh feed flow rate, and Xedtis the average exit concentration vector. According to the Ng and Rippin model, molecules in the fresh feed initially enter the segregated environment. Within this environment,molecules that have the same age (i.e., entered the reactor together) are well mixed; however, they do not mix with younger or older molecules. Some of the molecules leave the segregated environment and proceed directly to the reactor exit. Some enter the maximum mixed environment to mix and react with molecules of different ages. Since molecules that are mixed cannot spontaneously unmix (on thermodynamic grounds), the only molecules that can mix are those that have the same residual life and will therefore leave

Ind. Eng. Chem. Res., Vol. 27, No. 10,1988 1813

The key assumption is that the rate of transfer of material of age t from the segregated environment to the maximum mixed environment is proportional to the amount of material of age t remaining in the segregated environment. If m(t)is the amount of material of age t , then dm/dt = -h(t)m(t) e z,

which implies where g(t) = J t0h ( s ) ds and

mo = m(0) Figure 2. Some representations for a PFR.

/

rFl-

SEG

I

The derivation of the mass balances for the scheme in Figure 1 closely parallels that of Ng and Rippin. With reference to Figure 1, the following terms can be defined for the two mixing environments: cy is the age of a molecule in the segregated environment, X is the residual life of a molecule in the maximum mixed environment, C , is the vector of concentrations in the segregated environment, C,, is the vector of concentrations in the maximum mixed environment, Ceit is the average concentration at the exit of the reacting system, Cois the concentration of feed to reactor, and Crefis the reference concentration. In addition, p is the density of reaction mixture, assumed constant; r is the rate of production of a species by reaction; V is the total volume of system (both environments), f is the residence time density (or distribution) function (RTD), and h is the micromixing function. The concentrations can now be calculated, as a function off and h, by writing mass balances over the two mixing environments. Referring to eq 1,the volume of material in the segregated and maximum mixed environments having residual lives between X and X + dX is given by

Figure 3. Representations for some reactor trains.

the maximum mixed environment together. Mixing, if it occurs, takes place as soon as possible. Within the intermediate mixed reactor schematic, representations for some reactor trains can be found (see Figures 2 and 3). According to the model, the rate a t which molecules leave the segregated environment to enter the maximum mixed environment is directly proportional to the number of molecules present in the segregated environment. In this work, the modification that is made is that the rate is directly proportional to the number of molecules with age between a and a + da. As such, while in the original Ng and Rippin formulation the rate is not dependent on age, the modification allows the rate to depend on age in the segregated environment. The reasoning behind this is explained in the previous section. The main difference between the segregated and the maximum mixed environments is that in the segregated environment molecules with different residual lives but the same age are well mixed; on the other hand, in the maximum mixed environment, molecules with different ages but the same residual lives are mixed. Therefore, at each cross section in either environment there are no concentration gradients. Some of these shortcomings can be alleviated by postulating species-dependent RTDs. Other ways.to extend the model will be to postulate more than one segregated and/or maximum mixed environment.

and

respectively. Subsequently, the moles of reactant in the segregated and maximum mixed environments for residual lives between X and X + dX are given by

and

respectively. Note that the concentrations in the maximum mixed environment are not functions of age. Now the mass balance in the maximum mixed environment can be written as [accumulation of chemical species in the maximum mixed environment) = {rate of supply from the segregated environment) + [rate of production by reaction) (2) Equation 2 is expressed in mathematical terms through the next few equations. Since residual life is in the op-

1814 Ind. Eng. Chem. Res., Vol. 27, No. 10, 1988

posite direction of time, the mass balance a t a given X is given by -d/dX lCm,(X)lm(l 0 - e-g("))f(a+ A) dal = LmCSe,(a)h(a)e-g(")f(a + A) d a r[Cm,(X)I

+

Sm(le-g("))f(a+ 0

-

X) d a (3)

Moreover, one can show that d/dX

Sm(le-g("))f(a+ A) d a = -

0

L--h(a)e-g(")f(a + A) d a (4) and the mass balance can be simplified to --[Lm(l - e-g("))f(a+ A) d a ) d/dX Cm,(X)

value for C,(m) to integrate eq 6 backwards. Numerical experience has also given credence to this finding. A possible qualitative explanation for this is that, since the RTD is such that very few molecules have unusually long residence times, the concentration of these molecules does not affect the exit concentrations to a significant extent, and hence any finite concentration assigned to molecules that have long residual lives will lead to the same exit concentrations. 4. Formulation of the Nonlinear Program To avoid making the formulation unit specific, we normalize (nondimensionalize) the concentration vector, C , and other quantities by letting X = C/Cref= concentrations, Xo = C0/Cref= feed concentrations, and F(X) = r/Cref = rate of reaction. Also let [ = CY + X; then eq 6 becomes

+

Cm,(X)S,mh(a)e-g(')~(a+ X) d a =

r [ C m a x ( X ) ] l m-( le-g("))f(a 0

For constant

p,

+ A) d a

(5)

this can be simplified to give

Equations 7 and 8 become

dCmax(X) dX S,-[Cmax(h) - C,,,(a)]h(a)e-g(")f(a + X) d a -4Cmax)

+

Lm(l.O - e-g("))f(a + A) d a (6)

The exit concentrations arise from an averaging over the maximum mixed and segregated environments a t zero residual life; i.e.,

[

Cexit= l m0 ( l . O - e-g("))f(a)dalCm,(0)

+

so that

S,-CSeg(a) e-g(a)f(a)d a (7) The concentration profile in the segregated environment is defined by dCseg/da = r(Cseg(a))

(8)

Following Ng and Rippin, eq 6 can be solved by using the Zwietering boundary condition: dCmax/dX= 0 at X = m (9) This boundary condition makes it necessary to integrate eq 6 backwards using C-(m). The solution resulting from eq 6 and the boundary condition, eq 9, is given by 0 = -r(Cm,) + S,-[Cmax(X) - CBeg(a)lh(.)e-g'"'f(~+ lim

A--

Jm(l.O

+ A) d a

da (10)

- e-g("))f(a

Equation 10 is generally very difficult to solve. However, is finite. Glasser in a chemical reaction system, C,,(m) et al. (1986) have shown that for some reaction systems different values for Cmm(m)yield the same values for Cm,(0), which is needed in the expression for Cexit,the average exit concentration. Given this property, one can thus avoid solving eq 10 and instead use an arbitrary finite

where the fact that L m f ( a d) a = 1

has been used. Also,

Next replace t = m by t,, (a suitably large value) and divide the interval [O,t,,] into N finite elements (see Figure 41, [tj,tj+,], such that t, ti+, V j E [l,N].Here t , = 0 and tN+,= tm,. The present formulation of the target problem has the following features: (i) an objective function that depends only on exit concentrations and decision variables (the vector Y )and (ii) constraints imposed on the average residence time. Thus, the objective function can include residence time, which is a measure of reactor volume and hence the investment cost of the reactor. Other costs associated with the processing of the reactor effluent can be included in the objective function, as long as these costs can be expressed only as functions of the decisions and/or the effluent flows.

Ind. Eng. Chem. Res., Vol. 27, No. 10, 1988 1815 N M

max

Figure 4. Partitioning of [O,t,,]

into N finite elements.

For the target problem, an objective is chosen and it is optimized subject to the mass balances in the modified Ng and Rippin model. In this regard, we define J = J(Xexit,Y)

as the objective function and choose as decision variables (Y) the piecewise continuous functions h and f (being the micro- and the macromixing functions, respectively), the average residence time, T , and the feed, X,,. With the above definitions, the target problem can be cast from an optimal control problem to a nonlinear program (NLP) defined over N finite elements as follows: max J(Xexit( Y), Y) = Y) NLPA

N M

h(t) h, = j=1 iChjiC$ji(t,ajh) =l

hji 1 0

(32)

Using this discretization, the set of decisions, Y, becomes Y = [Uji),(hji),.if,ajh,(7jJ,XOTIT Note that for a particular problem one need not use the full set of decision variables. For example, one may fix Xo or set aif = ajh = a constant. There are an infinite number of basis functions, 4(t,a) that can be used. However, for faster convergence, the choice of bases should take into account characteristics of the nonlinear programming algorithm. In the test examples, to be discussed later, two sets of basis functions are described. The bases, (4ji(t)),can be defined globally (i.e., on the entire interval [O,t,,]), or locally, on an element:

subject to

[tj,tj+~lC

[o,tmaxl

Since residence time density functions can be discontinuous on [O,t,] (e.g., the PFR RTD), global interpolation is usually inappropriate. By definition, a set (@ji(t)) will have compact support (be non-zero) only on the j t h element. f, and h, are used to denote the finite element approximations of f and h, respectively. The partitioning of [O,tm,] is illustrated in Figure 4. In Figure 4 [tj,tj+l] E j t h finite element tl = 0,

~ N + I=

tj+l > tj, j

f[ l a 1

Choosing N equal to one corresponds to global interpolation; otherwise, for local interpolation on N elements, N is greater than one.

Here T is the average residence time. The boundary conditions (22)-(27) express the fact that the states (concentrations) are continuous at the knots. Equation 29 ensures that the fraction of molecules with residence time less than infinity is one. The average residence time, T, may be fixed a t some value or it may be allowed to find its optimum level, which can be shown to be bounded by tmax.

Since f ( t ) and h(t)are assumed piecewise continuous, they can be approximated on the j t h element (see Figure 4) by a linear combination of a set of M linearly independent basis functions (bases), C$ji(t,aj). Here aj is a parameter that influences the shape of the basis function. Thus,

5. Deriving Gradients for Quadratic Program There are a number of ways to solve the nonlinear optimal control problem (NLPA). Among these are methods based on Pontryagin’s strong maximum principle and control vector parameterization coupled with direct search methods. The nonlinearity of the modified Ng and Rippin model (with respect to the control variables f and h) makes the above methods impractical. To solve NLPA, a gradient-based method is used. Solving a system of adjoint equations is a convenient way of getting gradient information which is then used in the solution of NLPA. Adjoint variables, which are Lagrange multipliers on the differential equations, are functions of time. The gradient of the objective function with respect to the set of decision variables is expressed in terms of the adjoints and states. For clarity, the adjoints and gradients have been relegated to the Appendix. 6. Strategy for Solving Optimal Control Problem The target problem has been formulated as a nonlinear program (NLP). Different methods exist for its solution. For the NLP, a gradient-based algorithm is employed that uses successive quadratic programing (SQP) to update the decision variables. The SQP step employs the gradients that were presented in the previous section. Let Y be the vector of decision variables:

Y = [Uji),(hji),aif,cujh,o,XoTIT Then in terms of Y, the NLP takes the simpler form NLPB max J[Xe,,( Y),Y] = fi( Y)

1816 Ind. Eng. Chem. Res., Vol. 27, No. 10, 1988

subject to (33)

s,i, IS ( Y ) IS"

(34)

The nonlinear constraint set (34) contains the integral equality constraints on f (eq 29 and 30) and may involve some or all the decision variables. For example, one may also require the RTD and micromixing functions to be continuous at the knots. Note also that the integral constraints on f are linear in and, in general, nonlinear in a ! . Finally, the following strategy for solving the NLP is proposed: (i) Provide an initial guess for the decision variables, Y . (ii) For a given value of decisions, Yk, a t the lzth iteration, solve the nonlinear state equations (19)-(21) together with the associated boundary conditions using a Fortran routine such as LSODE. (iii) Evaluate the objective function J. (iv) Using state variables from (ii), solve the linear adjoint equations (see Appendix). (v) Use the set of gradient equations to calculate the derivatives of the objective function with respect to the above decision variables. (vi) Calculate a search direction for the decision variables by setting up and solving the following quadratic program:

vji]

maxdkQ = VT9(Yk)dk + 0.5dkTBkdk quadratic program subject to

S,i, - AYk IAdk IS,,

"1

- AYk

Here Bkis a positive definite Hessian matrix constructed by a quasi-Newton updating formula and serves as an approximation to v2L(Yk);L is the Lagrangian function constructed from NLPB. With the search direction, dk, update Y by Yk+l = Yk akdk,where a k is stepwise selected so that a sufficient decrease in 9 is found at Yk+l. This successive quadratic programming (SQP) algorithm is based on Han (1977) and Powell (1977). A complete description of an updated algorithm is given in Biegler and Cuthrell(l985). The above QP is solved by using a Fortran package, QPSOL, by Gill and Murray (1978). (vii) If the Kuhn-Tucker condutions for a stationary point for the NLP are satisfied to a specified tolerance, 6, STOP. Otherwise, go to (ii). Depending on the general nature of the objective function, J,there could be multiple local optima for NLPB. As a result, to increase the likelihood of finding a global optimum, it might be useful to restart the algorithm from different initial points.

+

7. Numerical Integration and Sources of Error In the calculation of the gradients, multiple integral terms (see eq 47, for example) were evaluated by using either an n-point Gaussian quadrature or Simpson's rule; the Fortran routine, QBO~AD(Harwell Library), was found useful, in this regard. On a uniform mesh, if h is the mesh size, using Simpson's rule results in an error of 0(h5).The Hamiltonian, H, is a function of the states and the adjoint variables. To perform the above integration, interpolation was used on the stored values of the states and adjoints. Subsequently, there were three main sources of error that impacted on the gradient evaluation, namely, error from (i) the differential equation solver, ~ ~ 0 (ii) 4 ; interpolation of stored values of the states and adjoints; and (iii) quadrature.

0

: 0.0

.

, . 0.2

1 0.4

I

0.6

.

,

0.8

.

1

1.0

.

1

1.2

TIME

Figure 5. Basis on [0,1] (basis 1)for different values of a.

Figure 6. Williams and Otto flow sheet.

To reduce errors from (i) and (iii), smaller error tolerances were used; also, error type (ii) was reduced by using a finer mesh. Reformulating the integral as a differential equation and solving it simultaneouslywith the adjoints is a good way to cut down on errors that affect the gradients. However, in the target-based synthesis, such reformulation is difficult since the integrand is a function of the limits on the integrals as well. 8. Test Examples In the test examples, two sets of basis functions, basis 1and basis 2, are used to illustrate the target approach. The f i t set involves exponential functions, and the second set contains constant elements. Using the second set is equivalent to approximating the RTD and micromixing profiles by piecewise constant functions; that is, the profiles are assumed to be constant on each element. As will be seen later, the choice of basis functions affects the quality or the accuracy of the maximum objective function. For the first set of basis functions, M in eq 38 and 39 is chosen to be 1. This means that, on the j t h element (see Figure 4), the RTD is approximated by a constant, fjl, multiplied by one basis function, $jl; the micromixing function is approximated in the same way. The basis function is chosen as follows: for t E [tj,tj+J $jl(t,a)= aj exp(ajt)/[exp(ajtj+l) - exp(ajti)l basis 1

The parameter ajis to be chosen appropriately, by the optimization algorithm. Basis 1is plotted in Figure 5 for different values of aj. The choice of basis 1is motivated by the following: (i) f and h are nonnegative over [O,t-]; (ii) the shapes of the basis mimic the shapes of the RTDs normally found in the chemical engineering literature; (iii) as an exponential function, the basis is easy to integrate or differentiate and is otherwise easy to manipulate; (iv) S i m = f m ( t ) dt = 1 implies E$l E ,: f j i = 1,a linear constraint. Note that the integral constraint above is found in NLPA, and it expresses the fact that the fraction of molecules with residence time less than t,, (i.e., infinity) is equal to one.

Ind. Eng. Chem. Res., Vol. 27, No. 10, 1988 1817 30 6

-.

4

-.

1

RTD 2 -.

0

-. I

-2

0

1

2

1

3

time

Figure 7. RTD for example A.

time

Figure 8. h, for example A.

Similarly, M = 1 for the second set of basis functions. Here, for t E [tj,tj+J, 4j1 = 1.0 basis 2

A number of examples have been chosen to illustrate the target approach to reactor synthesis (involving homogeneous fluid phase reactions). The first two examples deal with the Van de Vusse reaction (isothermal), which has been well studied by a number of researchers (e.g., Chitra and Govind, 1981). In both of these problems, the yield (defined as the normalized concentration) of the intermediate species, B, is maximized. The next three examples represent more complex kinetics, with selectivity of the valuable product as the objective function. In many process problems with complex kinetics, it becomes difficult to infer an optimal reactor design with qualitative arguments. Moreover, in many process problems, selectivity and not the yield is the more relevant objective. Example A. The isothermal Van de Vusse reaction involves four species for which the objective is the maximization of the yield of the intermediate species, B. The kinetic constants in the mechanism are given by A L B - Ck 2 k31

D

kl = 10 s-' ( f i r s t o r d e r ) k2 = 1 s-' ( f i r s t o r d e r ) k g = 1 L / ( m o l s) (second order)

molar flow r a t e of r e a c t a n t , A = 5 8 molls J = x exit Cno-0.58 m o l / L B

In terms of normalized concentrations, X , the reaction rate vector for the four components is given by

F(X) = [-k1 -

0.5k3XA2;k1xA - k2XB; k2XB; 0.5k3XA2IT

where

kl = kl

k,

= k2

k,

= CAok,

This example was solved by Chitra and Govind (1981) and later by Achenie and Biegler (1986a,b),who reported an optimal reactor network of a single plug flow reactor. By use of the set of basis functions, basis 1, the target approach resulted in an optimal B yield of 0.4391 mol/L with an optimal residence time of 0.2370 s. For this problem, 18 elements were used and t,, was set to 2.5. In comparison, Achenie and Biegler (1986a,b) report an optimal yield of 0.4368 with a single plug flow reactor with a residence time of 0.2965 s. As shown in Figure 7, the RTD shows one long peak next to a shorter one. This may point to two plug flow like reactors in parallel. On the other hand, if one imagines a smooth curve drawn through the data points, the RTD will look like an exponential RTD. The micromixing profile (Figure 8) shows a substantial degree of micromixing in the region of the two peaks. This points to

time

Figure 9. RTD for example B.

time

Figure 10. h, for example B.

significant backmixjng in the optimal reactor configuration. This example was resolved by using basis 2; that is, the RTD and micromixing profiles were assumed to be piecewise constant. Although the optimal objective function ( J = 0.4331) was slightly smaller than that reported above, the profiles were quite similar to those obtained from basis 1. Example B. Here, the reaction diagram is the same as the one in example A except that now CAO = 5.8 mol/L. Both Chitra and Govind (1981) and Achenie and Biegler (1986b) reported an optimal reactor network of one CSTR followed by a PFR, for the superstructure problem. The optimal RTD for such a reactor train can be desoribed as an exponential RTD with a time delay. The target approach resulted in an optimal B yield of 3.6956 mol/L and an optimal residence time of 0.2631 s. Using a single CSTR followed by a PFR, Achenie and Biegler (1986b) reported an optimal B yield of 3.6806 mol/L with an optimal overall average residence time of 0.2381 s. The optimal RTD (Figure 9) shows a lag followed by either a Dirac delta function (plug flow like) or exponential with a very large time constant. The fact that the micromixing function (Figure 10) is large over most of the interval would suggest a network of a small CSTR followed by a PFR as opposed to a PFR followed by a CSTR. Example B was again solved by using basis 2. As in example A, although the optimal objective function ( J = 3.6106) was slightly smaller than that reported above, the profiles were quite similar to those obained from basis 1. Example C. The a-pinene problem (as stated by Stewart and Sorensen (1981)) is a reaction network that consists of five major species and a number of reversible and irreversible steps given below. This system has mainly been studied for parameter estimation of the rate constants. A literature search has

1818 Ind. Eng. Chem. Res., Vol. 27, No. 10, 1988

I

0

.1

1

3

2

5

4

6

L,

time

7

time

Figure 12. h, for example C.

Figure 11. RTD for e x a m p l e C.

failed to turn up any “optimal” reactor network for this reaction. At the temperature of the reaction, 580 OC, the rate constants are 41

A-E

D

*6-

0

k g = 0 149 4 0 / S - ’ k p = 0 266 8 7 / s - ’ k , = 0 333 8 4 / s - ’ kd = 0 1 8 9 5 7 L / ( m o l s) k 5 = 0 0 0 9 5 9 8 L / ( m o l s ) k6 = 0 2 9 4 2 5 / ~ - ’ k 7 = o 011 9 3 2 I s - l [ A , 0, C . D. E l = [ a - p i n e n e . a - a n d R - p y r o n e n e . dimer, 8110- ocimene, d i p e n t e n e ]

A more complicated objective than an intermediate or product yield is chosen for targeting the optimal reactor network. Here the objective is to maximize the selectivity of C over D, defined by Xc/XD. The reaction rate vector for the five components is given by F ( X ) = [-(&I + kz)XA - 2k5XA2;- & , x B k3f;L); & S X A ~h 4 X ~ - k&c; ~ + k 6 X ~- k,XD_2 k 4 X ~+~2h7Xc; k l X ~ j T

+

where

kL = CIefn-lk,

Cref= 1.0

and n is the order of the reaction step. The target approach resulted in an optimal selectivity of 0.2336 and a residence time of T = 2.894 s. Here the number of elements is 18, and t,, equals 6.0. The optimal RTD starts with a truncated exponential, is zero for most of the interval, and ends with a step function. Together with the micromixing function, it would seem to suggest that the optimal configuration is a CSTR in parallel with a PFR with large residence time (see Figures 11 and 12). Example D. Now let us consider the isothermal Denbigh reaction. For this reaction, the target approach will be used for reactor network synthesis. The following scheme characterizes this reaction: A k , _ B L C kg1

D

k4/

E

-

k t = 1 0 L / ( m o l s) (second order) (first order) k2 k 3 = 0 6 s-‘ k4 = 0 1 L / ( m o l s) ( s e c o n d order) feed f l o w r a t e = 100 L / s A , = 6 0 m o l / L = i n i t i a l Concentration of s p e c i e s A DO = 0 6 m o l / L = i n i t i a l concentration o f s p e c i e s D

For target-based synthesis, the reaction rate vector becomes

For both the target and superstructure synthesis, the objective function considered is the selectivity of species B to species D, defined by

J

XB/XD

From the target approach, the optimal value of the objective function was 1.306 with an RTD and micromixing profiles which suggested a segregated plug flow like reactor (see Figures 13 and 14). By use of this information, the recycle superstructure-based algorithm was run with an initial guess of two plug flow reactors; the two reactors both had a residence time of 0.2 s. For the superstructure, the optimal value of the objective function was 1.322, which was slightly higher than that obtained from the target; the optimal structure consisted of two PFRs in series (which is equivalent to one PFR with an average residence time equal to the sum of the residence times of the two PFR’s). One reason why the target objective is smaller than the superstructure objective has to do with the fact that it is difficult to approximate an impulse function (RTD for a PFR) using a set of basis functions. Despite this, it is remarkable that the target did a good job of predicting the objective function and the optimal structure. Example E. This example, the “plant problem”, is a modified form of the Williams and Otto problem (as stated in Ray and Szekely (1973) and in Di Bella and Stevens (1965)). A chemical plant produces a chemical P from reactants A and B. G is a heavy oily byproduct of no economic value. Components C and E are intermediates that can be used as plant fuel. The process flow sheet is illustrated in Figure 6 and the reaction diagram is as follows: A t the temperature of the reaction, 120 “C, the rate constants are ki

A+B-C k2

C+B-P+E k3

P+C-G

kl = 6.1074 h w t fraction kz = 15.0034 h w t fraction k3 = 9.9851 h wt fraction The Williams-Otto problem has been cited in the process optimization literature as a “typical plant problem”. As seen in Figure 6, product P is removed in a distillation column, while intermediate C reacts further with P to form the waste product G. Because of the recycling of reactants and intermediates, a maximization of the product flow following the separation can be achieved by maximizing product selectivity. For example, here, the objective is to maximize the selectivity of P over C and G , given by Xp/ (X, + X,). The reaction rate vector €or the six components is given by

Ind. Eng. Chem. Res., Vol. 27, No. 10, 1988 1819

where ki = kipAo/FR and p~~ is the mass flow rate of species A and F R is the effluent from the reactor. The target approach resulted in an optimal selectivity of 0.7583 and a residence time of 7 = 0.643 h. In the original statement of the Williams-Otto problem, the reactor is modeled for simplicity as a CSTR. On the other hand, from the targeting approach, the optimal RTD and the micromixing functions (Figures 15 and 16) seem to suggest two plug flow like reactors in parallel, with a significant degree of micromixing. 9. Discussion

Before delving into a general discussion of the results in examples A-F, the differences between the RTD and the micromixing function should be put in perspective. The RTD, f ( t ) ,is the fraction of molecules in the effluent (exit) stream which have been in the reactor network for a period between t and t + At; therefore, the RTD is defined with reference to molecules a t the exit of the network, regardless of which environment these molecules sojourned. On the other hand, the micromixing function, h ( t ) ,is the transfer function which governs the transfer of mass from the segregated environment to the maximum mixed environment; hence, h(t)is defined with reference to molecules in the segregated environment. However, since molecules in the segregated environment eventually exit the reactor network, interpreting h(t)in the context of molecules exiting the network is valid, but one must exercise caution! In examples A and B, the targeting formulation resulted in an objective function better than the one based on a superstructure of reactors. Examples A and B were solved with both basis 1 and basis 2; the former proved to be better in terms of giving a larger value of the objective function. In example A, the micromixing profile (Figure 8) starts at a large value and tapers gradually toward the end; this means that, within the segregated environment, the rate of transfer of “young”molecules to the maximum mixed environment is high. In other words, there is a significant amount of early mixing in the system. In example B, the RTD (Figure 9) is closer to a PFR than to a CSTR; if indeed it is a PFR, then the corresponding micromixing profile (Figure 10) would suggest that an appropriate representation for the optimal structure is as shown in the second diagram in Figure 2. The objective function for example C is the selectivity of the intermediate species C to D. The RTD and the micromixing profiles for this example (Figures 11 and 12) suggests a reactor with a truncated exponential RTD and a PFR in parallel. In the context of the reaction diagram, since there are two reversible steps, a reactor with a large residence time will be needed to ensure equilibrium; apparently this is achieved by having a PFR with a large residence time. In addition, since the D to C step is second order, a segregated reactor such as a PFR is required to shift the equilibrium in favor of species C. A PFR will also aid the A to C step, because the order of the reaction step is two. The foregoing analysis would suggest that mixing would be detrimental to the selectivity of C to D; however, the micromixing function suggests that young molecules in the segregated environment transfer to the maximum mixed environment at a large rate. To resolve this dichotomy, one can postulate that most of the molecules in the fresh feed transfer to the maximum mixed environment immediately after entering the segregated environ-

ment. Therefore, even though most of the molecules are in the maximum mixed environment, they have a limited opportunity to mix with more material from the segregated environment; in short very little mixing goes on, even in the maximum mixed environment. In example D, where the selectivity of species B to D is the objective function, the reaction step that produces B is second order, while the step that produces D is first order; it is not surprising, therefore, that the RTD and micromixing profiles suggest that a PFR is the optimal network for the selectivity of B to D (see Figures 13 and 14). In example E where the objective function is the selectivity of the product, P, over the byproducts C and G, all the reaction steps are second order. To maximize the selectivity, a lot of P should be produced; this calls for a PFR. However, P reacts with C in a second-order step to form the undesirable product, G; hence, to minimize the production of G, backmixing will be needed for this step. In summary, it will seem that neither complete segregation nor maximum mixedness will maximize the selectivity. Therefore, it is reasonable that the target approach picked RTD and micromixing profiles which point to two tubular reactors (in parallel) with significant backmixing (see Figures 15 and 16). In this respect, the target approach is quite successful. As seen from the above examples, however, the target approach doesn’t yield the details of the corresponding reactor network, although it is possible to infer the latter from the RTD and micromixing profiles. It should be borne in mind, however, that one possible realization of the optimal RTD and h, profiles is precisely the scheme given in Figure 1. If the RTD profile is zero on the interval [a,b],then there are no molecules with residence times between [ a $ ] . Hence, the corresponding micromixing profile is redundant and has no physical meaning (see Figures 7 and 8, for example). The following characteristics of the target approach also deserve mention: (i) The target approach assumes that the RTD is species independent; in other words all chemical species are assumed to have the same RTD. It is therefore reasonable to expect that this approach may be inferior to the superstructure approach if component splits are allowed. To get around this limitation an RTD will have to be assigned to each chemical species. (ii) The local profiles (i.e., on an element) of h, and f, are influenced to some extent by the shapes of the basis functions used. Also to avoid nonlinear constraints on f,, the knot positions were chosen a priori (uniformly placed). By deriving gradients with respect to knot placement, one can allow knot movement as part of the optimization problem. For the target approach, most of the computational effort was in solving the mass balances and the interpolation of the states for the gradient evaluation. In developing the target approach, relatively little attention was paid to implementing the algorithm in ways that would cut down on the computational effort; instead the emphasis was on showing that the target idea was feasible. Solving the adjoints and state equations simultaneously is expected to decrease the computational effort. 10. Conclusions

In this paper we have presented a “target” approach that employs macro- and micromixing concepts in a nonlinear programming formulation and thereby offers a systematic approach to bounding the performance index defined on a reactor network. In addition to its being automatic (as

1820 Ind. Eng. Chem. Res., Vol. 27, No. 10, 1988 1 4

n

.c

I L

1

t me

Figure 13. RTD for example D. c ,,

rl

Figure 14. h , for example D. 1.

T

I 0

2

3

time

Figure 15. RTD for example E.

Acknowledgment Financial support under NSF Grant ENG-8451058 is gratefully acknowledged.

c,

-t

,I

unlike the modified Ng and Rippin model, the Glasser et al. model can handle systems that are described by distributed parameter systems; i.e., concentrations have spatial dependency. In simple terms, one can think of Glasser et al.’s model as a “multienvironment” model. Unfortunately, the generality of the Glasser et al. model has resulted in a model that, at present, does not lend itself to a nonlinear programming formulation; one of the reasons for this is the fact that there is a need to optimize a general joint probability distribution function, a function of the age and residual life of a molecule. Another reason is the need to solve partial differential equations as part of an NLP formulation. By allowing the segregated and the maximum mixed environments to exchange heat with their surroundings, the Ng and Rippin model can be extended to nonisothermal systems as well as multiple maximum mixed environments; this will be reported in a follow up paper. Despite the perceived deficiencies of the target model, it is a model which is complex or rich enough to model real reactors, and yet simple enough to be used in an optimization framework for reactor synthesis; this, in essence, is the real strength of the model! The next step is to develop a systematic procedure for using the residence time density and the micromixing functions to aid in building the reactor network. In addition to extending the procedure to nonisothermal reactions, constraints on the states (other than the mass balances) are also being considered.

; { ,

,

, 2

,

3

time

Figure 16. h, for example E.

opposed to case driven), the approach does not have the “dimensionality” problem that a geometric or graphical approach may have. In other words, there is virtually no restriction on the number of species that may be present in a reaction mixture. This approach was demonstrated on several examples drawn from the literature. On all of these examples, the target objective functions have been as good as or better than reported literature values. On simple examples, the resulting profiles also offer some hope of interpretation of the final network structure. The intermediate mixed reactor model used in the target approach can handle arbitrary RTD’s and micromixing functions; however, besides being limited to low-viscosity reaction mixtures, the “two-environment” concept may be inadequate for the following reasons: (i) mass transfer is from the segregated to the maximum mixed environment, and not vice versa; (ii) the model also assumes that there are no spatial concentration gradients in any of the environments; thus, it is limited to systems that can be described by lumped parameter models. It is felt, however, that since the transfer function (micromixing function) for exchange of material between the two environments is an arbitrary function of the age of a molecule, it will offset some of the deficiencies of the original Ng and Rippin model. As mentioned earlier, the model by Glasser et al. (1986) is, perhaps, a much more realistic model, since it provides for a larger number of micromixing states. Furthermore,

Nomenclature cy = age of a molecule in the segregated environment X = residual life of a molecule in the maximum mixed environment C , = vector of concentrations in the segregated environment C,, = vector of concentrations in the maximum mixed environment Cedt= average concentration at the exit of the reacting system Co = concentration of feed to reactor Cref = reference concentration p = density of reaction mixture, assumed constant r = rate of production of a species by reaction V = total volume of system (both environments) f = residence time density (or distribution function-RTD) h = micromixing function X = C/Cref= concentrations X , = Co/Cref= feed concentrations F ( X ) = r/Cref = rate of reaction Appendix: Deriving Gradients for Quadratic Program A.l. Adjoints. To solve NLPA, a gradient-based method is used. Solving a system of adjoint equations is a convenient way of getting gradient information that is then used in the solution of NLPA. Adjoint variables, which are Lagrange multipliers on the differential equations, are functions of time. The derivation of the adjoints is straightforward; however, it involves a lot of bookkeeping. As such, only a brief summary of the adjoint system of equations will be given. A full derivation can be found in Achenie (1988). Let us define a Hamiltonian function, H, as H = - h A T ( K m ( t ) J Y X m = ) + V m ( t ) ) + x ~ ~ F ( x , , J+ X D T X , e g ( t ) e-gm(t)f, ( t ) + x E [K, ~ ( t ) x m a x ( t ) - z m , ( t )I (A.1)

Ind. Eng. Chem. Res., Vol. 27, No. 10, 1988 1821

for j = 1,...,(N+1) and i = 1,...,M .

Literature Cited Achenie, L. K. E. ”Optimization Based Approaches to Chemical Reactor Network Synthesis”. Ph.D. Thesis, Carnegie Mellon University, Pittsburgh, 1988. Achenie, L. K. E.; Biegler, L. T. “Algorithmic Synthesis of Chemical Reactor Networks Using Mathematical Programming”. Znd. Eng. Chem. Fundam. 1986a, 25,621-627. Achenie, L. K. E.; Biegler, L. T. “An Algorithmic Method for Reactor Network Synthesis”. Report EDRC-O6-20-86,1986b;Engineering Design Research Center, Carnegie Mellon. Aris, R. The Optimal Design of Chemical Reactors; Academic: New York, 1961. Biegler, L. T.; Cuthrell, J. E. “Improved Infeasible Path Optimization for Sequential Modular Simulators-11: The Optimization Algorithm”. Comput. Chem. Eng. 1985,9, 257-267. Chauhan, S. P.; Bell, J. P.; Adler, R. J. “On Optimum Mixing in Continuous Homogeneous Reactors”. Chem. Eng. Sci. 1972,27, 585-591.

Chitra, S. P.; Govind, R. ”Yield Optimization for Complex Reactor Systems”. Chem. Eng. Sci. 1981, 36, 1219-1225. Di Bella, C. W.; Stevens, W. F. “Process Optimization by Nonlinear Programming”. Znd. Eng. Chem. Process Des. Dev. 1965,4,16-20. Gill, P. E.; Murray, W. “Numerically Stable Methods for Quadratic Programming”. Math. Program. 1978, 14, 349-372. Glasser, D.; Crowe, C. M.; Jackson, R. “Zwietering’s MaximumMixed Reactor Model and the Existence of Multiple Steady States”. Chem. Eng. Commun. 1986, 40, 41-48. Glasser, D.; Crowe, C.; Hildebrandt, D. “A Geometric Approach to Steady Flow Reactors: the Attainable Region and Optimization in Concentration Space”. Znd. Eng. Chem. Res. 1987,26(9), 1803. Han, S. P. “A Globally Convergent Method for Nonlinear Programming”. J. Opt. Theory Appl. 1977, 22(3), 297-309. Jackson, R.; Glasser, D. “A General Mixing Model for Steady Flow Chemical Reactors”. Chem. Eng. Commun. 1986,42, 17-35. Levenspiel, 0. Chemical Reaction Engineering; Wiley: New York, 1972.

Nestoridis, V.; Andreou, I.; Vayenas, C. G. “Optimal Residence Time Policy for Product Yield Maximization in Chemical Reactors”. J. Opt. Theory Appl. 1986, 49(2), 271-287. Ng, D. Y. C.; Rippin, D. W. T. “The Effect of Incomplete Mixing on Conversion in Homogeneous Reactions”. In Third Symposium on Chemical Reaction Engineering; Pergamon: Oxford, 1965. Powell, M. J. D. “A Fast Algorithm for Nonlinearly Constrained Optimization Calculations”. In Dundee Conference on Numerical Analysis; Springer Verlag: Dundee, Scotland, 1977. Rao, D. P.; Edwards, L. L. “Mixing Effects in Stirred Tank Ractors: a Comparison of Models”. Chem. Eng. Sci. 1973,28,1179-1192. Ray, H. W.; Szekely, J. Process Optimization with Applications in Metallurgy and Chemical Engineering; Wiley: New York, 1973. Stewart, W. E.; Sorensen, J. P. “Computer-Aided Modeling of Reaction Networks”. Foundations of Computer-Aided Chemical Process Design; Mah, R. S., Ed.; Engineering Foundation: New York, 1981; Vol. 11. Villermaux, J.; Zoulalian, A. “Etat de Melange du fluide dans un Reacteur Continu. A Propos d’un Modele de Weinstein et Adler”. Chem. Eng. Sci. 1969,24, 1513-1517. Zwietering, N. “The Degree of Mixing in Continuous Flow Systems”. Chem. Eng. Sci. 1959,11, 1-15. Received for review January 12, 1988 Revised manuscript received May 20, 1988 Accepted June 6, 1988