Handling Inequality Constraints in Optimal Control by Problem

Apr 17, 2009 - Establishment of optimal control for systems, where constraints involve both control and state, is very difficult. In some problems the...
1 downloads 8 Views 295KB Size
9622

Ind. Eng. Chem. Res. 2009, 48, 9622–9630

Handling Inequality Constraints in Optimal Control by Problem Reformulation Rein Luus* Department of Chemical Engineering, UniVersity of Toronto, Toronto, ON M5S 3E5, Canada

Establishment of optimal control for systems, where constraints involve both control and state, is very difficult. In some problems the difficulty is reduced significantly by transforming the optimal control problem. For illustration, the optimal control of a nonisothermal fed-batch reactor with heat removal constraint is considered. Although there are only two control variables, the feed rate and the temperature, the heat removal rate constraint makes the optimal control problem very difficult. To parametrize the optimal control problem, the time interval is divided into P time stages of variable length, and piecewise constant control is used at each time stage. Establishment of the optimal control policy is very challenging owing to the low sensitivity, the heat removal constraint, and the need for a large number of time stages for adequate approximation. However, by reformulating the optimal control problem where heat generation, rather than temperature, is used as a control variable, we are able to get greater accuracy with a smaller number of time stages. The optimal control policy is established with iterative dynamic programming and checked with LJ-optimization procedure. dynamic programming (IDP),13 and to show the benefits of slightly reformulating the optimal control problem.

Introduction In engineering design and applications, we are frequently faced with the problem of determining the control policy to achieve the best operation of the system within specified constraints. If the system is well-behaved, methods based on Pontryagin’s maximum principle may be successfully used. Although the resulting optimal control problem becomes a boundary value problem, if the system of equations is wellbehaved, the optimal control policy can be readily obtained by boundary condition iteration,1 or by control vector iteration.2 In fact, if the time horizon is reasonably short, boundary condition iteration provides very rapid quadratic convergence to the optimal control policy of well-behaved systems even if there are constraints on the control.3 However, many chemical engineering systems are not wellbehaved and alternate procedures may have to be used to obtain the optimal control policy. For some systems, the optimal control policy is highly oscillatory in nature and there can be doubts whether the optimal control policy is actually achieved.4 A seemingly simple isothermal fed-batch reactor optimization problem5 has presented challenges for many optimization procedures, and, as can be anticipated, establishing the optimal control of nonisothermal fed-batch reactors is much more difficult. Recently, Schlegel and Marquardt6 showed that the control policy thought to be optimal for a nonisothermal fedbatch reactor could be improved to provide a 2% improvement in the reported yield, but their reported control policy is still not the global optimum. Numerous methods have been proposed to deal with the establishment of optimal control of badly behaved systems such as fed-batch reactors where the very low sensitivity makes the establishment of the optimal control policy quite difficult.7-10 When temperature is included as an additional control variable and heat constraint is added, the problem of establishing the optimal control becomes much more difficult.11,12 The goal of this paper is to investigate establishing the optimal control policy for nonisothermal fed-batch reactors by iterative * To whom correspondence should be addressed. E-mail: [email protected].

Problem Formulation We consider the model of the nonisothermal fed-batch reactor as modeled by Srinivasan and Bonvin,11 where the exothermic reaction k1

k2

A + BfCfD is occurring. It is required to determine the optimal temperature profile and the feed rate to maximize the yield VcC, where V denotes the volume and cC is the concentration of the desired species C, in a given batch time tf, so that the heat removal capacity of the reactor would not be exceeded. The equations describing the reactor are dx1 ) -k1x1cB (1) dt dx2 ) k2(x1 - x2) (2) dt dx3 ) u1 (3) dt where x1 ) VcA, x2 ) V(cA + cC), and x3 ) V, the volume of the liquid in the reactor, and cB ) (20x3 + x1 - 29.139)/x3, and u1 is the feed rate. The initial state is xT(0) ) [10 10 1 ] and the rate constants are

( (

(4)

) )

k1 ) 2 × 103 exp

-2.0 × 104 8.31(u2 + 273.15)

k2 ) 8 × 1012 exp

-8.0 × 104 8.31(u2 + 273.15)

(5) (6)

where u2 is the temperature in degrees Celsius. The reactions are exothermic, where the heat produced by the reaction is Q ) 5 × 105k1x1cB + 5 × 104k2(x2 - x1)

10.1021/ie801806t CCC: $40.75  2009 American Chemical Society Published on Web 04/17/2009

(7)

Ind. Eng. Chem. Res., Vol. 48, No. 21, 2009

The constraints on the feed rate and temperature are 0 e u1 e 2

(8)

20 e u2 e 50

(9)

and the constraints on the volume and heat withdrawal rate are 0 < x3 e 1.1

(10)

Q e 5 × 106

(11)

At the batch time tf ) 0.25 h, it is required to maximize the yield of the desired product: I ) x2(tf) - x1(tf)

(12)

Equation 11 is a very difficult inequality constraint since Q is a function of both the state x and control u. We divide the time interval [0,tf] into P time stages of variable length as suggested by Bojkov and Luus,14 and use piecewise constant control at each time stage. For optimization, we use iterative dynamic programming (IDP) as presented in detail by Luus.13 For clarity, here we outline the algorithm. Algorithm for IDP. Let us consider the general problem, where we wish to choose the m-dimensional control vector u in the time interval [0, tf] to maximize the performance index that is an explicit function of the n-dimensional state at the final time x(tf): I ) Φ(x(tf))

(13)

subject to the mathematical model dx ) f(x,u), dt

x(0) given

(14)

the constraints on the control variables j ) 1, 2, ..., m

a j e uj e bj,

(15)

and the general inequality constraints for state and control in the form i ) 1, 2, ..., k

ψi(x,u) e 0,

(16)

for the entire time interval. To deal with general inequality constraints on the state, we follow the penalty function approach used by Luus,15 with a small variation. Instead of difference equations, we use differential equations as suggested by Mekarapiruk and Luus16 to construct the penalty functions. This approach was also used successfully by Shelokar et al.17 We therefore introduce k state constraint Variables through the differential equations

{

dxn+i if ψi(x, u) e 0 0 ) ψi(x, u) if ψi(x, u) > 0 dt

(17)

for i ) 1, 2, ..., k, with the initial condition xn+i(0) ) 0,

i ) 1, 2, ..., k

(18)

At the final time, tf, xn+i(tf) therefore gives the total violation of the ith inequality constraint integrated over time. The advantage of taking such value is that sometimes a violation may occur for a short time inside a time stage, and this violation may go by unnoticed if the difference equation approach is used where the violations are checked only at the ends of the time stages. Now we choose the augmented performance index to be maximized as k

J)I-

∑θx

i n+i(tf)

i)1

(19)

9623

where θi > 0 are penalty function factors for the inequality state constraints. The given time interval is divided into P time stages of varying length, and we consider the case where the control is kept constant in each time interval, although different control parametrizations, such as piecewise linear, could also be used. As suggested by Bojkov and Luus,14 the time variable t is transformed to τ, so that the time interval [0,tf] becomes [0,1] and in the new time variable all stages are of equal length. The differential equation in the k th time interval then becomes dx ) V(k) Pf(x, u) dτ

(20)

and the stage length at each stage V(k) becomes an additional control variable to be determined through optimization. The algorithm then becomes the following: 1. Choose the number of time stages P, the number of grid points N, the number of allowable values for control (including stage lengths) R at each grid point, the region contraction factor γ used after every iteration, the region restoration factor η, initial values for the control and the stage lengths, the initial region sizes, the number of iterations to be used in every pass, and the number of passes. 2. By choosing N values for control and the stage lengths around the best available control and stage lengths inside the region sizes, integrate eq 20 from τ ) 0 to τ ) 1 to generate N trajectories. The N values for x at the beginning of each time stage make up the N grid points at each stage. 3. Starting at stage P, corresponding to the normalized time τ )(P - 1)/P, for each grid point generate R sets of values for control and stage length: u(P) ) u*j(P) + Dr j(P)

(21)

V(P) ) V*j(P) + ωw j(P)

(22)

where D is an m × m diagonal matrix with different random numbers between -1 and 1 along the diagonal and ω is another random number between -1 and 1; u*j(P) is the best value for control, and V*j(P) is the best value for the stage length, both obtained for that particular grid point in the previous iteration; w j is the stage length at iteration j. Integrate eq 20 from τ ) (P - 1)/P to τ ) 1 once with each of the R allowable values for control and stage length to yield x(tf) so that the performance index can be evaluated. Compare the R values of the performance index and choose the control and stage length which give the maximum value. The corresponding control and stage length are stored for use in step 4. 4. Step back to stage P - 1, corresponding to time τ ) (P - 2)/P. For each grid point generate R allowable sets of control and stage lengths. Integrate eq 20 from τ ) (P - 2)/P to τ ) (P - 1)/P once with each of the R sets. To continue integration, choose the control and the stage length from step 3 that corresponds to the grid point that is closest to the x at τ ) (P - 1)/P. Now compare the R values of the performance index and store in memory the control policy and the stage length that yield the maximum value. 5. Step back to stage P - 2 and continue the procedure in the previous step. Continue until stage 1 corresponding to τ ) 0 with the given initial state as the grid point is reached. Make the comparison of the R values of the performance index to give the best control and the best stage length for this stage. We now have the best control policy and the best stage length for each stage in the sense of maximizing the performance index from the allowable choices.

9624

Ind. Eng. Chem. Res., Vol. 48, No. 21, 2009

6. In preparation for the next iteration, reduce the size of the allowable regions r j+1(k) ) γr j(k),

k ) 1, 2, ..., P

(23)

w j)1(k) ) γw j(k),

k ) 1, 2, ..., P

(24)

where γ is the region reduction factor and j is the iteration index. Use the best control policy and the best stage lengths from step 5 as the midpoint for the next iteration. 7. Increment the iteration index j by 1 and go to step 2 to generate another set of grid points. Continue for the specified number of iterations. 8. Increment the pass number index by 1, set the iteration index j to 1, and restore the region sizes to η times the region sizes used at the beginning of the pass, or choose the region sizes from the amount that the corresponding variables have changed, and go to step 2. Continue for the specified number of passes and examine the results. The clipping technique is used to handle the control constraints given in eq 15. In step 2, in generating the grid points, the suggestion of Fikar et al.18 is used where the values to control are generated at random inside the region. This method of generating grid points has been found to be especially useful for the optimal control of fed-batch reactors.19 Although IDP cannot guarantee obtaining the global optimum from any starting value for control,20 it has been found useful in time optimal control of distillation columns,21-23 and in the study of oscillatory systems.24 For some systems the computations can be carried out fast enough to enable its use for online optimal control.25,26 Numerical Results. All computations were done in double precision using WATCOM FORTRAN77 compiler version 9.5 on an AMD Athlon/3800 (2.4 GHz) personal computer, which is about 1.5 times faster than Pentium4/2.4 GHz computer. To obtain an accurate integration, the IMSL subroutine DVERK27 was used with a local error tolerance of 10-8. DVERK is a Runge-Kutta subroutine based on Verner’s fifth and sixth order pair formulas. We first consider solving the optimal control problem as presented in eqs 1-12. To deal with the heat removal constraint in eq 11, we introduce the state variable x4 which is initially chosen to be zero, and

{

dx4 Q × 10-6 - 5.0 if Q > 5.0 × 106 ) dt 0 if Q e 5.0 × 106

(25)

Here, to keep all the state variables of the same order of magnitude, the heat constraint violation is multiplied by 10-6. Thus, if there is no constraint violation, then x4 remains zero and whenever the heat removal constraint is violated, a positive value is assigned to the derivative and x4 becomes positive. Then x4(tf) is incorporated as a penalty into an augmented performance index in efforts to drive the constraint violation to zero. The value of x4(tf) shows the extent to which the constraint has been violated. A very small value will indicate that there is a violation, and methods can be used to reduce the violation to a negligible amount. This approach has been found to be effective by Mekarapiruk and Luus28 in dealing with state constraints. We are using stage lengths of variable lengths and since the final time tf is specified, this introduces another term into the augmented performance index. Also, it is expected that for maximum yield, the volume should be at its maximum value, so the upper limit on the volume in eq 10 is treated as an equality constraint at the final time t ) tf. This assumption is verified

through sensitivity analysis. We therefore choose the augmented performance index to be maximized as J ) [x2(tf) - x1(tf)] - θ1[(x3(tf) - 1.1 - s1)2 + (tfc - 0.25 - s2)2] - θ2x4(tf) (26) where the calculated value of the final time is P

tfc )

∑ V(k)

(27)

k)1

The penalty function factors θ1 and θ2 are positive and sufficiently large to provide convergence; the shifting terms s1 and s2 are put equal to zero initially and are updated after every pass according to ) sq1 - (x3(tf) - 1.1) sq+1 1

(28)

) sq2 - (tfc - 0.25) sq+1 2

(29)

where q is the pass number. The use of shifting terms to deal with equality constraints in optimization was first proposed by Luus29 for steady state optimization and used in IDP by Luus and Storey30 and is discussed in some detail in Luus13 and Luus et al.31,32 An interesting aspect of shifting terms is that, upon convergence, they give the sensitivity information. At their final converged values, -2θ1s1 gives the sensitivity of the performance index to the violation of the volume constraint, and 2θ1s2 gives the sensitivity of the performance index to the violation to the final time constraint. Thus if s1 is negative, then increasing the volume beyond 1.1 L will increase the performance index, and will confirm our expectation of getting the maximum yield when the volume reaches its maximum allowed value. Similarly, if s2 is negative, then an increase in tf will increase the performance index. For a large number of optimal control problems that have been solved by this approach, it has been found that there is a wide range over which the penalty function factors may be chosen to obtain successful results, so here each penalty function factor was assigned a value of 100. For a preliminary run we took P ) 25. The initial values for the control variables were taken as 1.0 and 35, respectively, and the initial stage lengths were chosen to be 0.01 for each time stage. The initial region sizes were taken as 1, 25, and 0.001. The region contraction factor γ ) 0.95 was used with the region restoration factor of η ) 0.90. For each pass, 20 iterations were used, and a total of 100 passes were used. The results of this preliminary run, shown in Table 1, indicate that more than a single grid point is necessary, but no more than 7 are required for this problem, and no more than 100 random points per iteration are necessary. The total computation time on the Athlon/3800 was 3 h, with 107 s for the case with N ) 1, R ) 25 to 2353 s for the case with N ) 7, R ) 100. The control policy for the best value, I ) 2.0825 with s1 ) -0.047746, s2 ) -0.267783, in Table 1 was used as a starting value for a refined run with R ) 50 and N ) 1, 3, and 5. The Table 1. Performance Index I as a Function of the Number of Grid Points N Generated by Assigning Control Values at Random, And the Number of Random Points R Used at Each Time Stage in Each Iteration with the Use of θ1 ) θ2 ) 102 with P ) 25 performance index, I number of grid points, N

R ) 25

R ) 50

R ) 100

1 3 5 7

2.06592 2.07762 2.07957 2.07631

2.07319 2.07774 2.08114 2.07949

2.07946 2.07932 2.08053 2.08250

Ind. Eng. Chem. Res., Vol. 48, No. 21, 2009

9625

Table 2. Control Policy with the Use of P ) 25 Time Stages of Variable Length Giving I ) 2.0931 stage

time

u1

u2

u3 ) ∆t

Q

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

0.000775 0.001662 0.002658 0.003780 0.004904 0.006224 0.008019 0.009914 0.012149 0.014783 0.018472 0.032874 0.072193 0.119033 0.126270 0.132793 0.139320 0.144853 0.150427 0.155194 0.160257 0.166534 0.201936 0.223716 0.250000

1.999999 1.999949 2.000000 2.000000 1.998614 1.999366 1.999136 1.999341 1.999193 1.999962 2.000000 0.614723 0.621229 0.633107 0.017450 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000

48.789321 47.623159 46.366872 45.013031 43.721329 42.275984 40.421285 38.592236 36.585655 34.400843 31.620800 31.618181 31.602861 31.600542 31.659878 33.547491 35.397620 37.331527 39.046367 40.841850 42.439509 44.195145 46.449477 45.603403 44.869540

0.000775 0.000887 0.000996 0.001123 0.001124 0.001319 0.001795 0.001895 0.002234 0.002635 0.003689 0.014401 0.039319 0.046841 0.007237 0.006523 0.006527 0.005533 0.005574 0.004766 0.005063 0.006277 0.035402 0.021780 0.026284

4999999.99 5000000.00 5000000.00 5000000.00 5000000.00 5000000.00 5000000.00 5000000.00 4999999.99 5000000.00 4999999.99 4999649.26 4997163.35 4995120.40 4761327.74 4768272.97 4760398.47 4788888.44 4780954.76 4806047.71 4788638.53 4731942.28 3653868.25 2989470.14 2388897.74

negative value of s1 shows that x3 is at its upper bound at the optimum, validating the assumption made in using the equality constraint. Changing the seed number for generating random numbers had negligible effect. After several refined runs, the maximum value I ) 2.0931 was obtained with the control policy shown in Table 2. The feed rate u1 starts at its maximum value, drops to 0.6 at 0.033 h, and then to zero at 0.12 h. It is interesting to note that the first time stage is very short and the temperature u2 is close to its maximum allowable value, but drops very rapidly in the beginning, reaching a minimum value of 31.6 °C, then increases to a maximum of 46.5 °C, and then decreases to 44.9 °C at the final time. In the last column is shown the heat removal rate Q at the end of each time stage. It is noted that for the first half of the time period, the heat removal rate is at its maximum value. To get a more accurate control policy, the number of time stages P was doubled to 50. A very small improvement to I ) 2.0949 was obtained in the performance index, but the control policy was not changed much as can be seen in Figures 1 and 2. The best control policy obtained here is quite different from that reported by Srinivasan and Bonvin,11 where the feed rate starts at 0.56, increases to 1.2 and then decreases gradually to 0.57 before dropping to zero at 0.16 h. Since at the optimum, Q is changing in a very smooth fashion, to check the results, it was decided to reformulate the problem where the heat rate Q is used as a control variable rather than temperature. Reformulation of the Optimal Control Problem. Instead of having the control u2 appearing as the temperature in the rate constants, an alternate way of expressing this optimal control problem is to have u2 ) Q

( (

-2.0 × 104 k1 ) 2 × 10 exp 8.31(T + 273.15)

) )

-8.0 × 104 k2 ) 8 × 10 exp 8.31(T + 273.15) 12

Figure 2. Optimal temperature policy with the use of P ) 50 time stages with the original formulation; I ) 2.0949.

(30)

When a value is given to Q, eq 7 is solved numerically to yield the temperature T by the method suggested by Luus.33 Equality constraints in optimization are readily accomplished and the results are very accurate.34 Identification of active inequality constraints and using these as equalities has been shown to be feasible and this approach improves the convergence properties of direct search optimization very significantly for many problems.35 Here, the algebraic equation is solved by Newton’s method inside the differential equation subroutine, so it is solved many times in each integration time step, yielding an almost continuous profile for the temperature. To get the gradients, a forward and backward perturbation of 10-4 was used. The optimal control problem is still the same, but this formulation is expected to give greater accuracy. We still need the auxiliary variable x4, but here it becomes

(31)

T - 50 if T > 50 dx4 ) 20 - T if T < 20 dt 0 if 20 e T e 50

Then eq 7 is used to obtain the temperature T, where 3

Figure 1. Optimal feed rate policy with the use of P ) 50 time stages with the original formulation; I ) 2.0949.

(32)

{

(33)

As before, each penalty function factor was assigned a value of 100 and we took P ) 25. The initial values for the control variables were taken as 1.0 and 2.5 × 106, respectively, and

9626

Ind. Eng. Chem. Res., Vol. 48, No. 21, 2009

Table 3. Performance Index I as a Function of the Number of Grid Points N Generated by Assigning Control Values at Random, and the Number of Random Points R Used at Each Time Stage in Each Iteration with the Use of θ1 ) θ2 ) 102 for the Reformulated Optimal Control Problem with P ) 25 performance index, I number of grid points, N

R ) 25

R ) 50

R ) 100

1 3 5 7

2.09473 2.10097 2.10055 2.10151

2.09598 2.10121 2.10170 2.10121

2.09096 2.10055 2.10136 2.10157

Table 4. Optimal Control Policy for the Reformulated Problem with the Use of P ) 25 Time Stages of Variable Length, Yielding I ) 10170 stage

time

u1

u2 × 10-6

u3 ) ∆t

temperature

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

0.00038 0.00734 0.03678 0.04626 0.06900 0.09655 0.11986 0.14736 0.16948 0.17330 0.17720 0.18117 0.18528 0.18957 0.19407 0.19885 0.20395 0.20934 0.21492 0.22060 0.22636 0.23218 0.23805 0.24399 0.25000

2.00000 2.00000 2.00000 1.16240 0.67829 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000

5.0000 5.0000 5.0000 5.0000 5.0000 5.0000 5.0000 5.0000 5.0000 4.7994 4.6079 4.4241 4.2464 4.0725 3.9007 3.7308 3.5604 3.3903 3.2238 3.0652 2.9149 2.7724 2.6378 2.5104 2.3896

0.000375 0.006966 0.029438 0.009483 0.022735 0.027554 0.023307 0.027500 0.022120 0.003828 0.003892 0.003977 0.004108 0.004287 0.004507 0.004773 0.005102 0.005391 0.005580 0.005677 0.005759 0.005821 0.005877 0.005940 0.006007

49.3335 41.1541 21.5100 20.0058 20.0079 26.0262 31.9438 40.2188 48.1778 47.9398 47.7158 47.5064 47.3144 47.1340 46.9600 46.7997 46.6402 46.4655 46.2725 46.0836 45.9038 45.7304 45.5654 45.4099 45.2626

the initial stage lengths were chosen to be 0.01 for each time stage. The initial region sizes were taken as 1, 1.25 × 106, and 0.001. To solve the algebraic equation, the initial value for T was taken as 35 °C for the first iteration of the first pass. Thereafter, the best previously obtained value of T for a stage was used as the starting value for that stage. The calculations were calculated until the calculated Q was within 10-5 of u2. This provided 11 figure accuracy. The region contraction factor γ ) 0.95 was used with the region restoration factor of η ) 0.90. As before, for each pass, 20 iterations were used, and a total of 100 passes were used. There was no constraint violation (x4(tf) ) 0) and considerably better results were obtained, as is shown in Table 3. As expected, the computation time was higher, about double the time required with the original formulation. Here, with P ) 25, the best value I ) 2.1017 is 0.2% better than obtained with the original formulation. It is most interesting to note that the control policy for u1 as shown in Table 4 is slightly different from the control policy in Table 2. The shifting terms obtained here: s1 ) -0.050327, s2 ) -0.26705 are quite close to those obtained with the original formulation. In the last column is the calculated temperature at the end of each stage. We see that the temperature profile is quite similar to the one obtained before, but it reaches the lower bound of 20 °C at t ) 0.046 h. Increasing the number of time stages to P ) 50 increased the performance index to 2.1018 and changed the control policy very slightly as is seen in Figures 3 and 4. Figure 5 shows the temperature profile. It is interesting to see that the temperature

Figure 3. Optimal feed rate policy with the use of P ) 50 time stages with reformulation; I ) 2.1018.

Figure 4. Optimal heat rate policy with the use of P ) 50 time stages with reformulation; I ) 2.1018.

starts at 49.84 °C which is very close to the upper limit and then falls off to the lower limit, and then rises to a peak of 48 °C, and then gradually decreases. Due to the low sensitivity, the optimal control policy is very difficult to obtain accurately with the original formulation, but the reformulated problem provides a way of getting excellent results. Checking Results with Direct Search Optimization Procedure. To check the optimal control policy as obtained with IDP, the reformulated problem was run with direct search optimization using time stages of constant length. The Luus-Jaakola (LJ) optimization procedure36,37 was used here because of simplicity and high reliability.38 To improve convergence, line search was used after every pass, as suggested by Luus.39 The time interval was divided into P ) 25 time stages, each of length 0.01 h. In each pass, 20 iterations were used and a maximum of 200 passes were allowed. The region reduction factor 0.95 was used to reduce the size of the region after every iteration, and after every pass the region was restored to 0.85 of its value at the beginning of each pass. The initial values for u1 and u2 were chosen randomly in

Ind. Eng. Chem. Res., Vol. 48, No. 21, 2009

Figure 5. Optimal temperature profile with the use of P ) 50 time stages with reformulation; I ) 2.1018.

9627

Figure 6. Optimal feed rate policy obtained with LJ optimization procedure by using P ) 50 time stages of equal length with reformulation; I ) 2.1017.

Table 5. Performance Index I as a Function of the Number of Random Points R Used in Each Iteration Using LJ Optimization Procedure with the Use of θ1 ) θ2 ) 102 for the Reformulated Optimal Control Problem with P ) 25 Stages, Each of Length 0.01, for 5 Runs with Different Seeds for the Random Number Generator in Each Run performance index I run number

R ) 200

R ) 500

R ) 1000

1 2 3 4 5

2.1013 2.1012 2.1014 2.1013 2.1012

2.1014 2.1012 2.1014 2.1014 2.1013

2.1014 2.1014 2.1014 2.1014 2.1014

(0,2) and (0, 5.0 × 106) respectively. The initial region sizes were chosen to be 0.25 and 2.5 × 106, respectively. By using different seeds for the random number generator, five runs were performed with R ) 200, R ) 500, and R ) 1000. The results in Table 5 show that the choice of the seed for the random numbers has very little effect and at least 500 random points are necessary to get the performance index I ) 2.1014. With R ) 1000 I ) 2.1014 was obtained each time. The average computation time with the use of 500 random points per iteration was 507 s for the 5 cases. When the number of stages was increased to 50, the performance index was increased to 2.1017. The optimal control policy obtained with P ) 50 stages of equal length is shown in Figures 6 and 7. As can be seen, the results are quite close to the optimal control policy obtained with IDP, presented earlier in Figures 3 and 4. However, since the stages are of equal length, the switching in the feed rate in Figure 6 is not very accurate. Thus a run was performed with the use of P ) 100 time stages, each of length 0.0025 h. An improved value I ) 2.1018 was obtained, and the switching in the feed rate was now much closer to that displayed in Figure 3. Thus by reformulating the problem, direct search optimization can be readily used to give results close to the optimal.

Figure 7. Optimal heat rate policy obtained with LJ optimization procedure by using P ) 50 time stages of equal length with reformulation; I ) 2.1017.

thermal fed-batch reactor as presented by Srinivasan et al.12 The system equations and initial state are the same, but the batch time is 0.5 h instead of 0.25 h, and there are few other differences: cB ) (20x3 + x1 - 28.8315)/x3

(34)

the rate constants are

(

k1 ) 4 exp

-6.0 × 103 8.31(u2 + 273.15)

(

k2 ) 800 exp

)

-2.0 × 104 8.31(u2 + 273.15)

(35)

)

(36)

and the heat produced by the reaction is Discussion of results Although the reformulated problem requires more computational effort, the problem of piecewise constant temperature profile is avoided. Recently Schlegel and Marquardt6 considered the optimal control of a slightly different noniso-

Q ) 3 × 104k1x1cB + 104k2(x2 - x1)

(37)

The constraints on the feed rate and heat rate are also different: 0 e u1 e 1

(38)

9628

Ind. Eng. Chem. Res., Vol. 48, No. 21, 2009

Q e 1.5 × 105 (39) The rest of the equations and the performance index are the same as before. When this problem was reformulated, using Q instead of T as the second control variable, the computer program that was used for the earlier example, could be used easily. With the use of P ) 60 stages of variable length, the maximum performance index I ) 2.05269 was obtained. By doubling the number of stages to 120, the performance index was increased to 2.05271. Although this is only marginally better than 2.05270 reported by Schlegel and Marquardt,6 the control policy close to t ) 0, is somewhat different, as shown in Figures 8 and 9. Here, Q is at its upper bound, whereas in that paper Q is initially significantly below the upper bound. The temperature profile is given in Figure 10. The temperature starts at 49.88 °C rather than 46 °C as reported by Schlegel and Marquardt.6 The rest of the temperature profile is almost identical to theirs. There was no violation of the constraints, with the V(tf) ) 1.100000 and tfc ) 0.500000 with the shifting terms s1 ) -0.040031 and s2 ) -0.009118. Because of the low sensitivity, it is very difficult to get the optimal control policy very accurately. Although IDP requires considerably more computational effort than sequential quadratic programming, for highly

Figure 8. Optimal feed rate policy with the use of P ) 120 time stages with reformulation of the second optimal problem; I ) 2.05271.

Figure 9. Optimal heat rate policy with the use of P ) 120 time stages with reformulation of the second optimal problem; I ) 2.05271.

Figure 10. Optimal temperature profile with the use of P ) 120 time stages with reformulation of the second optimal problem; I ) 2.05271.

nonlinear systems the reliability of IDP has been found to be considerably greater. This was especially true when difficulties were experienced in the use of SQP in establishing the optimal bifunctional catalyst blend in a tubular reactor.40 At present, the computer time on personal computers is essentially free, so the expenditure of a few hours to solve a complex problem is not excessive. Once the nature of the optimal control policy is known, then the computational effort is reduced substantially, as is the case of establishing the optimal control of oscillatory systems.24 For the nonisothermal fed-batch reactor, very accurate results were possible because the problem of searching for the best temperature was replaced by searching for the best heat rate. In the original formulation, piecewise constant control for temperature is not good, since under optimal control, the temperature changes very rapidly initially, so the stage lengths initially should be extremely short. Piecewise linear parametrization may diminish this difficulty, but this was not attempted. Reformulation of the problem provided a much more attractive alternative. Conclusions Reformulating a nonisothermal fed-batch reactor optimal control problem given by Srinivasan and Bonvin11 provided a significant simplification for establishing the optimal control. Instead of using the temperature as the second control variable, the heat produced by the reaction was used. This resulted in better results with a relatively small number of time stages. The reformulated problem involved solving a nonlinear algebraic equation inside the integration routine and required about twice as much computational effort for the same number of time stages. The better results, however, showed that the benefit of determining the heat rate rather than the temperature for the nonisothermal fed-batch reactor was worth the additional effort. For the reformulated problem, the optimal control policy is also readily obtained with direct search optimization procedure. A maximum yield of 2.1018 moles was obtained for this mathematical model of a nonisothermal fed-batch reactor. The optimal control policy for this model is quite different from the one reported in the literature and was cross-checked by a different optimization procedure. Successful results were also obtained with a similar optimal control problem, where the optimal control policy had evaded several researchers.

Ind. Eng. Chem. Res., Vol. 48, No. 21, 2009

Iterative dynamic programming provides a good means of obtaining the optimal control policy of highly nonlinear systems. To get to the vicinity of the global optimum, however, more than one grid point at each time step was required for both of the systems considered. As noted by Fikar et al.18 and by Rusnak et al.,26 here we also found that faster convergence resulted if the grid points are generated by assigning control values at random rather than uniformly. Computations showed that for this problem no more than 7 grid points are required at each time stage if the initial control policy is far from the optimum. For refining the results 3 grid points are quite adequate. Acknowledgment After my official retirement in 2004, the office space and access to the library provided to me by University of Toronto is gratefully appreciated. Nomenclature aj ) lower bound on uj bj ) upper bound on uj cA ) concentration of reactant A (mol/L) cB ) concentration of reactant B (mol/L) cC ) concentration of desired product C (mol/L) D ) diagonal matrix with random numbers in (-1,1) f ) continuous vector function of x and u I ) performance index, yield of desired product C (mol) J ) augmented performance index (mol) k ) number of inequality constraints k1 ) rate constant for first reaction (L/(mol h)) k2 ) rate constant for second reaction (L/h) m ) number of control variables n ) number of state variables N ) number of grid points used at each time stage P ) number of time stages q ) pass number Q ) heat produced by reaction (J/h) r ) region size vector for control variables R ) number of random points per iteration s1 ) shifting term for volume constraint (L) s2 ) shifting term for batch time constraint (h) t ) time (h) tf ) batch time (h) tfc ) calculated batch time (h) T ) temperature (°C) u1 ) feed rate (L/h) u2 ) temperature (°C) u ) control vector V ) stage length V ) volume (L) x1 ) moles of A (mol) x2 ) moles of A and C combined (mol) x3 ) volume of liquid in reactor (L) x4 ) heat constraint violation variable, temperature constraint variable x ) state vector w ) region size for stage length Greek Symbols γ ) amount by which the region sizes are reduced after every iteration ε ) region collapse parameter specifying the minimum region size η ) region restoration factor θ1 ) penalty function factor for equality constraints θ2 ) penalty function factor for inequality constraints

9629

τ ) transformed time Φ ) a continuous function of final state ψ ) general inequality constraint variable ω ) random number between -1 and 1

Literature Cited (1) Luus, R. Boundary condition iteration, BCI. Encyclopedia of Optimization, 2nd ed.; Floudas, C. A., Pardalos, P. M., Eds.; Springer, New York, 2009; p 313. (2) Luus, R. Control vector iteration, CVI. Encyclopedia of Optimization, 2nd ed.; Floudas, C. A., Pardalos, P. M., Eds.; Springer, New York, 2009; p 509. (3) Luus, R. Further developments in the new approach to boundary condition iteration in optimal control. Can. J. Chem. Eng. 2001, 79, 968. (4) Luus, R. Parametrization in nonlinear optimal control problems. Optimization 2006, 55, 65. (5) Park, S.; Ramirez, W. F. Optimal production of secreted protein in fed-batch reactors. AIChE J. 1988, 34, 1550. (6) Schlegel, M.; Marquardt, W. Detection and exploitation of the control switching structure in the solution of dynamic optimization problems. J. Process Control 2006, 16, 275. (7) Yang, R. L.; Wu, C. P. Global Optimal Control by Accelerated Simulated Annealing, Paper presented at the First Asian Control Conference, Tokyo, Japan, 1994. (8) Luus, R. Sensitivity of control policy on yield of a fed-batch reactor. Proceedings of IASTED International Conference on Modelling and Simulation, Pittsburgh, PA, April 27-29, 1995; Hamza, M. H., Ed.; IASTED/Acta Press: Calgary, AB, 1995; p 224. (9) Luus, R.; Hennessy, D. Optimization of fed-batch reactors by the Luus-Jaakola optimization procedure. Ind. Eng. Chem. Res. 1999, 38, 1948. (10) Schlegel, M.; Stockmann, K.; Binder, T.; Marquardt, W. Dynamic optimization using adaptive control vector parameterization. Comput. Chem. Eng. 2005, 29, 1731. (11) Srinivasan, B.; Bonvin, D. Characterization of optimal temperature and feed-rate policies for discontinuous two-reaction systems. Ind. Eng. Chem. Res. 2003, 42, 5607. (12) Srinivasan, B.; Palanki, S.; Bonvin, D. Dynamic optimization of batch processes I. Characterization of the nominal solution. Comput. Chem. Eng. 2003, 27, 1. (13) Luus, R. IteratiVe Dynamic Programming; Chapman & Hall/CRC: London, UK, 2000. (14) Bojkov, B.; Luus, R. Optimal control of nonlinear systems with unspecified final times. Chem. Eng. Sci. 1996, 51, 905. (15) Luus, R. Application of iterative dynamic programming to state constrained optimal control problems. Hung. J. Ind. Chem 1991, 19, 245. (16) Mekarapiruk, W.; Luus, R. Optimal control of inequality state constrained systems. Ind. Eng. Chem. Res. 1997, 36, 1686. (17) Shelokar, P. S.; Jayaraman, V. K.; Kulkarni, B. D. Multicanonical jump walk annealing assisted by Tabu for dynamic optimization of chemical engineering processes. Eur. J. Oper. Res 2008, 185, 1213. (18) Fikar, M.; Latifi, A. M.; Fournier, F.; Creff, Y. Application of iterative dynamic programming to optimal control of a distillation column. Can. J. Chem. Eng. 1998, 76, 1110. (19) Luus, R. Choosing grid points in solving singular optimal control problems by iterative dynamic programming. Proceedings of the 10th IASTED International Conference on Intelligent Systems and Control; Sztandera, L. M., Ed.; ACTA Press: Anaheim, CA, 2007; p 425. (20) Luus, R.; Galli, M. Multiplicity of solutions in using dynamic programming for optimal control. Hung. J. Ind. Chem 1991, 19, 55. (21) Luus, R. Time optimal control of a binary distillation column. DeV. Chem. Eng. Mineral Process 2002, 10, 19. (22) Woinaroschy, A. Time-optimal control of startup distillation columns by iterative dynamic programming. Ind. Eng. Chem. Res. 2008, 47, 4158. (23) Woinaroschy, A. Time-optimal control of startup distillation of nonideal mixtures, Ind. Eng. Chem. Res. 2009, 48, 3873. (24) Luus, R. Optimal control of oscillatory systems by iterative dynamic programming. J. Ind. Manage. Optim. 2008, 4, 1. (25) Luus, R.; Okongwu, O. N. Towards practical optimal control of batch reactors. Chem. Eng. J. 1999, 75, 1. (26) Rusnak, A.; Fikar, M.; Latifi, M. A.; Meszaros, A. Receding horizon iterative dynamic programming with discrete time models. Comput. Chem. Eng. 2001, 25, 161. (27) Hull, T. E.; Enright, W. D., Jackson, K. R. User Guide to DVERKsA Subroutine for SolVing Nonstiff ODE’s; Department of Computer Science, University of Toronto: Canada, 1976; Report 100.

9630

Ind. Eng. Chem. Res., Vol. 48, No. 21, 2009

(28) Mekarapiruk, W.; Luus, R. Iterative dynamic programming with adaptive scheme for region size determination. Hung. J. Ind. Chem. 1999, 27, 235. (29) Luus, R. Handling difficult equality constraints in direct search optimization. Hung. J. Ind. Chem 1996, 24, 285. (30) Luus, R.; Storey, C. Optimal control of final state constrained systems, Proceedings of the IASTED International Conference on Modeling, Simulation, and Optimization; Singapore August 11-13, 1997; Hamza, M. H., Ed.; IASTED/Acta Press:Calgary, AB, 1997; p 245. (31) Luus, R.; Mekarapiruk, W.; Storey, C. Evaluation of penalty functions for optimal control. Proceedings of the International Conference on Optimization Techniques and Applications (ICOTA ′98); Curtin Printing Services: Perth, Western Australia, 1998; p 724. (32) Luus, R.; Mekarapiruk, W.; Storey, C. Evaluation of penalty functions for optimal control. Optimization Methods and Applications; Yang, X., Teo, K. L., Caccetta, L., Eds.; Kluwer Academic Publishers: London, UK, 2001; p 81. (33) Luus, R. Effective solution procedure for systems of nonlinear algebraic equations. Hung. J. Ind. Chem 1999, 27, 307. (34) Luus, R. Handling difficult equality constraints in direct search optimization. Part 2. Hung. J. Ind. Chem 2000, 28, 211. (35) Luus, R.; Sabaliauskas, K.; Harapyn, I. Handling inequality constraints in direct search optimization. Eng. Optim. 2006, 38, 391.

(36) Luus, R.; Jaakola, T. H. I. Optimization by direct search and systematic reduction of the size of search region. AIChE J. 1973, 19, 760. (37) Luus, R. Direct search Luus-Jaakola optimization procedure. Encyclopedia of Optimization, 2nd ed.; Floudas, C. A., Pardalos, P. M., Ed.; Springer, New York, 2009; p 735. (38) Liao, B.; Luus, R. Comparison of the Luus-Jaakola optimization procedure and the genetic algorithm. Eng. Optim. 2005, 37, 381. (39) Luus, R. Use of line search in the Luus-Jaakola optimization procedure. Proceedings of the 3rd IASTED International Conference on Computer Intelligence; Banff, Alberta, Canada, July 2-4, 2007; Acta Press: Anaheim, CA, 2007; p 128. (40) Luus, R.; Dittrich, J.; Keil, F. J. Multiplicity of solutions in the optimization of bifunctional catalyst blend in tubular reactor. Can. J. Chem. Eng. 1992, 70, 780.

ReceiVed for reView November 25, 2008 ReVised manuscript receiVed March 28, 2009 Accepted April 2, 2009 IE801806T