Optimal Control by Iterative Dynamic Programming with Deterministic

Publication Date (Web): December 2, 1999 ... we examine the effect of also including deterministic control candidates in iterative dynamic programming...
0 downloads 0 Views 75KB Size
84

Ind. Eng. Chem. Res. 2000, 39, 84-91

PROCESS DESIGN AND CONTROL Optimal Control by Iterative Dynamic Programming with Deterministic and Random Candidates for Control Wichaya Mekarapiruk and Rein Luus* Department of Chemical Engineering, University of Toronto, Toronto, Ontario M5S 3E5, Canada

In addition to randomly chosen candidates for control, we examine the effect of also including deterministic control candidates in iterative dynamic programming (IDP) to improve the chance of achieving the global optimal solution. Two types of deterministic control candidates (shifting and smoothing candidates) are chosen on the basis of the control policy obtained in the previous iteration. The search for the optimal value for control in the subsequent iteration is then made on the combined set of control candidates chosen randomly and deterministically. Three highly nonlinear and multimodal chemical engineering optimal control problems are used to illustrate the advantages of this procedure in obtaining the global optimum. Introduction In recent years, iterative dynamic programming (IDP) has been developed and applied to solve successfully many different classes of chemical engineering optimal control problems.1 In the most recent IDP algorithms, the candidates for control are chosen at random inside a region that is contracted after every iteration. This provides a high probability in attaining the global optimum solution and ease of implementation because no gradient information or auxiliary variable is required. The use of random search approach with IDP was suggested by Bojkov and Luus2 as a better alternative to the earlier approach of choosing the test values for the control variables over a uniform distribution.3 However, even though the chance of getting the global optimum with IDP is high, if the system is highly nonlinear, sometimes the convergence is slow and a local optimum may result.4-6 To improve the convergence property and the probability of obtaining the global optimum of IDP, in this paper, we investigate the use of a deterministic approach together with the existing random approach to select the test values for the control variables. The candidates for control chosen deterministically here can be classified into two types: shifting and smoothing candidates. The shifting candidate is meant to shift the control policy forward or backward along the time scale to prevent stalling at a local optimum. On the other hand, the smoothing candidate is meant to smooth the control profile and facilitate the convergence of the algorithm. These control candidates are calculated from the best control policy obtained in the previous iteration. They are then used in combination with the control candidates chosen randomly to establish the search space for IDP for the next iteration. In this way, the deterministic candidates are taken only as “sugges* To whom correspondence should be addressed. Telephone: 416-978-5200. Fax: 416-978-8605. E-mail: luus@ ecf.utoronto.ca.

tions”, and neither the deterministic nor random component of the search mechanism dominates the convergence of the algorithm. Thus, at the same time, we can take advantage of the robust characteristic generally provided by the random candidates, and the fast convergence rate and guarding feature against stalling at a local optimum offered by the deterministic candidates. To illustrate and test the procedure, we choose three highly nonlinear chemical engineering optimal control problems for which difficulties in optimization have been reported. Optimal Control Problem We consider the continuous dynamic system described by the differential equation

dx ) f(x,u) dt

(1)

where the initial state x(0) is given. The state vector x is an (n × 1) vector and u is an (m × 1) control vector bounded by

Rj e uj e βj

j ) 1, 2, ..., m

(2)

The performance index associated with this system is a scalar function of the state at final time tf given by

I ) Φ(x(tf))

(3)

The optimal control problem is to find the control policy u(t) in the time interval 0 e t < tf that maximizes the performance index in eq 3. Deterministic Candidates for Control The IDP algorithm used here is based on the algorithm developed by Luus3 for seeking piecewise constant control with region sizes contracted in a multipass fashion as recommended by Luus7 for high-dimensional

10.1021/ie990381t CCC: $19.00 © 2000 American Chemical Society Published on Web 12/02/1999

Ind. Eng. Chem. Res., Vol. 39, No. 1, 2000 85

systems. The main features of IDP are given in the review paper by Luus.1 As test values for control at each time stage in each iteration, here we choose R candidates at random and also three additional deterministic candidates based on the values for control at adjacent stages. The calculation of these deterministic candidates is similar to the use of shift and smoothing reproductive operators to generate new offsprings in evolutionary optimization method as proposed by Pham.8 At each time step, these three additional candidates for control are chosen as follows:

for 1 < k < P q-1 q ) [us] k-1 [uc] R+1,k

(4)

q q-1 [uc] R+2,k ) [us] k+1

(5)

q [uc] R+3,k

)

q-1 φ[us] k-1

+ (1 -

2φ)[us] q-1 k

+

q-1 φ[us] k+1

(6) for k ) 1 q [uc] R+1,1 ) [us] q-1 2

(7)

q ) (1 - φ)[us] q-1 + φ[us] q-1 [uc] R+2,1 1 2

(8)

q [uc] R+3,1

(9)

)

φ[us] q-1 1

+ (1 -

φ)[us] q-1 2

then reduces the computation effort required for IDP. Here we shall use the same smoothing factor φ for all time stages. The choice for the value of φ will be discussed in the examples. Numerical Results All computations were performed in double precision on a Pentium/300 personal computer using Watcom Fortran compiler version 9.5. The built-in compiler random number generator URAND was used for the generation of random numbers. The Fortran subroutine DVERK of Hull et al.10 was used for the integration of the differential equations with the local error tolerance chosen to be 10-6 for examples 1 and 2, and 10-8 for example 3. Example 1: Ethanol Fermentation Problem To illustrate the procedure, let us first choose the fermentation process involving ethanol fermentation as formulated and used for optimal control by Hong,11 and used for optimal control studies by Hartig et al.,12 Bojkov and Luus,13 and Luus and Hennessy.14 The differential equations describing the ethanol fermentation are

x1 dx1 ) Ax1 - u dt x4

(

for k ) P q [uc] R+1,P

)

q-1 [us] P-1

(10)

q q-1 [uc] R+2,P ) (1 - φ)[us] q-1 + φ[us] P-1 P

(11)

q q-1 ) φ[us] q-1 + (1 - φ)[us] P-1 [uc] R+3,P P

(12)

where φ is a smoothing parameter, 0 e φ e 0.5. Here q [uc]i,k denotes the i-th control candidate for u at stage k denotes the best control and iteration q, and [us]q-1 k value for u at stage k and iteration q - 1. The first R values of uc are the control candidates chosen at random over some allowable regions, and eqs 4-12 give the additional three candidates. Therefore, the total number of candidates for control used at each grid point in each time stage is M ) R + 3. The control candidates chosen in eqs 4, 5, 7, and 10 are shifting candidates, which are simply taken to be equal to the best control values obtained from the previous iteration for the adjacent stage(s). Because it is often found that a local optimum is just the shifting of the optimal control policy one or a few steps forward or backward along the time scale with similar size and shape,4,9 it is expected that the shifting candidates chosen in this way enable an escape route from a local optimum. This is especially important if there is a large change or switching involved in the control policy. The candidates for control calculated in eqs 6, 8, 9, 11, and 12 are used to smooth the control policy. Here, φ is the smoothing factor with the value within the range of 0-0.5. The motivation for the introduction of smoothing candidates is the fact that the optimal control policy always contains some smooth portion(s). Thus, by incorporating the smoothing candidates into the search space of IDP, the smooth portions of the optimal control policy can be established more readily, which

(13)

)

dx2 150 - x2 ) -10Ax1 + u dt x4

(14)

x3 dx3 ) Bx1 - u dt x4

(15)

dx4 )u dt

(16)

with

A) B)

( (

)(

x2 0.408 1 + x3/16 0.22 + x2

)(

)

x2 1 1 + x3/71.5 0.44 + x2

(17)

)

(18)

where x1 represents the cell mass concentration, x2 is the substrate concentration, x3 is the desired product concentration, and x4 is the volume of the liquid inside the reactor. The initial state is specified as

x(0) ) [1 150 0 10]T

(19)

with the feed rate to the reactor u constrained by

0 e u e 12

(20)

The liquid volume is limited by the 200 L vessel size, so that at the final time tf,

x4(tf) - 200 e 0

(21)

The performance index to be maximized is the total mass of the desired product at the final time:

86

Ind. Eng. Chem. Res., Vol. 39, No. 1, 2000

I ) x3(tf)x4(tf)

(22)

For the maximum yield, we expect that we must have the maximum allowed volume. Therefore, instead of the inequality constraint in eq 21, we have the equality constraint

h ) x4(tf) - 200 ) 0

Table 1. Number of Passes Required To Reach I ) 20841.1 When the Regular IDP Algorithm with θ ) 1 and Different Values of M and N in Example 1 Were Useda total number of candidates for control, M ) R number of grid points, N

6

r (0) ) 12 8

10

6

r (0) ) 6 8

10

3 5 7 9 11 13

* * * 15 14 *

* * 10 * 12 *

10 * * * 14 15

* * * * 8 *

* * 8 * * *

* * * * * *

(23)

Therefore, we formulate the augmented performance index to be maximized as

J ) x3(tf)x4(tf) - θ(h - s)2

(24)

where a penalty function with the shifting term s is added to the performance index in eq 22. The value of the shifting term is initially chosen to be zero and is updated after every iteration by the amount of violation of the equality constraint at the final time tf, i.e., by

s

q+1

q

) s - (x4(tf) - 200)

a

Table 2. Number of Passes Required To Reach I ) 20841.1 in Example 1 When N ) 3, M ) 6, and Different Values of O and θ Were Useda smoothing factor, φ

0.1

penalty function factor, θ 0.2 1 2 5

10

0 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.50

22 14 22 21 15 22 23 7 26

10 25 11 6 28 24 19 22 8

* 43 42 47 36 42 31 45 38

(25)

where q is the iteration number. The use of the quadratic penalty function with a shifting term was introduced by Luus15 as a way of dealing with difficult equality constraint in steady-state optimization. It was found to be efficient in handling equality state constraints in optimal control problems.16-18 To have direct comparison with the results in the literature,12,14 we chose the final time to be 63.0 h, and we considered the case when the time interval was divided into 20 stages (P ) 20). Hartig et al.12 compared three optimization methods: IDP, sequential quadratic programming (SQP), and the direct search optimization method of Luus and Jaakola19 (LJ optimization) in solving this problem and reported that IDP yielded the highest success rate with P ) 20. However, they found that, due to numerous local optima, a large number of grid points had to be used in order to achieve the global optimum. In fact, with the use of 23 grid points and between 21 and 41 allowable values for control, the global optimum was obtained in 18 out of 28 runs. Luus and Hennessy14 applied LJ optimization with an adaptive scheme for adjusting region size and with a shifting term in the penalty function to solve this problem, and achieved reasonably high success rates for P ) 10 and 15. However, when P ) 20, the overall success rate was found to be only 35%. In the work by Hartig et al.,12 IDP was used with absolute value penalty function to handle inequality state constraint in eq 21. Here we shall examine first how well the regular IDP algorithm performs when used with a quadratic penalty function including the shifting term. We took the penalty function factor θ ) 1, and chose the initial control to be halfway between the bounds, namely u(0) ) 6, and the initial region size to be the size of the interval between the upper boundary and the lower boundary on u: i.e., r(0) ) 12. The reduction factor γ and the restoration factor η were taken to be 0.8 and 0.9, respectively. We allowed a maximum of 50 passes, each pass consisting of 20 iterations. As is shown in Table 1, here the optimum I ) 20841.1 is achieved with a smaller number of grid points and a smaller number of allowable candidates for control compared to the results reported by Hartig et al.12 Of the 18 runs, the global optimum was obtained seven times; thus, the success rate in getting the global

Asterisk denotes a local optimum.

a

28 15 19 20 18 17 9 * 13

20 * 14 13 20 17 29 15 19

* * 32 27 25 23 24 26 18

Asterisk denotes a local optimum.

optimum is reasonably good but not very high (39%). In Table 1, the results obtained with r(0) ) 6 are also given. When smaller initial region size is used, the optimum is more difficult to reach with the regular IDP. In this case, of the 18 runs, the global optimum was obtained only two times (11%). Now, let us use IDP with the combined deterministic and randomly chosen candidates. The effect of the smoothing factor φ and the penalty factor θ are shown in Table 2. Here the parameters for IDP were taken to be the same as in the previous case. The initial region size was chosen to be r(0) ) 6 and we used 6 total allowable candidates for control (M ) 6). As is seen in the table, for each value of φ, the chance of getting the global optimum was higher than 60%. Furthermore, for an intermediate value of φ (0.15 e φ e 0.35), the global optimum was obtained in all of the runs. The effect of the penalty function factor θ on the convergence rate as seen in Table 2 is not great. However, for values of θ greater than 2, generally a larger number of passes is required to reach the optimum. To compare these results with those for regular IDP, we solved the problem using the present procedure with r(0) ) 6, the number of grid points taken to be within the same range as in Table 1, and the total number of allowable candidates ranging from 6 to 14. The smoothing factor φ was chosen to be 0.25 and the penalty factor θ was chosen to be 1. The results, given in Table 3, show that the success rate obtained with the proposed procedure was 87%, which is considerably higher than the success rate achieved with the regular IDP algorithm. The computation times are reasonable. For example, the computation time for 18 passes with N ) 3 and M ) 6 was 4 min, and for 5 passes with N ) 13 and M ) 14 was about 12 min on a Pentium/300. However, it is noted that even when a large number of grid points, N ) 11, were used, there was still a chance of getting a local optimum for this problem. This

Ind. Eng. Chem. Res., Vol. 39, No. 1, 2000 87 Table 3. Number of Passes Required To Reach I ) 20841.1 When O ) 0.25, θ ) 1, and Different Values of M and N in Example 1 Were Useda

Table 4. Number of Iterations Required To Reach I ) 10.0942 × 10-3 in Example 2 When γ ) 0.8 and Different Values of O and M Were Used

number of total number of candidates for control, M ) R + 3 grid points, N 6 8 10 12 14

smoothing factor, φ

3 5 7 9 11 13 a

18 19 * 14 19 21

12 10 15 8 10 5

18 * 20 8 14 5

11 10 8 8 * 11

7 * 5 15 6 5

Asterisk denotes a local optimum.

shows that care is needed in choosing the parameters for IDP. To achieve a high probability in obtaining the global optimum, several runs should be made by changing the number of grid points and the number of candidates for control. This is not a difficult task because, when the present method of choosing control candidates with IDP is used, the range of the parameters yielding the global optimum is increased.

0 0.10 0.20 0.25 0.30 0.40 0.50

dx1 ) -k1x1 dt

(26)

dx2 ) k1x1 - (k2 + k3)x2 + k4x5 dt

(27)

dx3 ) k2x2 dt

(28)

dx4 ) -k6x4 + k5x5 dt

(29)

dx6 ) k8x5 - k7x6 dt

(31)

dx7 ) k9x5 - k10x7 dt

(32)

with the initial condition

x(0) ) [1 0 0 0 0 0 0]T

(33)

The rate expressions are given by

i ) 1, 2, ..., 10 (34)

where the coefficients cij are given by Luus et al.4 The objective is to determine the catalyst blend u along the length of the reactor specified by the characteristic reaction time interval 0 e t < 2000 g h/mol such that the performance index

I ) x7(tf)

23 20 24 23 23 23 23

14 16 19 18 18 14 15

17 17 17 17 17 17 17

16 12 16 16 16 16 16

initial region size, r(0) 0.2 0.3 0.4

reduction factor, γ

0.1

0.60

10.0940

10.0941

0.65

10.0942 (20) 10.0941

10.0941

10.0942 (18) 10.0939

10.0942 (13) 10.0942 (20) 10.0942 (24) 10.0942 (29) 10.0942 (40)

10.0942 (15) 10.0942 (21) 10.0942 (24) 10.0942 (29) 10.0942 (41)

0.70 0.75 0.80 0.85 0.90

10.0942 (20) 10.0942 (18) 10.0942 (22) 10.0942 (38)

10.0938 10.0942 (16) 10.0938 10.0942 (21) 10.0942 (24) 10.0942 (37) 10.0942 (40)

0.5 10.0942 (12) 10.0936 10.0942 (20) 10.0942 (24) 10.0942 (28) 10.0942 (29) 10.0942 (36)

is maximized subject to the constraints

dx5 ) k3x2 + k6x4 - (k4 + k5 + k8 + k9)x5 + k7x6 + dt k10x7 (30)

ki ) ci1 + ci2u + ci3u2 + ci4u3,

27 24 22 27 27 22 22

Table 5. Value of the Performance Index I × 103 Obtained with Different Values of r(0) and γ in Example 2 When R ) 2 Was Used (Numbers in Parentheses Indicate the Number of Iterations Required To Reach I ) 10.0942 × 10-3)

Example 2: Bifunctional Catalyst Problem A very challenging optimal control problem possessing numerous local optima is the bifunctional catalyst problem as formulated by Luus et al.4 and considered by Luus and Bojkov.20 The system is described by the seven differential equations:

total number of candidates for control, M ) R + 3 5 7 9 11 13

(35)

0.60 e u e 0.90

(36)

As was done by Luus and Bojkov,20 let us divide the time interval into 10 stages, i.e., P ) 10. As the initial control policy, we chose the value of u halfway between the bounds, as specified by eq 36, i.e., u(0) ) 0.75, and as the initial region size, we chose r(0) ) 0.3. Only a single grid point (N ) 1) was used, and a run consisted of only a single pass with the reduction factor γ taken to be 0.8. To investigate the effect of the smoothing factor, we solved the problem using the proposed procedure with different values of φ. Table 4 shows the number of iterations required to reach the global optimum of I ) 10.0942 × 10-3 when φ in the range 0-0.5 and different numbers of total allowable control candidates are chosen. A larger number of allowable control candidates tends to yield the optimum in fewer iterations, but the effect of φ does not seem to be significant for this problem. The number of iterations required to reach the optimum does not vary much for different values of φ. In fact, they do not differ much from the number of iterations required when φ ) 0, suggesting that, for this problem, using only the shifting candidates along with randomly chosen values should suffice. The problem was then solved by incorporating only shifting candidates for control to the set of allowable control candidates. In this case, the numbers of total allowable control candidates become M ) R + 2 for stages 2 to P - 1 and M ) R + 1 for stages 1 and P. Table 5 shows the performance index obtained after 50 iterations for different values of γ and r(0) when taking R ) 2 (i.e., M ) 4 for stages 2 to P -1 and M ) 3 for stages 1 and P). In all runs, the global optimum was

88

Ind. Eng. Chem. Res., Vol. 39, No. 1, 2000

obtained up to at least five-figure accuracy. The computation time required for 50 iterations was 30 s on a Pentium/300 computer. The results in Table 5 compare favorably to the results obtained by Luus and Bojkov20 when the regular IDP algorithm was used. In their work, Luus and Bojkov20 chose the value of γ to be in the range of 0.70.9 and the initial region size in the range of 0.2-0.5. They found that to obtain the global optimum, the region size chosen had to be greater than 0.2; and when the initial region was chosen to be greater than 0.3, there were still some chances that the global optimum would be missed. However, one can see from Table 5 that, using the present procedure, even if the initial region size is as small as 0.1, the global optimum is still obtained. If the reduction factor γ was chosen to be equal to or greater than 0.75, the global optimum was always reached to six-figure accuracy. Example 3: Fed-Batch Reactor for Protein Production Problem Next let us consider the problem of determining the optimal nutrient and inducer feeding rates for a fedbatch reactor as introduced by Lee and Ramirez,21 and used for optimal control studies by Luus.1 The system is described by the following differential equations:

dx1 ) u1 + u2 dt

(37)

dx2 ) µx2 - ψx2 dt

(38)

µx2 dx3 100u1 - ψx3 ) dt x1 0.51

(39)

dx4 ) RfPx2 - ψx4 dt

(40)

dx5 4u2 - ψx5 ) dt x1

(41)

dx6 ) -k1x6 dt

(42)

dx7 ) k2(1 - x7) dt

(43)

dx8 ) u2 dt

(44)

where

µ)

(

)(

)

0.22x7 0.407x3 x6 + ζ 0.22 + x5

(45)

ψ ) (u1 + u2)/x1

(46)

ζ ) 0.108 + x3 + (x23/14814.8)

(47)

)

(48)

k1 ) k2 ) 0.09x5/(0.034 + x5)

(49)

RfP )

(

)(

0.095x3 0.0005 + x5 ζ 0.022 + x5

and

The initial state is specified as

x(0) ) [1 0.1 40 0 0 1 0 0]T

(50)

The final time is specified as tf ) 10 h, and the performance index to be maximized is

I ) x1(tf)x4(tf) - Qx8(tf)

(51)

where Q is the relative cost factor associated with the inducer feed. The constraints on the control variables are

0 e uj e 1,

j ) 1, 2

(52)

First, let us consider the case where the relative cost of the inducer to the value of the product is 5 (Q ) 5). Luus1 reported that, in this case, at the optimum, u1 is zero throughout the time interval. Thus, optimization can be done only on u2. By using IDP with flexible stage lengths, Luus1 achieved the performance index I ) 0.816480. This value is 2.2% better than the performance index 0.7988 reported by Lee and Ramirez.21 To solve this problem with stages of equal length, using IDP with deterministic-random search, let us divide the time interval into 50 stages (P ) 50). To check the observation made by Luus1 that when Q ) 5, u1 can be set to zero, we first performed the optimization on both u1 and u2. We chose as initial control values, u(0) 1 (0) (0) ) u(0) 2 ) 0, and as the initial region sizes, r1 ) r1 ) 0.5. The reduction factor γ and the region restoration factor η were taken to be 0.8. The smoothing factor was arbitrarily taken to be 0.25. A few preliminary runs showed that using 3, 5, or 7 grid points yielded rather slow convergence to the optimum. This is due to the nonlinearities and low sensitivity of the system. Therefore, we used 9 grid points (N ) 9) and 12 allowable candidates for control (M ) 12 or R ) 9). After 12 passes, each consisting of 50 iterations (the same performance index as reported by Luus1), I ) 0.816480 was achieved. The control policy for u2 is given in Figure 1. It has similar shape to, but is somewhat smoother than, the control policy given by Luus.1 The value of the control u1 at the optimum is indeed zero throughout the time interval. This allows us to simplify the problem and reduce the computational effort by setting u1 ) 0 and perform optimization on only u2. We now study the effect of the smoothing factor φ for this problem. As the initial control policy, we took u(0) 2 ) 0, and chose the initial region size, r(0), to be 0.5. Each run consists of only a single pass with 100 iterations. Table 6 shows the values of the performance index obtained when N ) 9 and different values for the number of allowable candidates for control M and smoothing factor φ were used. The results show that the value of φ has some effect on the convergence property of the algorithm for this problem. An intermediate value of φ (0.15 e φ e 0.35) tends to yield a better result than a value at the extremes. The computation

Ind. Eng. Chem. Res., Vol. 39, No. 1, 2000 89 Table 6. Value of the Performance Index Obtained When N ) 9 Was Used (Numbers in Parentheses Indicate the Number of Iterations Required To Reach I ) 0.816480) smoothing factor, φ 0 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 Figure 1. Optimal control policy of u2 for example 3 when Q ) 5, yielding I ) 0.816480.

0.45 0.50

total number of candidates for control, M ) R + 3 6 8 10 12 14 0.816439 0.816480 (53) 0.816480 (46) 0.816480 (42) 0.816480 (44) 0.816480 (42) 0.816480 (56) 0.816480 (43) 0.816478

0.816418 0.816480 (53) 0.816480 (44) 0.816480 (42) 0.816480 (42) 0.816480 (39) 0.816480 (43) 0.816480 (53) 0.816480 (37) 0.816473 0.816480 (46) 0.816473 0.816478

0.816471 0.816480 (50) 0.816480 (43) 0.816480 (41) 0.816480 (42) 0.816480 (42) 0.816480 (42) 0.816480 (53) 0.816480 (41) 0.816478

0.816464 0.816479 0.816479 0.816480 (45) 0.816480 0.816480 (42) (40) 0.816480 0.816480 (40) (43) 0.816480 0.816480 (42) (39) 0.816480 0.816480 (39) (40) 0.816480 0.816480 (39) (40) 0.816480 0.816480 (41) (40) 0.816479 0.816480 (40) 0.816479 0.816480 (39) 0.816478 0.816479 0.816479

Figure 2. Control policy of u2 yielding I ) 0.816478 for example 3 when Q ) 5, showing low sensitivity of control on the performance index.

time required for 60 iterations ranges from 219 s when M ) 6 is used to 510 s when M ) 14 is used. Note here the low sensitivity of the control policy on the performance index. Figure 2 shows a control policy yielding a performance index I ) 0.816478, obtained after a single pass when φ ) 0.5 and R ) 7 were chosen. This control policy is quite different from the optimal control policy in Figure 1, even though the performance index obtained in this case is just 0.0002% lower than the optimal solution. This kind of low sensitivity of the performance index to a change in the control policy makes the problem very difficult to solve. Next let us consider the case where there is no penalty for the use of inducer feed (Q ) 0). In this case, u1 is not zero, and the search must be done on both u1 and u2. Using IDP with 10 stages of flexible lengths, Luus1 obtained I ) 1.00960. Here, we took the initial control (0) values to be u(0) 1 ) u2 ) 0, and the initial region sizes to be 0.5 for the two controls. The smoothing factor was chosen to be 0.25. After four passes, each consisting of 50 iterations, the performance index I ) 1.009600 was

Figure 3. Optimal control policy of u1 and u2 for example 3 when Q ) 0.

achieved without any difficulty. The computation time required was 25 min. To further refine the control policy, several runs were made. The best control policy obtained yielding I ) 1.009604 is given in Figure 3. The value of u1 reaches the peak of 0.95 at stage 44 (corresponding to 8.6 e t e 8.8) and then decreases to zero, whereas u2 reaches the maximum of 1.0 also at stage 44 and remains at 1.0 until the final time. Now, let us take the relative cost of the inducer to be just slightly greater than zero, i.e., Q ) 0.002. Using the same IDP parameters as above, we obtained the performance index I ) 1.007738. The optimal control policy for this case in Figure 4 shows that increasing value of Q from zero by only a very small amount causes significant reduction in the controls of u1 and u2. The

90

Ind. Eng. Chem. Res., Vol. 39, No. 1, 2000 k ) index used for stage number m ) number of control variables M ) number of sets of total allowable values for the control n ) number of state variables N ) number of grid points used in IDP P ) number of time stages Q ) relative cost factor used in eq 51 r ) region over which allowable values for the control are taken R ) number of sets of random allowable values for the control RfP ) nonlinear function defined by eq 48 s ) shifting term used in the quadratic penalty function t ) time tf ) final time u ) scalar control variable u ) control vector, dimension (m × 1) uc ) candidate for the control us ) best control value x ) state vector, dimension (n × 1) Greek Letters

Figure 4. Optimal control policy of u1 and u2 for example 3 when Q ) 0.002.

peak value for u1 is reduced from 0.95 to 0.08, and the value of u2 rapidly decreases to zero after reaching the maximum value of 1.0 at stage 46 (corresponding to 9 e t e 9.2). Although Q penalizes the control u2 only, it is interesting to note that it affects u1 more strongly. Conclusions The use of combined deterministic and randomly chosen values for the control candidates in iterative dynamic programming improves convergence to the optimum for the three difficult chemical engineering problems chosen here. The effect of the smoothing factor φ was found to be problem-dependent, but an intermediate value of φ ) 0.25 gave good results in all three examples. Even though care is still necessary in choosing the parameters for IDP, the ranges for the parameters yielding the global optimum become greater when the deterministic candidates are included, and the success rate in achieving the global optimal solution for IDP is improved. Acknowledgment Financial support from the Natural Science and Engineering Research Council of Canada is gratefully acknowledged. Nomenclature A ) expression defined by eq 17 B ) expression defined by eq 18 f ) nonlinear function of x and u, dimension (n × 1) h ) equality constraint defined in eq 23 I ) performance index J ) augmented performance index

Rj ) lower bound on control uj βj ) upper bound on control uj γ ) reduction factor for the control region after every iteration ζ ) nonlinear function defined in eq 47 η ) region restoration factor used after every pass θ ) penalty factor µ ) nonlinear function defined in eq 45 φ ) smoothing factor Φ ) performance index ψ ) nonlinear function defined in eq 46 Subscripts f ) final value i ) index used for control candidate number j ) index used for control variable number k ) index used for stage number Superscripts (0) ) initial value q ) index used for iteration number

Literature Cited (1) Luus, R. Iterative Dynamic Programming: From Curiosity to a Practical Optimization Procedure. Control Intelligent Syst. 1998, 26, 1. (2) Bojkov, B.; Luus, R. Use of Random Admissible Values for Control in Iterative Dynamic Programming. Ind. Eng. Chem. Res. 1992, 31, 1308. (3) Luus, R. Optimal Control by Dynamic Programming Using Accessible Grid Points and Region Reduction. Hung. J. Ind. Chem. 1989, 17, 523. (4) Luus, R.; Dittrich J.; Keil F. J. Multiplicity of Solutions in the Optimization of a Bifunctional Catalyst Blend in a Tubular Reactor. Can. J. Chem. Eng. 1992, 70, 780. (5) Bojkov, B.; Luus, R. Evaluation of the Parameters Used in Iterative Dynamic Programming. Can. J. Chem. Eng. 1993, 71, 451. (6) Mekarapiruk, W.; Luus R. Optimal Control of Final State Constrained Systems. Can. J. Chem. Eng. 1997, 75, 806. (7) Luus, R. Numerical Convergence Properties of Iterative Dynamic Programming when Applied to High Dimensional Systems. Chem. Eng. Res. Des. 1996, 74, 55. (8) Pham, Q. T. Dynamic Optimization of Chemical Engineering Processes by an Evolutionary Method. Comput. Chem. Eng. 1998, 22, 1089. (9) Luus, R. Application of Iterative Dynamic Programming to Optimal Control of Nonseparable Problems. Hung. J. Ind. Chem. 1997, 25, 293.

Ind. Eng. Chem. Res., Vol. 39, No. 1, 2000 91 (10) Hull, T. E.; Enright, W. D.; Jackson, K. R. User Guide to DVERK-a Subroutine for Solving Nonstiff ODE’s. Report 100, Department of Computer Science, University of Toronto, Canada, 1976. (11) Hong, J. Optimal Substrate Feeding Policy for Fed Batch Fermentation with Substrate and Product Inhibition Kinetics. Biotechnol. Bioeng. 1986, 27, 1421. (12) Hartig, F.; Keil, F. J.; Luus, R. Comparison of Optimization Methods for a Fed-Batch Reactor. Hung. J. Ind. Chem. 1995, 23, 141. (13) Bojkov, B.; Luus, R. Optimal Control of Nonlinear Systems with Unspecified Final Times. Chem. Eng. Sci. 1996, 51, 905. (14) Luus, R.; Hennessy, D. Optimization of Fed-Batch Reactors by the Luus-Jaakola Optimization Procedure. Ind. Eng. Chem. Res. 1999, 38, 1948. (15) Luus, R. Handling Difficult Equality Constraints in Direct Search Optimization. Hung. J. Ind. Chem. 1996, 24, 285. (16) Mekarapiruk, W.; Luus, R. Optimal Control of Inequality State Constrained Systems Ind. Eng. Chem. Res. 1997, 36, 1686. (17) Luus, R.; Storey, C. Optimal Control of Final State Constrained Systems. Proc. IASTED Int. Conf. on Modelling,

Simulation and Control, Singapore, Aug 11-13, 1997; IASTED/ Acta Press: Anaheim, CA, 1997; p 245. (18) Luus, R.; Mekarapiruk, W.; Storey, C. Evaluation of Penalty Functions for Optimal Control. Proc. Int. Conf. on Optimization Techniques and Applications ICOTA ’98, Perth, Australia, July 1-4, 1998; Curtin Printing Services: Curtin, Australia, 1998; p 724. (19) Luus, R.; Jaakola, T. H. I. Optimization by Direct Search and Systematic Reduction of the Size of Search Region. AIChE J. 1973, 19, 760. (20) Luus, R.; Bojkov B. Global Optimization of the Bifunctional Catalyst Problem. Can. J. Chem. Eng. 1994, 72, 160. (21) Lee, J.; Ramirez, W. F. Optimal Fed-Batch Control of Induced Foreign Protein Production by Recombinant Bacteria. AIChE J. 1994, 40, 899.

Received for review June 1, 1999 Revised manuscript received September 29, 1999 Accepted October 12, 1999 IE990381T