Piecewise linear continuous optimal control by iterative dynamic

Iterative dynamic programming overcomes the curse of dimensionality usually associated with dynamic program- ming, by allowing a very coarse grid to b...
1 downloads 0 Views 624KB Size
Ind. Eng. Chem. Res. 1993,32, 859-865

869

PROCESS DESIGN AND CONTROL Piecewise Linear Continuous Optimal Control by Iterative Dynamic Programming Rein Luus Department of Chemical Engineering, University of Toronto, Toronto, Ontario, M5S 1A4 Canada

Iterative dynamic programming is extended to provide piecewise linear continuous control policies to systems described by seta of ordinary differential equations. T o ensure continuity in the optimal control policy, only a single grid point is used for the state at each time stage. Three examples that are used to test the viability of the procedure show that the proposed procedure is attractive from the computational point of view and provides reliable results even for highly nonlinear systems that exhibit singular control. constraints of the type

Introduction

Iterative dynamic programming overcomes thecurse of dimensionality usually associated with dynamic programming, by allowing a very coarse grid to be used to generate accurate results. Recently Bojkov and Luus (1992) showed that 11grid points were adequate for a system having 20 state variables and 20 control variables. From the preliminary design point of view, the nature of the optimal control policy does not matter, so the use of piecewise constant control is a reasonable approximation. However, in the implementation of the control policy, there are situations where use of discontinuous control may be undesirable,as discussed by Teo et al. (1991) in considering the control of a container crane by a hoist motor and a trolley drive motor. Also in chemical engineering,we prefer to have a control policy that is continuous, rather than a policy that requires sudden switching from one level to another. Even when the optimal control policy is bang-bang, a simulationstudy of a fed-batch reactor by Luus (1991a) showed that very little is gained by very large switches. It is therefore desirable to carry out optimal control studies where a continuous control policy is specified. Teo et al. (1991) considered obtaining piecewise linear continuous control by introducing an extra set of differential equations with bounds on the derivatives. By iterative dynamic programming,however,no extra equations are necessary, since the conditions of piecewise linearity and continuity can be readily handled. The purpose of this paper is to extend iterative dynamic programming to situations where the resulting optimal control policy is piecewise linear and continuous, and to examine the effect of imposing such a condition on the control policy by considering several examples. Problem Formulation

Let us consider the system described by the differential equation @ = f(x,u) dt

with x(0) given, where x is an ( n X 1)state vector and u is an (m X 1)control vector bounded by (2) aiIuiI&, i = 1,2, ...,m The state variables are constrained by s inequality

$;(x) I O ,

i = 1,2, ...,s

(3)

Associated with the system is a performance index I[x(O),t,I = @(X(tf))

(4)

For most of the problems state inequality constaints do not exist, so for ease of presentation we first ignore eq 3. However, in the last numerical example we show how the state inequality can be incorporated into the algorithm. The optimal control problem is to find the control policy u(t) in the time interval 0 5 t < tf such that the performance index in eq 4 is minimized (or maximized), and so that u(t) is piecewise linear and continuous. To solve this optimal control problem by iterative dynamic programming, we divide the time interval [O,tfl into P time stages, each of length L,and seek a piecewise linear control policy in the time interval ( t k , t k + l ) by

where uj(k) is the value of uj at t k and U j ( k + l ) is the value of uj at tk+l. The optimal control problem then is to find u(k), k = 0, 1,...,P - 1, such that the performance index in eq 4 is minimized. For the last stage we specify that u is kept constant at the value u(P-1). By iterative dynamic programming, eq 5 does not introduce any difficulties, since uj(k+l) is known from the previous stage, and all that remains is to find the best value for uj(k)at the beginning of the interval. To provide a continuous control policy, we shall use a single grid point at each time stage. There may be situations where convergence difficulties may result, but from the point of view of simplicity it is a good compromise. To handle the inequality constraints in eq 3 the method of penalty functions, as used for final state constraints by Luus and Rosen (1991) and used for general state constraints by Luus (1991b), can be used. Iterative Dynamic Programming Algorithm for Piecewise Linear Control

The iterative dynamic programming procedure using systematicregion contraction as describedby Luus (1991a) is modified to impose the constraint of having piecewise

0888-5885/93/2632-0859$04.00/0 0 1993 American Chemical Society

860 Ind. Eng. Chem. Res., Vol. 32, No. 5, 1993

linear continuous control policy. The steps can be summarized into the following algorithm. Algorithm: 1. Divide the time interval [O,tfl into P time stages, each of length L. 2. Choose the number of allowable values M for u. 3. Choose an initial profile for each ui, initial region size ri, and the contraction factor y. 4. By using the initial control policy, integrate eq 1 from t = 0 to tf to generate the x trajectory and store the values of x at the beginning of each time stage, so that x(k-1) corresponds to the value of x at the beginning of stage k . 5. Starting at stage P, corresponding to time tf - L, integrate eq 1 from tf - L to tf, using as an initial state x(P-1) from step 4 once with each of the allowable values for the control vector that is kept constant throughout the time interval. Choose the control u(P-1) that gives the minimum value for the performance index, and store the value for use in step 6. 6. Step back to stage P - 1,corresponding to time tf 2L. For each allowable value of u(P-2) integrate eq 1by using as an initial state x(P-2) chosen from step 4 and the control policy computed from eq 5 where u(P-1) is the stored value from step 5. Continue integration until t = tf using for the last stage the constant value u(P-1) from step 5. Compare the M values for the performance index and choose the u(P-2) that gives the smallest value. 7. Continue the procedure until stage 1,corresponding to the initial time t = 0, is reached. As before, compare Mvalues of the performance index and choose the control u(0) that gives the minimum value for the performance index. Store the corresponding x trajectory. 8. Reduce the region for allowable control = yrci) (6) where j is the iteration index. Use the optimal control policy from step 7 as the midpoint for the allowable values for the control u at each stage. 9. Increment the iteration index j by 1and go to step 5. Continue the procedure for a specified number of iterations such as 30 and examine the results. There are several methods that can be used to choose allowablevalues for the control at a particular stage. If the control is a scalar, then the easiest way to choose the M candidates is by choosing u*,u* f (2i/(M- l))r, i = 1, 2, ...,((M1)/2), where u* is the best value from the previous iteration. It is obvious that if u is not scalar, this scheme will run into difficulties when the dimensionality of u is high. This problem was resolved by Bojkov and Luus (1992)by suggesting the use of random choices for u inside the allowable region. That worked very well and enabled a problem with 20 state variables and 20 control variables to be solved readily with iterative dynamic programming. In the present investigation we are interested in testing the viability of the use of piecewise linear control in iterative dynamic programming as outlined in the above algorithm, so the uniform procedure for choosing candidates for control is used throughout this investigation. To examine the viability of this algorithm,we consider several examples. Example 1: Optimal Control of a CSTR. Let us first consider the nonlinear CSTR considered by Luus and Cormack (19721, where the equations describing the chemical reactor are given by rci+l)

-dx, - -(2 + u)(x, + 0.25) + ( x 2 + 0.5) exp( dt

2) (7)

0.1;

0.1 c

I

-

I

I

X W

0

z W 2 0.1: a

5 zU

a

W

a 0.IL

\ P = IO

P =20

0.1:

I

I

I

1

I

IO

20 I T E R A T I O N NUMBER

Figure 1. Convergence of the proposed algorithm with M = 3, y = 0.75.

_ dx2-- 0.5 - x 2 - ( x 2 + 0.5) exp(25x1 ) dt x1 + 2

(8)

with the initial state x(0) = lO.09 0.09 0.0p (10) The optimal control problem is to choose the scalar control u in the time interval 0 I t < tf to minimize I = X&f)

(11)

where tf = 0.78. Employing control vector iteration procedure based on Pontryagin’s maximum principle, Luus and Cormack (1972) showed that there exists a local minimum of I = 0.244 425 and a global minimum of I = 0.133 094. By using dynamic programming in iterative fashion, with an (11X 11)grid for the variables XI and x2, and 21 allowablevalues for control, Luus (1990) obtained I = 0.133 21 by using 78 stages. However, the computation time was prohibitively high, almost 3 h on the CRAY XMP/24 computer. By using only accessible states for the grid elements as proposed by Luus (19891, Luus and Galli (1991) were able to obtain I = 0.133 16 with 80 stages, using only 2 min of computation time on the CRAY. This is a well-suited problem for using piecewise linear continuous control because the optimal control policy itself is smooth and continuous. As was done by Luus and Galli (19911, it was decided to use a multipass method where the number of stages is doubled after every pass, each consisting of 30 iterations, and the best control policy obtained is used as an initial control policy for the next pass. For the initial pass P = 10 was chosen and four passes were made. The initial control policy u(0) = 2 was used and r(O)= 2 was chosen with a reduction factor y = 0.75. Three allowable values, M = 3, were chosen for the candidates for control. From an initial value of 0.416 75, the performance index was rapidly reduced to 0.135 31 at the end of the first pass, as shown in Figure 1. At the end of the second pass

Ind. Eng. Chem. Res., Vol. 32, No.5, 1993 861 0.13361

X

W

I

I

I

I

1

I

I

I

I

1

c

0 0.1334

z

I

0.13331

I

0.I330 0

I

20

IO

30

I T E R A T I O N NUMBER

Figure 2. Continuation of the run for passes 3 and 4, yielding Z = 0.133 12.

with P = 20, I = 0.133 52 was obtained. The computation time for the first two passes was 35.2 s on the 486133 personal computer. The computations were continued for two more passes to provide a more refined control policy. As is shown in Figure 2, the convergence is systematic and rapid. The computation time for these two additional passes was 124.5 son the personal computer. The resultingvalue of 0.133 12 for the performance index at the end of the fourth pass with P = 80 is very close to the optimum of 0.133 094 obtained by Luus and Cormack (1972) by using second variation method in control vector iteration. Figure 3 shows the advantage of using piecewise linear continuous control rather than piecewise constant control. It is noted that the extrapolated value for each curve is approximately 0.133 09. The piecewise linear control policy is given in Figure 4. As can be seen, the policy obtained with 10 stages is very close to the optimal. When the procedure was repeated with the same values for all parameters, except do)= 1, convergence to the local optimum with I = 0.244 41 was obtained. This is marginally better than 0.244425 reported by Luus and Cormack (1972). As was shown by Luus and Galli (1991), depending on the initial control policy and the initial region size, one may get either the local optimum or the global optimum, but the global optimum is very easily obtained for this system. Example 2: Nondifferentiable System. We next consider the system used by Thomopoulos and Papadakis (1991), who showed the difficulty of convergence of several optimization techniques. The system is described by &I -=

dt

x2

-&2 - -xl - x 2 + u + d dt b 3 = 6Xl2+ 2.6XZ2+ 0 . 5 ~ ' dt

with x(0) = 10 0 0IT where the disturbance term is given by

(15)

d 100[ U(t-0.5) - U(t-0.6)l (16) This means that there is a rectangular pulse of magnitude 100 applied from t = 0.5 until t = 0.6.

I-"

-I

0.13

3 0

0 2

I

3 3

lo3

i/p2

Figure 3. Comparison of piecewise linear continuous control to piecewise constant control: (0-0)piecewise linear; (A-- -A)piecewise constant.

1

51

I

I

I

I

,

I

,

,

,

I

4 II

DIMENSIONLESS TIME t / t f

Figure 4. Piecewise linear control policies: (-) 10-stage approximation, Z = 0.13531; (- -) 40-stage approximation, Z = 0.133 19.

-

The problem is to find the control u such that x3(tf) is minimized, where tf = 2.0 8. Thomopoulosand Papadakii reported a minimum value of 58.538 for the performance index. The discontinuities of the Hamiltonian at t = 0.5 and t = 0.6 prompted the authors to use an exponential smoothing procedure. In iterative dynamic programming, however, no differentiation is required and therefore no special procedures are necessary. For iterative dynamic programming, we chose 7 = 0.75, M = 3, do)= 0.0, do) = 5.0, and a three-pass method, starting with P = 10. As is shown in Figure 5, the convergence in the first pass was veryrapid from the initial value of 220.39 to 58.27 within 10 iterations. In the second pass with P = 20, the initial region size was restored to 5.0 and, as an initial control policy, the optimal control policy obtained in the fiit pass was used. The performance index

862 Ind. Eng. Chem. Res., Vol. 32, No. 5, 1993

Ol

-2

X

w "I50

z

i

.

2

1

-

0

0.5

I .o

1.5

2.0

T I M E t [set)

Figure 7. Optimal piecewise constant control policy for example 2. I = 58.20. 0

5

IO

I T E R A T I O N NUMBER

Figure 5. Convergence of the proposed algorithm for example 2 with A4 = 3, y = 0.75, P = 10.

are slightly better than reported by Thomopoulos and Papadakis (19911, and there are no approximations other than the linear fitting of the control policy over very short time segments. Example 3: Fed-Batch Fermenter. As a third example to test the viability of piecewise linear continuous control, we consider a fed-batch reactor for which the optimal control policy is discontinuous. We therefore consider the fed-batch fermenter involving biosynthesis of penicillin as studied by Lim et al. (1986) and revised by Cuthrell and Biegler (1989). The fermenter is described by the four differential equations

dx1 = hlxl - (&)u dt

"-(

dx3 hlxl h -=--dt 0.47 21.2

o.029x3 0.0001 + x 3

+

-+

(1 500 x4 (19)

Figure 6. Optimal piecewise linear control policy for example 2. I = 58.18.

was reduced to 58.19. In the third pass a minimum of 58.18 was reached. Although 20 iterations were allowed for each pass, the convergence to the minimum value in each pass was achieved within 10 iterations. The total computation time for this three-pass run was 68.1 s on 486133 personal computer. The resulting piecewise linear control policy is given in Figure 6. For comparison, also piecewise constant control policy was run. After three passes requiring 58.3 s of CPU time, a performance index of 58.59 was reached after the first pass, 58.28 after the second, and 58.20 at the end of the third pass with P = 40. The resulting control policy is given in Figure 7. It is obvious for this problem that the piecewise linear continuous control policy is considerably better. The state variables are given in Figure 8. The results

with

where the numerical values for the parameters given by Cuthrell and Biegler have been substituted directly into the equations. The state variables are the concentrations of biomass, product, and substrate, and the volume, respectively. The initial state is x(0) = 11.5 0 0 7IT The constraints on the feed rate are

(23)

6c

Ind. Eng. Chem. Res., Vol. 32, No. 5, 1993 863

5

I

4

I

I I I

\

\ \ \

Figure 8. Optimal a t a t e variable trajectoriesfor example 2:

-

(-)

XI;

(- -) X Z .

(24)

OIu550

The Constraints on the state variables are OIxl140

(25)

0 Ix 3 I25

(26)

0 Ix4 I10

(27)

The performance index to be maximized is the total amount of penicillin produced at time tf given by

I = . q t , ) X4(tf)

(28)

The optimal control problem is to find the final time tf and the control policy u(t)in the time interval 0 It < tf that will give a maximum value to the performance index given in eq 28. The problem in this form corresponds to Case I11 of Cuthrell and Biegler (1989). To solve this optimal control problem by iterative dynamic programming, we divide the time interval into P stages of equal length

L = tf/P (29) and seek a piecewise linear continuous control policy for which we have to determine the values u(O),u(l), ...,u(P1) to maximize the performance index. If P is taken to be sufficiently large, then good approximation to the optimal control policy should be obtained. It is proposed to handle the stateconstraints imposed by eqs 25-27 by using penalty functions as outlined by Luus and Rosen (1991) and by Luus (1991b). By using the same penalty function factor for each of the constraints, we introduce the augmented performance index P

where

P

P

The optimal control problem now is to fiid the point values u(O),u(l), ...,u(P-1) to maximize the augmented performance index in eq 30. Since the equations are highly nonlinear, it was decided to use for integration DVERK subroutine as developed by Hull et al. (1976). The use of 10-6 tolerance gave reliable results. We chose P = 20,B = 40, and y = 0.70. For this problem it is necessary to have a good initial control policy in order to get reliable results. Therefore, the starting control policy was taken from the optimal piecewise constant control policy corresponding to tf = 132 h. Initial region size was chosen to be 1.0 and 20 iterations were allowed. A performance index of 87.842 was obtained in 38 min of CPU time on the 486133 personal computer. Another run was made with initial region size of 0.2, yielding 87.851 for the performance index. This value as compared to 87.69 obtained by Cuthrell and Biegler (1989) is a significant improvement, but is not quite as good as 87.948 obtained by iterative dynamic programming using piecewise constant control. Therefore, there is a loss of only 0.11 % performance by imposing linear continuous control. The comparison of piecewise linear and piecewise constant control policies is shown in Figure 9, and the corresponding state variables are given in Figure 10. It is difficult to determine the optimal batch time tf, because in the range tf = 129.5h to tf = 133.5h the optimal performance index is between 87.85 and 87.86 and the peak is not well-defined, as shown in Figure 11. For tf below 129.5, however, the performance index starts to decrease noticeably, reaching a value of 87.80 at tf = 128 h. It is interesting to examine the effect of the volume constraint on the performance index. The volume constraint was relaxed in incrementa of 0.01,and optimization was carried out with a batch time tf = 129.5h. The results shown in Figure 12 show that the performance index can be increased by 3% to 90.5 by increasing the volume of the tank by 1% to 10.1 L. The increase is mostly due to the higher concentration of the product. A similar type of increase can be obtained by lowering the initial volume in the reactor from the given value of 7 L to 6.9 L as is shown in Figure 13. The control policy corresponding to 6.9-L initial condition is given in Figure 14. The performance index of 90.521 at 6.9 L is 3 % higher than 87.845 obtained with the given initial volume of 7.0 L. This shows the importance of examining both the design possibilities and the operating possibilities for obtaining an optimal system. Although only a single state grid point was used at each time stage to provide a continuous control policy, no convergence difficulties were experienced for these three examples. If, however, the dimensionality of the system is very high and there are many control variables, it is recommended to use a multipass method where the number of stages is kept constant and the algorithm is used repeatedly. After each pass the initial region size for the control is reset to a smaller value than what was used at the beginning of the previous pass. Conceptually there appears to be no difficulty in handling optimal control

864 Ind. Eng. Chem. Res., Vol. 32, No. 5, 1993 =

O

L

07.7 W

a

-1

0 0

I I I 0.25 0.50 0.75 DIMENSIONLESS T I M E t/t,

1.0

Figure 9. Comparisonof optimal piecewise linear continuous control to optimal piecewise constant control for example 3 with tf = 132 h (- - -) piecewise constant; (-) piecewise linear.

-1

W

120 m

I

0

0.25

0.50

1.0

0.75

DIMENSIONLESS TIME t / t f

Figure 10. State variable profiles with optimal piecewise linear control with tf = 132 h: (-) xl;(-) xz; (- - -) x3; (- -) x4. e

10.00

10.05 VOLUME CONSTRAINT

problems of very high dimension by taking a relatively small number of test points a t each grid point. Then by using such a multipass method, convergence to some specified tolerance should be possible. Research in this area is continuing. Conclusions

The use of piecewise linear continuous control in iterative dynamic programming provides a good means of approximating the optimal control policy. If the optimal control policy is continuous, as is the case in the first two examples, the piecewise linear continuous control policy provides a better approximation than piecewise constant control. If the optimal control policy is discontinuous as in the last example, very little is lost by using a piecewise linear continuous control instead of piecewise constant control, since the extra benefit obtained by very rapid bang-bang type of control is quite negligible. Since piecewise linear control is easier to implement, the present work extends the usefulness of iterative dynamic programming.

10.10

(L)

Figure 12. Effect of increasing the volume capacity for the reactor; tf = 129.5 h.

Acknowledgment

Financial support from the Natural Sciences and Engineering Research Council of Canada Grant A-3515is gratefully acknowledged. Nomenclature d = disturbance term

f = nonlinear function of x and u (n X 1) Z = performance index i = general index, variable number j = general index, iteration number J = augmented performance index k = general index, stage number L = length of stage m = number of control variables M = number of allowable values for the control vector n = number of state variables p = penalty factors used in augmented performance index

Ind. Eng. Chem. Res., Vol. 32, No. 5, 1993 865 9’

“ 7

t = time to = initial time tf = final time T = temperature, K u = control vector (rn X 1) U = unit function x = state vector (n X 1) Greek Letters ai = lower bound on control ui j3i = upper bound on control ui

y = reduction factor for the control region 0 = penalty function factor @ = function of final state used for performance index +i = inequality constraint

Literature Cited

07 6.90

6.95

7.00

(L)

I N I T I A L VOLUME

Figure 13. Effect of changing the initial volume in the reactor; t f = 129.5 h.

Y ob



I







0.5



I

I

I

1’ .0

DIMENSIONLESS BATCH TIME t / t f

Figure 14. Piecewiselinear optimal control policy with initial volume of 6.9 L, yielding a performance index of 90.521.

P = number of stages r = region over which control is taken (m X 1) s = number of inequality constraints

Bojkov, B.; Luus, R. Use of Random Admissible Values for Control in Iterative Dynamic Programming. Znd. Eng. Chem. Res. 1992, 31, 1308-1314. Cuthrell, J. E.; Biegler,L. T. Simultaneous Optimization and Solution Methods for BatchReador ControlProfdes. Comput. Chem.Eng. 1989,13, 49-62. Hull, T. E.; Enright, W. D.; Jackson, K. R. “Users Guide for DVERK-a Subroutine for Solving Nonstiff ODE’S”;Report 100; Deuartment of ComDuter Science,Universitvof Toronto: Toronto, Canada, 1976. Lim, H. C.; Tayeb, Y.J.; Modak, J. M.; Bonte, P. Computational Algorithms for Outimal Feed Rates for a Class of Fed-Batch Fermentation: Numerical Resulta for Penicillin and Cell Mass Production. Biotechnol. Bioeng. 1986,28,1408-1420. Luus, R. Optimal Control by Dynamic Programming using Accessible Grid Points and Region Contraction. Hung. J. Znd. Chem. 1989, 17,523-543. Luus, R. Optimal Control by Dynamic Programming Using Systematic Reduction in Grid Size. Znt. J. Control 1990,51, 995’ 1013. Luus, R. Effect of the Choice of Final Time in Optimal Control of Nonlinear Systems. Can. J. Chem. Eng. 1991a,69,144-151. Luus, R. Application of Iterative Dynamic Programming to State Constrained Optimal Control Problems. Hung. J. Znd. Chem. 1991b,19,245-254. Luus, R.; Cormack, D. E. Multiplicity of Solutions Resulting from the Use of Variational Methods in Optimal Control Problems. Can. J. Chem. Eng. 1972,50,309-311. Luus, R.; Galli, M. Multiplicity of Solutions in Using Dynamic Programming for Optimal Control. Hung. J.Znd. Chem. 1991,19, 55-62. Luus, R.; Rosen, 0. Application of Dynamic Programming to Final State Constrained Optimal Control Problems. Znd. Eng. Chem. Res. 1991,30, 1525-1530. Teo, K. L.; Goh, C. J.; Wong, K. H. A Unified Computational Approach to Optimal Control Problems: ?r Pitman Monographs and Surveys in Pure and Applied Mathematics 55; Longman Scientific & Technical: Essex, England, 1991;pp 226-230. Thomopoulos, S. C. A.; Papadakis, I. N. M. A Single Shot Method for Optimal Step Computation in Gradient Algorithms. Proceedings of the 1991 American Control Conference; Automatic Control Council, IEEE Service Center: Piscataway, NJ, 1991;pp 2419-2422. Received for review September 9,1992 Revised manuscript received December 22, 1992 Accepted January 25, 1993