Aspects of Forward Dynamic Programming Algorithm - Industrial

Industrial & Engineering Chemistry Process Design and Development · Advanced Search .... Aspects of Forward Dynamic Programming Algorithm. J. H. Seinf...
0 downloads 0 Views 430KB Size
literature Cited

Ark, R., “Optimal Design of Chemical Reactors,” p. 16, Academic Press, New York, 1961. Brown, N., Hartig, M. J., Roedel, M. J., Anderson, A. W., Schweitzer, C. E., J . A m . Chem. SOC. 77, 1756 (1955). Buxbaum, L. H., Liebigs Ann. Chem. 706,81 (1967). Ishimoto, S.,Sasano, T. (to Teijin, Ltd.), Japan Patent 1967/16294 (Sept. 4, 1967). Ishimoto, S., Togawa, H., Naruchi, T. (to Teijin, Ltd.), Japan Patent 1967/11819 (July 5, 1967).

Kato, S., Mashio, F., Mem. Fac. Ind. Arts, Kyoto Tech. Univ. Sci. Technol. 6, 67 (1957). Kunugi, T., Matsuura, T., Oguni, S.,J. Japan Petrol Inst. 6, 26 (1963). Parlant, C., Rev. Inst. Franc. Petrole Ann. Combust. Liquides 19, No. 7-8, l(1964).

RECEIVED for review November 24, 1967 ACCEPTEDMarch 28, 1968

ASPECTS OF THE FORWARD DYNAMIC PROGRAMMING ALGORITHM JOHN H. SEINFELDIAND LEON LAPIDUS Department of Chemical Engineering, Princeton University, Princeton, N . J .

The computational features of the forward dynamic programming algorithm as applied to a nonautonomous optimal control problem are discussed and a detailed numerical example is presented.

programming may be used for solving multistage Bellman’s principle of optimality leads to a computer algorithm in which the last stage is optimized first (Aris, 1964; Bellman, 1957) (the backward algorithm). However, to begin a t the last stage knowledge of the final state or stage number is required. Thus, in certain nonautonomous systems, solution of the optimization or control problem by this customary procedure is extremely timeconsuming because the dynamic programming algorithm must be solved over and over as in a boundary-value problem. A forward dynamic programming algorithm in which the first stage is optimized initially, exists as a dual of, although much less known than, the backward algorithm. This dual technique has been discussed by Dreyfus (1965) and Nemhauser (1966). Bhavnani and Chen (1966) noted the utility of the forward approach in solving certain time-dependent control problems with fixed and free final times. Aris, Nemhauser, and Wilde (1964) have discussed the general technique of working from the fixed to the free end of straight-chain, cyclic, and branching systems using dynamic programming. I n the present paper, the authors discuss the computational features of the forward dynamic programming algorithm as applied to a nonautonomous optimal control problem. To help in this discussion a detailed numerical example is presented; consideration is given to effective means of reducing the computer storage problem normally associated with dynamic programming. Since computation time (Lapidus and Luus, 1967) may often be a major drawback in the use of dynamic programming, it is also pointed out how this time may vary significantly between the use of the forward and backward algorithms. YNAMIC

D optimization and control problems.

Principle of Optimality

Consider a system governed by the differential equations

Present address, Department of Chemical Engineering, California Institute of Technology, Pasadena, Calif.

where x ( t ) is ap-dimensional state vector and u ( t ) anr-dimensional control vector. I n terms of a n optimal control problem u (t)EUmust be chosen such that the scalar performance index

+

4 4 0 , t f l = @[X(t,)l

is minimized.

1 t/

J[x(t),u(t)ld

(2)

I n discrete notation Equations 1 and 2 become X,+I

- x,

=

h(x,,u,,nAt)

(3)

and

+ c J(x,,u,) AJ-1

Z[u,,NI

=

‘P.(XN)

n=l

At

(4)

where N is the number of time stages. I n terms of the well known backward dynamic programming algorithm Bellman’s principle of optimality states (Bellman, 1957): “An optimal policy has the property that whatever the initial state and initial decisons are, the remaining decisions must constitute an optimal policy with regard to the state resulting from the first decision.” Application of Bellman’s principle to the problem detailed above leads to the recurrence relation

where FN-,(x,) = cumulative value of the performance index Equation 4, by using the optimal policy from stage n to stage N with x, the state resulting from the nth stage. I n this form the explicit algorithm embedded in Equation 5 requires a backward pass starting from stage N (the last stage) and working toward stage 1 (the initial stage). A second pass in the opposite direction then develops the explicit optimal stage information. This is called the backward algorithm. Bhavnani and Chen (1966) have formulated a dual to the above principle of optimality: “An optimal policy has the property that whatever the ensuing state and decisions are, the preceding decisions must constitute a n optimal policy with regard to the state existing before the last decision.” VOL. 7

NO. 3 JULY 1 9 6 8

475

When applied to the system of Equations 3 and 4,this leads to a forward recurrence relation Hn(xn) = minu,-, [J(x,-l,u,-d

At

-I- H,-l(x,-1)1

(6)

where H,(x,) = cumulative value of the performance index, Equation 4, by using the optimal policy from stage 1 to stage R with x, the state from the nth stage.

In this form the explicit algorithm embedded in Equation 6 requires a forward pass starting from stage 1 and working towards stage N . A second pass in the opposite direction then develops the explicit optimal stage information. This is called the forward algorithm.

value of T,- 1, calculating t,- 1. This will yield a value for the argument of the minimum operator. Assuming t, and calculating t,- 1 corresponds to calculating the input of a system given the output. This technique, called state inversion (Aris et a!., 1964; Wilde, 1965), may become cumbersome for a multidimensional system. I n this particular example, however, a simple Newton-Raphson iteration procedure was employed with excellent results. Unconstrained Exact Optimal Solution. The current problem without bounds on the temperature may actually be solved exactly. The procedure used here is to follow the maximum principle formulation (Lapidus and Luus, 1967) for the optimal solution. T o eliminate the time dependence we introduce a new state variable, y ( t ) , defined by

y(t) = 1

Numerical Example

Development of Algorithm. As a n example of the application of the forward dynamic programming algorithm we consider a first-order exothermic reversible catalytic reaction carried out in a batch reactor. T h e catalyst involved is known to decay with time. Often such a catalyst deterioration can be accounted for by a n exponential time term that multiplies the frequency factor of the rate constant for the forward reaction. Thus, we assume that the course of the reaction can be described by

y(0) = 0 so that the performance index to be minimized is merelyy(t,). The Hamiltonian, B,is given by

fl

= zl(kloe-orYe-E1/RT(l- x)

- k20e--E11RT~]-

22

(13)

where 21 and 22 are the customary adjoint variables. T o minimize y (t,) we maximize R; this may be done by setting @/bT = 0 for unconstrained variation. This yields (from Equation 13) the optimal temperature policy

(7) where x is the dimensionless extent of reaction and (Y is a n empirical factor determined by the rate of catalyst decay. I t is desired to determine the temperature progression T ( t ) such that starting from the pure reactant [ x ( O ) = 01 we achieve a desired conversion @ in the least possible time t,. We note that the system is time-dependent with a n unknown final time t,. If we let Q = 0 and J = 1 in Equation 2 or 4, Irepresents the minimum time. Since t j is unknown, the backward algorithm initially requires a guess of t , at x(t,) = @ and is subsequently worked backward until x = 0. I n general, t will not also equal zero when x = 0, so the procedure must be repeated with a new guess for t,. However, the forward algorithm will handle this problem with one pass, significantly reducing the over-all time of computation. If we denote the right-hand side of Equation 7 as R(x,T,t), the performance index can be stated as

A simple finite difference representation of Equation 7 yields t,+1

- t,

Ax =

R(x,, T,,tn)

(9)

with the integral in Equation 8 replaced by the summation

T h e recursion relation for the forward algorithm becomes Ax

H,(tn) = minr.-,

R(x,-

1,

TQ- i&

Ho(to= 0)

I)

= 0

T o apply the algorithm, Equation 11, we select a value for Ax and for each assumed value oft, we search the admissible 476

l & E C PROCESS D E S I G N A N D D E V E L O P M E N T

For the following selected parameters k1° = lo5

E1

=

6000

kz'

Et

=

12000

=

10"

(u=2

p

= 0.1

the optimal unbounded temperature policy as generated by Equation 14 is shown in Figure 1 (the solid line); the temperature is plotted as a function of time with &,) indicated. Computational Considerations. At this point we need to discuss the computational considerations involved in dynamic programming with reference to the particular problem under consideration. For systems of more than a few state variables classical dynamic programming techniques require excessive storage capacity. T h e recently published techniques of Larson (1965) and Bard (1967) have been shown effective in reducing the necessary memory requirements. We will now indicate how the ideas of Larson and Bard can be used with the forward dynamic programming algorithm, as well as how the classical scheme of Bellman (1957) is used in computation with the forward algorithm. Let us first consider the classical computational technique of Bellman as applied to the catalyst decay example. I t is most important to point out that in this particular example, time t is taken to be the state of the process and conversion x to be the independent variable-e.g., Equation 9. Thus, we fix a priori a value of Ax and take the time at each value of x to be the state of the system. The first step forward from x = 0 to x = A x does not involve a minimization, since for fixed Ax, a value of tl issimply calculated corresponding to each admissible value of T . If we have M values of T , we will have M values of t l . An interpolation routine is required to determine corresponding temperatures a t values o f t not identically equal to any of the M values of tl. A simple linear interpolation has been used here. Each additional step from t l on will involve a minimization.

I 680

1

I

1

1

‘6601

I

I

UNCONSTRAINED MAXIMUM PRINCIPLE

---

CONSTRAINED DYNAMIC PROGRAMMING

r.J r-iI

620 I-

xn-2

600

I Figure 1. reaction

I

0.005

I

I

1

0.015 0.020 HOURS TIME.

0,010

I 0.025

I

Optimal temperature progression for batch

Figure

2.

Grid

xn

Xn-i representation of

We have not as yet specified 6 x . method it is set

forward

algorithm

According to Larson’s

6 x = j ~ ~t ( x , ax,T’,t’)i

The first operation required is to select a series of t,+ 1 values from which the M values of T can be used to calculate back to t, and isolate that value of T corresponding to the particular t n + l that yields the lowest cumulative value of the time, H(t,+1). The set of t , + l values must be chosen so that a n admissible value of T is able to transfer the system from a t, that lies between the highest and lowest values oft, on the previous step. This is necessary to avoid extrapolation. This necessity of making a careful and realizable choice of the state a t n + l (or n - 1 if the backward algorithm is being used) is often overlooked in discussions of the computational aspects of dynamic programming. I n particular, assume we have just completed the nth recursion step and have stored L (not necessarily equal to M ) values of t,, each with its associated best value of T,- 1 . Now, we must provide a means of assuming a set of tn+1 values for the next step. However, we want to retain only those tn+l values that are reachable from the set of the L t, values by each of the admissible T,. We note the smallest and largest t, and begin assuming t,+ 1 values a t the lowest value oft, (since time must increase). We increment the t,+ 1 values from this point on by an amount dependent in general on the range of the t, until the set n is no longer reachable. The key point is that if this scheme is employed, the incremental amount must be small enough to provide for a set of t , + l not significantly smaller than the t,, and large enough not to violate dimensionality restrictions. Thus, after a few steps it may happen that no t,+ 1 chosen is reachable from t , or, conversely, too many t,+ 1 values are reachable. I n any case, it is evident that with several state variables the storage requirements will become excessive. T o utilize Larson’s method we form a grid of points x, = nAx, n = 0,1,2,. . . , tl, = kAt, k = 0,1,2. . . Assume we are a t ( X , , t k ) and desire to estimate H(x,,tb). I n general, Equation 9 can be written as t k - t’ 1 -(15 ) 6x R(x, 6x,T’,t’)

-

-

-

where t‘ is the time a t x, 6 x and T‘ is the control applied a t that point. As pointed out previously with 6 x and T’ fixed, Equation 15 represents an implicit equation for t ’ which must be solved in general by iteration. Equation 11 may be rewritten as H(x,,tk) = m i n p {R(xn

- 6%6x,T’,t’) f

H(x,

- a+’)}

(16)

(17)

where At = tk+l - tk or ti, - tk-1 and t’ = tk+l or t k - l , Consider the representation in Figure 2 as suggested by Bard. If Ax _< 6 x 5 2Ax (Case I), H(x, - 6 x , t’) is estimated by interpolation between H(x,- 1, tk+ I ) and H(x,- 2, tk+ I ) . If 6 x > 2Ax (Case 2) we must store values of H(x,-j, tk+J , j > 2. If 6 x < Ax (Case 3) to determine H ( x , a x , t’) we must extrapolate from H(x,- 2, tk+ I ) and H (x,- I , t k + I ) . Obviously, the same reductions in storage requirements obtained by Larson’s state increment method are realized by the forward algorithm as well as by the backward algorithm, in that only points immediately adjacent to tk need be stored a t any step. However, one important difference can be seen from Equation 17-that is, it is an implicit equation for 6 x in the forward formulation and must be solved by iteration. Bard has suggested a modification of Larson’s method that uses interpolation and extrapolation along the t-axis (the state) rather than the x-axis (the independent variable). His technique may also be extended readily to the forward algorithm. I n particular, if we set A x = 6x, we have

-

Now, H(x,- I , t’) may be estimated from H(x,- ti;)an H(x,tk+ I ) or H(x,- I, t k - I). Cases 1 and 2 reduce to interpolation between H(x,-1, t k ) and H ( ~ , - I t, k + J while Case 3 results in extrapolation from H(x,- 1, t k ) and H(x,- tk+ To carry out the interpolation or extrapolation we define 6 x from Equation 17. We see that 6 x / A x = A t / ( t ’ t J , SO that for the situation depicted in Figure 2,

-

Thus, we always carry out an interpolation (Cases 1 and 2) or extrapolation (Case 3) along the x,- 1 axis, rather than along, say, the tk+ 1 axis as in Larson’s method. We have chosen to employ the classical method as outlined above in solving the catalyst decay problem, mainly to avoid extrapolation, which has been shown by Bard to be a main source of error propagation in dynamic programming. I n addition, this particular example is not faced with the “curse of dimensionality” and thus the classical method is adequate to use. Equation 14 predicts that an infinite temperature should be employed a t the beginning of the reaction. Obviously this cannot be done with any numerical procedure; instead lower VOL. 7

NO. 3

JULY

1968

477

Table 1.

Largest and Smallest Values of tn with Corresponding Tn- 1 Stage n 0 0.0

Smallest

tn Tn-

Largest

tn Tn- 1

1

No. of intermediate values

0.0 0

7 0,00083 665 0,00240 595 13 7

6

Smallest

tn

Largest

tn Tn-

Tn-1

No. of intermediate values

1

0.00881 650 0.0165 595 21

Nomenclature

F

478

activation energies = cumulative value of performance index from backward algorithm =

I&EC P R O C E S S DESIGN AND DEVELOPMENT

9

8

0.0116 635 0.0205 665 18

and upper bounds were set for T(t) with a finite number of values intermediate between the limits. More specifically, the upper temperature bound of T* = 665’ K. and a lower temperature of T , = 595’ K. were chosen as realistic. Discrete values of T ( t )a t 5’ intervals between the limits were used and a Ax = 0.01 waschosen. Figure 1 presents a comparison of the optimal profile as generated from the maximum principle (solid line) with the result from the forward algorithm (dashed line). The computation required 1 minute and 16 seconds on a n IBM 7094. The particular advantage of the forward algorithm is plainly evident. This total time would have to be multiplied by the over-all number of iterations on an original guess of t, to give the time required to solve a comparable problem by the backward routine. For a multidimensional problem (which can be fitted into computer storage) the increase in time associated with the extra iterations o f t , is prohibitive. The backward algorithm in this case appears equivalent to guessing the missing boundary conditions for the two-point boundary value problem arising from the variational calculus. The dynamic programming value oft, = 0.0254 is sufficiently close to the true t , = 0.0240. Of course, the necessity for an upper limit on T ( t ) ,which exists in any practical situation, will cause the discrete dynamic programming t , to exceed the unconstrained continuous maximum principle result. The information developed in the forward pass is shown partially in Table I, where for each stage number, n, the smallest and largest values oft, and the optimal Tn-1 are indicated. In each case, the number of intermediate values between the limits varied. The smallest value of the time a t x = fl is 0.0254. If, however, a different method of choosing the t,+l were employed, a slightly lower value of tlo might have been obtained. The results shown in Figure 1 were developed by simply working back through the results indicated in Table I.

El, E2

2 3 0.00187 0.00300 660 665 0.00490 0,00752 595 595 28 31 Stage n

0.0153 620 0.0373 665 34

O.Ol9G 600 0.0511 650 36

4

0.00453 665 0.0102 595 29 70 0.0254 595 0,0706 640 38

r-dimensional function

f H

=

B

= Hamiltonian = performance index = integrand of performance index

r

J kP, k 2 O N

R t T u X

X V f l , 22

5 0.00636 655 0.0133 595 26

= cumulative value of performance index from for-

ward algorithm

frequency factors total numbers of stages = right-hand side of Equation 7 = time variable = temperature = r-dimensional control vector = extent of reaction = p-dimensional state vector = auxiliary variable = adjoint variables

= =

GREEKLETTERS CY

P

@

= = =

constant dependent on rate of catalyst decay desired extent of reaction performance index term

literature Cited

Aris, R., “Discrete Dynamic Programming,” Blaisdell Publishing Co., Waltham, Mass., 1964. Aris, R., Nemhauser, G. L., Wilde, D. J., A.I.Ch.E. J. 10 (61, 913 (1964). Bard, Y., ZEEE Trans. Automatic Control AG12, 97 (1967). Bellman, R., “Dynamic Programming,” Princeton University Press, Princeton, N. J., 1957. Bhavnani, K. H., Chen, K., “Optimization of Time-Dependent Systems by Dynamic Programming,” Joint Automatic Control Conference Proceedings, 1966. Dreyfus, S. E., “Dynamic Programming and the Calculus o Variations,” Academic Press, New York, 1965. Lapidus, L., Luus, R., “Optimal Control of Engineering Processes,” Blaisdell Publishing Co., Waltham, Mass., 1967. Automatic Control AC-10, 135 (1965). Larson, R. E., ZEEE Nemhauser, G. L., Introduction to Dynamic Programming,” Wiley, New York, 1966. Wilde, D. J., Chem. Eng. Progr. 61, 3, 86 (1965). RECEIVEDfor review December 21, 1967 ACCEPTED March 4, 1968

This work made use of computer facilities supported in part by the National Science Foundation Grant NSF GP-579. The National Science Foundation supported part of this work under Grant NSF GK-460.