Ind. Eng. Chem. Res. 1999, 38, 2513-2514
2513
Reply to Comments on “Enhancement of the Global Convergence of Using Iterative Dynamic Programming To Solve Optimal Control Problems” Chyi Hwang* Department of Chemical Engineering, National Chung Cheng University, Chia-Yi 621, Taiwan
Jeng-Shaw Lin† Department of Chemical Engineering, National Cheng Kung University, Taina 700, Taiwan
Sir: The iterative dynamic programming (IDP) method initiated by Luus and co-workers is indeed a powerful technique for finding optimal controls for continuoustime dynamic systems described by nonlinear differential equations. In the literature, it has been applied to solve various optimal control problems with a great possibility of obtaining the global optimum solutions. However, as is well-known, IDP is a heuristic optimization method whose performance depends on several parameters, such as Nx, Nt, Nc, Np, Ni, R, and η. Moreover, the possibility of using IDP to obtain global optimal solutions also depends on the way of generating candidate controls and the problem type. How to choose proper parameters in using an IDP scheme to solve a specific optimal control problem in order to ensure the global optimum is always difficult. Hence, any suggestions that can lead to improving the computational efficiency and the solution convergence of IDP are helpful to interested readers. One of the advantages of IDP lies in the fact that the scheme of dynamic programming can be implemented in a parallel computing environment so that the optimization computation can be accomplished in a prespecified time interval. This is very crucial in such a real-time application as the nonlinear model predictive control, in which the optimization computation must be accomplished in a specified time interval. The IDP scheme with a single-state grid point at each time stage cannot generally get a satisfactory result without using multipass computations. However, multipass IDP computations are sequential operations which cannot take full advantage of a parallel computer, in which a computing job can be finished in a much shorter time interval if the underlying computation scheme can be implemented in a parallel manner. Hence, single-pass IDP computation with multiple-state grid points has its great potential use in real-time applications, even if the solution of the problem to be solved can be obtained offline via the multipass single-grid IDP scheme. From the above viewpoint, we think it is not necessary to make a criticism of the examples illustrated in our paper. There is almost always a tradeoff between computational time and global convergence in solving an optimization problem which has more than one local optimum. As mentioned in our paper, the convergence of the IDP solution depends on the contraction factor R. It is generally true that the larger the contraction factor used, the more computation time that is required and the greater the possibility to obtain a globally converged †
This author passed away in July 1998, 1 month after he received his Ph.D. degree from National Cheng Kung University.
solution. It is therefore not surprising that Luus demonstrated a converged solution by IDP with allowable controls chosen randomly and with R ) 0.95 and 100 iterations, which requires more than triple the computer time required by Sobol’s systematic approach of assigning allowable controls and with R ) 0.85 and 30 iterations. It is noted that the use of R ) 0.85 with 30 iterations reduces the control resolution by the factor 0.8530 ) 0.007 63, which is comparable to the factor of 0.95100 ) 0.005 92 in the use of R ) 0.95 with 100 iterations. Hence, the contraction factor of R ) 0.85 and the 30 iterations taken in example 2 are quite reasonable for a compromise between the computation time required and the global optimal solution expected. This comparison demonstrates the advantage of using Sobol’s systematic approach to generate control candidates, allowing the use of a smaller region contraction factor, which in turn reduces the number of iterations and computing time, for obtaining a global optimal solution. By the way, we did not conclude that the single-pass IDP with randomly generated controls fails to obtain a converged solution for any region contraction factor. In our paper, we gave the formula NcNt(1 + Nx(Nt 1)/2) for estimating the number of state shootings required in one cycle of usual backward dynamic programming (DP) computation. In the following, we explain how this formula was derived. Suppose that the number of state grid points for each time stage is Nx and the number of allowable controls for each state grid point is Nc. The time interval of interest [0, tf] is partitioned into Nt equal-length time stages. The kth time stage spans the time interval [(k - 1)∆, k∆], where ∆ ) tf/Nt. Let the Nx state grid points of the kth time stage be represented by xˆ (k-1, i), i ) 1, 2, ..., Nx and the Nc allowable controls for each state grid points by u(k, i), i ) 1, 2, ..., Nc. The backward DP starts the control evaluation from the state grid points of the last time stage, or time stage Nt, which spans the time interval [(Nt - 1)∆, Nt∆]. At time stage Nt, the number of state shootings required to evaluate the optimal controls associated with the Nx state grid points xˆ (Nt-1, k), k ) 1, 2, ..., Nx, is NcNx. At time stage Nt 1, the performance index INt-2(k, i) associated with the kth state grid point xˆ (Nt-2, k) and the ith candidate control u(Nt-2, i) is evaluated by first performing time integration of augmented dynamic equations with the initial condition [xˆ (Nt-2, k), 0] and the control u(Nt-2, i) from t ) (Nt - 2)∆ to (Nt - 1)∆ to obtain an augmented state, say [x((Nt-1)∆), xn+1((Nt-1)∆)]. In general, the state x((Nt-1)∆) may not coincide with one of the state grid points xˆ (Nt-1, j), j ) 1, 2, ..., Nx, generated for time stage Nt. Suppose the nearest state grid point to the state x((Nt-1)∆) is given by x(Nt-1, l)
10.1021/ie991076a CCC: $18.00 © 1999 American Chemical Society Published on Web 05/07/1999
2514 Ind. Eng. Chem. Res., Vol. 38, No. 6, 1999
whose optimal control has been evaluated to be u(Nt1, j). With the control u(Nt-1, j) and the initial condition [x((Nt-1)∆), xn+1((Nt-1)∆)], we then integrate the augmented state equations from t ) (Nt-1)∆ to Nt∆ to obtain the performance index INt-2(k, i) ) xn+1(Nt∆). Hence, at time stage Nt - 1, it is required to perform two state shootings for each control evaluation associated with a state grid point and 2NcNx state shootings for control evaluations of Nx state grid points. Similarly, a control evaluation for a state grid point at time stage Nt - 2 requires performing three state shootings. Therefore, it is required to perform 3NxNc state shootings for Nx state grid points with Nc allowable controls for each state grid point. By induction we know that the control evaluation for Nx state grid points at time stage Nt - j, 0 e j e Nt - 2 requires performing (j + 1)NxNc state shootings. Because there is only one state grid point for the first time stage, the control evaluation requires performing NcNt state shootings. On the basis of the above description, we know that the total number of state shootings performed in one cycle of backward dynamic programming is given by
NcNt + NcNx + 2NcNx + ... + (Nt - 1)NcNx ) NcNt + NcNxNt(Nt - 1)/2 This does not include the number of state shootings for generating state grid points.
It is noted that there is an alternative approach to the control evaluation. At time stage j, 0 e j < Nt, the performance index Ij-1(k, i) associated with the kth state grid point xˆ (j-1, k) and the ith allowable control u(j1, i) is evaluated by integrating the augmented state equations with the initial condition [xˆ (j-1, k), 0] and the control u(j-1, i) from t ) (j - 1)∆ to j∆ to reach an augmented state [x((j∆)), xn+1(j∆)]. Let the nearest state grid point at time stage j + 1 to the state x(j∆) be xˆ (j, l), of which the optimal control and the associated performance index are given by u(j, ν) and I*(j, l). Then the performance index Ij-1(k, i) is evaluated as
Ij-1(k, i) ) xn+1(j∆) + I*(j, l) This way of evaluating the performance index can reduce a great number of state shootings. However, the computed performance index does not represent the true performance index because of the discrepancies between the shooted states and state grid points at time instants t ) j∆, (j + 1)∆, ..., (Nt - 1)∆. For example, there is a discrepancy between the shooted state x(j∆) and the state grid point x(j, l). In authors’ experience, this method of evaluating the performance index often leads to poor convergence in IDP optimization. Acknowledgment We thank Dr. Rein Luus for his interest in our paper. IE991076A