ARTICLE pubs.acs.org/IECR
Using Sparse-Grid Methods To Improve Computation Efficiency in Solving Dynamic Nonlinear Chance-Constrained Optimization Problems Michael Kl€oppel, Abebe Geletu, Armin Hoffmann, and Pu Li* Simulation and Optimal Processes Group, Institute of Automation & Systems Engineering, Ilmenau University of Technology, POB 10 05 65, 98684 Ilmenau, Germany ABSTRACT: Chance-constrained programming is known as a suitable approach to optimization under uncertainty. However, a serious difficulty is the requirement of evaluating the probability of holding inequality constraints through the numerical computation of multidimensional integrals. If a nonlinear system with many uncertain variables is considered, the computational load will be prohibitive when using a full-grid integration method. Thus our aim is to investigate a method to decrease the computation expense in solving nonlinear chance-constrained optimization problems with many uncertain variables. In particular, we consider dynamic nonlinear process optimization under uncertainty, which will be transferred into a nonlinear chanceconstrained optimization problem by a discretization scheme. To solve this problem, we propose to use sparse-grid methods for the evaluation of the objective function, the probability of constraint satisfaction, and their gradients. These components are implemented in a nonlinear programming framework. A dynamic mixing process is taken to illustrate its computation efficiency. It can be shown that the computation time will be significantly reduced using the sparse-grid method, in comparison to using fullgrid methods.
1. INTRODUCTION It is a well-accepted fact that uncertainties exist in almost all industrial processes. In many situations, these uncertainties have a considerable impact on the process design and operation. Therefore, optimization of such processes must be considered under these uncertainties. Several approaches have been proposed to carry out this task. These include (approximated) robust counterparts,1,2 approximate polyhedral dynamic programming,3 nominal solutions,4 measurement-based optimization,5 extended Kalman filter-based nonlinear model predictive control approaches,6 worst-case and distributional robustness analysis,7,8 and chance-constrained optimization.9,10 In this study, we will concentrate on the chance-constrained optimization (CCOPT) approach. The main idea of CCOPT is to require the satisfaction of process restrictions with a predefined probability level. Another advantage of CCOPT is that a relationship between the optimality, in the sense of achieving an optimal objective function value, and the reliability, in the sense of satisfying process restrictions, can be obtained. Based on this relationship, a compromised decision that balances the profitability and reliability can be made. The last two decades have shown a rising interest in the numerical solution of CCOPT problems. This is partly due to the increased capability of computer hardware, as well as software and the availability of efficient computational approaches to largescale deterministic optimization problems. Recently, the demand for high reliability, fault tolerance and risk minimization in structure design,11,12 and financial risk metrics13 has led to wider areas of applications of CCOPT. Most importantly, the introduction of CCOPT in process engineering problems has facilitated the design of robust control strategies and reliable performance in the presence of model and parameter uncertainties.1422 r 2011 American Chemical Society
There exist several available approaches to the solution of CCOPT problems. The main difference between these approaches is the way statistical moments, probabilities, and their gradients are evaluated. Until recently, most of the models considered were linear. Analytical solutions can be obtained for linear systems with single chance-constraints.23 A linear CCOPT problem with a joint chance constraint is inherently nonlinear and the probability evaluation can be made using the inclusionexclusion method proposed by Szantai and Prekopa.24 Based on this method chance-constrained model predictive control was studied.19,20 Cannon et al.25 analyzed a chanceconstrained control approach for linear time-dependent systems with certain quadratic objective functions. The solution of nonlinear CCOPT problems is usually carried out in a nonlinear programming (NLP) framework. To do this, the values and gradients of the objective function and probability constraints must be computed. For nonlinear systems, a direct evaluation of probabilities of holding output restrictions is not promising, since the probability distribution of outputs can hardly be explicitly described, because of the propagation of uncertainties through nonlinear model equations. Monte Carlo sampling was used for computing average sums of function values to approximate chance constraints.18,26 Despite the fact that this approach is applicable, irrespective of the type of the distribution function of uncertain variables, it requires a very large sample size to yield accurate estimations for the probability values. Importance sampling may provide some improvements over Monte Received: December 2, 2010 Accepted: March 9, 2011 Revised: February 21, 2011 Published: March 31, 2011 5693
dx.doi.org/10.1021/ie102426w | Ind. Eng. Chem. Res. 2011, 50, 5693–5704
Industrial & Engineering Chemistry Research Carlo sampling.27 For high dimensional systems, however, it is well-known that sampling methods are not efficient for evaluating gradient values. Based on a monotonic relationship between the constrained output and an uncertain input, a back-mapping approach to evaluate the probability of the output constraint satisfaction in the space of the uncertain inputs was proposed by Wendt et al.28 Collocation on finite elements (a full-grid method) was used for the numerical multivariate integration. This method was further studied by Arellano-Garcia and Wozny,14 for monotonicity analysis, and by Flemming et al.,15 for optimization of closedloops under uncertainty, as well as by Xie et al.,22 for nonlinear model predictive control. The back-mapping approach requires intensive computation, because of its demand for repeatedly solving the nonlinear model equations at the grid points. For CCOPT of large-scale systems, these full-grid integrations will be prohibitive, in particular when an online implementation is needed. Therefore, an efficient computation of high dimensional integration remains as a challenge for solving large-scale nonlinear CCOPT problems. High-dimensional integration approaches use interpolation properties of one-dimensional polynomials to generate integration nodes (grid points) and weighting parameters (weights). These are either a tensor product of one-dimensional quadrature rules or recursive dimension-wise integration techniques, which are collectively known as full-grid integration rules. Full-grid integration techniques are known to be ineffective for integrals of high dimensions.29 In contrast, sparse-grid approaches, based on fully symmetric integration formulas, require very few integration nodes. They are found to provide an efficient evaluation of high-dimensional integrations by reducing computation time significantly.30 Sparse grids were first proposed by Smolyak in 1963,31 and they have been recently applied to many fields of numerical computation, such as stochastic partial differential equations,32,33 microelectromechanical systems,34 and data mining,35 as well as quantum mechanics.36 However, to the best of our knowledge, sparse-grid methods have only been applied in solving nonlinear chance-constrained process engineering problems for the steady-state case.17 Therefore, the aim of this study is to investigate an efficient approach to multivariate integration for nonlinear CCOPT problems with a large number of uncertain variables. In particular, we consider dynamic nonlinear optimization problems under uncertainty. While steady-state processes always possess time-independent uncertain variables, in dynamic processes, there may be time-dependent variables, which is the case that we consider in this study. We transform the dynamic CCOPT problem to a nonlinear CCOPT problem through a discretization scheme. In this way, a large number of uncertain variables must be treated if time-dependent uncertain variables are considered. In addition, since constraints must be satisfied at each discrete time point, the discretization leads to many chance constraints. To solve this problem in a NLP framework, we employ the sparse-grid method for the computation of probability values in chance constraints and statistical moments (expectation, variance) in the objective function, as well as their gradients. Because of the fact that many fewer grid points are needed for achieving the same numerical accuracy as that obtained using a full-grid method, the computation expense for the multivariate integration will be considerably reduced. In our case study, we apply a full-grid method as well as a sparse-grid method for a dynamic mixing process and compare the results
ARTICLE
regarding computation time and numerical accuracy. It can be seen that, indeed, the computational load can be significantly decreased using sparse-grid methods without compromising the numerical accuracy. The remainder of the paper is organized as follows. Section 2 gives a general formulation of dynamic nonlinear CCOPT problems. We use a simple discretization scheme to transfer this problem to a discrete nonlinear CCOPT problem. Section 3 presents the back-mapping approach, through which output constraints are projected to an input space upon which output probability will be evaluated. Section 4 introduces the sparse-grid method for numerical integration. In section 5, computation results of the case study are presented and compared with those from a full-grid method. Section 6 concludes the paper and addresses some future work in this area.
2. NONLINEAR TIME-DISCRETE CCOPT PROBLEMS Dynamic processes are usually described by differential equations (time-continuous form) or difference equations (timediscrete form). The time-continuous form can be transformed to the time-discrete form through the use of finite differences. We consider the following model equations in the time-continuous form: y_ ¼ gðu, y, ξÞ where uðtÞ ¼ ðu1 ðtÞ, :::, ul ðtÞÞ yðtÞ ¼ ðy1 ðtÞ, :::, ym ðtÞÞ ξðtÞ ¼ ðξ1 ðtÞ, :::, ξn ðtÞÞ denote decision variables (controls), outputs (states), and uncertain inputs (stochastic variables), respectively. The uncertain variables ξ(t) can describe model parameter and/or input uncertainties. We may have control and state constraints in the time horizon, given as umin e uðtÞ e umax ymin e yðtÞ e ymax In addition, at least some of the uncertain input variables ξ(t) are time-dependent, i.e., their stochastic distribution changes in time. To discretize this dynamic system, the time horizon is divided into N time intervals. Using the explicit Euler method, the simplest algorithm for discretizing differential equations, the time-discrete form of the above model equations yields yðti þ 1 Þ yðti Þ ¼ gðuðti Þ, yðti Þ, ξðti ÞÞ ti þ 1 ti
ð1Þ
for time points ti, i = 1, ..., N. In this way, the variable restrictions are to be satisfied at the discrete time points: umin e uðti Þ e umax
ði ¼ 1, :::, N Þ
ymin e yðti Þ e ymax
ði ¼ 1, :::, N Þ
The uncertain inputs will be described as piecewise constant random variables. Based on this discretization and taking the uncertainty into account, a dynamic nonlinear optimization 5694
dx.doi.org/10.1021/ie102426w |Ind. Eng. Chem. Res. 2011, 50, 5693–5704
Industrial & Engineering Chemistry Research
ARTICLE
problem under uncertainty can be formulated as a nonlinear optimization problem with chance constraints: Objective function: min ψð~uÞ þ γ1 Eðf1 ð~u, ~y, ~ ξÞÞ þ γ2 Varðf2 ð~u, ~y, ~ ξÞÞ
~u ∈ U
ð2Þ
Model equations: Gð~u, ~y, ~ ξÞ ¼ 0
ð3Þ
Single chance-constraints: P yimin e yi ðtÞ e yimax g Ri i ∈ I, t ∈ ft1 ,... , tN g
ð4Þ
Control constraints: hð~uÞ e 0
ð5Þ
In this problem formulation, ~u denotes the decision variables, which are piecewise constant for the time intervals between t1, ..., tN, i.e., ~u = {u(t1), ..., u(tN)}. The vectors ~y and ~ξ are defined similarly, i.e., ~y = {y(t1), ..., y(tN)} and ~ξ = {ξ(t1), ..., ξ(tN)}. The parameters γ1 and γ2 are weighting factors for balancing the expectation and variance terms. At each time point, output restrictions yimin e yi(t) e yimax with index i from a given index set I ⊂ {1, ..., m} are to be satisfied with a predefined probability level of R ∈ [1/2, 1]. The model equations described by eq 3 are derived from the difference equation, described by eq 1. We assume that the uncertain inputs have a multivariate normal distribution with an expectation μ and a covariance matrix Σ. The operators E and P are the expectation and the probability measure induced by the joint distribution of the stochastic inputs through the model equations described by eq 3. These operators are necessary, since the objective function and variable restrictions are stochastic. The meaning of the CCOPT problem can be explained as follows. The statistical moments in the objective function (eq 2) must be evaluated if they contain uncertain variables. Because of the nonlinear propagation of the uncertain variables through the model equations described by eq 3, the output variables are also uncertain. To satisfy the output restrictions, we use the chance constraints described by eq 4, i.e., the restrictions should be held with a predefined probability level. Since the controls are decision variables, the first term in the objective function and constraints, depending solely on controls, are deterministic. If the expectation and the probability values can be computed, this CCOPT problem becomes a nonlinear programming (NLP) problem, which can be solved by a NLP solver. However, note that the number of uncertain variables and the number of chance constraints will be proportional to the number of time intervals used for the discretization, which will lead to intensive computations in solving this NLP problem. We assume that ψ, f1, f2, G, and h are twice continuously differentiable on an open set S containing all solutions of eqs 35 and that the system described by eq 3 can be uniquely solved, with respect to ~y, for all (~u, ~y, ξ ~) ∈ S. Thus, we have det
D Gð~u, ~y, ~ ξÞ 6¼ 0 D~y
3. COMPUTING PROBABILITIES AND STATISTICAL MOMENTS In order to transform the CCOPT problem described by eqs 25 to a NLP, it is necessary to evaluate the expectation and variance term in eq 2, as well as the chance constraints described by eq 4. In addition, most NLP solvers require the derivative of the objective and constraints, with respect to the control variables u. This section describes how theses values and gradients can be obtained. 3.1. Back-Mapping Approach. The back-mapping approach28 presents a useful method to compute the probability values of the output chance constraints. The first step of this approach is to determine whether a monotonic relationship between the constrained output yi and an uncertain input ξj exists. A monotonic relationship between yi and ξj is called positive, if a strict increase of the value of yi leads to a strict increase in the value of ξj. This is denoted by yi v ξj. Similarly, a negative monotonic relation is denoted by yi V ξj. For ξ ∈ Rn and y ∈ Rm , ξj and yi denote the vectors ξ and y, excluding the component used for back-mapping; thus, if ξ = (ξ1, ..., ξn) and y = (y1, ..., ym), we have ξj ¼ ðξ1 , ... , ξj 1 , ξj þ 1 , ... , ξn Þ and
yi ¼ ðy1 , ... , yi 1 , yi þ 1 , ... , ym Þ
The chance constraints described by eq 4 then can be transformed by P yimin e yi ðtÞ e yimax ¼ P ξjmin ðu, yimin , ξj Þ e ξj ðtÞ e ξjmax ðu, yimax , ξj Þ In the case of a negative monotonic relation, i.e., yi V ξj, the transformation is as follows: P yimin e yi ðtÞ e yimax ¼ P ξjmin ðu, yimax , ξj Þ e ξj ðtÞ e ξjmax ðu, yimin , ξj Þ Because of the monotonicity and the solvability assumption described by eq 6, the existence of ξj( 3 ) can be ensured. To calculate the value of these functions, one must solve the model equations with respect to ξj and yi while the value of yi is set to yimin or yimax, respectively. The corresponding values of ξj are the function values of ξjmin(u, yimin, ξj) and ξjmax(u, yimax, ξj). To determine these values, it is necessary to solve the discretized (possibly large-scale) nonlinear equations described by eq 3. We use a damped Newton’s method21 to perform this task. Based on the values of ξjmin and ξjmax, we can compute the probability values of output chance constraints in the space of the uncertain inputs, since the distribution of ξ is given. Here, we derive the computation steps for a positive monotonic relation only (the case of negative monotony is similar). The probability value in the chance constraints can be computed by the following multivariate integration: P ξmin ðu, yimin , ξj Þ e ξj e ξmax ðu, yimax , ξj Þ Z ¥ Z ¥ Z ξmax ðu, yimax , ξj Þ ¼ φðξÞ dξj dξj 333 ¥ ¥ ξmin ðu, yimin , ξj Þ
ð6Þ
for all (~u,~y,ξ ~) ∈ S and the validity of all statements of the implicit function theorem.37 For an easier notation, the tildes will be omitted throughout the remainder of the paper. 5695
ð7Þ
dx.doi.org/10.1021/ie102426w |Ind. Eng. Chem. Res. 2011, 50, 5693–5704
Industrial & Engineering Chemistry Research where
h
1 φðξÞ ¼ pffiffiffiffiffiffil exp ðξ μÞT Σ1 ðξ μÞ 2π jΣj
ARTICLE
i
is the density function of the multivariate normally distributed vector ξ with expectation μ and covariance matrix Σ. In order to use sparse-grid methods (which will be introduced later), the uncertain variables must be decorrelated (i.e., transformed such that their covariances are zero), using the Cholesky transformation. The inner integral in eq 7 then can be evaluated separately. When using gradient-based NLP methods, gradients of the chance constraints, with respect to the control variables, are needed. To calculate these values, it is necessary to determine the derivatives (∂/∂u)ξjmin(u,yimin,ξj) and (∂/∂u)ξjmax(u,yimax,ξj). According to the implicit function theorem,37 the gradients can be obtained as part of the solution of the following equations 0 1 D y ðu, y , ξ Þ imin j C D B Du i C ¼ D Gðu, y, ξÞ Gðu, y, ξÞB @ A D Dðyi , ξj Þ Du ξ ðu, yimin , ξj Þ Du j The gradient values (∂/∂u)ξj(u,yimax,ξj) can be calculated in the same way. Using the results obtained above, the gradients of the probabilities, with respect to the control variables, can be computed by D P ξmin ðu, yimin , ξj Þ e ξj e ξmax ðu, yimax , ξj Þ Du Z Z ¥ Z ξmax ðu, yimax , ξj Þ D ¥ ¼ φðξÞ dξj dξj Du ¥ 3 3 3 ¥ ξmin ðu, yimin , ξj Þ Z ¥ Z ¥ D ¼ φðξj , ξmax Þ ξmax ðu, yimax , ξj Þ 333 Du ¥ ¥ D φðξj , ξmin Þ ξmin ðu, yimin , ξj Þ dξj Du where φ(ξj,ξj) = φ(ξ0, ..., ξj1,ξj,ξjþ1, ..., ξn). The dimension of integration reduces by one, in comparison to the evaluation of the probability values, because of the special structure of the parameter integral (note that φ does not depend on u). 3.2. Statistical Moments and Gradients. As defined in the objective function described by eq 2, expected values and variances need to be computed. In contrast to the evaluation of probabilities, statistical moments can be obtained in a straightforward way, using the mathematical definition, i.e., Z ¥ Z ¥ f1 ðu, y, ξÞφðξÞ dξ ð8Þ Eðf1 ðu, y, ξÞÞ ¼ 333 ¥
and
¥
2 Var f2 ðu, y, ξÞ ¼ E ðf2 ðu, y, ξÞÞ2 E f2 ðu, y, ξÞ Z ¼
Z
¥ ¥
333
¥ ¥
Z
¥
¥
333
¥ ¥
2 f2 ðu, y, ξÞφðξÞ dξ
¥
ð9Þ
The integrals in eqs 8 and 9 can be calculated using sparse-grid methods introduced in the next section. If the functions f1( 3 ) or f2( 3 ) depend on the uncertain outputs y, these must be evaluated at every grid point of the integration rule by solving the model
¥
D þf1y ðu, y, ξÞ yðu, ξÞÞφðξÞ dξ Du and D Var f2 ðu, y, ξÞ Du D 2 D E f2 ðu, y, ξÞ ¼ E ðf2 ðu, y, ξÞÞ2 Du Z Du Z ¥ D ¥ ¼ f2 ðu, y, ξÞ2 φðξÞ dξ Du ¥ 3 3 3 ¥ Z ¥ 2 Z ¥ D f ðu, y, ξÞφðξÞ dξ 2 333 Du Z ¥ ¥ Z ¥ ¥ ¼ 2f2 ðu, y, ξÞ f2u ðu, y, ξÞ 333 ¥ ¥ D þ f2y ðu, y, ξÞ yðu, ξÞ φðξÞ dξ Du Z Z ¥ ¥ 2E f2 ðu, y, ξÞ f2u ðu, y, ξÞ 333 ¥ ¥ D þ f2y ðu, y, ξÞ yðu, ξÞ φðξÞ dξ Du where f1u, f2u and f1y, f2y are the partial derivatives of f1 or f2, with respect to u and y. The derivative (∂/∂u)y(u,ξ) can be obtained by solving the following system of equations D D D Gðu, y, ξÞ yðu, ξÞ ¼ Gðu, y, ξÞ Dy Du Du
4. NUMERICAL INTEGRATION USING SPARSE GRIDS As discussed above, to compute the probability values and gradients in the chance constraints, as well as statistical moments and gradients, we must evaluate multidimensional integrals in the form Z bn Z b1 Fðξ1 , ..., ξn Þ dξn 3 3 3 dξ1 I ¼ 333 a1
f2 ðu, y, ξÞ2 φðξÞ dξ Z
equations described by eq 3 for y. Their gradients can be computed in a similar way as for probabilities, i.e., Z Z ¥ D D ¥ Eðf1 ðu, y, ξÞÞ ¼ f1 ðu, y, ξÞφðξÞ dξ 333 Du ZDu¥ ¥ Z ¥ ¥ D f1 ðu, y, ξÞφðξÞ dξ ¼ 333 ¥ Du Z ¥ Z ¥ ¥ ¼ ðf1u ðu, y, ξÞ 333
an
where ¥ e aj < bj e ¥ for i = 1, ..., n and F is continuous. For systems with a large number of uncertain variables, the computation of such integrals is very time-consuming. Using a numerical scheme, such integrals can be approximated through the use of cubature formulas: ^I ¼
N
∑ ωi Fðξi1 ,..., ξinÞ i¼1
where N is the number of grid points, ωi are weighting factors, and ξi1, ..., ξin denote grid points aj e ξij e bj for j = 1, ..., n and 5696
dx.doi.org/10.1021/ie102426w |Ind. Eng. Chem. Res. 2011, 50, 5693–5704
Industrial & Engineering Chemistry Research
ARTICLE
i = 1, ..., N, respectively. To illustrate this formula, we consider the R integral I = baF(ξ) dξ, where ¥ < a < b < ¥. One method to evaluate this integral is to independently sample N points ξi from the interval [a, b] and to approximate the integral with the average of the function values at the grid points ξi, i.e., i ^I = (1/N) ∑N i=1F(ξ ). This method is known as the Monte Carlo integration and is also used in the sample average approximation.38 Now we consider integrals with standard Gaussian weight and in the domain of integration R or Rn . For the one-dimensional case, these are Z ¥ 1 1 2 FðξÞpffiffiffiffiffiffi exp ξ dξ ð10Þ 2 2π ¥ and for the n-dimensional case, Z ¥ Z ¥ 1 Fðξ1 , ..., ξn Þpffiffiffiffiffiffin 333 2π ¥ ¥
1 exp ξ1 2 þ ... þ ξn 2 dξn 3 3 3 dξ1 2
∑
ð11Þ
Let Vj(N) = denote a one-dimensional integration rule for the integral described by eq 10 with N points ξij and weights ωij. This can be used to construct a rule for the integral described by eq 11, using the tensor product approach. The idea behind this approach is to employ the one-dimensional rule for every single integral in eq 11. More formally, the product (or full-grid) rule39 over the n-dimensional integral is defined by N1
Nk
∑ 3 3 3 i ∑¼ 1 ωi1 3 3 3 ωin Fðξi1 , ..., ξin Þ i ¼1 1
1
n
n
1
k
ð12Þ A graphical explanation of the construction of tensor product grids is shown in Figure 1. The points on the left-hand-side and below the square represent the one-dimensional integration nodes, while the grid points inside the square are the derived points for a two-dimensional rule. The number of grid points of the product rule is N1 3 3 3 Nn. Assuming that N1 = 3 3 3 = Nn = N, the number of points in the product rule is Nn (i.e., the number of grid points grows exponentially with the dimension of the integral). This behavior is called the “curse of dimension”, which leads to intensive computation. This is a problem that exists in all numerical methods that have been used in the solution of CCOPT problems so far. In 1963, Smolyak31 proposed the sparse-grid integration method to overcome this difficulty. The idea is to construct multidimensional rules, using tensor products of differences of quadrature rules, in contrast to full-grid integration, where only quadrature rules are used. It can be demonstrated that the sparsegrid method results in cubature rules with fewer grid points but the same polynomial accuracy. The basis of sparse grids is a sequence of one-dimensional quadrature rules Vi(Ni) with increasing polynomial accuracy, i = 1, ..., n. Using these integration rules, the differences of quadrature rules can be defined by V0 ðN0 Þ ¼ 0,
Δ0 ¼ 0,
integral value. This series is used to derive the multidimensional cubature formula. As in the full-grid case, the series is combined using the tensor-product approach (see eq 12), leading to 0 e i1 , ..., in < ¥
i i ∑N i=1ωiF(ξj)
V1 ðN1 Þ X 3 3 3 X Vn ðNn Þ ¼
Figure 1. Construction of tensor product grids.
Δi ¼ Vi ðNi Þ Vi 1 ðNi 1 Þ
It is easy to see that Vn(Nn) = ∑ni=1 Δi, because of the telescopic sum. A second observation is that, for continuous integrands, limnf¥∑ni=1 Δi = I; i.e., the series converges toward the exact
Δi1 X 3 3 3 X Δin
This series cannot be evaluated practically. Therefore, a truncation is used, such that the sparse-grid integration rule has the form Aðn, qÞ ¼
∑
0 e i1 þ ... þ in e q
Δi1 X 3 3 3 X Δin
ð13Þ
It is possible to express A(n,q) directly in terms of the underlying quadrature rules, as shown by Wasilkowski and Wozniakowski.40 Software for generating sparse grids is available (for example, see the work of Heiss and Wintschel41). A comparison of a full grid and a sparse grid for the same polynomial accuracy in the twodimensional case is shown schematically in Figure 2 and described later in section 5.2. Sparse-grid rules generated through eq 13 generally contain negative weights. Since these negative weights affect the numerical stability of the rules, not all rules generated by eq 13 are convenient for integration. In fact, one must determine grid points and weights where the norm of the negative weights is comparatively small. Such grids have already been evaluated for unweighted integrals over hypercubes and integrals with Gaussian weights. Novak and Ritter30 have shown that, for integration over a hypercube, such formulas have a polynomial accuracy of 2n 1 if the underlying one-dimensional rules are Gaussian quadrature rules. For the same class of integrals, it was shown that the number of grid points to obtain the same polynomial accuracy grows only polynomially with the dimension of the integral, in contrast to an exponential growth for the product rule.42 The disadvantage of Gaussian quadrature rules in constructing sparse grids is that different tensor product sets only share a few points, which leads to more points in the cubature rule. Extensions proposed by Genz and Keister43 have overcome this drawback. In our case study (see section 5), for example, the product rule needs 96 grid points, while the sparse-grid rule needs only 2021 points to achieve the same integration accuracy. A comparison of the necessary points for product rules and sparse grids based on standard Gaussian rules, as well as on the extensions proposed by Genz and Keister,43 was given by Heiss and Winschel.41 More performance results for sparse grids were obtained by Gerstner and Griebel.30 The advantages of sparse-grid integration techniques can be summarized as follows: • Multidimensional integrals can be accurately computed using only a few grid points. 5697
dx.doi.org/10.1021/ie102426w |Ind. Eng. Chem. Res. 2011, 50, 5693–5704
Industrial & Engineering Chemistry Research
ARTICLE
Figure 2. Comparison between a full grid (left) and a sparse grid (right).
Figure 4. Buffer tank.
Figure 3. General structure of computation.
•
Integration of polynomials can be evaluated exactly (depending on the order). • Grid points and weights only depend on the region of integration, not on the integrand itself. • They can be easily implemented when using predefined grid points and weights. • Existing routines for full grids can be used with no or only a few modifications. Note that the computation of these integrals is the most time-consuming operation in the solution of CCOPT problems. The introduction of sparse grids reduces the number of grid points and, hence, leads to a significant reduction of computation time. Our computational framework, as shown in Figure 3, has a sequential structure and consists of a nonlinear programming (NLP) solver, a method for multidimensional cubature, and a nonlinear equation solver. IpOpt44 is used as an NLP solver, since it can handle infeasible starting points. We use GotoBLAS245 as basic linear algebra subprograms (BLAS) implementation. In addition, we use PARDISO4648 to solve large systems of linear equations. For the multidimensional integration, the sparse grids generated by Heiss and Winschel41 are used. Functions from GSL49 are used to solve nonlinear and linear equations. Using the starting point u0 the NLP solver requests values and gradients of expectation and probabilities for given values of controls. These values are calculated in the second stage, using
sparse-grid integration. With controls u from the first stage and the realizations of the uncertain vector ξ from the second stage, the model equations described by eq 3 are solved in the third stage to evaluate the integrals. The third stage returns values of output variables y and bounds of the integrals ξmin and ξmax, as well as gradient values required by the second stage. The procedure of solving equations is repeated at all grid points until the second stage has all the necessary information to evaluate the integrals. The results of this evaluation are then fed back to the NLP solver, which generates a new iteration of controls. Note that this framework is similar to the computation structure used by Xie et al.22 The novelty of this work is the employment of the sparse-grid integration method in the second stage, where, previously, a full-grid integration method was used.
5. CASE STUDY AND REMARKS 5.1. Case Study: Buffer Tank. We consider a buffer tank, as shown in Figure 4, with a time-discrete process model. The inflow is a mixture of two components of A and B. Both the flow rate and concentration are considered as time-dependent uncertain variables. Under these uncertainties, the volume of the liquid and the concentration in the tank are assumed to be restricted. The outflow ut is considered to be a control variable, which is taken as being a piecewise constant. The goal is to minimize the fluctuations of the outflow and the expected deviation of the tank concentration from a predefined level, and meanwhile, the volume and concentration restrictions should be ensured with a certain probability level at each time point. We use a moving horizon approach to the optimal 5698
dx.doi.org/10.1021/ie102426w |Ind. Eng. Chem. Res. 2011, 50, 5693–5704
Industrial & Engineering Chemistry Research
ARTICLE
operation of this process. The following optimal control problem is considered: s1
EððC^t þ i Cref Þ2 Þ ∑ i¼0
min
ΔuT Δu þ γ
s:t:
Vt ¼ Vt 1 þ qt ut qt Ct ¼ Ct 1 þ ðC0, t Ct 1 Þ Vt PðVmin e Vt e Vmax Þ g RV PðCmin e Ct e Cmax Þ g RC t ∈ f^t , ..., ^t þ s 1g 0 e ui e 30 V ð0Þ ¼ V0 , Cð0Þ ¼ C0
where ^t is the actual time interval; Δu = (u^t u(^t 1), ..., u(^t þ s 1) u(^t þ s 2); s is the length of the prediction horizon; Cref is the desired tank concentration; V0 and C0 are the initial tank holdup and tank concentration, respectively; Vt and Ct denote the tank holdup and tank concentration in time interval t (uncertain output variables), respectively; qt and C0,t are the values of the inlet flow and inlet concentration in time interval t (uncertain input variables); ut is the outlet flow for the next s time Table 1. Parameters Used in the Implementation parameter s Vmin Vmax
value 3 130 m3 170 m3
Cmax
49 kmol/m3 51 kmol/m3
RV
0.8
RC
0.8
Cmin
Δt
1 min
V0
160 m3
C0 γ
100
50 kmol/m3
intervals (control variables); Vmin and Vmax respectively represent the lower and upper bounds of tank holdup; Cmin and Cmax respectively represent the lower and upper bounds of tank concentration; and RV and RC represent the desired probability levels for the constraints. The parameters used in the implementation are listed in Table 1. The term ΔuTΔu in the objective function describes the 2 fluctuation in the controls, whereas ∑s1 i=0 E((C^tþiCref) ) describes the deviation of the tank concentration from the desired Cref value. The constant γ is a weighting factor for balancing the fluctuations of the outflow and the concentration error to the specified value. The prediction horizon s has a strong influence on the implementation. Normally, one would choose the horizon as large as possible to gain stability of the control system. In the case of chance-constrained dynamic optimization, a long time horizon will lead to the following problems. First, the variance of the predicted values of the state variables depends on the length of the prediction horizon, i.e., for the predictions V^tþk and V^tþk (k = 1, 2, ...,), the relations Var(V^tþkþ1) > Var(V^tþk) and Var(C^tþkþ1) > Var(C^tþk) hold. Because of this increasing variance, an index of 1 e pmax < ¥ will exist, such that P(Vmin e V^tþkVmax) < RV or P(Cmin e V^tþkCmax) < RC for all k > pmax. In this situation, the CCOPT problem will become infeasible. In our example, this leads to an upper bound of pmax = 15. Second, a larger prediction horizon leads to a higher computation time, since integrals until dimension NumUncertain s, where NumUncertain is the number of uncertain variables at a distinct time interval, must be evaluated. Since we use sparse-grid integration techniques, this is not a problem for smaller values of s; however, it becomes time-consuming for values s > 20. A time horizon of s = 3 is chosen, leading to a good tradeoff between computation time and feasibility. For the purpose of numerical experimentation, the following assumptions are made: (1) The inflow rate is normally distributed with timedependent expected value E(qt) and constant variance Var(q). Realizations of the inflow rate are shown in Figure 5. (2) The inflow concentration is normally distributed with time-dependent expected value E(C0,t) and constant
Figure 5. Realizations of the inflow rate at different time intervals. 5699
dx.doi.org/10.1021/ie102426w |Ind. Eng. Chem. Res. 2011, 50, 5693–5704
Industrial & Engineering Chemistry Research
ARTICLE
Figure 6. Realizations of the inflow concentration at different time intervals.
Figure 7. Realizations of tank volume.
variance Var(C0). Realizations of the inflow concentration are shown in Figure 6. (3) Inflow rate and inflow concentration are assumed to be independent. The numerical computations have been performed on a computer with an Intel Core I-7 980x hexa core processor with 3.33 GHz per core (only one core was used) and 6 GB RAM. To test the approach, 100 trajectories were simulated; 20 of these trajectories of tank concentration Ct profiles and tank volume Vt profiles are plotted in Figures 7 and 8, respectively. Because of the random realizations of the uncertain variables, there is no unique optimal trajectory. For one of these trajectories (the results for other trajectories are similar), the fluctuation in the outlet stream, as well as the deviation of Ct from 50 kmol/m3 for different values of the weighting parameter γ, but the same realizations of the uncertain input variables, are plotted in Figure 9. It can be seen that, especially for γ = 100 (which was used throughout the case
study), both objectives (minimizing fluctuations in the outlet stream, as well as minimizing the deviation of Ct from the predefined value) are achieved. Generally, an increase of γ leads to an increase in fluctuations in the outlet stream, but, at the same time, to a decrease of the deviation from 50 kmol/m3. There are some exceptions to this rule, because of the constraints (fluctuations or deviations must be increased in order to satisfy chance constraints or control constraints; e.g., for γ = 10 000, the deviation between the time steps 9 and 14 is increased, because of the constraint ut g 0). Detailed information on constraint violations can be found in Figure 10. It can be seen that the specified probability levels (RV = RC = 0.8) can be ensured by the optimal control. The most constraint violations occur in the second time interval. Here, 19 out of 100 profiles (19%) violate the criterion 130 e V e 170. It can be seen that some time points are more prone to constraint violations than others. In our example, these are time points 2, 3, 5700
dx.doi.org/10.1021/ie102426w |Ind. Eng. Chem. Res. 2011, 50, 5693–5704
Industrial & Engineering Chemistry Research
ARTICLE
Figure 8. Realizations of tank concentration.
Figure 9. Fluctuations in the outlet stream and deviation of tank concentration from 50 kmol/m3 for different values of γ.
Figure 10. Number of constraint violations in the distinct time intervals. 5701
dx.doi.org/10.1021/ie102426w |Ind. Eng. Chem. Res. 2011, 50, 5693–5704
Industrial & Engineering Chemistry Research
ARTICLE
Figure 11. Comparison of the number of necessary grid points.
and 1012. Constraint violations also occur for the concentrations in the 16th time interval. For comparison, we solved the MPC problem using the deterministic optimization problem for the same realizations of the uncertain variables. In contrast to the chance-constrained approach, 40 of 100 profiles (40%) violated Vmax at the second time point and 42 profiles (42%) violated Vmin at the 11th time point. 5.2. Comparison between Full-Grid and Sparse-Grid Integration. Computation of the integrations is the most timeconsuming task in solving CCOPT problems. This is especially true in the dynamic case, since the dimension of the integrals depends on the length of the prediction horizon. More clearly, for the calculation of expectations and variances in the objective function and their gradients, the dimension of the integrals is given by s NumUncertain, where s is the length of the prediction horizon and NumUncertain is the number of uncertain variables in a distinct time interval. In our case, NumUncertain = 2 (at every time interval, the inflow volume and inflow concentration are uncertain). We chose values between 2 and 5 intervals for the prediction horizon, leading to 4- to 10-dimensional integrals. The dimension of integrals necessary to evaluate probabilities and their gradients depends on the constrained output. If the constrained output is a function of k uncertain variables, then k1 integrals are needed (this is due to the fact that the inner integral in eq 7 is treated separately); e.g., V1 depends only on q1 and, therefore, no integration is necessary. The concentration C2 depends on q1, q2, C0,1, and C0,1; therefore, a three-dimensional integral is to be computed. In Figure 11, a comparison of the necessary number of grid points for different integral dimensions and different integration methods is shown. Under some assumptions, an estimation for the computation time for optimization over a horizon of length s can be obtained by t ¼
9 NumIter NumGridPointsðsÞ TimeSol 4
for full grids and by t ¼ 4 NumIter NumGridPointsðsÞ TimeSol
ð14Þ
for sparse grids. Here, NumIter is the number of iterations of the NLP solver, NumGridPoints(s) is the number of grid points needed to evaluate the integral with the highest dimension (depending on s), and TimeSol is the time required to solve the model equations at a given set of parameters (depending on the number of the model equations). The last two factors approximate the computation time for evaluating the expectation term in the objective function. For every point in the grid, the model equations must be solved to obtain the state variables. In the case study, this means the solution of 96 sets of equations for full grids or 2021 sets of equations for sparse grids. The factors 9/4 (full grids) and 4 (sparse grids) result from the integration rules used in the case study. The variable NumIter can be influenced by the choice of the NLP solver; NumGridPoints(s) can be influenced by the choice of the integration method (e.g., full-grid, sparse-grid, Monte Carlo, Quasi-Monte Carlo); and TimeSol can be influenced by the choice of the nonlinear equation solver (e.g., NewtonRaphson, damped Newton, Powell’s hybrid method). Good results can be obtained when improving NumGridPoints(s) and, to a lower degree, TimeSol. However, improving NumIter is difficult, because it is hard to predict how different optimization algorithms will perform. Nevertheless, some improvement in NumIter is possible by relaxing the optimality conditions. Following the estimation in the case study, one would expect an increase of performance by a factor of ca. 150 when using a sparse-grid method instead of a full-grid method in the case of a prediction horizon s = 3. Indeed, simulation results have shown that solving the optimization problem for a horizon of three time intervals takes 6070 s, using the full-grid method and 46 s using the sparse-grid method. In Table 2, a comparison of computation time when using full grids and sparse grids, as well as predicted and achieved performance improvements for different lengths of s, are listed. Numerical experiments were not completed for full grids with a prediction horizon of s = 5, since the calculation required over 24 h. 5.3. Potential Extensions. In the case study, a simple process is considered. The uncertain variables are assumed to be Gaussian distributed and a monotonicity relation between the constrained outputs and an uncertain input exists. In addition, a 5702
dx.doi.org/10.1021/ie102426w |Ind. Eng. Chem. Res. 2011, 50, 5693–5704
Industrial & Engineering Chemistry Research
ARTICLE
Table 2. Computation Time for Different Integration Methods and Prediction Horizons, Using a PC with an Intel Core I-7 980x Hexa Core Processor with 3.33 GHz per Core (Only One Core Was Used) and 6 GB RAM Performance Computation Time prediction horizon
full-grid
sparse-grid
2