Experimental Design for Batch-to-Batch Optimization under Model

May 22, 2019 - In addition, while in our previous study the next gradient was solely determined ... Following the identification problem (eq 1) and us...
0 downloads 0 Views 1MB Size
Article pubs.acs.org/IECR

Cite This: Ind. Eng. Chem. Res. 2019, 58, 13599−13610

Experimental Design for Batch-to-Batch Optimization under ModelPlant Mismatch Rubin Hille and Hector M. Budman*

Downloaded via GUILFORD COLG on August 1, 2019 at 17:49:05 (UTC). See https://pubs.acs.org/sharingguidelines for options on how to legitimately share published articles.

Department of Chemical Engineering, University of Waterloo, Waterloo, ON N2L 3G1, Canada ABSTRACT: Model errors in model-based optimization procedures can result in suboptimal policies. In that regard, structural mismatch is of particular concern since it results in inaccurate model predictions even when a large amount of data is available for model calibration. The method of simultaneous identification and optimization, proposed in our previous work, addresses the structural model mismatch by adapting the model parameters and matching the predicted to measured gradients thus ensuring progressive convergence to the actual process optimum. In the former implementation of this approach, the gradients have been corrected only at the most recent operating point. To achieve better prediction accuracy of the updated model and to obtain a faster convergence to the optimum, we therefore propose to use cost measurements from previous batch experiments in combination with additional optimally designed new experiments. The advantages of the presented approach versus previous versions of the algorithm are illustrated using two simulated run-to-run optimization case studies.



INTRODUCTION Mathematical models are an essential tool for model-based optimization of process systems. In general, mathematical models include data-driven models, consisting of empirical functions that are trained with data or first principle models that are based on balances of physical quantities such as mass or energy.1 Although first-principles models require significant process knowledge and longer development time, they offer the benefit of superior extrapolation abilities for operating conditions that were not used for model calibration.2 This is particularly critical for batch-to-batch procedures where the input to a new batch run may correspond to operating conditions that were not encountered in earlier batches. When considering model-based optimization, model mismatch (uncertainty) is a key obstacle for convergence to the process optimum. When considering model error, it is inherently assumed that there exists a perfect (true) model that can describe the system accurately and that such a perfect model involves a particular mathematical structure and a specific set of model parameters. Then, two types of model mismatch may occur: (i) parametric uncertainty where the estimated parameters are not equal to the values of the perfect model due to noisy or insufficient data and (ii) structural uncertainty that arises due to simplifications and assumptions during the development of first-principles models. For example, incomplete knowledge about the reaction network describing the metabolism of a microorganism will result in structural mismatch where some reactions may be missed or their functional dependencies with respect to concentrations may be inaccurate. While increasing the amount of data can reduce parametric uncertainty, it cannot prevent prediction inaccuracies due to structural mismatch. © 2019 American Chemical Society

A key element for the convergence of a batch-to-batch procedure is that the predicted gradients of the cost function with respect to the decision variables will match the corresponding measured gradients thus ensuring that the cost function is consistently improved from one batch to the next. However, due to a combination of measurement noise and structural mismatch, the parameter values that minimize the errors between measured and predicted process outputs (identification) may not be necessarily equal to the values that result in a correct prediction of the gradients of the cost function and constraints (optimization). Thus, batch-to-batch optimization procedures that involve repeated iterations, each consisting of model identification step followed by an optimization step, may not converge to the optimum due to the often conflicting results between output and gradient prediction.3 The research on run-to-run optimization in the presence of a structural model error can be grouped into three different approaches: (i) Modifier Adaptation, (ii) Direct Input Adaptation, and (iii) Simultaneous Identification and Optimization. When the model is solely used for optimization purposes and the accuracy of the model is not of interest, the Modifier Adaptation4 has been used where the cost function and the constraints are modified by adding additional terms to compensate for the effect of model error.5−7 Special Issue: Dominique Bonvin Festschrift Received: Revised: Accepted: Published: 13599

January 7, 2019 April 21, 2019 May 22, 2019 May 22, 2019 DOI: 10.1021/acs.iecr.9b00105 Ind. Eng. Chem. Res. 2019, 58, 13599−13610

Article

Industrial & Engineering Chemistry Research

most recent one is used when correcting the predicted gradients. Then, future optimal experiments, necessary for obtaining better gradient estimates, are determined based on an optimal design of experiments approach. It is shown that the presented approach leads to the following improvements:

A second approach that focuses on achieving optimal operation but does not aim at identifying a model involves transforming the iterative optimization problem into a feedback control problem to implicitly optimize the process. In this approach, functions of the measured variables are controlled at prespecified constant values that result in optimal operation of the process.8,9 However, for some problems, both the accuracy of the model as well as convergence to a global optimum are of interest. For example, beyond its use for optimization, an accurate model in the neighborhood of the optimum can be used for designing soft sensors, for performing robust optimization, or for calculating a set point trajectory for closed loop control. For such problems, an approach referred to as simultaneous identification and optimization10 has been previously developed by our group where all or a subset of model parameters are updated at each iteration of the batch-tobatch procedure to ensure that the model outputs and the gradients of cost function and constraints fit the respective measured values. In our previous work, only the most recent gradient measurements were used for the gradient correction, thus not making use of potentially valuable gradient information from earlier batches. In addition, while in our previous study the next gradient was solely determined by a fixed input perturbation, there may be a motivation to perform specifically designed experiments that may be more informative. Optimal Design of Experiments (DoE) methods were first derived for data-driven models to generate experimental conditions that will result in smaller confidence intervals of the estimated parameters.11 Subsequently, these method have also been used for reducing confidence regions in estimated parameters of nonlinear first-principles models.12 Furthermore, experimental design has been applied in optimal control problems such as in the design of robust Nonlinear Model Predictive Controllers in order to obtain inputs that maximize the information in the feedback while ensuring the satisfaction of the process constraints.13 In the context of batch process optimization, DoE has been used for selecting sampling times in dynamic experiments14 and for finding experiments that will provide information about the most profitable operation.15 A key aspect of these DoE methods is that they are generally based on the minimization of a criterion associated with the parameter covariance matrix, which is typically approximated by the inverse of the Fisher Information Matrix (FIM).16 In the current work, DoE is used to determine future experiments in the batch-to-batch procedure that are informative about gradients and consequently may increase parameter precision when fitting the measured gradients. Toward this goal, a covariance matrix that is derived from the parametric sensitivities of the gradients of the cost function and constraints can be used to conduct an optimal experimental design. The objective of this experimental design is to determine, at each iteration of the run-to-run procedure, new experiments that provide valuable gradient information in addition to the past experiments while progressively approaching the optimum. Thus, the overall goal is to converge reliably to an economic optimum to which end a better fitting of the gradients is expected to facilitate the convergence. In summary, the current work improves the originally proposed simultaneous identification and optimization procedure in two aspects. First, information about costs and constraints gathered from past experiments in addition to the

(i) The effect of gradient uncertainty is significantly reduced when cost measurements from past batch runs in addition to the most recent one are considered. (ii) Convergence in the neighborhood of the optimum is improved due to the use of optimally designed experiments at each iteration for the next batch (iii) The improved parameter precision leads to a better prediction of the cost function near the process optimum. The paper is organized as follows. The first section reviews the original version of the simultaneous identification and optimization algorithm involving three sequential steps: identification, gradient matching, and optimization. Then in the next section, the key novel contribution involving the reformulation of the gradient matching step using optimal DoE is presented. In the following section, results from run-to-run optimization studies using simulated case studies of a simple synthetic batch process and a more complex CHO cell cultivation process are presented followed by the conclusions in the last section.



REVIEW OF THE SIMULTANEOUS IDENTIFICATION AND OPTIMIZATION METHODOLOGY The method for simultaneous identification and optimization10,17 is briefly reviewed since it presents the basis of the DoE methodology proposed in the current study. The method involves three sequential steps that are carried out at each batch (iteration): (1) identification step where the model outputs are fitted to the measured outputs by adjusting model parameters, (2) a gradient correction step where the modeled gradients are fitted to the measured gradients by performing additional corrections to the model parameters and outputs calculated in step 1, and (3) a model-based optimization step based on the model obtained in step 2 to determine the input for the next batch run. However, before outlining the batch-tobatch optimization methodology in more detail, we first illustrate the effect of model-plant mismatch by means of the two-step approach.18 Two-Step Approach for Batch Optimization. When estimating model parameters from batch process data, a typical model fitting objective is the sum of squared errors (SSE) between process outputs and model predictions which can be formulated as follows: ij ykp, j (ti) − yk , j (θ , ti) yz zz zz ∑ ∑ jjjjj z σ j i=1 j=1 k { nt

θk = arg min θ s.t.

ny

2

xk̇ = f (xk , uk , θ) xk(0) = xk ,0 yk = h(xk) θlb ≤ θ ≤ θub

13600

(1) DOI: 10.1021/acs.iecr.9b00105 Ind. Eng. Chem. Res. 2019, 58, 13599−13610

Article

Industrial & Engineering Chemistry Research

parameters with respect to the values calculated in the first step thus maintaining desirable model accuracy. To quantify the parametric sensitivity of gradients, Hille and Budman22 proposed the following parametric sensitivity measure:

where given batch index k, ykp ∈ nt × ny are the plant measurements, yk ∈ nt × ny the model outputs, and xk ∈ nt × nx the corresponding model states. Furthermore, θ ∈ nθ are the set of model parameters, while uk ∈ nu represents the vector of batch specific inputs. Following the identification problem (eq 1) and using the estimated parameter values θk, we are interested in finding the optimal input for the following batch run. This can be achieved by performing a model-based optimization as follows:

nu

ϕS∇(u , θ) =

θ) +

∑∑

sij∇ g (u , θ)

i=1 j=1

(5)

s∇ϕ ij

where are scaled parametric sensitivities of the gradients of the cost function with respect to each of the decision variables:

u

sij∇ ϕ =

s.t. x ̇ = f (x , u , θk) x(0) = x0

∂(∇ϕi) θj ∂θj

∇ϕi

(6)

while s∇g ij are the scaled parametric sensitivities of the constraint gradients:

y = h(x) g (y(θk), u) ≤ 0 u ≤u≤u

∑∑

sij∇ ϕ(u ,

i=1 j=1

uk + 1 = arg min ϕ(y(θk), u)

L

nu × ng nθ



U

sij∇ g =

(2)

where uk+1 represents the optimal input minimizing the cost ϕ ∈ , while satisfying the process-dependent constraints g ∈ ng and bounds on the input space uL and uU. Following the two-step approach18 by repeating steps (eqs 1 and 2) and given the necessary model-adequacy,19,20 one can achieve convergence to the process optimum over the course of several batch runs. However, in practice, model-adequacy cannot be guaranteed due to structural model-plant mismatch and parametric uncertainty. As a result, operating policies that are solely based on the two-step approach may fail to deliver optimal process performance. This is mainly due to the fact that, in the presence of model-plant mismatch, the gradients of ∂g ∂ϕ the cost ∇ϕ = ∂u and constraints ∇g = ∂u with respect to the decision variables are not predicted correctly: ∇ϕ ≠ ∇ϕ p

(3)

∇g ≠ ∇g p

(4)

∂(∇gi) θj ∂θj

∇g i

(7)

Large gradient sensitivities in eq 5 imply that smaller parameter deviations, from the ones obtained in the identification step, will be required for matching of the gradients. Smaller deviations between the parameter values required for fitting the outputs and the parameter values required for matching gradients go in hand with fewer oscillations and thus a smoother convergence of the run-torun optimization procedure. Note that the objective described in eq 5 is a local parametric sensitivity measure. In principle, a global sensitivity measure could be obtained from nonlinear simulations over a large range of operating conditions.23 However, global sensitivities require some a priori knowledge of the range of values of each parameter, statistical assumptions about the parameter estimates, and an adequate model. In this study, since model structure error is considered, it is desired to avoid using the available model over a wide range of conditions since a large error is to be expected. Hence, a local parametric sensitivity measure is utilized based on the operating conditions of the last batch (last iteration). Then, given a standard model fitting objective, i.e., ϕSSE(u, 2 p n n i yk , j (ti) − yk , j (θ , ti) y zz , an identification objective θ) = ∑i =t 1 ∑ j =y 1 jjj z σj k { that combines the minimization of the sum of squares objective (eq 1) and the maximization of the parametric sensitivity measure in eq 5 is defined as follows:

where the superscript p denotes the measured quantities. These circumstances necessitate some sort of gradient correction to enable a satisfactory convergence to the process optimum. For that reason, Mandur21 proposed an additional gradient correction step to reduce the error between predicted and measured gradients by precisely adjusting the model parameters. To facilitate the effectiveness of this crucial gradient correction step, a set-based identification method exploits the fact that we can find model output trajectories within the set-based bounds that allow for an improved gradient correction. Identification Using Set-Based Bounds. In contrast to standard identification techniques where only model fitting is desired (eq 1), the goal in simultaneous identification and model-based optimization is to find parameter values which provide both fitting of model outputs as well as of the gradients of the cost function and constraints. Toward this goal, since in the methodology the initial identification step is followed by a second step where the modeled gradients are fitted to the measured ones, it is desirable to identify parameter values in the first step that provide high sensitivity of the gradients with respect to changes in parameters. In this way, fitting gradients in the second step will not necessitate too large changes in

ij ϕ (u , θ) yz zz θk = arg minjjjj SSE zz θ ϕ ∇(u , θ) S k { s.t. xk̇ = f (xk , uk , θ) xk(0) = xk ,0 yk = h(xk) − ck − 1 θ ∈ Θ0 yk ∈ @k

(8)

Here, the correction term ck − 1 ∈ nt × ny describes the lagged value of a correction term ck, i.e., the value from iteration k − 1, that is adjusted in the gradient correction step which is further 13601

DOI: 10.1021/acs.iecr.9b00105 Ind. Eng. Chem. Res. 2019, 58, 13599−13610

Article

Industrial & Engineering Chemistry Research

jij Δθk = arg minjjjjwϕT ∇ϕ p (uk) − ∇ϕ(yk (θk + Δθ), uk) Δθ j k

described in the section below. The set Θ0 is a permissible space for the parameter values. Notice that the minimization of the cost in eq 8 results in a set of parameter values that fit the model predictions to model outputs while simultaneously penalizing parameter values which reduce the parametric sensitivities. However, since the optimization cost was modified from a norm of squared errors as in eq 1 to a norm of the errors divided by the sensitivity function as in eq 8, there is a risk that the time trajectories of the predicted outputs may deviate from the corresponding measured values. To address this issue, set-based bounds, represented by the last constraint in eq 8, i.e., yk ∈ @k , are

+

s.t. xk̇ = f (xk , uk , θk + Δθ) xk(0) = xk ,0 yk = h(xk) − ck θk + Δθ ∈ Θ0 ϵkT



≤ ϵmax (11)

nu

nu

where ∇ϕ ∈  and ∇gj ∈  with j = 1, ..., ng are the cost and constraint gradients. The measured gradients are denoted by the superscript p. It is assumed that ∇ϕ can be measured by performing at least two simultaneous experiments with slightly different values of the decision variables and calculating the gradient from the results of these experiments. The errors in cost and constraint gradients are normalized using the weights wϕ ∈ nu and wg , j ∈ nu , respectively. These weights are selected as the inverses of the noise in the gradients of cost and constraints with respect to the decision variables. Since the gradient correction step requires changing the parameters with respect to the values calculated in the identification step, the model fitting accuracy will be affected by the current gradient correction unless a further correction of the model is conducted to preserve the original accuracy achieved in the identification step. To that purpose, a correction factor ck is added to the model outputs so as to preserve the fitting accuracy achieved in the identification step while allowing for changes Δθk in the parameter values to fit the gradients (eq 11). The corresponding correction term is derived from a first order Taylor expansion to account for the changes Δθk that were introduced to the parameter values to fit the gradients:

(9)

where the set-based bounds provide upper and lower bounds for the permissible range of model outputs at each sampling time ti such that25 y̲ i ≤ yki ≤ yk̅ i k

z {

j=1

added to force the predicted outputs to remain reasonably close to the measured values. The use of set based bounds has been proposed before for biochemical processes, and it is motivated as follows. Let us assume that several experiments (batch runs) are performed at a given operating point defined by the decision variables uk. The set measurements for all sampling times ti along each batch can then be defined as24 @k = {yki ∈ ny |i ∈ {1, ..., nt }}

yz

∑ wgT,j ∇gjp(uk) − ∇gj(yk (θk + Δθ), uk) zzzzz ng

(10)

where the model outputs are given by yk ∈ ny . Set-based bounds such as defined in eq 10 are particularly well suited for describing experimental data in biological systems24,26 for which the variation among batches typically arises from variability in initial concentrations, changes in environmental conditions, or biases in analytical equipment such as HPLC. The use of such bounds in the current approach is further justified by the need to bound the time trajectories of the variables when using nonconventional model fitting criteria as explained below. Furthermore, it has been shown in an earlier study22 that using the output set-based bounds (eq 9) it is possible to define a model-update criterion so that the identification step (eq 8) is only conducted when these bounds are violated, thus avoiding frequent updating of the model parameters which may contribute to oscillatory convergence to the optimum. Gradient Correction. As mentioned in the Introduction, progressive convergence of the batch-to-batch optimization procedure to the process optimum requires that the predicted gradients at each iteration coincide with that of the process. Hence, in each iteration, a gradient matching step is performed as follows:

ck(ti) = ck − 1(ti) + Dy (θk , ti)Δθk k

(12)

where Dy (θk , ti) ∈ ny × nθ is the Jacobian matrix of the model k outputs at sampling time ti whose elements are the partial derivatives of the outputs with respect to the parameters. An upper bound equal to ϵmax, that is a priori selected by the user, is imposed on a relative truncation error to ensure a desirable degree of model fitting. This upper bound also indirectly determines the allowable amount of change Δθk to fit the measured gradients. The relative truncation error is defined as the error introduced by the linear correction term as follows: ϵk (ti) = [yk (θk + Δθk , ti) − Dy (θk , ti) − yk (θk , ti)] k

−1

× [diag(yk (θk , ti))]

(13)

Then, after the gradient correction step, the updated parameter values at batch run k are given by the sum of the parameter values calculated in eq 8 plus the deviation Δθk needed to fit the gradients in eq 11: θk′ = θk + Δθk

(14)

Model-Based Optimization. Subsequently, based on the identified parameters in eq 8 and the additional parameter 13602

DOI: 10.1021/acs.iecr.9b00105 Ind. Eng. Chem. Res. 2019, 58, 13599−13610

Article

Industrial & Engineering Chemistry Research

derivative can be approximated as a normalized finite difference between two operating points:

corrections and output correction obtained for gradient fitting (eq 11), a model based optimization is performed with the adequate model as follows: uk + 1

αk , m =

= arg min ϕ(y(θk′), u) u

x(0) = x0 y = h(x) − ck

Δθk = arg min Δθ

(15)

∑ wmα

αkp, m(uk) − αk , m(uk , θk + Δθ)

m=1

s.t. xk̇ = f (xk , uk , θk + Δθ)

where uk+1 presents the optimal input for the next batch run minimizing the cost ϕ ∈ , while satisfying the processdependent constraints g ∈ ng and bounds on the input space uL and uU.

xk(0) = xk ,0 yk = h(xk) − ck



θk + Δθ ∈ Θ0

NOVEL REFORMULATION OF GRADIENT CORRECTION BY INTEGRATION OF EXPERIMENTAL DESIGN One limitation of the simultaneous identification and optimization procedure described above is that the gradients are only corrected to fit the most recent measured gradient. Thus, information from past operating points is not taken into consideration when solely using the last measured gradient in eq 11. A correction of the cost gradient only at the current operating point may be adversely affected by measurement noise and does not guarantee an accurate prediction of the cost function at other operating points around the process optimum. To introduce additional robustness with respect to uncertainty in gradient measurements and to increase parameter precision, it is therefore proposed to match the predicted gradients not only to the most recently measured values but also to match predicted gradients to their corresponding measured values from several past batch runs. Moreover, it is proposed to use optimal experimental design to determine future experiments for which the gradients will be highly sensitive to parameter changes and thus will require smaller parameter changes to achieve fitting between the predicted and measured gradients. Using smaller changes in parameters is advantageous since it is expected to result in smoother convergence to the optimum. To summarize, the objectives of the use of experimental design are reducing the sensitivity of model predictions to current gradient measurement noise and designing future experiments for which the gradients are more sensitive with respect to the parameters so as to facilitate gradient matching with smaller changes in these values. Gradient Correction: Definitions. In step 2 of the original methodology, gradient predictions are fitted to the measured values as per the gradient correction step described in eq 11. The gradient of the cost is defined as the derivative with respect to the decision variables: ∂ϕ (u k ) ∂um

2

nu

g (y(θk′), u) ≤ 0

∇m ϕ(uk) =

(17)

where em ∈ nu is the identity vector in the direction of the corresponding decision variable. Using this definition, the gradient correction problem from our earlier study (eq 11) for the cost function gradients can be reformulated as follows:

s.t. x ̇ = f (x , u , θk′)

u L ≤ u ≤ uU

ϕ(uk + Δumem) − ϕ(uk) Δumem

ϵT



≤ ϵmax (18)

wαm

where the normalizing weights are selected as the inverses of the noise in the gradients with respect to the decision variables. The superscript p denotes the finite difference approximation of the cost function derivative (eq 17) estimated from plant measurements. Consideration of Measured Gradients from Past Experiments. In addition to the most recent gradients obtained at operating point uk according to eq 17, it is desired to consider additional cost function measurements that are available from past experiments. Let us define a vector of the differences between the measured cost at the current operating point uk with respect to past measured costs as follows: ΔΦk = [ϕk − ϕk − 1 , ϕk − ϕk − 2 , ···, ϕk − ϕk − n − 1]T b

(19)

where nb is the number of past operating points to be available for fitting measured and predicted gradients, and similarly, let us define a matrix containing the differences between the current and past decision variables as Δ