Article pubs.acs.org/IECR
Design of In Silico Experiments as a Tool for Nonlinear Sensitivity Analysis of Knowledge-Driven Models Alexandros Kiparissides,† Christos Georgakis,*,‡ Athanasios Mantalaris,† and Efstratios N. Pistikopoulos† †
Biological Systems Engineering Laboratory, Centre for Process Systems Engineering, Imperial College, London SW7 2AZ, United Kingdom ‡ System Research Institute for Chemical & Biological Processes and Department of Chemical & Biological Engineering, Tufts University, Medford, Massachusetts 02155, United States ABSTRACT: We propose the in silico use of the Design of Experiments (DoE) methodology in the analysis of parametric sensitivity of a detailed process model, so that we overcome the substantial limitation of current state-of-the-art Global Sensitivity Analysis (GSA) methods, which require a large and often prohibitive number of model evaluations. This is achieved by the calculation of an accurate and much simpler response surface meta-model (RSM) that is able to provide all the required nonlinear global sensitivity information. To benchmark the efficiency of the proposed methodology, we utilize an unstructured dynamical model that describes hybridoma cell growth and proliferation in batch cultures. [From the work of D. J. Jang and J. P. Barford, Biochem. Eng. J. 2000 4 (2), 153−168.] We show that using the DoE methodology to generate a response surface metamodel approximation of the full model can lead to the same quality of sensitivity information as the established variance-based GSA methods at a significantly lower computational cost. Finally, we evaluate, through the application of GSA on the estimated RSM meta-model, its ability to capture the nonlinear interaction effects present in the detailed model.
■
INTRODUCTION Mathematical modeling provides a systematic way to quantitatively study the characteristics of the complex and multilevel interactions that occur in nature. In a way, it can be viewed as the best way to meaningfully organize the available information about a process or system. However, a mathematical model is always an approximation of the real process that it describes, because of our limited knowledge and the underlying scientific assumptions. The deviation of a model’s prediction from reality is often considered as the defining criterion for the quality of a model and the unknown values of the model parameters are chosen to minimize the distance. Such a metric, while necessary, is not sufficient to fully gauge the model’s adequacy. For example, introducing additional model components and, unavoidably, additional parameters, although not optimal, often results in further reduction of the models’ deviation from the available experimental data. However, what is ultimately expected of a good model is the ability to accurately predict the behavior of the process at a new operating point and, most importantly, with a minimal amount of uncertainty. Uncertainty is introduced in the model either by the need to make modeling assumptions (structural uncertainty) or through the model parameters (parametric uncertainty). Adding unnecessary model components reduces the quality of any subsequent parameter identification and, most importantly, increases the uncertainty in the model predictions. Several mathematical tools that are labeled as model analysis techniques aid in examining the model’s parametric uncertainty. Prominent among these methodologies is sensitivity analysis (SA),1 which apportions the total uncertainty present in the model output to the various sources of variation. In other words, SA provides both qualitative and quantitative insight, with regard to the dependence of the model output to its parameters through © 2014 American Chemical Society
what is known as parameter significance ranking. Knowing which parameters have a significant effect on the model output, and to what extent, allows for the targeted reduction of parametric uncertainty through tailor-made experiments.2 Parameters with negligible sensitivity indices (SIs) can be fixed at their literature or approximated value. SA methods are commonly classified in three broad categories, namely, screening, local, and global methods.1 For the general case of a nonlinear ODE model commonly encountered in engineering applications, global methods and variance based global methods in particular have the edge over their counterparts, as discussed previously.1,3,4 Simply, the ability to estimate higher-level indices, which quantify the effect of parameter−parameter interactions on the output of the model, gives global methods the edge over their local counterparts. Global Sensitivity Analysis (GSA) methods are commonly grouped in two categories, namely, (i) methods that utilize a model approximation in order to generate measures of importance, and (ii) methods that study the total output variance of the model. Variance-based methods, currently considered state-of-the-art in GSA,4 treat model parameters as random variables (Xi) within the predetermined parameter space in order to study the total output variance and how it is apportioned to the model parameters. This is achieved by decomposing the model output into summands of increasing dimensionality, which is a process known as ANOVA decomposition.5,6 This methodology is shown in eq 1: Special Issue: John Congalidis Memorial Received: Revised: Accepted: Published: 7517
September 27, 2013 January 14, 2014 February 21, 2014 February 21, 2014 dx.doi.org/10.1021/ie4032154 | Ind. Eng. Chem. Res. 2014, 53, 7517−7525
Industrial & Engineering Chemistry Research
Article
n
f (X1 , ..., X n) = f0 +
∑ fi (Xi) + ∑ i=1
factor (that is, in almost all real-world problems). To properly uncover how different factors jointly affect the response, one needs to utilize the Design of Experiments (DoE) methodology. DoE refers to a group of techniques that formalize, in a statistically sound way, the attempt to relate the effect of many inputs of interest (X) on a studied response or set of responses (Y) through repeated experimentation. Conventionally, in DoE, the inputs (X) are measured variables such as flow rates, concentrations, pressures, pH values, etc. However, in the present work, we propose the use of DoE to study the effect of the parameters of a mathematical model (the varying inputs X) on the model output (the studied response Y). Methodical design of the conditions of the set of experiments is essential for efficient and effective information gathering. The goal is to select the levels (values) of the input factors (X), following a prespecified design, in order to maximize the amount of information gained while minimizing the amount of conducted experiments. Many of the central concepts of DoE originate from the work of R.A. Fisher8,9 in the early parts of the previous century. Fisher’s work demonstrated how a systematically designed and executed set of experiments circumvents many problems that are frequently encountered in analysis. Furthermore, he established the four central principles of DoE: the factorial principle, randomization, replication, and blocking. An overview of the information provided by DoE is presented in Table 1. The book by Montgomery10 is an excellent introduction to the subject. A full factorial design explores all the combinations between factors at each possible factor level. Depending on the number of levels at which each factor is studied, such an approach requires 2n experiments, where n signifies the number of factors and the base number (2) denotes the number of levels at which the factors are studied. The number of total experiments scales badly with dimension. Nevertheless, it provides enough information to approximate the dependence of the studied response to the various input factors, using an equation similar to eq 1. Repeated experiments at the base value of all of the factors (Xi = 0) provide an estimate of the normal variability of the process, which is assumed to be the same throughout the design space. In order to reduce the number of overall experiments to be performed, one usually resorts to the selection of fractional factorial designs, which enable the estimation of a smaller number of interaction terms. Based on the notion that quantifiable nonlinear effects are limited to second (or third)level interactions, Box and Wilson11 expanded the ideas of Fisher and proposed using a second-order polynomial to simulate the behavior of the response to the changes in the input factors, which is an approach that has come to be known as Response Surface Methodology (RSM). Since its introduction, RSM has expanded to higher-order polynomials12,13 at the expense of computational effort. Table 2 summarizes the information gained from polynomials of various degrees.
fij (Xi , Xj) + ...
1≤i