1294
Ind. Eng. Chem. Res. 2011, 50, 1294–1304
Global Sensitivity Analysis Procedure Accounting for Effect of Available Experimental Data Yunfei Chu, Zuyi Huang, and Juergen Hahn* Artie McFerrin Department of Chemical Engineering, Texas A&M UniVersity, College Station, Texas 77843-3122, United States
Sensitivity analysis of uncertain nonlinear systems plays an important role in analyzing chemical engineering process models. Even though a variety of different techniques exists for analyzing nonlinear systems, these methods often do not update the parameter distribution using available measurement data. This forms the motivation behind this work as a sensitivity analysis procedure is introduced that deals with nonlinear systems and uncertainty in model parameters, and also incorporates the use of available data. The technique computes two covariance matrices: a covariance matrix that describes the effect that changes of the parameters values, sampled from a prior distribution, have on the outputs, and a covariance matrix that determines the effect of available data on the posterior distribution of the parameters using a Markov chain Monte Carlo method. The information contained in these two covariance matrices is then simultaneously evaluated by balancing the two matrices. The results returned by this technique are the directions in parameter space that have the largest impact on the system behavior according to the given parameter uncertainty as well as the contribution of individual parameters to these important directions in parameter space. The presented technique is applied to three examples. 1. Introduction Sensitivity analysis aims to study how model parameter variations can qualitatively or quantitatively influence model behavior.1-6 Sensitivity analysis has become an essential component for mathematical modeling and analysis, which is reflected by the wide range of its applications in chemical engineering.7-10 Sensitivity analysis techniques are broadly classified as local sensitivity analysis methods or global sensitivity analysis techniques. Local sensitivity analysis perturbs one parameter at a time in a neighborhood of a given parameter point,9 and the sensitivities of a nonlinear model are generally dependent on the parameter values, which are not precisely known prior to parameter estimation. Global sensitivity analysis simultaneously varies several parameters according to parameter uncertainty, often over a large range of the parameter values.5,6 As a result, global sensitivity analysis techniques are able to provide a more accurate description of the sensitivity than local analysis if the uncertainties of the parameter values are significant. One drawback of global sensitivity analysis techniques is that they are computationally intensive. However, advances in hardware and software over the last two decades have resulted in a significant number of applications of global sensitivity analysis.11-13 While global sensitivity analysis is not dependent on a single parameter value as is the case for local analysis, global sensitivity analysis is affected by the parameter uncertainty which is often described by a probability density function. The information provided by the density function is critical since it can significantly change the calculated sensitivities.5,6 It would be ideal if the density function can provide an accurate description of the parameter uncertainty; however, the density function is often subjectively assumed to have a simple expression for computational reasons. While these assumptions * To whom correspondence should be addressed. E-mail: hahn@ tamu.edu.
make the global sensitivity easier to compute, they limit the conclusions that can be drawn from the analysis. One problem related to the parameter uncertainty results from correlated uncertain parameters. Even though correlated parameters are commonly encountered in practice, most techniques for global sensitivity analysis require the assumption that the parameters are uncorrelated, e.g., the Latin hypercube sampling method,14 Sobol’s method,15 FAST,16 extended FAST,17 or the method based on the correlation ratio.18 While some advances regarding the correlation of parameters have been made for the original methods,19-22 the computation of the sensitivity value is significantly more expensive since shortcuts available for uncorrelated parameters are no longer viable. Furthermore, it is not uncommon in practice that parameter uncertainty is described by a complex density function, e.g., a posterior density function resulting from estimating the parameters. Computing the global sensitivity becomes difficult in this case as even sampling from such a density function is nontrivial. While advanced sampling techniques, e.g., Markov chain Monte Carlo (MCMC),23 are available, it is still challenging to incorporate the generated sampling points into the calculation of the global sensitivity since the calculation itself needs to be performed according to a sampling strategy. Similar to common variance-based methods, samples are required not only from the joint density function but also from the marginal and conditional density functions which result in expensive computations. The goal of this paper is to present a new technique for global sensitivity analysis of nonlinear dynamic systems which addresses challenges resulting from incorporating available data. The presented approach is closely related to the balancing method used in model reduction which computes a transformation of two covariance matrices describing two different, yet important, system properties. Parameter uncertainty information is characterized by a covariance matrix related to the probability density function. If the density function cannot be sampled directly, then a Markov chain Monte Carlo (MCMC) method is applied to generate the sampling points. A second covariance
10.1021/ie101283g 2011 American Chemical Society Published on Web 11/22/2010
Ind. Eng. Chem. Res., Vol. 50, No. 3, 2011
matrix is computed which characterizes information describing the parameter-output sensitivity of the model. Information from both covariance matrices is used simultaneously as a balancing procedure is applied to the two matrices. The directions in parameter space which have the most significant effects on the outputs can then be identified. One feature of the presented technique is that even though information of parameter uncertainty and information about the parameter-output sensitivity are combined in the analysis, the two types of information can be extracted independently. This feature can considerably lower the computational load associated with the global sensitivity analysis procedure. The paper is organized as follows. A simple example is provided in section 2 to illustrate the role that parameter uncertainty plays in global sensitivity analysis. Essential background information about balancing and MCMC is presented in section 3. The new global sensitivity analysis technique is presented in section 4. Section 5 contains three case studies to illustrate the technique, and conclusions are presented in section 6. 2. Illustrative Example A simple linear model is used to demonstrate the important role that parameter uncertainty plays in global sensitivity analysis and to point out difficulties arising from parameter uncertainty being described by a complex density function. Suppose the model to be analyzed is given by y ) Aθ
[ ]
2 1 where A ) 1 2
(1)
Global sensitivity analysis aims to answer the following question: Which are important uncertain parameters that contribute the most to output variations? Apart from the model that describes the effect of the parameters θ on the outputs y shown in eq 1, information about parameter uncertainty is also required for global sensitivity analysis. The results returned by global sensitivity analysis will depend upon both the provided model and the description of the parameter uncertainty. Parameter uncertainty is often described by a probability density function, denoted by p(θ). The most popular techniques for global sensitivity analysis define the sensitivity index based on analysis of variance (ANOVA).5,6,17-20 The sensitivity of the output yi with respect to the parameter θj is defined as sij ) Var[E(yi |θj)]
(2)
and the conditional variance is calculated as follows Var[E[yi |θj]] ) E[(E[yi |θj] - E[yi])2] )
∫ ( ∫ · · · ∫ y p(θ i
-j |θj)dθ-j
-
∫ · · · ∫ y p(θ)dθ) p(θ )dθ 2
i
j
1295
Figure 1. Illustration of the role that parameter uncertainty plays in global sensitivity analysis. The ellipses denote the confidence regions, and the line segments of the same color inside the ellipses can be transformed into each other using the model. The important parameter causes a large variation in the output value (denoted by the length of the line segment on the output plane) when the parameter value is varied according to its uncertainty (denoted by the line segment on the parameter plane). (a) The parameters θ1 and θ2 are equally important. (b) The parameter θ1 is more important. (c) The parameter θ2 is more important.
tainty description even though the identical parameter-output model is investigated. Several different possible conclusions about the importance of parameters can be drawn: The two parameters are determined to be equally important (Figure 1a), the first parameter is more important (Figure 1b), or the second parameter can be more important (Figure 1c). Therefore, use of an accurate density function is crucial for global sensitivity analysis since it can have a significant effect on the results. While eq 3 can be used to compute the conditional variance for this simple example, it is generally not possible to find a closed-form solution of the conditional variance for more complex models. The reason for this is that there are two nested integrals that need to be evaluated. Using a Monte Carlo type approach, one parameter is first fixed at some value and m sampling points for the other parameters are generated to calculate the inside integral of the conditional expectation. Then r sampling points for this parameter are generated to calculate the outside integral of the variance. The procedure repeats n times to calculate the sensitivity for all n parameters. The total number required to evaluate the model is n × r × m which increases with the number of parameters in the model. Additionally, if the density function is very complex, even sampling using Monte Carlo integration can become prohibitively expensive.
j
(3) where θ-j denotes the parameter vector without the parameter θj . In this simple example, the parameter uncertainty is assumed to be described by a Gaussian distribution θ ∼ N(0, Σ), and the sensitivities can be calculated analytically. The reason for choosing such a simple example is that some properties are easier to illustrate than if a more complex model would be used. The sensitivities for three different covariance matrices Σ are calculated and shown in Figure 1. It can be easily seen that the global sensitivity matrix S is dependent on the parameter uncer-
3. Background This section presents background material that will be used for the presented technique. The technique is based upon simultaneously taking into account two types of information, each of which can be described by a covariance matrix. Since this concept is similar to the balancing approach used for model reduction, a brief presentation of balancing is provided in section 3.1. If the probability density function is complex, sampling becomes nontrivial and sampling based on a Markov chain Monte Carlo (MCMC) technique, as described in section 3.2, can be used. Section 3.3 presents background
1296
Ind. Eng. Chem. Res., Vol. 50, No. 3, 2011
material on uniformly distributed random matrices which form a part of the procedure presented in section 4. 3.1. Balancing of Controllability and Observability Covariance Matrices. The concept of balancing has been introduced first for linear dynamic systems24 and was later expanded to a certain class of nonlinear systems.25 As the analytical balancing algorithms for nonlinear systems present computational difficulties, a class of balancing methods based upon simulation data has also been investigated,26-28 which are presented below. All balancing approaches are based upon the idea of retaining the states of a system that are most important when both controllability and observability are taken into account. The degree of controllability and observability can be characterized by two covariance matrices computed from sample points recorded for the state trajectories or the output trajectories, respectively. To calculate the controllability covariance matrix, a set of input signals is first selected, denoted by U. Then the system is stimulated by each combination of input signals. The controllability covariance matrix WC is computed via 0
WC )
∑ ∑ (x(k, u) - x (k))(x(k, u) - x (k))
T
s
s
(4)
u∈U k)-nt
where x(k, u) denotes the state values at the time point k stimulated by the input vector u and xs(k) denotes a reference state trajectory. The time interval [-nt, 0] is selected to be sufficiently large to contain the most important dynamic behavior. Computation of the observability covariance matrix requires that a set of initial state values is selected via xi(0) ) cVei + x0
(5)
where x0 is a reference value, c∈C a scalar, V∈V an orthogonal matrix, and ei the ith column of the identity matrix.27 The system is repeatedly simulated starting at different initial points while the inputs are kept at their nominal values. The observability covariance matrix WO is computed from the output trajectories as WO )
∑∑
c∈C
(∑ )
1 V 2 V∈V c
nt
ψ(k) VT
(6)
k)0
where the matrix ψ(k) is given by ψ(k) ) [(y(k, xi(0)) - ys(k))T(y(k, xj(0)) - ys(k))]ij
Algorithm I: Compute Balancing Transformation Matrix T. Step 1. Compute WC and WO. Step 2. Perform a Cholesky decomposition of WC, WC ) LLT. Step 3. Compute the eigenvalue decomposition of LTWOL, LTWOL ) UΛUT, where UTU ) Inx and Λ ) [λ1, λ2, ..., λnx]. Step 4. Return the transformation matrix T, T ) LUΛ-1/4. Each state of the transformed system is as observable as it is controllable, and the importance of each state to the input-output behavior is given by the magnitude of the corresponding entries in Λ. While the technique presented in this paper does not directly use the controllability and observability concepts, the procedure will result in two covariance matrices, each of which describes important system properties, that will be balanced to take both types of information into account. This balancing step is performed by the procedure described in algorithm I above. 3.2. Markov Chain Monte Carlo. Performing calculations based on sampling points derived from a probability density function is a common approach in global sensitivity analysis when the closed-form solution is difficult to obtain. However, if the density function is complex, sampling itself becomes nontrivial. A Markov chain Monte Carlo method is an approach for sampling from a complex density function which will be further used in this work. A Markov chain is constructed so that its equilibrium distribution equals the desired distribution, denoted by a density function p(θ). The samples generated from the chain after a large number of transient steps are equivalent to the samples from the desired distribution. One class of methods to construct the Markov chain is given by the Metropolis-Hastings algorithm where an accept-reject procedure is used.29,30 A specific algorithm which generates a random walk chain for sampling is given by algorithm II. Algorithm II: Construct a Random Walk Chain for Sampling from a Given Density Function p(θ). Step 1. Initialize θ(0) ) θ0 and k ) 0. Step 2. Generate a sample η from the normal distribution N(0, σ2Inθ). Step 3. Compute a potential value, θ′ ) θ(k) + η
(7) where [ · ]ij denotes a matrix with the ij-entry displayed in the bracket. The notation y(k, xi(0)) refers to the output values at the time point k resulting from an initial state perturbation of xi(0) according to eq 5. After the two covariance matrices have been computed, a balancing transformation, which turns both matrices into identical diagonal matrices, can be applied x ) Tx˜
(8)
where x is the original state vector, x˜ is the transformed state vector, and T is the transformation matrix. One specific algorithm that computes T for the case where both matrices are full rank is given by algorithm I.
Step 4. Generate a sample R from a uniform distribution in [0,1]. If R e p(θ′)/p(θ(k)) θ(k + 1) ) θ′ else θ(k + 1) ) θ(k) Step 5. Set k ) k + 1 and return to step 2. This algorithm generates a potential value by adding a random value to the current value in step 3. The potential value is accepted with probability min {p(θ′)/p(θ(k)), 1} in step 5. If the potential value is rejected then the current value is kept.
Ind. Eng. Chem. Res., Vol. 50, No. 3, 2011
The acceptance rate is a key factor for controlling the performance of a constructed Markov chain and can be adjusted by choosing the variance σ2 in step 2. A recommended value for the equilibrium acceptance rate is 0.234.31 The generated sampling points can then be used to calculate the quantity defined by an expectation as E[q(θ)] )
∫ q(θ)p(θ)dθ
(9)
where q(θ) is a function of the random vector θ with the probability density function p(θ). The function of q(θ) can be general, but it is often chosen as q(θ) ) θ to calculate the mean vector or chosen as q(θ) ) (θ - E[θ])(θ - E[θ])T to calculate the covariance matrix. The integral is the limit of the average over the sampling points generated from p(θ) as N
E[q(θ)] ) lim
Nf∞
∑
1 q(θ(k)) N k)1
(10)
where q(θ(k)) denotes the function evaluated at the sampling points. It should be noted that MCMC is computationally significantly more expensive than direct sampling techniques, which may limit its use for very large scale problems. However, MCMC is usually applied when the probability density function is too complex for direct sampling methods and thereby allows one to deal with such a situation, even if it is computationally expensive. 3.3. Uniformly Distributed Random Matrices. This subsection presents a sampling procedure that will be used to generate parameter sampling points according to a uniform distribution in a hypersphere. This sampling procedure forms a part of the sensitivity analysis technique presented in the next section. A set of n-by-n orthogonal matrices is given by On ) {U: UTU ) In}. There is a unique distribution, denoted by µ, over the set On which is invariant under multiplication from either side by an orthogonal matrix, i.e., for the random matrix X over O n: µ(UX) ) µ(XU) ) µ(X)
for any U ∈ On
(11)
The distribution µ is called the uniform distribution of random orthogonal matrices.32 A random orthogonal matrix X with the uniform distribution can be sampled using a QR factorization in algorithm III.33 Algorithm III: Generate Uniformly Distributed Orthogonal Matrix X Step 1. Generate a random n-by-n matrix Y where each element of Y is independently sampled from the standard normal distribution. Step 2. Compute the QR factorization of Y, i.e., Y ) QR where the diagonal elements of R are all positive. Step 3. Return the orthogonal matrix. 4. Global Sensitivity Analysis via Balancing As has been discussed in the Introduction, global sensitivity analysis can be problematic when parameter uncertainty is described by a complex probability density function. This forms the motivation for the technique presented in this paper as a complex posterior density function can be incorporated into the sensitivity analysis procedure. 4.1. Problem Statement. Global sensitivity analysis requires two types of information: a model describing the effect of
1297
parameter variations on the outputs and a description of the parameter uncertainty. The model is assumed to be y ) O(θ)
(12)
where y is the output vector and θ is the parameter vector. For the analysis of dynamic systems, the parameter-output relationship is generally not an explicit expression but is usually provided implicitly by a set of differential equations. Parameter uncertainty is assumed to be described by a probability density function p(θ). A poor characterization of p(θ) may result in arbitrary conclusions as was illustrated in section 2. Accordingly, if experimental data are available then it is desired to use the posterior density function to calculate the sensitivity instead of the prior density function as this will result in a more accurate description of p(θ). The posterior density function can be calculated from a data regression model z˜ ) h(θ) + ε
(13)
where z˜ is the data vector, h(θ) is the function that predicts the data from the parameter values, and ε is a noise vector. The posterior density function is given by pO(θ|z˜) ) Cp(z˜ |θ)pI(θ)
(14)
where pI(θ) is the prior density function, pO(θ|z˜) is the posterior density function, and p(z˜|θ) is the likelihood function of data which equals the density function of the noise, i.e. p(z˜ |θ) ) pε(z˜ - h(θ))
(15)
The scaling factor C has no effect on the analysis as only the ratio of two posterior density functions is used in this work. It is important to point out that O(θ) and h(θ) do not have to be of the same form, as, for example, one may have temperature data which can be used to update the parameter uncertainty. However, sensitivity analysis could be performed with regard to the selectivity of a reaction mechanism. To combine the two types of information, the models used for sensitivity analysis will be formulated as a dynamic system. Information about the effects of parameter variations on the outputs is represented by one covariance matrix while parameter uncertainty is characterized by another covariance matrix. The two covariance matrices are then balanced, using the procedure reviewed in section 3.1, to take both types of information into account for evaluating the importance of the parameters. Details about the procedures used for these steps are presented in the following subsections. 4.2. Effect of Parameter Variations on Outputs. The model which describes the effect of variations of the parameters on the outputs given by eq 12 can be reformulated as θ(k + 1) ) θ(k) y(k) ) O(θ(k)) with θ(0) ) θ0
for k g 0
(16)
in order to treat the parameters as “states”, even though their values do not change over time for k g 0. Accordingly, a change of the value of θ in the model given by eq 12 is equivalent to a change of the initial parameter value θ(0) for the systems shown in eq 16. The effect that changes in the initial values have on the outputs can be captured by the covariance matrix WO. The set of sampling points according to eq 5 contains points inside a hypersphere which can be generated by choosing the sets C and V by algorithm IV.
1298
Ind. Eng. Chem. Res., Vol. 50, No. 3, 2011
Algorithm IV: Generate Parameter Sampling Points inside a Hyper-sphere Step 1. Generate a set of uniformly distributed orthogonal matrices V) {Vi:Vi is independently sampled from On according to algorithm III}
Step 2. Generate a set of uniformly distributed value C ) {ci:ci ) rθµ1/nθ where µ is independently uniformly sampled from [0, 1]}
Step 3. Return the set of sampling points P ) {θij:θij ) ciViej + θ¯ } where ej is the j-th column in the identity matrix. This algorithm generates a uniformly distributed vector θij j via a two step inside a hypersphere with radius rθ and center θ procedure: the first step generates a uniformly distributed random vector of unit length, while the second step modifies the length of the vector by multiplying it with a random variable.35 For each superscript i, nθ points θi1,...,θinθ are required in the following calculation. If it is desirable to choose a different distribution than the one described here, then this is also possible without modification of the procedure for computing the covariance matrix. However, for the presentation of this work, the given choice is appropriate as the covariance matrix computed in this subsection only involves prior information about the parameter uncertainty, as the posterior information will be separately characterized in section 4.3. A covariance matrix can be computed that quantifies the effect of the parameters on the outputs. For this task, the system given by eq 16 is simulated at each sampling point of θij and the outputs are recorded. The covariance matrix is then computed WO )
1 N
N
∑ c1 V ψ V i)1
2 i
i
i
T i
(17)
where N is the number of the sampled orthogonal matrices, and the matrix ψi which does not vary with time is given by ψi ) [(y(θij) - y(θ¯ ))T(y(θik) - y(θ¯ ))]jk
(18)
The matrix WO in eq 17 is closely related to the matrix shown in eq 6, apart from the inclusion of a scaling factor. The total number of sampled parameter values is nθ × N, which is also the number of simulations required for computation of the matrix WO. 4.3. Parameter Uncertainty Information. Parameter uncertainty information can be captured by a set of sampling points following the probability density function. If the density function is complex, direct sampling cannot be used and an advanced sampling method such as MCMC presented in section 3.2 has to be adopted. Since MCMC generates samples from the state trajectory of a constructed Markov Chain, a special feature of MCMC is that a sampling point is dependent on the previous one, i.e., the general expression that represents the sampling procedure is a dynamic system θ(k + 1) ) π(θ(k), u(k))
k