Anal. Chem. 1997, 69, 3658-3664
Articles
Propagation of Uncertainty in Aqueous Equilibrium Calculations: Non-Gaussian Output Distributions Stephen E. Cabaniss
Department of Chemistry and Water Resources Research Institute, Kent State University, Kent, Ohio 44242
The propagation of uncertainty in aqueous equilibrium calculations is examined using a derivative method and Monte Carlo simulations. Simulations of 104 trials provide both good reproducibility and reasonably short simulation times ( 1000, and SS < 1% at P ) 104. For P ) 104, jx and S of the input distributions are within 0.1% and 1%, respectively, of the assigned µ and σ. Evaluation of Monte Carlo Algorithm. The reproducibility of output concentration distributions is more complex than that of input constraint distributions and varies from one simulated solution to another. For example, a simulated titration of 1.00 mM malonic acid (H2Mal) with NaOH using σNaOH ) σH2Mal ) 0.10 mM shows very different reproducibilities in the output pH distributions as the titration progresses (Figure 1). The standard deviation of the mean pH (Sx) and the standard deviation of the standard deviation of pH (SS) are calculated for 10 simulations with a given P. For a solution with no added base, only P ) 100 trials are needed to achieve Sx and SS 0.5 for all HCl values except those nearest the equivalence point, 0.45 mM e HCl e 0.55 mM; in fact, the absolute value of the skewness is >1 for 90% of the points in the titration. Figure 6 shows the resulting SpCu for this system, which has a maximum value slightly larger than that of the HAc system with comparable uncertainties (Figure 3a). However, disparity between Monte Carlo and derivative SpCu is larger in magnitude and spread 3662 Analytical Chemistry, Vol. 69, No. 18, September 15, 1997
over a wider range of acid values in the Cu-NTA system than in the HAc system. The addition of excess Ca(II) to the system decreases the observed SpCu (Figure 6), although some bimodality is evident in the distribution surface (not shown) and the disparity between Monte Carlo and derivative methods is still apparent. In this case, the presence of the Ca-NTA complex decreases the average pCu and pH at low HCl levels, making the transition between near 0.5 mM HCl smoother and more nearly linear. DISCUSSION The propagation of uncertainty is only one aspect of the more general problem of interactions between equilibrium system constraints and output concentrations. Others include master variable calculations (e.g., the effect of pH on Cu(II) speciation) and sensitivity analysis (e.g., the effect of changing K on species concentrations).14,30,31 Generally speaking, these interactions can be examined either using derivative methods or by repetitive calculations. Repetitive calculation is straightforward with equilibrium computer codes and provides an equilibrium solution for each value of the input constraint of interest. Derivatives can provide additional, qualitative insight into the interactions among system constraints and species concentrations.32 (30) Stumm, W.; Morgan, J. J. Aquatic Chemistry, 3rd ed.; Wiley Interscience: New York, 1996; 1022 pp. (31) Nordstrom, D. K.; Ball, J. W. 1989, Sci. Geol. Bull. 42, 269-280. (32) Morel, F. M. M.; McDuff, R. E.; Morgan, J. J. In Trace Metals and MetalOrganic Interactions in Natural Waters; Singer, P. C., Ed.; Ann Arbor Science Publishers: Ann Arbor, MI, 1973; pp 157-200.
Propagation of uncertainty differs from master variable calculations and sensitivity analyses because (1) all the input constraints may vary simultaneously and (2) the result for a single set of constraints is a distribution (array) rather than a single concentration (scalar) for each species. The former is a significant complication because the interactions between constraints must be explicitly considered; this greatly increases the information required for equilibria with many components. In practice, most of the studies cited above (and this work) ignore possible interactions and assume the input uncertainties to be mutually independent. Treating the output concentrations as arrays rather than scalars greatly increases the volume of the program output and thus increases the complexity of its interpretation. Typically, the distribution has been assumed to be Gaussian so that a simple pair of parameters (xj, S) could be used to represent the output. These results show that the concentration distributions calculated from normally distributed, mutually independent input uncertainties need not be Gaussian. Depending on the system, input uncertainties of e10% can lead to highly non-Gaussian distributions, for which derivative methods of uncertainty analysis will be inappropriate. This is frequently accompanied by large increases in the calculated concentration uncertainties. Three general types of non-Gaussian distributions are observed in these examples: monomodal skewed, bimodal symmetric, and bimodal skewed. Bimodal distributions typically resulted from “titration” phenomena, in which the concentrations of two strongly interacting components are (stoichiometrically) similar. The separation between the peaks of a bimodal distribution can be very large (e.g., Figure 2a) or small, blending into monomodal (e.g., the CaCu(NTA) system), depending on the strength of the interaction. The bimodal distributions shown in this work are mostly skewed as well. Symmetric bimodal distributions only occur near the stoichiometric equivalence point, and the “true” or average concentration is one of the least likely results (a minimum in the distribution). Monomodal skewed distributions can result from any discontinuous or nonlinear process, and a moderate level of skewness results from simple log/anti-log operations, as noted above. For discontinuous equilibria (e.g., precipitation of an ionic solid), skewed monomodal distributions are observed near the point of discontinuity. A detailed understanding of the propagation of uncertainty is particularly important when either (1) the calculation performed amplifies the input error or (2) the tails of the output distribution are of interest. Error amplification is likely to occur near the equivalence point of highly favorable reactions, e.g., the equivalence point of the HAc or HOCl (Figure 2). While error amplification “near the equivalence point” may seem obvious to a trained observer, the definition of “near the equivalence point” is murky. Monte Carlo analysis provides a way to clarify this phrase; Figures 3b and 6 show elevated standard deviations for most values of total base (acid), even those which might seem far from the equivalence point. Although skewed monomodal distributions typically produce both jx and S similar to those of Gaussian distributions, the differences in the tails of the distributions can be substantial. Tests based on 99% confidence intervals, or outlier frequency, will be strongly affected and should be explicitly designed to avoid assumptions of Gaussian behavior. Monte Carlo methods of uncertainty propagation have significant advantages and disadvantages relative to derivative methods.33,34 Unfortunately, several papers have contrasted them as
Figure 7. SpH calculated by derivatives (eqs 1 and 2) for the base titration of HOCl shown in Figure 2b. Parameter a ) 1.5 (b), 1 (2), 0.5 (as shown in Figure 3b) (9), and 0.25 (1).
almost an either-or choice.35,36 Clearly, derivative methods are much faster than Monte Carlo simulations (>100-fold, in the examples shown here) and can readily account for correlated input uncertainties. However, it is also apparent from the above examples that, even in the favorable case of mutually independent Gaussian uncertainty distributions, the assumptions of the derivative method are not always valid. Monte Carlo simulations must be used in such cases. The principal limitation to the Monte Carlo approach is the time available for the calculation. As microprocessor performance continues to improve, this limitation becomes less restrictive. Extrapolating from Table 1 assuming the solution time depends on the square of the number of components, a 200 MHz Pentium microcomputer should be able to perform a P ) 104 simulation on a 20 component system in less than 6 min. Nonetheless, extensive modeling efforts of environmental phenomena, i.e., coupled transport/reaction codes, will find Monte Carlo simulations impractical in many cases. How can we predict in advance, without performing the simulation, whether a given set of constraints gives rise to a Gaussian distribution? Although no “bullet-proof” method is presented, two suggestions follow: One possible method uses derivatives to provide positive evidence of nonlinear behavior and thus a high probability of a non-Gaussian distribution. A significant difference between derivative estimates of S using different values of the parameter a indicates that the equilibrium functions are not acceptably linear with respect to the constraints over the interval 2aσ. For example, Figure 7 shows SpH calculated by the derivative method with values of a ranging from 0.25 to 1.5 (∆Z ) 0.5σ-3σ). The values are quite similar for NaOH ) 0.3-0.7 mM and for NaOH > 1.3 mM but vary significantly at low NaOH concentration and near the titration equivalence point. The latter behavior indicates that the second derivative with respect to some of the inputs is not zero and the linear approximation, which permits truncation of the higher-order terms in the Taylor series expansion, is inapplicable.21 Unfortunately, this method will not detect all non-Gaussian output distributions. The difference between SpH calculated with the maximum and minimum values of a is zero at two points near the equivalence point and 1 at low NaOH; although the derivative (33) Schwartz, L. M. Anal. Chem. 1975, 47, 963-964. (34) Anderson, G. M. Geochim. Cosmoschim. Acta 1976, 40, 1533-1538. (35) Rees, C. E. Geochim. Cosmochim. Acta 1984, 48, 2309-2311. (36) Roddick, J. C. Geochim. Cosmochim. Acta 1987, 51, 2129-2135.
Analytical Chemistry, Vol. 69, No. 18, September 15, 1997
3663
estimates agree, these are bimodal distributions (Figure 2b). In addition, some of the points with very small differences in SpH are highly skewed; compare Figures 4b and 7 at 0.4 mM NaOH. A second possible method for predicting whether the output concentration distribution will be Gaussian is to perform a “screening” Monte Carlo simulation with a small number of trials, e.g., P ) 100. While this would be less reproducible than a P ) 104 simulation, it would be 100 times faster and should provide at least an indication of bimodality, although the S and skewness values would be imprecise. CONCLUSIONS This work shows the practicality and some of the benefits of Monte Carlo uncertainty propagation in aqueous equilibrium calculations. Using a Pentium-based microcomputer, simulations with 104 trials provide both good reproducibility and reasonably short simulation times (