A Stochastic Regression Approach to Analyzing Thermodynamic

Modeling the Effects and Uncertainties of Contaminated Sediment Remediation Scenarios in a Norwegian Fjord by Markov Chain Monte Carlo Simulation...
0 downloads 0 Views 516KB Size
Environ. Sci. Technol. 2006, 40, 3872-3878

A Stochastic Regression Approach to Analyzing Thermodynamic Uncertainty in Chemical Speciation Modeling CHRISTOPHER L. WEBER, JEANNE M. VANBRIESEN,* AND MITCHELL S. SMALL Department of Civil and Environmental Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213

Chemical speciation modeling is a vital tool for assessing the bioavailability of inorganic species, yet significant uncertainties in thermodynamic parameters and model form limit its potential for decision-making. In this paper we present a novel method for the quantification of thermodynamic parameter uncertainty and ionic strength correction model uncertainty using Bayesian Markov Chain Monte Carlo (MCMC) estimation methods. These methods allow for the inclusion of correlation modeling, which has not been present in previous work. The MCMC simulations are used to model a natural river water to determine the uncertainty in the calculated environmental speciation of ethylenediamenetetraacetate, a chelating agent that has attracted considerable environmental interest. The results indicate that incorporating correlation among related thermodynamic parameters into the uncertainty model is necessary to correctly quantify the overall system uncertainty. This result indicates the superiority of MCMC estimation methods over traditional Monte Carlo methods when available data are used to estimate parameter uncertainty in systems with closely related model parameters.

Introduction The bioavailability of metals has been an area of significant recent attention (1, 2). Various modeling techniques (e.g., biotic ligand models and free-ion activity models) are increasingly being utilized in risk assessment studies for metals (3). Because these techniques are based on metal speciation, reliable and accurate methods for measuring and predicting speciation are critically needed as the first step to determining the environmental impact of metals. In general, speciation can be determined in two ways: laboratory analysis can sometimes provide a direct measure of different forms of a metal, while chemical modeling applies known thermodynamic relationships among chemical forms to predict the equilibrium distribution. Because environmental concentrations of most metals of interest are low and because of the dynamic nature of several aquatic equilibria, analytical techniques often are not effective for determining the complete speciation (4). Further, many in situ sensors are in development that will rely on measurement of only one or two forms of a chemical. For these reasons and often for simplicity and cost savings, chemical models * Corresponding author phone: (412)268-4603; fax: (412)268-7813; e-mail: [email protected]. 3872

9

ENVIRONMENTAL SCIENCE & TECHNOLOGY / VOL. 40, NO. 12, 2006

have an important role to play in risk assessment (5). Several codes are available both commercially (6, 7) and publicly (8-10) for such modeling. Despite being easy to use and time-efficient, speciation modeling involves significant uncertainties (11). From Finkel’s framework (12), uncertainty can, in general, be divided into four distinct types: decision rule uncertainty, model uncertainty, parameter uncertainty, and parameter variability (13). In the context of chemical speciation modeling, the latter three of Finkel’s types are important. Model uncertainty includes the difficulties of choosing which species are expected to be present and which processes are expected to be important (i.e., humic complexation, changes in redox states, or adsorption to solids), the assumption of equilibrium conditions, and the models used to correct thermodynamic data for ionic strength, temperature, and medium interactions. Parameter variability refers to how chemical and thermodynamic parameters vary with respect to location and time. Parameter uncertainty has been the most studied, including examinations of the Fe(II)-Fe(III)-SO4 aqueous system (14), actinide speciation in groundwater (15), and U(VI) aqueous speciation (16). These previous studies have relied on several simplifying assumptions, such as a fixed equation for ionic strength extrapolation (17) and, perhaps most critically, the assumption of statistical independence of model parameters, acknowledged as a key limitation of previous efforts (16). In this work we present a new stochastic method specifically aimed at resolving these difficulties. To illustrate this new method, we utilize the case of ethylenediamenetetraacetate (EDTA), a strong anthropogenic chelating agent that has recently attracted significant environmental attention (18, 19). The European Union (EU) recently completed a risk assessment for EDTA (20), which considered speciation only qualitatively, despite general agreement that the major potential environmental effects of EDTA are highly dependent on speciation (21-23). While consideration of the total EDTA concentration for risk assessment is conservative, the degree to which it is overly conservative is unresolved when the full speciation of EDTA is ignored. This general statement is also true for the assessment of metal concentrations in the environment, which can be significantly affected by their complexation with EDTA and other organic ligands. EDTA is also a wellstudied ligand, providing a large set of thermodynamic data. Finally, it has been very challenging to measure analytically until recently (21) and has thus been modeled extensively in past work. The method is implemented in three steps: First, a review of relevant thermodynamic data and model structure is conducted. Second, a statistical model is created utilizing the relevant data to produce uncertainty and correlation estimates for the distribution of the thermodynamic parameters of interest. Finally, these distributions are used in a batch execution of the speciation model. Results from both the statistical model and from the speciation model are then analyzed.

Method Formulation Thermodynamic Data Sources and Programs for Chemical Modeling. For chemical thermodynamic data, there are generally two types of databases: limited databases affiliated with speciation programs and more comprehensive critical reviews of literature citations. Most mainstream aqueous speciation programs, such as MINTEQA2 (8), PHREEQC (10), etc., come supplied with a standard thermodynamic database 10.1021/es0523035 CCC: $33.50

 2006 American Chemical Society Published on Web 05/12/2006

including reaction constants, enthalpy values, and parameters for ionic strength correction (8). These program databases offer a single value for the overall formation reaction of each aqueous component with no estimate of the constant’s uncertainty. This value, in turn, has generally been taken from more comprehensive databases such as the International Union of Pure and Applied Chemistry (IUPAC) Critical Database (24), the National Institute of Standards and Technology (NIST) standard reference database (25), or any of a number of other critical compilations of thermodynamic data. These broader thermodynamic sources have been recently reviewed for chelate-associated thermodynamic data (26). Each of these databases has its advantages and disadvantages for use in analysis. The NIST and IUPAC databases are widely accepted and comprise all primary data but provide only qualitative assessments of data quality and have problems of internal consistency (32). The Joint Expert Speciation System (JESS) thermodynamic database (27, 28), which was chosen for the present analysis, offers several advantages over the NIST and IUPAC databases. The JESS database provides all data in a constant format, while retaining internal consistency by reporting all data at cited experimental conditions (i.e., without further extrapolation to a standard state). Additionally, it provides quantitative expert assessment of the quality of each cited constant, which is especially valuable for uncertainty assessments. However, the JESS database differs from the others in that it includes both primary and secondary data, including, for example, critical values cited in the NIST database. Although this feature of JESS is not ideal due to concerns of statistical double-counting, it is nearly impossible to trace some literature citations back to their primary sources because different authors have often adjusted the data for their specific use and the correction functions have not always been described (28). Because of this inherent problem, correction functions for temperature and ionic strength are expected to comprise a significant fraction of the parameter uncertainty for thermodynamic data (27). Further, because thermodynamic constants are corrected inside speciation models as new ionic strength values are computed iteratively (8), it is imperative that the correction functions used within the model and those used on the initial data set be consistent. This is only possible if data at relevant experimental conditions are fit to a specific correction function and the same correction function is utilized inside the speciation program. Furthermore, because thermodynamic data are presented in a variety of forms, the proper relations among different data can only be maintained by a simultaneous correction function regression for all parameters of interest. The method presented here resolves several of these challenges by simultaneously fitting a stochastic ionic strength correction function to the thermodynamic data for each species of interest. Further, by use of this simultaneous regression to link the original thermodynamic data with the internal correction function in the speciation model, thermodynamic consistency is achieved. Statistical Modeling Methods. To achieve this simultaneous correction function regression, a statistical model was constructed using Markov chain Monte Carlo (MCMC) methods. MCMC using Gibbs sampling is an approach that allows direct simulation of joint posterior distributions that are informed by the available data. In contrast, traditional Monte Carlo (MC) approaches simulate only predetermined (e.g., prior) distributions that have not been updated with observed data (29). The MCMC technique has gained in popularity recently due to its ease of use and its ability to identify correlations between parameters in statistical models (30). Although the method, to the authors’ knowledge, has

not been used to analyze thermodynamic data previously, it has been used for several other types of parameter estimation studies including pharmacokinetics (31), food web modeling (30), and the estimation of fugitive air emissions (32). WinBUGS 1.4 (33) was utilized for the MCMC coding. It was assumed that the prior distributions for the standard errors of each of the sets of regression parameters (log βio and bi , see below) followed a bivariate normal distribution with a standard Wishart prior distribution on the precision matrix. For all regression parameters, pseudo-informationless priors (with very large variances) were utilized to allow the data to play an overriding role in determining the posterior parameter distributions (29, 32). The data points were weighted using their JESS-supplied expert rankings by multiplying the regression error precision by the numerical rank given in the JESS database. For example, a datum with expert rank 6 would have twice the precision (or half the variance) as a datum with expert rank 3. The overall WinBUGS code contained approximately 140 lines and took just under 3 min on a 1.5 GHz Pentium M processor to complete 10 000 iterations for three simultaneous Markov Chains. Model convergence was monitored by several methods: by checking the autocorrelation of each chain, by comparing history and trace plots of the three chains, and by using the GelmanRubin convergence statistic (34). Speciation Modeling Methods. Ten environmentally important metals and their EDTA chelates (Cu2+, Zn2+, Ni2+, Ca2+, Mn2+, Mg2+, Pb2+, Co2+, Al3+, and Fe3+) were chosen for the analysis, in addition to the protonated EDTA form, HEDTA3-. Several previous analyses have shown that these metals are the most important chelates of EDTA in wastewaters and natural streams (19, 21). Each of the metal-EDTA combinations had a different set of reactions and complexes to model, which ranged from the set {ML, MHL} only to {ML, MHL, M(OH)L, and M(OH)2L}, where M represents the metal and L is the ligand of interest. In total, the thermodynamic parameters of 27 different EDTA species were fit simultaneously by the model. Because this analysis focused on thermodynamic formation constant uncertainty, it was assumed that the redox state of each of the metals was fixed at the state shown above. Data were filtered for relevance to freshwater environmental situations, so that data reported at high temperatures (>30 °C) or very high ionic strength (>1 M) were not utilized in the model. Additionally, for simplification, all MH2EDTA complexes were ignored in the uncertainty model (they were included in the speciation modeling), as their formation is only significant at extremely low pH values (21). Two types of correction are necessary to translate experimentally measured thermodynamic parameters to overall conditional constants: temperature correction and ionic strength correction. For temperature correction, it was assumed that eliminating data outside of the 20-30 °C range would make the overall uncertainty in temperature correction small compared to ionic strength correction. While this is a substantial assumption, it was substantiated by a simple uncertainty analysis (see the Supporting Information for further details). This temperature correction uncertainty will likely become more important when simulations are run at temperatures below 20 °C, but because all simulations for this study were run at 25 °C, our analysis shows that the assumption is valid. The ionic strength correction function was modeled after the extended Debye-Huckel equation (Truesdell-Jones equation) (35), similarly to the approach of May (27) in the formulation of the JESS program. The model equation is shown below as eq 1, where i sums over all reaction species, j denotes different EDTA chelates, and all other parameters are similar to May except j, representing the regression error, VOL. 40, NO. 12, 2006 / ENVIRONMENTAL SCIENCE & TECHNOLOGY

9

3873

FIGURE 1. Comparison of ionic strength correction methods showing the MCMC approach and its 95% Bayesian interval, the JESS approach, and the Visual MINTEQ/Davies approach. Data are from the JESS database. The second axis in panel a shows the size of 90% credible interval (X95-X5). Panel b shows an enlargement of ionic strengths of environmental significance in freshwater systems. and the Debye-Huckel (bDHai) value, which was assumed to be equal to 1.5 except for free metal ions, which were taken from ref 35. By summing over each reactant i as opposed to fitting an overall reaction bi value as in May, the direct modeling of correlation between the regression parameters for each chelate (log βio and bi) was permitted. The TruesdellJones extension was chosen instead of electrolyte-specific approaches such as specific interaction theory (36) for several reasons, including its simplicity and applicability to the typically dilute natural waters of interest in environmental modeling.

log10 β′j ) log10 βj° +

∑i

(

|Zi+ Zi-|AxI 1 + 1.5xI

)

+ biI + j (1)

Two types of statistical correlation were accounted for by this simultaneous model: correlation between the regression parameters for each species (correlation between log βjo and bj) and data-induced correlation between the parameters for one species and parameters for related species (i.e., correlation between log βjo and log βko, where j and k are two related aqueous species). The model treated the first kind of correlation directly through the use of bivariate distributions, while the second kind of correlation was treated indirectly by fitting each primary constant (ML and HL) before fitting the secondary constants (e.g., MHL, MOHL), as described in detail in the Supporting Information. Including this datainduced correlation, which has not been done in previous studies, is important to avoid creating highly unlikely (very low values of the joint likelihood function) sets of different log βio values for two related species. This could happen, for instance, by randomly sampling a relatively high value for the formation constant of HEDTA3- and relatively low values for any of the MHEDTA constants, which are correlated with the HEDTA3- value. Visual MINTEQ, a Windows version of the U.S. EPA code MINTEQA2, was used for the chemical speciation calculations (9). A batch version of Visual MINTEQ was written in Microsoft Visual Basic. NET to vary the thermodynamic parameters in the Visual MINTEQ database and to store the results of each equilibrium calculation. The example chemical system examined was a simulated natural river water constructed using average metal concentrations from the U. S. EPA’s STORET database for Allegheny County, PA (37). (see Figure 3 for chemical system details) The simulated natural river was examined first by assuming a fully aqueous system and then by assuming a dissolution-limited system for the important metals. To isolate the effects of thermo3874

9

ENVIRONMENTAL SCIENCE & TECHNOLOGY / VOL. 40, NO. 12, 2006

dynamic parameter uncertainty, the system was assumed to be very simple, ignoring potential complexation by natural organic matter, sorption to solid phases, and redox reactions.

Results and Discussion Statistical Model. Convergence of the MCMC model was generally achieved in less than 100 iterations, though autocorrelation remained in some species’ chains for over 20 000 iterations. These tended to be species with fewer data. To ensure that this autocorrelation did not affect the speciation uncertainty analysis, the final chains were thinned (29) by a factor of 5, substantially reducing the autocorrelation. One-thousand iterations were then kept from this thinned joint posterior distribution. Initial conditions were varied to ensure stable posterior distributions for all model parameters. Sample statistics for the converged chains are shown for the interested reader in Table SI-1 in the Supporting Information. The values listed in the Visual MINTEQ and JESS thermodynamic databases are shown for comparison. The obtained mean values at infinite dilution were generally close to the values cited in the JESS and VMINTEQ databases, with standard deviations ranging from 0.05 log units to 0.2 log units. However, for some of the constants the fitted mean stability constant was far from the Visual MINTEQ and JESS values, emphasizing the difficult nature of extrapolating thermodynamic data to a standard state. Because no data are measured at the standard state of infinite dilution, the value obtained from the extrapolation is highly dependent on the data and the method used for extrapolation as well as medium effects and experimental error (38). Sample moments were taken to determine the symmetry of the posterior distributions, and the skewness coefficients were close to zero (|γ| < 0.2) for all but 7 of the 54 marginal posterior distributions. Curve fits using Kolmogorov-Smirnov statistics showed that symmetric distributions such as normal and logistic were the best fitting distributions (in log space) for all 54 marginal distributions. The differences in correction methodology used in the three speciation programs are illustrated in Figure 1, which shows the data and the three ionic strength correction methods for the formation constant of CuEDTA2-. Several things can be noted from this figure, which is representative of the results for all the considered constants. First, in the range of quite dilute solutions (