Environ. Sci. Technol. 2005, 39, 4929-4937
Estimation of Fugitive Lead Emission Rates from Secondary Lead Facilities using Hierarchical Bayesian Models AMIT GOYAL AND MITCHELL J. SMALL* Civil & Environmental Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213 KATHERINE VON STACKELBERG AND DMITRIY BURMISTROV Menzie-Cura & Associates, 8 Winchester Place, Suite 202, Winchester, Massachusetts 01890 NANCY JONES EC/R Inc., 6330 Quadrangle Drive, Suite 325, Chapel Hill, North Carolina 27517
Fugitive emissions from secondary lead recovery facilities are difficult to estimate and can vary significantly from site to site. A methodology is presented for estimating fugitive emissions using back inference from observed ambient concentrations at nearby monitors, in conjunction with an atmospheric transport and dispersion model. Observed concentrations are regressed against unit source-monitor transfer terms computed by the model, and the fitted parameters of the regression equation include the background ambient lead concentration, the fugitive lead emission rate, and (when stack emissions are assumed to be unknown) the stack lead emission rate. The methodology is implemented at three sites, one each in Florida, Texas, and New York. A hierarchical Bayesian method is used to estimate the parameters of the model, allowing inferences to be made for both site-specific values and multisite (national) distributions of fugitive emissions and background concentrations. Informed prior distributions must be specified for the background lead concentrations and for fugitive and stack emission rates in order to obtain stable estimates. Sensitivity analyses with alternative priors indicate that posterior estimates of background concentrations and fugitive emission rates are relatively insensitive to the assumed priors, although estimated stack emission rates can vary with alternative priors, especially for the New York facility, where the stack emission rate is highly uncertain and poorly resolved by the model. The fugitive lead emission rates estimated for the sites are comparable to, or in some cases (especially Texas and New York) likely larger than the stack emissions that are determined for these facilities. An aggregate predictive distribution is derived for the average fugitive lead emission rate from secondary lead smelting facilities, with a median value of 9.2 × 10-7 g Pb/m2/sec, and a 90% credible interval from 2.1 × 10-7-5.3 × 10-6 g Pb/m2/sec. This wide range reflects both the variation in fugitive lead emissions * Corresponding author phone: (412)268-8782; fax:(412) 268-7813; e-mail:
[email protected]. 10.1021/es035465e CCC: $30.25 Published on Web 05/20/2005
2005 American Chemical Society
from site to site and the high degree of uncertainty resulting from an estimate based on only a very small sample of sites. As such, the primary contribution of this study is methodological, demonstrating how information from multiple sites can be combined and considered simultaneously for the estimation of fugitive emission rates, but recognizing that additional sites must be included to obtain a more precise characterization.
Introduction It is very difficult to characterize uncontrolled, or “fugitive”, emission rates from industrial sites. This characterization becomes increasingly important as significant reductions from controlled stack emissions are achieved at many facilities. The US clean air act amendments (CAAA) of 1990 target reductions of process/stack emissions by mandating the use of maximum achievable control technology (MACT) (1). However, Section 112(f) of the CAAA also directs the EPA to determine the residual risk to public health and the environment following implementation of the MACT standards. These residual risks are expected to be of particular concern for complex industrial sites where fugitive emissions are prevalent, such as petroleum refineries, chemical plants, gasoline distribution operations, and secondary metals recovery facilities. In a first residual risk report to congress (2), the EPA delineates its plan for characterizing and addressing post-MACT residual risks, recognizing that the emissions, ambient concentrations, exposures, and risks associated with air toxics are highly variable from site to site and over time and that the models and data needed to estimate these impacts are quite limited. The research presented in this paper is motivated by the need to estimate the residual risks from emissions from secondary lead smelters following the implementation of MACT standards. These facilities include various operations needed to crack, drain, and isolate lead from automotive batteries and other scrap products, followed by smelting, refining, and alloying of the recovered lead. Many of these facilities were managed in the past without the necessary procedures to ensure full containment and capture of the recycled lead. Potential sources of release to the surrounding environment include leakage from the smelter; contaminated soils and dust that become airborne during dry, windy periods; foot and vehicular traffic; and erosion and runoff during storms. As a result, many historic secondary lead recovery facilities are currently Superfund sites (3, 4). In recent years, secondary lead recovery has been consolidated into fewer, larger operations that are easier to manage and regulate but could still provide a potentially large source of lead to the surrounding environment. The MACT standards for secondary lead smelters were promulgated in June 1995 (5) and amended in June 1997 (6). The rules include requirements for smelting furnaces, kettles, dryers, and some fugitivesource generating activities, such as material handling. The rules are projected to yield a ∼70% reduction in total known toxic emissions from the 23 secondary lead smelters in the United States (7). However, this estimate is highly uncertain because uncontrolled fugitive emissions, especially from residual windblown dust and soil, could result in continuing emissions of unknown magnitude. Methods for estimating fugitive emissions from industrial facilities include (8, 9) the following: (i) direct measurement at the point of release; (ii) the application of mass balance calculations to the industrial process; (iii) emission factors VOL. 39, NO. 13, 2005 / ENVIRONMENTAL SCIENCE & TECHNOLOGY
9
4929
based on throughput or on-site activity; and (iv) back inference from observed ambient air concentrations or those measured in the surrounding soil or biota. Direct measurements of emissions are often precluded by the spatially dispersed and temporally intermittent nature of the source. Mass balance methods are ineffective for determining most industrial fugitive emissions because these emissions are (hopefully) of too small a magnitude to be resolved, given the estimation uncertainties for the much larger input and output flows that must be subtracted from each other. Emission factor models have been developed for a number of processes and facility types (10-13) and may ultimately be sought for general application to operations such as secondary metals recovery, but are likely to be subject to significant site-to-site variability and error, and still require good local estimates for their initial development and testing. For this, approach iv is often used: back inference from measurements of nearby ambient concentrations, analyzed in the context of a source-receptor pollutant transport model for the site. This approach is adopted in this study. Back inference of air pollution emission rates from ambient concentration measurements is central to the micrometeorological method for estimating absorption or deposition rates to, and volatilization or resuspension rates from, water or land surfaces and the atmosphere (14-18) and has been used at various scales to back calculate emission rates from a number of difficult-to-characterize sources (19-21). The estimation procedure is complicated when the facility emissions “signal” must be separated from the “noise” of other “background” sources of the contaminant. This is especially the case when estimating fugitive emissions of lead because ambient levels are also affected by contributions from roadside urban soils contaminated by historic leaded gasoline emissions, the weathering of leaded paint, and other residential or industrial sources (4). In this study, fugitive lead emissions from three large secondary lead recovery and smelting facilities are estimated from a statistical analysis of ambient lead concentration data at nearby air monitoring stations. The ambient measurements are combined with source-receptor estimates from each site to its monitors, as determined by an atmospheric dispersion model. The model is used to compute separate sourcereceptor transfer factors for stack and fugitive emissions at each site, and these factors serve as the explanatory variables for the statistical models. The estimated coefficients of these variables correspond to the unknown emission rates. Alternative models are considered and compared: (i) assuming that the stack emission rates are unknown (or, as described below, characterized with a high degree of prior uncertainty) and estimated along with the fugitive emission rates; and (ii) assuming that the stack emissions are known (based on reported values) so that only the fugitive emission rates are estimated by the model. The fitted intercepts of the statistical models correspond to the estimated ambient background concentrations for the sites. A Bayesian hierarchical approach is used to characterize the variability in the estimates between the sites, as well as the overall uncertainty of the model parameters. In the hierarchical model, the fugitive emission rates of lead at multiple sites are assumed to be sampled from an aggregate distribution characterizing such sites. This aggregate distribution provides a first, preliminary estimate for the national distribution of emission rates from secondary lead facilities. Likewise, background ambient lead concentrations at monitoring locations are related hierarchically both within a site (locally) and between sites (i.e., nationally). Inferences are made simultaneously for the aggregate distributions and the site-specific values of the emission rates and background concentrations. Bayesian estimation methods are especially well suited for fitting hierarchical models (22-25), and these 4930
9
ENVIRONMENTAL SCIENCE & TECHNOLOGY / VOL. 39, NO. 13, 2005
have been applied to a variety of environmental problems (26-30). The Bayesian method allows consideration of exogenous prior information for the model parameters from previous studies or related data, to help better inform or constrain the estimates. The paper begins with a review of the available site data and the atmospheric transport model applied to each. Alternative statistical models are then presented and the Bayesian method for fitting the models is described. The results for each model are presented, and inferences are drawn concerning the ambient background concentrations and fugitive emission rates. The sensitivity of these estimates to alternative prior distributions for key model parameters is then explored. Three sites are considered in this study: one each from Florida, Texas and New York. Although a considerable amount of data and a substantial modeling effort is needed for each, the small number of studied sites does limit the strength of the inference that can be drawn concerning the aggregate distribution of similar sites. However, the availability of a limited number of studies is not unusual when addressing exposure and risk assessment problems with both national and site-specific features, and this limitation is reflected in the high residual uncertainty of the fitted distributions. In this sense, the primary contribution of this study is methodological, demonstrating how information from multiple sites can be combined and considered simultaneously for the estimation of fugitive emission rates, but recognizing that additional sites must be included to obtain a more precise characterization.
Methods Site and Data Descriptions. For the Florida and Texas facilities considered in this study, data collected over two years (1999-2000) at three ambient air monitors are used to fit the models, whereas for the New York facility, data collected over two years (1994-95) at three monitors and at a fourth monitor for part of 1995 are used for model fitting. Because a single hierarchical model is used to model all three sites, the national distributions of the background ambient lead concentration and fugitive emission rates from sites of this type are implicitly assumed to be unchanged and stationary over this six-year period (1994-2000). The monitors sampled total suspended particulate matter (TSP), defined as particles with diameters of 30 microns and below. The amount of lead was estimated from the TSP samples using standard EPA analytical methods used for the national ambient air quality standards (NAAQS) (31, 32). All of the monitoring data used in this study were obtained from the EPA. The ambient concentration data were generally reported as daily averages every 6 days, yielding a total of 314 observations for Florida, 328 for Texas, and 490 for New York. At the Florida facility, a significant fraction of the ambient lead measurements were reported as being below the detection limit (BDL) of 0.01 µg/m3. These BDL observations were assigned a fixed value of
( )
0.007 µg/m3 )
DL (33, 34) x2
As shown in Appendix 5 in the Supporting Information, the value assigned to the BDL measurements had very little influence on the estimated aggregate distribution of the fugitive lead emission rates. Preliminary, very uncertain estimates of the total stack emission rates of lead are available at each facility for the corresponding monitoring periods: Florida, 6.9 × 10-3 g/sec; Texas, 3.3 × 10-2 g/sec; and New York, 1.1 × 10-2 g/sec. Fugitive emission rates are estimated on a per-unit-area basis, and the total fugitive emission rates are then estimated for
each site based on their areas: Florida, 20 800 m2; Texas, 227 500 m2; and New York, 41 300 m2. New York is a closed facility with a negative pressure building to enclose the lead processing operations, whereas the Texas and Florida facilities are unenclosed. Further details on the facilities and sites are provided in Appendix 1, included in the Supporting Information. Atmospheric Dispersion Model. The effects of stack and fugitive emissions from the facilities on ambient concentrations at the respective monitoring locations are modeled using the U.S EPA’s industrial source complex short term (ISCST3) atmospheric dispersion model (35). The ISCST3 model determines downwind concentrations based on a Gaussian plume calculation, using hourly meteorological records obtained from nearby weather stations. The model assumes a linear relationship between emission rates and ambient concentrations so that superposition can be employed for the effects from different sources. The model was implemented using the EPA default model option with wet and dry plume depletion selected. An urban land use classification was used for Florida, whereas rural was selected for New York and Texas. Terrain elevations and a gridded terrain file were also used for New York and Texas. Further details on the parametrization and implementation of the ISCST3 model runs for each site are provided in Appendix 2 of the Supporting Information. The ISCST3 model was implemented using unit emission rates, yielding unit source-receptor transfer coefficients from each source to each receptor monitor for each hour simulated by the model. The coefficients for each monitor are averaged for the days on which observed data are available and then averaged again for the selected averaging time for the model application. It is important to note that the regression model determines a single average emission rate for the period to which it is applied. Because the observed ambient concentration data are daily averages, a daily period is the minimum averaging time that can be used for the associated sourcereceptor transfer coefficients. However, regression estimates from the daily ambient concentration source-receptor coefficient pairs exhibited a high degree of variation. This occurs in part because the fugitive emission rates exhibit a high degree of variability from day to day. The variation in the estimate is reduced by averaging over a few days (as occurs when a monthly averaging period is used with data collected every 6 days). As such, monthly averages were used and constructed by matching the 24-hour results from the model with the 24-hour observed results during each month at each monitor. Only those days with a one-to-one correspondence were included in the averaging. The averages of the observed ambient lead concentrations are computed by assuming that individual daily measurements (typically n ≈ 4 per month) are log-normally distributed. The mean and variance of the ln concentrations are first computed as
ln mean ) ξ ) ln(C) )
ln variance ) φ2 )
1
1
n
∑ln(C ) i
n i)1
n
∑(ln(C ) - ξ) n-1 i
2
(1)
i)1
The mean value of the observed concentration for the month is then computed as (36)
1 Co ) C h ) exp ξ + φ 2 2
(
)
(2)
Statistical Models. For a given site and monitor, define Xs [M L-3 / M T-1]and Xf [M L-3 / M L-2 T-1] as the time-
averaged source-monitor transfer coefficients for the stack S [M T-1] and fugitive F [M L-2 T-1] emission rates, respectively. The standard units for the stack emission rate are g/sec, whereas the fugitive emission is characterized as an area source with units of g/m2/sec. The ambient concentration is reported in µg/m3. For known stack and fugitive emission rates, the estimated concentration at the monitor is given by
C ) b + FXf + SXs
(3)
where b is the background concentration at the monitor. When both F and S are unknown, eq 3 forms the basis of a multiple linear regression model, in which b, F, and S are estimated by regressing the observed C values versus the values of Xf and Xs averaged for the same period, from the atmospheric dispersion model. When S is assumed known, a simple linear regression model (with only one explanatory variable) is formulated by regressing the variable Diff ) C - SXs versus Xf. Because concentrations are constrained to be positive and the measurement and modeling errors in concentration are often best described by a log-normal distribution (36, 37), alternative log models are formulated by transforming eq 3 as follows:
ln(C) ) ln(b + FXf + SXs)
(4)
Although eq 4 maintains the linear relationship between C, F, and S (consistent with the underlying physical model and the ISCST calculation), the fitting procedure requires nonlinear regression. For the case where S is assumed to be known, a simple (one-variable) log regression model is formulated by fitting b and F for the equation, ln(Diff) ) ln(b + FXf). There are multiple stacks at each of the three sites, each with an emission rate that has been estimated prior to this study. The sum of the emissions from all of the stacks at a facility is used as the known value of S for the case with known stack emissions. A separate value of Xs is determined using the atmospheric dispersion model for each stack at a site. Then a single Xs value, representative for the entire site, is computed for each period of interest (i.e., for the days on which the ambient lead concentration is measured and used to calculate the monthly average), as a weighted average of the Xs values of individual stacks at the site, where the weighting factor is the emission rate from each stack. The use of a single aggregate stack emission for the site, estimated using the weighted average transfer coefficient for the multiple stacks that are present, is necessary because the ambient data do not have sufficient spatial coverage and detail to resolve the individual stack emission rates. Implicit in our formulation is the assumption that the sources of lead other than stack and fugitive emissions from the facility affect the observed concentrations at the monitors in a manner independent of that by the facility, that is, they vary independently of Xf and Xs. When this is not the case, for example, when a major highway is located between the plant and a monitor so that higher contributions from the highway occur on the same days as higher contributions from the plant, then this source will be confounded with S and F, leading to erroneously high estimates for either or both variables. Similarly, if contributions from a background source are negatively correlated with Xf and/or Xs (e.g., the source is located downwind of the monitors relative to the plant site), then this will cause a negative bias in the estimates of F and/or S. Our knowledge of other sources in the study areas is not sufficient to ascertain whether and how much these biases might be present. However, we note this possible VOL. 39, NO. 13, 2005 / ENVIRONMENTAL SCIENCE & TECHNOLOGY
9
4931
source of error as contributing to the overall uncertainty of the estimates. The models in eqs 3 and 4, and the corresponding singlevariable models (for the case where S is known), can be fit separately for each of the sites with data for each of the site monitors, in a nonhierarchical manner. Such models were implemented as described in Appendix 3 of the Supporting Information and are used as a benchmark for comparison of the results of the hierarchical model, now described. Bayesian Hierarchical Model. Hierarchical models are used to allow derivation of an aggregate distribution of site fugitive lead emission rates and background ambient concentrations and to allow information from one site to help inform estimates at others. The models estimate the background concentrations and fugitive emission rates at each site but assume that these rates are sampled from a “parent” aggregate (e.g., national) distribution. The methodology for the estimation of parameters of the local and aggregate distributions is illustrated by using data from the three sites. The estimated parameters of the aggregate distribution can subsequently serve as a prior model for new sites. Furthermore, new data from the three sites or other secondary lead smelting facilities can be used to derive updated estimates for both the local and the aggregate distributions. The logarithmic form of the model (eq 4) was found to provide superior prediction when used for nonhierarchical estimation, in part because it can be structured to ensure that predicted concentrations and estimated emission rates are nonnegative (see Appendix 3). As such, the logarithmic models were chosen for use in the hierarchical implementation. Separate models were fitted for the cases in which S is assumed known (simple log model) versus unknown (multiple log model). The equations for the hierarchical models for these cases are as follows
ln(Diffi,j,k) ) ln(bi,j + Fi Xfi,j,k) + i (simple log)
(5)
ln(Ci,j,k) ) ln(bi,j + Fi Xfi,j,k + Si Xsi,j,k) + i (multiple log) (6) where Ci,j,k is the average ambient lead concentration at site i, monitor j, and for time period k; bi,j is the average background ambient lead concentration at site i and monitor j (assumed to be time invariant); Si is the average stack lead emission rate at site i (assumed to be time invariant); Xsi,j,k is the average stack-monitor transfer coefficient computed by the atmospheric dispersion model for site i, monitor j, and time period k; Fi is the long-term average fugitive lead emission rate at site i (assumed to be time invariant); Xfi,j,k is the average fugitive emission-monitor transfer coefficient computed by the atmospheric dispersion model for site i, monitor j, and time period k; Diffi,j,k ) Ci,j,k - Si Xsi,j,k (used in the simple log model where the stack emissions, Si, are assumed to be known); and ei is the error term for the regression model. The above expressions can also be written as
Diffi,j,k ∼ log-normal[ln(bi,j + Fi Xfi,j,k),σi] (simple log) (7) Ci,j,k ∼ log-normal[ln(bi,j + Fi Xfi,j,k + Si Xsi,j,k),σi] (multiple log) (8) which denotes that Diffi,j,k and Ci,j,k are log-normally distributed with the indicated parameters, corresponding to the means and standard deviations of their respective ln values. As indicated in eqs 5-8, the background concentrations can vary from monitor to monitor at a given site and between sites, whereas stack and fugitive emission rates and error 4932
9
ENVIRONMENTAL SCIENCE & TECHNOLOGY / VOL. 39, NO. 13, 2005
variances differ only between sites. The full hierarchical model thus includes the following underlying distributions and hyperparameters: (i) A log-normal distribution for the background concentrations across the multiple monitors at a site
bi,j ∼ log-normal[ln(bi), ln(σbi)]
(9)
where ln(bi) and ln(σbi) are the two parameters of the lognormal distribution of the background concentration at site i, corresponding to the mean of the ln concentration and the standard deviation of the ln concentration across monitors. Note that bi is the geometric mean and σbi is the geometric standard deviation of the background concentration for the monitors at site i. As indicated in the Supporting Information, Appendix 4, ln(σbi) is specified by its related precision parameter, τbi ) 1/[ln(σbi)]2. (ii) A log-normal distribution for the ln mean background concentrations across the multiple sites studied
bi ∼ log-normal[ln(b), ln(σb)]
(10)
where ln(b) and ln(σb) are the mean and standard deviation, respectively, of the ln of site geometric mean background concentrations across the multiple sites. (iii) A log-normal distribution for the fugitive emission rates across the multiple sites in the study
Fi ∼ log-normal[ln(F), ln(σF)]
(11)
where ln(F) and ln(σF) are the two parameters of the aggregate log-normal distribution. This model lumps all of the facilities into a single aggregate distribution. An alternative model was also evaluated assuming different distributions for the enclosed (New York) and unenclosed (Texas and Florida) facilities. Further details of the implementation and results of this model are provided in Appendix 6 of the Supporting Information. (iv) Separate values of the total stack emission rate for each site, Si (either assigned in the simple log model or estimated from the data in the multiple log model). These values are not sampled from an aggregate distribution of stack emission rates. (v) Separate values of the model error variance for each site, σi2. Texas and Florida have three monitors each, whereas the New York facility has four monitors. The total number of parameters to be estimated for the model is thus given by the following: (a) (3 + 3 + 4) ) 10 parameter values for the background lead concentration at each monitor, bi,j (b) 6 hyperparameters for the background lead distribution at each site; that is, bi and σbi for each of the three sites; (c) 2 hyperparameters for the multisite background lead distribution, b and σb (d) 3 parameter values for the fugitive emission rates at each site, Fi (e) 2 hyperparameters for the aggregate distribution of fugitive emission rates, F and σF (f) 3 parameter values for the stack emission rates at each site, Si (multiple log model only) (g) 3 parameters values for the error distribution at each site, σi Therefore, 29 parameters are estimated for the multiple log model and 26 parameters for the simple log model. The distribution of F and the particular values of Fi are a key focus of this study. Note that even if the parameters of the aggregate distribution, ln(F) and ln(σF), were to be known with certainty, the value of Fi at a new, randomly selected site would be unknown, because of inherent site-
to-site variability. This variability is compounded by the uncertainty that is present for the values of ln(F) and ln(σF), which implies that the distribution from which each Fi value is sampled is itself uncertain. The overall distribution of Fi that combines the site-to-site variability, and the uncertainty in the underlying distribution that characterizes this variability, is referred to as the predictive distribution for Fi. This predictive distribution is derived and generated as the principal output of this study. Selection of Prior Distributions. Noninformative, highly diffuse prior distributions were first considered for all of the parameters (as implemented for the nonhierarchical models described in Appendix 3) in order to estimate the parameters of the hierarchical models. However, because the hierarchical structure adds a number of new parameters (those for the aggregate distributions of background concentration and fugitive emission rates, and for the site distributions of background concentration), stable results could not be obtained. There are simply too many parameters with too little data for the estimation procedure to succeed unless some of the parameters are constrained (even if only somewhat) with some prior information. This is possible given the available data on observed ambient lead concentrations in the United States (38), which are used to identify a prior distribution for background ambient lead concentrations, and known ranges for fugitive dust emission rates (39) and associated lead concentrations likely to occur for soil and dust particles emitted from secondary lead facilities, which are used to identify a prior distribution for fugitive lead emission rates. The rationale for these selections and the particular prior distributions chosen are summarized in Appendix 4 of the Supporting Information. Although the selected priors are based on some knowledge, they are intentionally chosen to be very diffuse, so as to minimize their affect on the posterior parameter estimates. Appendix 4 includes a sensitivity analysis that demonstrates that the key results of this study are only moderately sensitive to the prior assumptions; these results are discussed further after presentation of the main, base-case results. Parameter Estimation using Markov Chain Monte Carlo. Markov Chain Monte Carlo (MCMC) is a simulation procedure that provides a very flexible methodology for statistical modeling (25, 40-42). It involves Monte Carlo integration through the generation of a sequence of dependent (Markov chain) samples. The sampled points characterize the joint posterior distribution of the uncertain model parameters, and these points can be analyzed to obtain point estimates (e.g., the mean, median, or mode of the parameter distribution), to characterize the uncertainty in the estimated parameters (e.g., using the standard deviation of the distribution, or X% “credible intervals”), and to describe the correlation structure of the parameter uncertainty (using scatter plots of the simulated sample for multiple parameters or by computing their correlation coefficients). The MCMC estimation is implemented in this study using the BUGS (Bayesian inference using Gibbs sampling) software system (43), with the Metropolis-Hastings sampling method. The MCMC method requires initial values for the simulated sequence of parameter values. All of the logarithmic mean parameters of the fugitive emissions and stack emissions parameters were initialized at 1, whereas the background concentrations were initialized at 0.1. All of the logarithmic standard deviations were initialized as 10. The sequence of the sampled variables indicated 1000 to be a sufficient burn-in sample size. The autocorrelation value (up to a sample-number lag of 40) was less than 5% for all of the sampled variables for 10 000 samples, indicating that discarding the initial 1000 simulations was more than sufficient to ensure that the initial values selected for the parameter vectors would not affect the results.
Results The posterior median, 5th and 95th percentile values of the estimated parameters for the hierarchical models are presented in Table 1 and compared to the corresponding results obtained using the nonhierarchical model. Figure 1 provides a graphical comparison of the results. In these and all of the subsequent presentations, the aggregate distribution, from which each of the three individual site distributions is assumed to be sampled, is referred to as the “national” distribution. As indicated, the estimated background concentrations for TX and FL facilities are higher in the hierarchical cases, whereas those for NY are the same. The estimated distributions of background ambient lead concentrations are indicated to range from approximately 0.02 to 0.2 µg/m3, similar to values observed in the US for the period 1994-1999 (see Appendix 4 in the Supporting Information and ref 36). The estimated fugitive emissions are lower in the hierarchical cases as compared to the nonhierarchical estimates. The estimated stack emissions are similar for the TX and NY facilities, whereas the variance of stack emission estimates for TX and FL are much lower in the hierarchical cases. In addition, as shown in Table 1, the error term variance is in general lower for the hierarchical models Comparing the simple log (assumed S) and multiple log (estimated S) hierarchical models, the national fugitive emission estimates are higher for the simple log models than for the multiple log models. This is because the simple log model includes fixed stack emission rates that are are generally lower than those estimated by the multiple log model. In particular, the fugitive emission rates estimated for FL and TX using the multiple log hierarchical model are somewhat lower than those estimated using the simple log hierarchical model because the determination of higher stack emission rates with the former model is compensated for by correspondingly lower estimated fugitive emission rates. Another key output of the analysis is the geometric standard deviation of the site fugitive lead emission rates, σF. This value, shown in the bottom row of Table 1, determines the site-to-site variability in Fi. It is the joint distribution of F and σF (or, equivalently, ln(F) and ln(σF)) that determines the uncertainty in the national distribution of Fi. Figure 2 depicts the MCMC scatter plot of ln(F) and ln(σF) for the multiple log model results for the national distribution. As indicated, both parameters of the national distribution extend over a wide range. Furthermore, some degree of (positive) correlation is present in their joint distribution. In the Implications and Discussion section, values of Fi are sampled from log-normal distributions with each of these ln(F) and ln(σF) parameter pairs to simulate the national predictive distribution of Fi. The credible intervals of the estimated stack and fugitive emissions for all three sites using the hierarchical multiple log model are plotted and compared in Figure 3. It is clear that although significant uncertainty is present in many of the estimates, fugitive emissions are estimated to be comparable to or in some cases (especially TX and NY) likely larger than the stack emissions that remain following the implementation of MACT at these facilities. Sensitivity Analysis. The hierarchical multiple log model was used as the base case to explore the sensitivity of the results to the prior distributions selected for background ambient lead concentrations and fugitive and stack emission rates. The full results of the sensitivity analysis are presented in Appendix 4 in the Supporting Information. There it is noted that major changes in both the location and the spread of the prior national distribution for median background ambient lead concentrations had very little affect on the posterior estimates of the national and site background VOL. 39, NO. 13, 2005 / ENVIRONMENTAL SCIENCE & TECHNOLOGY
9
4933
TABLE 1. Comparison of Background Concentrations, Fugitive Emissions, and Stack Emissions Obtained from a Nonhierarchical Approach with Those Estimated from the Hierarchical Approach nonhierarchical
hierarchical
multiple log
simple log
5% Florida
Texas
New York
national
a
background concentration bi (µg/m3) fugitive emissions Fi (g/m2/s) stack emissions Si (g/s) error standard deviation σi (µg/m3) background concentration bi (µg/m3) fugitive emissions Fi (g/m2/s) stack emissions Si (g/s) error standard deviation σi (µg/m3) background concentration bi (µg/m3) fugitive emissions Fi (g/m2/s) stack emissions Si (g/s) error standard deviation σi (µg/m3) background concentration bi (µg/m3) fugitive emissions GM, F (g/m2/s) fugitive emissions GSD, σF (g/m2/s)
median
95%
5%
median
multiple log 95%
5%
median
95%
0.01
0.02
0.03
0.03
0.07
0.18
0.03
0.08
0.20
3.2E-05
4.4E-05
5.7E-05
9.4E-06
3.1E-05
5.1E-05
2.5E-07
1.0E-06
2.5E-05
5.6E-11
1.4E-06
3.6E-02
2.6E-02
1.4E-01
2.4E-01
0.77
0.90
1.05
0.75
0.87
1.02
0.73
0.85
1.01
5.3E-04
0.03
0.07
0.03
0.07
0.17
0.03
0.07
0.18
1.4E-06
2.0E-06
2.7E-06
9.2E-07
1.4E-06
2.0E-06
5.1E-07
9.4E-07
1.4E-06
1.0E-08
8.4E-02
1.7E-01
4.3E-02
8.6E-02
1.3E-01
0.60
0.69
0.81
0.56
0.65
0.76
0.48
0.55
0.64
0.04
0.05
0.06
0.03
0.05
0.08
0.03
0.05
0.09
2.0E-14
1.1E-06
3.1E-06
6.1E-08
8.8E-07
2.3E-06
1.8E-07
8.0E-07
1.9E-06
3.0E-03
1.0E-02
2.7E-02
2.5E-04
8.4E-03
3.0E-02
0.59
0.66
0.74
0.56
0.62
0.69
0.51
0.57
0.64
NA
NA
NA
0.02
0.06
0.15
0.02
0.06
0.16
NA
NA
NA
5.1E-07
3.3E-06
1.6E-05
3.3E-07
9.4E-07
3.1E-06
NA
NA
NA
3.8E-07
8.1E-07
9.6E-07
5.9E-10
8.4E-08
8.5E-07
(6.9E-3)a
(3.3E-2)a
(1.1E-2)a
Not estimated by the model. The assumed fixed value is shown in the parentheses.
concentrations, the estimated fugitive emission rates, or the estimated stack emission rates. Similarly, major changes in the location and spread of the prior distribution for the median of the national fugitive emission rate yielded only small changes in estimated background concentrations, fugitive emission rates, or stack emission rates (all changes less than or equal to 5%, as compared to the base case). However, major changes in the prior distribution for the unknown stack emission rates did cause moderate changes in the estimated background concentrations and fugitive emission rates (changes of up to 18%, as compared to the base case) and significant changes in the estimated stack emission rate for the NY facility, which is very difficult to specify with the current model. Given the order-of-magnitude spread in the estimated national fugitive lead emission rate distribution, a further uncertainty of 18% due to the use of alternative prior distributions for the unknown stack emission rates is deemed to be relatively small.
Implications and Discussion National Predictive Distribution of Fugitive Emissions. Now that MCMC estimates for the parameters of the Bayesian hierarchical models have been determined, and they have been shown to be reasonably robust to the selection of prior distributions (which are in any case based on a best 4934
9
ENVIRONMENTAL SCIENCE & TECHNOLOGY / VOL. 39, NO. 13, 2005
assessment of information currently available), the national predictive distribution of Fi can be simulated. The predictive distribution is generated in the following manner: (i) The MCMC generated joint distribution of ln(F) and ln(σF) is used to characterize the uncertainty in the national distribution (see Figure 2). (ii) A value of Fi is sampled from a log-normal distribution with each of these parameter values to reflect the site-to-site variability. This yields a predictive distribution for the next randomly sampled value Fi that considers both uncertainty in ln(F) and ln(σF) and variability in Fi. The results for the multiple log model are shown in Figure 4. As indicated, the median estimate is 9.2 × 10-7 g Pb/m2/sec, but there is an approximately two-order-of-magnitude uncertainty in the fugitive lead emission rate (with a 90% credible interval ranging from 2.1 × 10-7 to 5.3 × 10-6 g/m2/sec) estimated for a new (unsampled) site in the US. This high level of uncertainty is due to both the significant variation that is estimated between the sites as well as the small number of sites used to estimate the aggregate national distribution. The adoption of the hierarchical modeling approach is shown in this study to yield a number of advantages. Although the results from the hierarchical models are similar to those
FIGURE 1. Graphical comparison of parameters predicted by hierarchical and nonhierarchical models. The bars indicate the 90% credible intervals. VOL. 39, NO. 13, 2005 / ENVIRONMENTAL SCIENCE & TECHNOLOGY
9
4935
ambient lead at the TX site, the fugitive emission rate at the NY facility, and the stack emission rates at the FL and TX facilities. These inferences could not be made without the informed hierarchical structure. Finally, the hierarchical formulation provides a first estimate for the aggregate national distribution of fugitive emission rates from secondary lead facilities, highlighting the large uncertainty in the estimate and the need to incorporate models at additional sites to better resolve the possible range of current fugitive emissions and, subsequently, their associated residual risk.
Acknowledgments
FIGURE 2. Scatter plot of national fugitive lead emission distribution parameters (multiple log hierarchical model). Here F has units of 10-6 g/m2/s.
Support for this research was provided by the U.S. Environmental Protection Agency, under EPA contract 68-D60008, work assignment no. IV-09. This paper has not been subject to formal EPA peer review, and no official endorsement should be inferred. Additional support for Amit Goyal and Mitchell Small was providedby the Vira Heinz Endowment through the H. John Heinz III Professorship in Environmental Engineering at Carnegie Mellon University, and the National Science Foundation (NSF) program in Materials Use: Science, Engineering, and Society (MUSES), grant no. 0328870. The findings of this paper represent the views of the authors and not those of the NSF. The comments of an anonymous peer reviewer led to substantial improvements in the quality and clarity of the manuscript.
Supporting Information Available
FIGURE 3. Stack and fugitive secondary lead emissions estimated by the multiple log hierarchical model. Median estimate shown for each; the bars indicate 90% credible intervals.
Detailed description of the secondary lead smelting sites and an overview of the ISCST3 model implementation; a comparison of results obtained by classical and Bayesian nonhierarchical models; assumed prior distributions for model parameters in the base case, as well as the sensitivity of the final results of the hierarchical Bayesian model to changes in these assumed prior distributions; the sensitivity of results to a change in the assigned value of BDL observations; an analysis of another hierarchical model with separate national fugitive emissions distributions for “unenclosed” (Texas and Florida) and “enclosed” (New York) facilities, these results are compared to those obtained using a single national distribution, as in the main text. This material is available free of charge via the Internet at http://pubs.acs.org.
Literature Cited
FIGURE 4. CDF plot for the predictive distribution of national fugitive lead emissions (multiple log hierarchical model). of the nonhierarchical models for a number of model parameters, somewhat higher background concentrations and lower fugitive emission rates are obtained using the hierarchical models, along with somewhat higher stack emission rate estimates (using the multiple log model). In addition, the hierarchical modeling approach allows for reasonably constrained estimates for a number of the model parameters that cannot be obtained using the single-site models, in particular, for the background concentrations of 4936
9
ENVIRONMENTAL SCIENCE & TECHNOLOGY / VOL. 39, NO. 13, 2005
(1) U. S. EPA, Clean Air Act, Full Text of 1990 Amendments, http://www.epa.gov/oar/caa/caaa.txt, 2002 (November), (accessed July 17, 2003). (2) U. S. EPA, Residual Risk Report to Congress, EPA-453/R-99001, US EPA Office of Air Quality Planning and Standards, Research Triangle Park, NC, http://www.epa.gov/ttn/oarpg/ t3/reports/risk_rep.pdf, 1999 (March), (accessed July 11, 2003). (3) Royer, M.; Selvakumar, A.; Gaire, R. Control technologies for remediation of contaminated soil and waste deposits at Superfund lead battery recycling sites. J. Air Waste Manage. Assoc. 1992, 42, 970-980. (4) Small, M. J.; Nunn, A. B.; Forslund, B. L.; Daily, D. A. Source attribution of elevated residential soil lead near a battery recycling site. Environ. Sci. Technol. 1995, 24, 883-895. (5) U.S. EPA, Federal Register notices for hazardous air pollutants (Title I, Section 112) - Proposed and final preambles and rules, secondary lead smelting NESHAP - Final rule. Fed. Regist. 1995, 60. (6) U.S. EPA, Federal Register notices for hazardous air pollutants (Title I, Section 112) - Proposed and final preambles and rules, 112(c)(6) Federal Register Notice. Fed. Regist. 1997, 62. (7) U. S. EPA, Taking Toxics Out of the Air, EPA-452/K-00-002, US EPA Office of Air Quality Planning and Standards: Research Triangle Park, NC, http://www.epa.gov/oar/oaqps/takingtoxics/ airtox.pdf, 2000 (August), (accessed July 11, 2003). (8) Heinsohn, R. J.; Kabel, R. L. Sources and Control of Air Pollution. Prentice Hall: Upper Saddle River, NJ, 1999. (9) Frey, H. C.; Small, M. J. Integrated environmental assessment, Part I: Estimating emissions. J. Ind. Ecol. 2003, 7(1), 9-11.
(10) U. S. EPA, User’s Guide: Emission Control Technologies and Emission Factors for Unpaved Road Fugitive Emissions, EPA/625/5-87/022, U. S. EPA: Washington, DC, 1987. (11) Kim, D. Y.; Kim, J. W. Development of a speciated, hourly and gridded air pollutants emission modeling system - A case study on the precursors of photochemical smog in the Seoul metropolitan area, Korea. J. Air Waste Manage. Assoc. 2000, 50, 340347. (12) Lee, C.-H.; Tang, L.-W. Chang, C. T.; Modeling of fugitive dust emission for construction sand and gravel processing plant. Environ. Sci. Technol. 2001, 35, 2073-2077. (13) Massacci, G.; Ricchi, F.; Soddu, A. P.; Usala, S. Proceedings of the First International Conference on Environmental Research and Assessment. Bucharest, Romania, March 23-27, 2003, http://www.portiledefier.ro/icerA3-2003/039.pdf (accessed July 11, 2003). (14) Poissant, L.; Amyot, M.; Pilote, M.; Lean, D. Mercury water-air exchange over the upper St. Lawrence River and Lake Ontario. Environ. Sci. Technol. 2000, 34, 3069-3078. (15) Zhu, T.; Pattey, E.; Desjardins, R. L. Relaxed eddy-accumulation technique for measuring ammonia volatilization. Environ. Sci. Technol. 2000, 34, 199-203. (16) Tsai, P.; Yee, D.; Bamford, H. A.; Baker, J. E.; Hoenicke, R. Atmospheric concentrations and fluxes of organic compounds in the northern San Francisco Estuary. Environ. Sci. Technol. 2002, 36, 4741-4747. (17) Vette, A. F.; Landis, M. S.; Keeler, G. J. Deposition and emission of gaseous Mercury to and from Lake Michigan during the Lake Michigan Mass Balance Study (July, 1994-October, 1995). Environ. Sci. Technol. 2002, 36, 4525-4532. (18) O’Driscoll, N. J.; Beauchamp, S.; Siciliano, S. D.; Rencz, A. N.; Lean, D. R. S. Continuous analysis of dissolved gaseous mercury (DGM) and mercury flux in two freshwater lakes in Kejimkujik Park, Nova Scotia: Evaluating mercury flux models with quantitative data. Environ. Sci. Technol. 2003, 37, 2226-2235. (19) Hashmonay, R. A.; Yost, M. G. Localizing gaseous fugitive emission sources by combining real time optical remote sensing and wind data. J. Air Waste Manage. Assoc. 1999, 49, 174-185. (20) Kaharabata, S. K.; Schuepp, P. H.; Desjardins, R. L. Estimating methane emission from dairy cattle housed in a barn and feedlot using an atmospheric tracer. Environ. Sci. Technol. 2000, 34, 3296-3302. (21) Nemitz, E.; Hargreaves, K. J.; McDonald, A. G.; Dorsey, J. R.; Fowler, D. Micrometeorological measurements of the urban heat budget and CO2 emissions on a city scale. Environ. Sci. Technol. 2002, 36, 3139-3146. (22) Berliner, L. M. Hierarchical Bayesian modeling in the environmental sciences. J. Ger. Stat. Soc. 2000, 84, 141-153. (23) Carlin, B.; Louis, T. Bayes and Empirical Bayes Methods for Data Analysis; Chapman and Hall/CRC Press: Boca Raton, FL, 2000. (24) Ellison, A. M. An introduction to Bayesian inference for ecological research and environmental decision-making. Ecol. Appl. 1996, 6, 1036-1046. (25) Gelman, A.; Carlin, J. B.; Stern, H. S.; Rubin, D. B. Bayesian Data Analysis; Chapman & Hall: London, 1995. (26) Dominici, F.; Daniels, M.; Zeger, S. L.; Samet, J. M. National models for estimating the effect of particulate matter on mortality in U. S. cities. J. Am. Stat. Assoc. 2002, 97, 100-111.
(27) Dominici, F.; Zeger, S. L.; Samet, J. M. Combining evidence on air pollution and daily mortality from the largest 20 US cities; A hierarchical modeling strategy (with discussion). J. R. Stat. Soc., A 2000, 163, 263-302. (28) Post, E.; Hoaglin, D.; Deck, L.; Larntz, K. An empirical Bayes approach to estimating the relation of mortality to exposure to particulate matter. Risk Anal. 2001, 21, 837-837. (29) Borsuk, M. E.; Higdon, D.; Stow, C. A.; Reckhow, K. H. A Bayesian hierarchical model to predict benthic oxygen demand from organic matter loading in estuaries and coastal zones. Ecol. Modell. 2002, 143, 165-181. (30) Lockwood, J. R.; Schervish, M. J.; Gurian, P.; Small, M. J. Characterization of arsenic occurrence in U. S. drinking water treatment facility source waters. J. Am. Stat. Assoc. 2001, 96, 1184-1193. (31) U. S. EPA, Compendium of Methods for the Determination of Inorganic Compounds in Ambient Air, http://www.epa.gov/ ttn/amtic/inorg.html, 1999 (July), (accessed September 14, 2004). (32) U. S. EPA, Lead (Pb) National Ambient Air Quality Standards, http://www.epa.gov/ttn/naaqs/standards/pb/ s_pb_index.html, 1990, (accessed September 14, 2004). (33) Hornung, R. W.; Reed, L. D. Estimation of average concentration in the presence of nondetectable values. Appl. Occup. Environ. Hyg. 1990, 5, 46-51. (34) Galke, W.; Clark, S.; Wilson, J.; Jacobs, D.; Succop, P.; Dixon, S.; Bornschein, B.; McLaine, P.; Chen, M. Evaluation of the HUD lead hazard control grantee program: Early overall findings. Environ. Res. 2001, 86, 149-156. (35) U. S. EPA, Environmental Protection Agency - Dispersion Models, http://www.epa.gov/scram001/tt22.htm, 2003 (May). (36) Gilbert, R. O. Statistical Methods for Environmental Pollution Monitoring; Van Nostrand Reinhold: New York, 1987. (37) Ott, W. R. Environmental Statistics and Data Analysis; CRC Press Inc, Lewis Publishers: New York, 1995. (38) U. S. EPA, EPA 1999 National Air Quality Emissions Trend Report, http://www.epa.gov/air/aqtrnd99/PDF%20Files/tables/ a_1b.pdf, 1999, (accessed July 17, 2003). (39) U. S. ACE, Engineering and Design - Handbook for the Preparation of Storm Water Pollution Prevention Plans for Construction Activities, http://www.usace.army.mil/inet/usacedocs/eng-pamphlets/ep1110-1-16/bmp-39.pdf, 1997 (February), (accessed July 17, 2003). (40) Gilks, W. R.; Richardson, S.; Spiegelhalter, D. J. Markov Chain Monte Carlo in Practice; Chapman & Hall: London, 1996. (41) Roberts, C. P.; Casella, G. Monte Carlo Statistical Methods; Springer: New York, 1999. (42) Gammerman, D. Markov Chain Monte Carlo; Chapman & Hall: London, 1997. (43) Spiegelhalter, D. J.; Thomas, A.; Best, N. G.; Gilks, W. R. BUGS: Bayesian Inference Using Gibbs Sampling: Version 0.5; Technical report; Biostatistics Unit-MRC: Cambridge, UK, 1996.
Received for review December 28, 2003. Revised manuscript received April 1, 2005. Accepted April 19, 2005. ES035465E
VOL. 39, NO. 13, 2005 / ENVIRONMENTAL SCIENCE & TECHNOLOGY
9
4937