Article pubs.acs.org/est
Methane Concentrations in Water Wells Unrelated to Proximity to Existing Oil and Gas Wells in Northeastern Pennsylvania Donald I. Siegel,*,† Nicholas A. Azzolina,‡ Bert J. Smith,§ A. Elizabeth Perry,∥ and Rikka L. Bothun⊥ †
Department of Earth Sciences, Syracuse University, 204 and 314 Heroy Geology Lab, Syracuse, New York 13244, United States The CETER Group, Inc., 1027 Faversham Way, Green Bay, Wisconsin 54313, United States § Enviro Clean Products and Services, 11717 North Morgan Road, P.O. Box 721090, Yukon, Oklahoma 73172-1090, United States ∥ AECOM Technology Corporation, 250 Apollo Drive, Chelmsford, Massachusetts 01824, United States ⊥ AECOM Technology Corporation, 1601 Prospect Parkway, Fort Collins, Colorado 80525, United States ‡
S Supporting Information *
ABSTRACT: Recent studies in northeastern Pennsylvania report higher concentrations of dissolved methane in domestic water wells associated with proximity to nearby gas-producing wells [Osborn et al. Proc. Natl. Acad. Sci. U. S. A. 2011, 108, 8172] and [Jackson et al. Proc. Natl. Acad. Sci. U. S. A., 2013, 110, 11250]. We test this possible association by using Chesapeake Energy’s baseline data set of over 11,300 dissolved methane analyses from domestic water wells, densely arrayed in Bradford and nearby counties (Pennsylvania), and near 661 pre-existing oil and gas wells. The majority of these, 92%, were unconventional wells, drilled with horizontal legs and hydraulically fractured. Our data set is hundreds of times larger than data sets used in prior studies. In contrast to prior findings, we found no statistically significant relationship between dissolved methane concentrations in groundwater from domestic water wells and proximity to pre-existing oil or gas wells. Previous analyses used small sample sets compared to the population of domestic wells available, which may explain the difference in prior findings compared to ours.
■
■
INTRODUCTION
Site Location. Chesapeake’s baseline database for NE Pennsylvania includes predrill samples of domestic water collected from June 2009 through November 2011. The baseline samples were initially collected from active drinking water wells within a radius of 305 m (1,000 feet) of Chesapeake’s planned gas wells. This radius was subsequently expanded to 762 m (2,500 feet) and then to 1,219 m (4,000 feet). Most samples (89.5%) were collected using the 1,219 m (4,000 foot) radius. We accessed the Pennsylvania Department of Environmental Protection (PADEP) database of oil and gas wells to obtain the locations of the 661 oil and gas wells (Figure 1) drilled between 1929 and 2013 and near the domestic wells in our study area.3 Of these 661 wells, 92% were unconventional wells drilled with horizontal legs, and the remaining 8% conventional were older vertical wells. This data set was used to identify oil and gas wells that were present at the time each of the baseline samples were collected.
Previous investigators found an association between dissolved methane concentrations in drinking water from domestic wells and the proximity of gas wells in Northeastern (NE) Pennsylvania (United States).1,2 They report dissolved methane concentrations up to six times higher in drinking water within 1 km (km) of a gas well, compared to concentrations in groundwater farther than 1 km away. We report the results of an investigation to further test this possible association using a vastly larger data set. Our study area in NE Pennsylvania (Figure 1) is located where Chesapeake Energy Corporation (Chesapeake) and other operators have been active in natural gas development of the Marcellus Shale. As part of its operating procedure, Chesapeake routinely collects samples from domestic and stock water wells near its planned drilling operations and has them analyzed for dissolved methane and multiple other constituents prior to drilling. The data set we used for this study consists of 11,309 individual water well samples analyzed for dissolved methane, a data set two orders-of-magnitude larger than previous studies examining the relationship between dissolved methane and the proximity of oil and gas wells. © 2015 American Chemical Society
METHODS
Received: Revised: Accepted: Published: 4106
November 26, 2014 February 24, 2015 March 12, 2015 March 12, 2015 DOI: 10.1021/es505775c Environ. Sci. Technol. 2015, 49, 4106−4112
Article
Environmental Science & Technology
Figure 1. Oil and gas wells used in proximity analysis. Each baseline sample was paired with the nearest oil and gas well that existed when the sample was collected. Because of the data density, many samples were sometimes paired with the same oil and gas well. Oil and gas wells that were not designated the most proximate to any of the water samples are also shown for reference (“unpaired” oil and gas wells). Both conventional and unconventional oil and gas wells were included in our analysis.
Table 1. Four Different but Complementary Statistical Tests Used To Critically Assess Whether There Was or Was Not a Statistically Significant Increase in Dissolved Methane Concentrations in Domestic Water Wells at Closer Distances to an Oil or Gas Well data analysis method test of proportions logistic regression survival analysis correlation analysis
explanation
inputs: dissolved methane concentration
inputs: distance
tests whether the proportion of samples above a threshold concentration are significantly different between two groups of samples collected within and beyond a specified distance tests whether the proportions of samples above a threshold concentration show a trend with distance from an oil or gas well
four discrete dissolved methane concentration thresholds: detected >MRL or detected ≥1, ≥5, or ≥10 mg/L four discrete dissolved methane concentration thresholds: detected >MRL or detected ≥1, ≥5, or ≥10 mg/L continuous dissolved methane concentration
three discrete distances from an oil or gas well: within or beyond 500, 1000, or 1500 m continuous distance to an oil or gas well
tests whether the statistical distributions are significantly different between two groups of samples collected within and beyond a specified distance tests whether there is a statistically significant correlation between dissolved methane concentrations and distance to an oil or gas well
Dissolved Gas Sampling and Analytical Protocols. National environmental consulting firms collected groundwater samples from water taps using the inverted bottle method4 (before treatment systems if possible) three to six months before drilling of the nearby gas well and meeting Pennsylvania regulatory protocols. The dissolved gas samples were analyzed by a single state-accredited analytical laboratory by EPA test method RSK-1754 using a gas chromatographic (GC) headspace equilibration technique. Our sampling approaches have been the industry norm. Recently, some have suggested, from preliminary experiments, that different sampling methods for dissolved methane may lead to different results.5 While this may be true that the absolute concentrations may have some systemic inaccuracy, our using the common inverted bottle method will still produce consistent results with respect to whether concentrations are relatively higher or lower as a function of distance to oil and gas wells.
continuous dissolved methane concentration
three discrete distances from an oil or gas well: within or beyond 500, 1000, or 1500 m continuous distance to an oil or gas well
Over the course of the sampling program, the analytical test method reporting limit (MRL) for dissolved methane improved from 0.026 mg per liter (mg/L) to 0.005 mg/L. Approximately 90% of our samples were measured using the 0.026 mg/L MRL. Our statistical data analysis accommodates the different MRLs, as described in the Statistical Data Analysis section. Well Pairing Process. We paired each water sample with the closest oil and gas well (from the PADEP database) existing prior to sample collection. Although each water sample by definition was associated with the closest geographic preexisting oil/gas well, in some cases multiple water samples sometimes paired with the same oil or gas well. To do the pairing, we used geographic information system technology6 and a program written with Visual Basic for Applications (VBA) in Microsoft Excel. The coordinates of the pre-existing oil and gas wells and Chesapeake baseline groundwater samples were projected from geographic coordinates (latitude/longitude) to a planar projection system 4107
DOI: 10.1021/es505775c Environ. Sci. Technol. 2015, 49, 4106−4112
Article
Environmental Science & Technology
cutoffs evaluate distances within and beyond the 1000-m cutoff, respectively. Sixty-seven percent (7,608 samples); 77% (8,691 samples); and 85% (9,625 samples) of the groundwater samples were collected less than 500, 1000, and 1500 m from a pre-existing oil or gas well, respectively. We selected the upper value of 10 mg/L since it was the lower bound used by Jackson et al.2 to show the “action level for hazard mitigation”. We used additional threshold values of 1 and 5 mg/L dissolved methane to assess threshold levels between the MRL and 10 mg/L. Logistic Regression (Discrete y/Continuous x). Jackson et al.2 used linear regression to evaluate whether distance to gas wells related to dissolved methane concentrations. Given the large number of nondetects in our data set, a linear regression model cannot be used because substitution of nondetect measurements with zero, 0.5MRL or the MRL would yield invalid estimated regression coefficients and standard errors. Omitting nondetect measurements would also be inadequate, since nearly 75% of the samples would be discarded, ignoring the majority of information in the data set.8 A logistic regression model can be used instead. Similar to the test of proportions, logistic regression uses a proportion for the response variable, which removes the effect of nondetect measurements. However, unlike the test of proportions, which uses a discrete distance cutoff that might be interpreted as “arbitrary” (i.e., 500, 1000, or 1500 m), logistic regression allows evaluation of proportions along a continuous distance scale and whether trends in concentrations above each threshold increase closer to an oil or gas well. Logistic regression is widely used in many disciplines to evaluate these types of discrete binary responses.8,9 Survival Analysis (Continuous y/Discrete x). Survival analysis methods were originally used in epidemiological studies, such as survival time of patients receiving different treatments in a study, where outcomes are only partially known; for example, the ultimate survival of patients receiving different treatments may only be partially known for patients who leave a study or if a study is cut short. Survival analysis can also be used with environmental data where the time to events are measured concentration and the partially known outcomes are nondetect concentrations. As before, we divided the groundwater samples into two groups using the same distance cutoff values as the test of proportions (i.e., 500, 1000, or 1500 m). The null hypothesis for comparing the two groups of samples in this case was that the statistical distribution functions were equal between the two groups. Thus, a p-value greater than 0.05 means that you cannot reject the null hypothesis; there is no difference in dissolved methane concentrations between the two groups. Correlation (Continuous y/Continuous x). Jackson et al.2 used Pearson’s r and Spearman’s rho in their analysis to assess the correlation between dissolved methane concentrations and distance from gas wells. Correlation measures the strength of association between two variables: in this case continuous dissolved methane concentration and continuous distance. Correlation coefficients range from −1 (perfect negative correlation) to +1 (perfect positive correlation), with a correlation coefficient of 0 meaning there is no correlation between the variables. Pearson’s r is widely used to demonstrate correlation between two variables; however, Pearson’s r cannot be adapted to handle nondetect measurements and is therefore inappropriate for data sets with nondetects. Spearman’s rho may be
(Lambert Conformal Conic for the Contiguous United States) to allow for a calculation of distances on a planar system. We identified 639 pre-existing gas wells and 22 pre-existing oil wells that were closest to the domestic groundwater wells at the time they were sampled. Of these 661 pre-existing oil and gas wells, 56 wells were conventional wells (8%) and 605 were unconventional (92%). In contrast to previous studies we included all commercial hydrocarbon wells, not just unconventional natural gas wells.1,2 Statistical Data Analysis. The dissolved methane concentration data set contained many measurements less than the two different MRLs, a characteristic common in studies of water resources. We took care to use appropriate statistical tests to ensure that underlying test assumptions were not violated leading to false inferences.7,8 For example, it is not possible to assess the median in our data set, since nondetects occurred nearly 75% of the time. Therefore, tests of central tendency using the median (or mean) prove meaningless, and more robust statistical methods need to be used. Our analysis involved increasingly more sophisticated techniques to assess whether measured dissolved methane relates to distance to oil or gas wells. We started by plotting dissolved methane concentrations versus distance and assessing different concentration ranges using boxplots.7 We then applied four different statistical testing combinations: (1) discrete y/ discrete x; (2) discrete y/continuous x; (3) continuous y/ discrete x; and (4) continuous y/continuous x (Table 1). Detailed descriptions of the statistical methods used in the analysis are provided in the Supporting Information (SI). Statistical Testing. We used four independent statistical tests (Table 1). Test of Proportions (Discrete y/Discrete x). Jackson et al.2 used the Kruskal−Wallis test (more appropriately called the Wilcoxon Rank Sum [WRS] or Mann−Whitney test when comparing two groups) to compare dissolved methane concentrations between groups of samples within and beyond 1 km. The WRS test is used to test for equality of medians for two or more groups.8 However, since nearly 75% of our measurements were below the MRL in any group, tests comparing medians are not appropriate. Moreover, the WRS test cannot accommodate two MRLs. Instead of this approach, we used the test of proportions. We calculated the proportion (p) of samples above each threshold dissolved methane concentration (MRL, 1, 5, or 10 mg/L) at the three distance cutoff values (500, 1000, or 1500 m) using eq 1 proportion above threshold ⎛ no. of measurements > MRL or ≥ 1, ≥5, or ≥ 10 mg/L ⎞ =⎜ ⎟ no. of samples ⎝ ⎠ × 100
(1)
and then tested whether the proportions were significantly different between two groups of samples collected within and beyond that distance (e.g., Group 1 ≤ 1000 m and Group 2 > 1000 m). Calculating a proportion removes the effect of nondetect measurements; the test only assesses the proportion above or below threshold concentration values. The null hypotheses comparing two groups would be that proportions were equal. A p-value greater than 0.05 means there is no difference in dissolved methane concentrations between the two groups. The 1000-m cutoff directly compares to the groupings used by Jackson et al. (2013); the 500- and 1500-m 4108
DOI: 10.1021/es505775c Environ. Sci. Technol. 2015, 49, 4106−4112
Article
Environmental Science & Technology
Figure 2. Scatterplots of distance of water well sample to nearest existing oil/gas well versus dissolved methane concentration. Both graphs show the same data, but part a presents dissolved methane concentrations in original units, while part b displays dissolved methane concentrations using a more appropriate log scale. Samples that plot in a straight line at 0.005 and 0.026 mg/L are samples where dissolved methane was not detected above the MRL (i.e., samples are plotted at their respective MRLs).
used in lieu of Pearson’s r when there is only one MRL, because concentrations are converted to ranks prior to computing Pearson’s r on the ranked data. However, Spearman’s rho cannot handle multiple MRLs. Therefore, the two correlation methods used by Jackson et al.2 would be inappropriate for our data set. However, for comparison purposes with their paper, we calculated the correlation coefficients anyway using Pearson’s r and Spearman’s rho. Alternatively, Kendall’s tau correlation coefficient can be used for data with multiple MRLs.8 Thus, Kendall’s tau is the appropriate statistical measure of correlation for this data set and is preferred over the other correlation coefficients. We therefore present results for all three correlation coefficients − Pearson’s r, Spearman’s rho, and Kendall’s tau − recognizing that the latter is the most valid statistic from which we may draw inferences about the correlation between dissolved methane concentrations and distance to an oil or gas well.
Figure 3. Five different boxplots of dissolved methane concentrations ranges versus distance to nearest oil and gas well. Median values (p50) shown for reference. The number of samples in each boxplot (n) represents the number of samples out of 11,309 that were either below or above the specified threshold concentration value. For example, 8,569 measurements were less than or equal to the MRL, but only 266 measurements in the data set had reported concentrations greater than or equal to 10 mg/L dissolved methane.
■
RESULTS AND DISCUSSION Concentration versus Distance. Figure 2 shows scatter plots of dissolved methane concentrations versus distance to nearest oil/gas well in original concentration units (Figure 2a) and log-transformed units (Figure 2b). Samples that plot in a straight line at 0.005 and 0.026 mg/L are samples where dissolved methane was not detected above the MRL (i.e., samples are plotted at the MRL). These two MRLs are more clearly visible on the log-transformed scale (Figure 2b). Figure 2a is similar to Figure 1 in Jackson et al. (2013) with dissolved methane plotted in original units. Both graphs (Jackson et al.’s2 and our Figure 2a) visually mislead because a high percentage of samples with low dissolved methane concentrations cluster near the x-axis, essentially cannot be seen, and visually suggest an increase in dissolved methane concentration closer to oil/gas wells. Using a log-transformed scale, concentration ranges do not visually appear significantly different for samples located at closer distances to oil/gas wells. Figure 2 also shows one of the key challenges in analyzing our particular data set−many measurements were less than the MRL and two different MRLs were used during the study period. Side-by-side boxplots of dissolved methane values below and above specified concentration thresholds as a function of distance to the nearest oil or gas well (Figure 3) show no visual relationship between dissolved methane concentration and proximity to an oil/gas
wells either. Regardless of the threshold concentration (i.e., MRL, 1, 5, or 10 mg/L dissolved methane), the interquartile ranges (IQRs) overlap with distance and suggest that there are comparable percentages of samples in each group regardless of distance from an oil/gas well. If there was a cause-and-effect relationship between distance to an oil/gas well and dissolved methane concentrations in groundwater samples, then the boxplots for the higher threshold concentrations would reflect that impact by plotting lower on the y-axis (i.e., closer distances to oil/gas wells). They do not. Test of Proportions. Table 2 shows the individual proportions and the tests of proportions results. Group 1 samples consist of groundwater samples collected within 500, 1000, or 1500 m of a pre-existing oil and gas well, whereas Group 2 samples were those groundwater samples that were collected beyond these distance cutoffs. The top row shows the proportion results for Group 1 at the three different distance cutoffs, and the second row shows the proportion results for Group 2. The third row shows the results from the tests of proportions and the p-values. The 1000-m cutoff is directly 4109
DOI: 10.1021/es505775c Environ. Sci. Technol. 2015, 49, 4106−4112
Article
Environmental Science & Technology
regression models. The approximate 95% confidence interval for the slope is equal to the coefficient ± (2 × standard error). If the 95% confidence interval contains zero, then the slope is not significantly different from zero.9 Neither the percent detections above the MRL nor the percent of samples ≥1 mg/L dissolved methane have slope coefficients significantly different from zero. However, the percentage of samples ≥5 or ≥10 mg/ L shows a slight increase at farther distances from the nearest oil/gas well (i.e., positive slope) that is small but statistically significant (Table 3). There are likely several factors which could be responsible for this result (e.g., seasonal effects, topography, lithology, well depth, etc.). Definitively confirming which factors(s) might be responsible for this result is beyond the scope of this study. However, the logistic regression results are clear−there is no statistically significant increase in the probability of detection above the MRL or the probability of being above 1, 5, or 10 mg/L dissolved methane as you get closer to an oil/gas well. Additional details of the logistic regression results are provided in the SI. Survival Analysis. The survival analysis also shows that the distributions of dissolved methane concentrations do not differ as a function of distance to the nearest oil/gas well. The survival function curves for the two groups (within and beyond a distance cutoff) are virtually indistinguishable and plot on top of one another (Figure 4). In addition to the graphical assessment shown in Figure 4, the log-rank and Wilcoxon p-values are significantly greater than 0.05; data are consistent with the null hypothesis that the population distributions are equal (Table 4). The log-rank test is generally the most appropriate method for comparing two survival curves where some of the measurements have been censored (i.e., nondetects). However, the Wilcoxon test can be more sensitive when the ratios between curves is higher at early survival times (lower percentiles) than at late ones.10,11 Both tests yield the same conclusion, validating no significant difference between the two groups. Correlation Analysis. The results for Pearson’s r show no significant correlation whether original or log-transformed units are used in the analysis (p-values of 0.326 and 0.396, respectively); no relationship exists between dissolved methane concentrations and distance from an oil or gas well (Table 5). In contrast, the Spearman’s rho results are conflicting. For example, one of the results for Spearman’s rho shows a significant negative correlation, suggesting that dissolved methane concentrations increase as you get closer to an oil or gas well (NDs = MRL, rho = −0.025, and p-value = 0.008); in contrast one of the results shows a significant positive correlation, suggesting that dissolved methane concentrations decrease as you get closer to an oil or gas well (detected measurements only, rho = 0.043, and p-value = 0.024), and one of the results shows no significant correlation (censored at 0.026 mg/L, rho = −0.004, p-value = 0.676) (Table 5). These mixed results show the impact of nondetect measurements on correlation results and why alternative methods are necessary. Among the permutations using Pearson’s r and Spearman’s rho, the least biased result would be the scenario where dissolved methane concentrations ≤0.026 mg/L were censored at 0.026 mg/L. This test produces a correlation coefficient of −0.004 that is not statistically significant (p = 0.676). However, a better alternative is to use the correct correlation statistic − Kendall’s tau. The Kendall’s tau result provides a correlation coefficient of −0.002 and a p-value of 0.700 (Table 5). This result would be
Table 2. Results of the Tests of Proportions and the Calculated p-Values Using Fisher’s Exact Testa Group 1 Proportions (Samples Collected Less than or Equal to the Distance Cutoff) distance (m)
n
% Det >MRL
% ≥1 mg/L
% ≥5 mg/L
% ≥10 mg/L
≤500 7608 24.0 8.8 4.0 2.0 ≤1000 8691 24.1 8.8 4.1 2.1 ≤1500 9625 24.4 9.0 4.2 2.2 Group 2 Proportions (Samples Collected beyond the Distance Cutoff) distance (m)
n
>500 >1000 >1500
3701 2618 1684
% Det >MRL
% ≥1 mg/L
% ≥5 mg/L
% ≥10 mg/L
5.2 5.4 5.2
3.0 3.1 3.3
24.7 9.5 24.7 9.8 23.2 9.0 Test of Proportions Results p-value
distance threshold (m)
% Det >MRL
% ≥1 mg/L
% ≥5 mg/L
% ≥10 mg/L
500 1000 1500
0.798 0.738 0.140
0.878 0.951 0.526
0.999 0.998 0.968
0.999 0.998 0.996
a Groups 1 and 2 were defined using three different distance cutoffs: 500 m (0.5 km); 1000 m (1 km); and 1500 m (1.5 km); and four different concentration thresholds: % detected above the MRL; % ≥1 mg/L, % ≥5 mg/L; and % ≥10 mg/L dissolved methane.
comparable to the two groupings that were used in Jackson et al.;2 the 500- and 1500-m cutoffs were used to evaluate distances within and beyond the 1000-m cutoff, respectively. The p-values for each of the 12 different test permutations are greater than 0.05; therefore, the population proportions are equal, and dissolved methane does not dif fer between samples collected in close proximity to oil/gas wells and those collected farther f rom the oil/gas wells. This is true regardless of the discrete distance cutoffs. The large p-values (p = 0.738 to 0.998) for the sample groups within 1000 m of an oil/gas well strongly support this conclusion. Logistic Regression. The logistic regression results produce the same conclusions as the test of proportions; there is no increase in either the percent detections above the MRL or percent of samples ≥1, ≥5, or ≥10 mg/L dissolved methane as a function of proximity to the nearest oil/gas well. Table 3 provides the fitted coefficients, standard errors, and approximate 95% confidence intervals of the four logistic Table 3. Fitted Parameters of a Logistic Regression for Each of the Four Models As a Function of ln(Distance) from the Nearest Oil/Gas Well in Meters approximate 95% confidence interval model % Detects % ≥1 mg/L % ≥5 mg/L % ≥10 mg/L
term β0 β1 β0 β1 β0 β1 β0 β1
(constant) (ln[Distance]) (constant) (ln[Distance]) (constant) (ln[Distance]) (constant) (ln[Distance])
coefficient
SE
lower
upper
−1.10 −0.01 −2.44 0.02 −3.74 0.11 −4.68 0.16
0.13 0.02 0.19 0.03 0.26 0.04 0.35 0.06
−1.35 −0.05 −2.81 −0.04 −4.26 0.03 −5.37 0.05
−0.85 0.03 −2.07 0.08 −3.22 0.19 −3.98 0.27 4110
DOI: 10.1021/es505775c Environ. Sci. Technol. 2015, 49, 4106−4112
Article
Environmental Science & Technology
Table 4. Results of the Survival Analysis and the Calculated p-Values for the Log-Rank and Wilcoxon Testsa Group 1
p-value
Group 2
distance (m)
n
distance (m)
n
log-rank
Wilcoxon