Modeling Nitrate at Domestic and Public-Supply ... - ACS Publications

Apr 29, 2014 - ... Survey, 345 Middlefield Road, Menlo Park, California 94025, United States. § ... California's Central Valley is a highly productiv...
0 downloads 0 Views 4MB Size
Article pubs.acs.org/est

Modeling Nitrate at Domestic and Public-Supply Well Depths in the Central Valley, California Bernard T. Nolan,*,† JoAnn M. Gronberg,‡ Claudia C. Faunt,§ Sandra M. Eberts,∥ and Ken Belitz⊥ †

U.S. Geological Survey, MS 413 National Center, Reston, Virginia 20192, United States U.S. Geological Survey, 345 Middlefield Road, Menlo Park, California 94025, United States § U.S. Geological Survey, 4165 Spruance Road, San Diego, California 92101, United States ∥ U.S. Geological Survey, 6480 Doubletree Avenue, Columbus, Ohio 43229, United States ⊥ U.S. Geological Survey, 10 Bearfoot Road, Northborough, Massachusetts 01532, United States ‡

S Supporting Information *

ABSTRACT: Aquifer vulnerability models were developed to map groundwater nitrate concentration at domestic and public-supply well depths in the Central Valley, California. We compared three modeling methods for ability to predict nitrate concentration >4 mg/L: logistic regression (LR), random forest classification (RFC), and random forest regression (RFR). All three models indicated processes of nitrogen fertilizer input at the land surface, transmission through coarse-textured, well-drained soils, and transport in the aquifer to the well screen. The total percent correct predictions were similar among the three models (69−82%), but RFR had greater sensitivity (84% for shallow wells and 51% for deep wells). The results suggest that RFR can better identify areas with high nitrate concentration but that LR and RFC may better describe bulk conditions in the aquifer. A unique aspect of the modeling approach was inclusion of outputs from previous, physically based hydrologic and textural models as predictor variables, which were important to the models. Vertical water fluxes in the aquifer and percent coarse material above the well screen were ranked moderately high-to-high in the RFR models, and the average vertical water flux during the irrigation season was highly significant (p < 0.0001) in logistic regression.



in the Central Valley.2 Burow et al.2 subdivided the Central Valley into eastern and western alluvial fans and the basin subregion to interpret groundwater nitrate trends. They noted increases in groundwater nitrate concentration of 0.2−0.6 mg/ L per decade in the eastern alluvial fans and 0.05 mg/L per decade in the shallow part of the basin subregion. Nitrate concentration was elevated in the western fans but was highly variable and showed no distinct trend. The higher rate of nitrate increase in the eastern fans occurred in the shallow depth zone, which corresponds to the depth of domestic wells. Both the eastern and western fans typify more oxic conditions and younger groundwater age compared with the basin subregion. Population increases, a concomitant increase in groundwater nitrate, and intense competition for groundwater resources in the Central Valley call into question whether groundwater is a sustainable source of drinking water for domestic and publicsupply wells.4 From a water-quality perspective, knowing the risk of contamination of such wells will benefit resource

INTRODUCTION California’s Central Valley is a highly productive agricultural region that supports over 200 different crops within an area of 51800 km2.1 Although the Central Valley occupies less than 1% of the total farmland in the U.S., it produces 8% of the U.S. agricultural output in terms of cash value. The high productivity is made possible by irrigation water, which is supplied by surface water diversion and groundwater pumping. The region accounts for about one-sixth of the irrigated land and about one-eighth of the groundwater pumpage in the U.S.1 The Central Valley aquifer system comprises unconfined, semiconfined, and confined aquifers largely within the upper 300 m of alluvial sediments.2 The Sacramento Valley occupies the northern third of the Central Valley, whereas the San Joaquin Valley encompasses the southern two-thirds (Figure 1). In the San Joaquin Valley, unconfined aquifers exist in unconsolidated deposits above the Corcoran Clay, which is a regional confining unit. Aquifer sediments vary in texture depending on location and generally are more fine grained in the Sacramento Valley than in the San Joaquin Valley, not including the Corcoran Clay.2 Nitrate is one of the most common anthropogenic contaminants in domestic well water3 and exceeds the maximum contaminant level of 10 mg/L as N in many wells © 2014 American Chemical Society

Received: Revised: Accepted: Published: 5643

December 18, 2013 March 31, 2014 April 14, 2014 April 29, 2014 dx.doi.org/10.1021/es405452q | Environ. Sci. Technol. 2014, 48, 5643−5651

Environmental Science & Technology

Article

Figure 1. Locations of (A) shallow (domestic) and (B) deep (public-supply) wells used to develop the logistic regression and random forest models.

straightforward to apply, works well with censored data, and often yields a parsimonious model.9−18 A limitation is that LR predicts probabilities in lieu of concentrations. Often, resource managers are interested in contaminant concentrations because these can be compared directly with water-quality standards. Also, the logit is assumed to be a linear function of the explanatory variables, when in fact groundwater nitrate response to any given factor may be highly nonlinear. Classification trees are a promising alternative that dispenses with assumptions related to statistical hypothesis testing. For example, relations need not be linear and data need not follow a particular distribution or link function. Classification trees can screen large numbers of predictor variables, automatically incorporate interactions among predictor variables, are robust when predictor variables are collinear, and can impute missing data.19 Classification trees also accommodate either categorical or continuous response and predictor variables. Here we use random forest, which is an ensemble of individual classification trees. Random forest has the highest prediction accuracy among ensemble tree methods.19 Following prior researchers,20 we refer to random forest with a categorical response variable as random forest classification (RFC) and, alternatively, as random forest regression (RFR) in the case of a continuous response variable. The goal of the study was to develop aquifer vulnerability models for the Central Valley at both domestic and publicsupply well depths (referred to here as “shallow” and “deep,” respectively). Specific objectives were (1) to compare LR, RFC, and RFR for both shallow and deep wells in the Central Valley, (2) to see if CVHM and CVTM outputs were important

managers tasked with identifying problem areas. Additionally, better understanding of the effects of land use and other factors on groundwater nitrate will inform practices aimed at protecting groundwater resources. To address these concerns, we developed models to show the risk of nitrate contamination of groundwater at depths of domestic and public supply. As part of a hybrid multimodeling approach, we used textural data and simulated vertical water fluxes from previous physically based models as predictor variables in the statistical models. Surface-water delivery and groundwater pumpage have the potential to affect vertical water fluxes in the Central Valley aquifer, which in turn can influence groundwater nitrate concentration. Water fluxes depend in part on aquifer sediment texture. Prior researchers developed three-dimensional textural and numerical flow models of the Central Valley hydrologic system to better understand the structure of water-bearing deposits and to quantify aspects of the groundwater flow system, including irrigated agriculture.4 Here, we designate the Central Valley textural model as CVTM, and the numerical flow model is referred to as the Central Valley hydrologic model (CVHM). One of CVHM’s outputs is vertical water flux, which is available for each of the 2.6 km2 grid cells forming the upper active model layer (median thickness = 30 m). Therefore, a unique aspect of the study was to see if the outputs of the physically based models significantly improved the statistical models. Another aspect of the study was evaluation of recent statistical methods such as random forest, which has been used primarily in ecology.5−8 Logistic regression (LR) has been used in previous aquifer vulnerability studies because it is 5644

dx.doi.org/10.1021/es405452q | Environ. Sci. Technol. 2014, 48, 5643−5651

Environmental Science & Technology

Article

predictors in the statistical models, and (3) to construct aquifer vulnerability maps at shallow and deep well depths.

1−



(2)

where = the variance of observed groundwater nitrate concentration computed with n in the denominator, rather than n − 1. Additionally, we report an “estimation R2” that is based on the MSE of all observations, including those used to build the models. We used R’s randomForest package for both RFC and RFR.26 As mentioned above, random forest performs classification if the response variable is categorical (≤4 mg/L nitrate is NON; > 4 mg/L is EXC); otherwise, it performs “regression,” but not in the literal sense because there is no hypothesis testing and decision rules replace model parameters. Predictions on new data are obtained by aggregating the predictions from all of the trees in the ensemble. In RFC, the prediction is that which occurs most frequently (i.e., by majority vote), and in RFR it is the average of all of the trees.20 We used the percent increase in MSE resulting from permutation of predictor variables to determine their importance in the final RFR models. We evaluated all models using classification table statistics comprising the total percent correct (the rate at which observed EXC and NON cases are correctly predicted), sensitivity (the rate of observed EXC cases correctly predicted as EXC), and specificity (the rate of observed NON cases correctly predicted as NON). RFR predictions of continuous nitrate response were categorized as NON or EXC according to the 4 mg/L threshold to facilitate comparison with LR and RFC. For RFR we report classification statistics corresponding both to OOB observations (“prediction”) and to all observations, including those used in model building (“estimation”). Additionally, for all three modeling methods we report the classification statistics obtained using an evaluation data set (see below). Additional modeling steps and fit criteria are described in the Supporting Information. We screened all variables in Table S1 (Supporting Information) using exploratory LR, RFC, and RFR models. We then excluded variables calculated directly from other variables, such as vel_avg92, which is the average of the monthly water fluxes from the CVHM for the 1992 water year (October 1991−September 1992). For RFR we excluded variables having 0.7 (Table S3, Supporting Information), which is acceptable. Tolerance values 10 mg/L prediction class. The total numbers of shallow (314) and deep (928) wells in the figure are somewhat less than used in RFR model development because of missing data in the GIS. As the nitrate-prediction class increased, the mean nitrate concentration of corresponding shallow and deep sampled wells increased. In most cases the mean observed nitrate concentration fell within the prediction class, indicating reasonable agreement. Observed mean nitrate concentration was outside of the prediction class in two instances, indicating prediction bias. The shallow model overpredicted nitrate concentration for the 6−8 mg/L prediction class (the observed mean is lower than the range of mapped predictions), but the upper 95% confidence limit approached 6 mg/L. The deep model underpredicted nitrate concentration for the 8−10 mg/L prediction class, but the 95% confidence interval overlapped the 8−10 mg/L prediction class (Figure 4). The overall agreement between mean groundwater nitrate concentration and the prediction classes indicated that the aquifer vulnerability maps were reasonably well supported by the data.

ASSOCIATED CONTENT

S Supporting Information *

GIS processing of predictor variables; statistical modeling steps and fit criteria; values of random forest variables; logistic regression model coefficients; equal area aquifer proportion analysis; random forest regression partial dependence plots; MODFLOW vertical water fluxes; model fit and sensitivity to groundwater-age proxy variables; analysis of random forest regression model errors; model limitations. This material is available free of charge via the Internet at http://pubs.acs.org.



AUTHOR INFORMATION

Corresponding Author

*E-mail: [email protected]. Tel: (703) 648-4000. Fax: (703) 648-6693. Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS We gratefully acknowledge the field personnel who collected the data used in the study. We also thank Karen Burow, Neil Dubrovsky, Tyler Johnson, Bryant Jurgens, Donna Knifong, and Mike Wieczorek, who contributed data used in the modeling and/or provided valuable insights into the study area. We thank four anonymous reviewers whose comments and suggestions substantially improved the manuscript. Any use of trade, firm, or product names is for descriptive purposes only and does not imply endorsement by the U.S. Government.



REFERENCES

(1) Reilly, T. E.; Dennehy, K. F.; Alley, W. M.; Cunningham, W. L. Ground-water availability in the United States. U.S. Geological Survey Circular 1323 2008 (http://pubs.usgs.gov/circ/1323/). (2) Burow, K. R.; Jurgens, B. C.; Belitz, K.; Dubrovsky, N. M. Assessment of regional change in nitrate concentrations in groundwater in the Central Valley, California, USA, 1950s−2000s. Environ. Earth Sci. 2013, 69, 2609−2621.

5650

dx.doi.org/10.1021/es405452q | Environ. Sci. Technol. 2014, 48, 5643−5651

Environmental Science & Technology

Article

Investigations; U.S. Geological Survey: Reston, VA, 2002; Book 4, Chapter A3. (23) Jurgens, B. C.; Burow, K. R.; Dalgish, B. A.; Shelton, J. L. Hydrogeology, water chemistry, and factors affecting the transport of contaminants in the zone of contribution of a public-supply well in Modesto, eastern San Joaquin Valley, California. U.S. Geological Survey Scientific Investigations Report 2008-5156 2008. (24) Ward, M. H.; deKok, T. M.; Levallois, P.; Brender, J.; Gulis, G.; Nolan, B. T.; VanDerslice, J. Workgroup report: Drinking-water nitrate and health−Recent findings and research needs. Environ. Health Perspect. 2005, 113 (11), 1607−1614. (25) SAS, SAS Enterprise Guide, 2013 (http://www.sas.com/). Accessed in January 2013. (26) R, The R project for statistical computing, 2011 (http://www.rproject.org/). Accessed in June 2011. (27) Eberts, S. M.; Böhlke, J. K.; Kauffman, L. J.; Jurgens, B. C. Comparison of particle-tracking and lumped-parameter age-distribution models for evaluating vulnerability of production wells to contamination. Hydrogeol. J. 2012, 20 (2), 263−282. (28) Eberts, S. M.; Thomas, M. A.; Jagucki, M. L. The quality of our Nation’s watersFactors affecting public-supply-well vulnerability to contaminationUnderstanding observed water quality and anticipating future water quality, U.S. Geological Survey Circular 1385 2013. (29) Nolan, B. T.; Hitt, K. J. Vulnerability of shallow groundwater and drinking-water wells to nitrate in the United States. Environ. Sci. Technol. 2006, 40 (24), 7834−7840. (30) Menard, S. Applied Logistic Regression Analysis; Sage Publications: Thousand Oaks, 2002. (31) Landon, M. K.; Green, C. T.; Belitz, K.; Singleton, M. J.; Esser, B. K. Relations of hydrogeologic factors, groundwater reductionoxidation conditions, and temporal and spatial distributions of nitrate, Central-Eastside San Joaquin Valley, California, USA. Hydrogeol. J. 2011, 19 (6), 1203−1224. (32) Boy-Roura, M.; Nolan, B. T.; Menció, A.; Mas-Pla, J. Regression model for aquifer vulnerability assessment of nitrate pollution in the Osona region (NE Spain). J. Hydrol. 2013, 505, 150−162. (33) Stackelberg, P. E.; Barbash, J. E.; Gilliom, R. J.; Stone, W. W.; Wolock, D. M. Regression models for estimating concentrations of atrazine plus deethylatrazine in shallow groundwater in agricultural areas of the United States. J. Environ. Quality 2012, 41 (2), 479−494. (34) Belitz, K.; Jurgens, B.; Landon, M. K.; Fram, M. S.; Johnson, T. Estimation of aquifer scale proportion using equal area grids: Assessment of regional scale groundwater quality. Water Resour. Res. 2010, 46 (11), 1−14. (35) McMahon, P. B.; Chapelle, F. H. Redox processes and water quality of selected principal aquifer systems. Ground Water 2008, 46 (2), 259−271. (36) Nolan, B. T. Nitrate Behavior in Ground Waters of the Southeastern United States. J. Environ. Quality 1999, 28 (5), 1518− 1527.

(3) DeSimone, L. A.; Hamilton, P. A.; Gilliom, R. J. The quality of our nation’s watersQuality of water from domestic wells in principal aquifers of the United States, 1991−2004Overview of major findings, U.S. Geological Survey Circular 1332 2009 (http://pubs.usgs. gov/circ/1323/). (4) Faunt, C. C. Groundwater availability of the Central Valley aquifer, California. U.S. Geological Survey Professional Paper 1766 2009. (5) Carlisle, D. M.; Falcone, J.; Wolock, D. M.; Meador, M. R.; Norris, R. H. Predicting the natural flow regime: Models for assessing hydrological alteration in streams. River Res. Appl. 2010, 26 (2), 118− 136. (6) Eng, K.; Carlisle, D. M.; Wolock, D. M.; Falcone, J. A. Predicting the likelihood of altered streamflows at ungauged rivers across the conterminous United States. River Res. Appl. 2013, 29 (6), 781−791. (7) Waite, I. R.; Kennen, J. G.; May, J. T.; Brown, L. R.; Cuffney, T. F.; Jones, K. A.; Orlando, J. L. Comparison of Stream Invertebrate Response Models for Bioassessment Metrics. J. Am. Water Resources Assoc. 2012, 48 (3), 570−583. (8) Zhang, L.; Wang, L. L.; Zhang, X. D.; Liu, S. R.; Sun, P. S.; Wang, T. L. The basic principle of random forest and its applications in ecology: A case study of pinus yunnanensis. Shengtai Xuebao 2014, 34 (3), 650−659. (9) Ayotte, J. D.; Nolan, B. T.; Nuckols, J. R.; Cantor, K. P.; Robinson, G. R., Jr; Baris, D.; Hayes, L.; Karagas, M.; Bress, W.; Silverman, D. T.; Lubin, J. H. Modeling the probability of arsenic in groundwater in New England as a tool for exposure assessment. Environ. Sci. Technol. 2006, 40 (11), 3578−3585. (10) Gurdak, J. J.; Qi, S. L. Vulnerability of recently recharged groundwater in principle aquifers of the United States to nitrate contamination. Environ. Sci. Technol. 2012, 46 (11), 6004−6012. (11) Nolan, B. T.; Hitt, K. J.; Ruddy, B. C. Probability of nitrate contamination of recently recharged groundwaters in the conterminous United States. Environ. Sci. Technol. 2002, 36 (10), 2138−2145. (12) Rupert, M. G. Probability of detecting atrazine/desethyl-atrazine and elevated concentrations of nitrate in ground water in Colorado. U.S. Geological Survey Water-Resources Investigations Report 02-4269; 2003. (13) Warner, K. L.; Arnold, T. L. Relations that affect the probability and prediction of nitrate concentration in private wells in the glacial aquifer system in the United States. U.S. Geological Survey Scientific Investigations Report 2010−5100 2010 (14) Ahn, J. S.; Cho, Y. C. Predicting natural arsenic contamination of bedrock groundwater for a local region in Korea and its application. Environ. Earth Sci. 2013, 68 (7), 2123−2132. (15) Fram, M. S.; Belitz, K. Probability of detecting perchlorate under natural conditions in deep groundwater in California and the Southwestern United States. Environ. Sci. Technol. 2011, 45 (4), 1271−1277. (16) Liu, C. W.; Wang, Y. B.; Jang, C. S. Probability-based nitrate contamination map of groundwater in Kinmen. Environ. Monit. Assess. 2013, 1−10. (17) Mair, A.; El-Kadi, A. I. Logistic regression modeling to assess groundwater vulnerability to contamination in Hawaii, USA. J. Contaminant Hydrology 2013, 153, 1−23. (18) Menció, A.; Boy-Roura, M.; Mas-Pla, J. Analysis of vulnerability factors that control nitrate occurrence in natural springs (Osona Region, NE Spain). Sci. Total Environ. 2011, 409 (16), 3049−3058. (19) Anning, D. W.; Paul, A. P.; McKinney, T. S.; Huntington, J. M.; Bexfield, L. M.; Thiros, S. A. Predicted nitrate and arsenic concentrations in basin-fill aquifers of the southwestern United States. U.S. Geological Survey Scientific Investigations Report 2012-5065 2012. (20) Liaw, A.; Wiener, M. Classification and regression by random forest. R News 2002, 2 (3), 18−22. (21) Gronberg, J. M.; Spahr, N. E. County-level estimates of nitrogen and phosphorus from commercial fertilizer for the conterminous United States, 1987−2006. U.S. Geological Survey Scientific Investigations Report 2012-5207 2012. (22) Helsel, D. R.; Hirsch, R. M. Statistical Methods in Water Resources. U.S. Geological Survey Techniques of Water Resources 5651

dx.doi.org/10.1021/es405452q | Environ. Sci. Technol. 2014, 48, 5643−5651