Comparative Evaluation of Statistical and Mechanistic Models of

Jan 29, 2016 - Copyright © 2016 American Chemical Society. *Phone: 517-432-0851; e-mail: [email protected] (M.S.P.). Cite this:Environ. Sci. Technol. 50,...
0 downloads 0 Views 3MB Size
Article pubs.acs.org/est

Comparative Evaluation of Statistical and Mechanistic Models of Escherichia coli at Beaches in Southern Lake Michigan Ammar Safaie,† Aaron Wendzel,† Zhongfu Ge,‡ Meredith B. Nevers,‡ Richard L. Whitman,‡ Steven R. Corsi,§ and Mantha S. Phanikumar*,† †

Department of Civil and Environmental Engineering, Michigan State University, 1449 Engineering Research Court, East Lansing, Michigan 48824, United States ‡ U.S. Geological Survey, Great Lakes Science Center, Lake Michigan Ecological Research Station, 1574 N. County Road, 300 E. Chesterton, Indiana 46304, United States § U.S. Geological Survey, Wisconsin Water Science Center, 8505 Research Way, Middleton, Wisconsin 53562, United States S Supporting Information *

ABSTRACT: Statistical and mechanistic models are popular tools for predicting the levels of indicator bacteria at recreational beaches. Researchers tend to use one class of model or the other, and it is difficult to generalize statements about their relative performance due to differences in how the models are developed, tested, and used. We describe a cooperative modeling approach for freshwater beaches impacted by point sources in which insights derived from mechanistic modeling were used to further improve the statistical models and vice versa. The statistical models provided a basis for assessing the mechanistic models which were further improved using probability distributions to generate high-resolution time series data at the source, long-term “tracer” transport modeling based on observed electrical conductivity, better assimilation of meteorological data, and the use of unstructured-grids to better resolve nearshore features. This approach resulted in improved models of comparable performance for both classes including a parsimonious statistical model suitable for real-time predictions based on an easily measurable environmental variable (turbidity). The modeling approach outlined here can be used at other sites impacted by point sources and has the potential to improve water quality predictions resulting in more accurate estimates of beach closures.

1. INTRODUCTION

In the context of bathing water quality and beach closures, significant progress has been made in modeling the levels of FIOs such as E. coli using mechanistic models at both marine10−17 and freshwater beaches.4,5,18−20 Mechanistic models have been used in these environments to estimate the extent of impact of a river or creek point source of FIO contamination on nearby coastal beaches. They have also been used to calculate the effects of coastline breakwaters on the containment and concentration of bacterial contaminants.20 While these models provide detailed information on nearshore dynamics, typically resulting in a three-dimensional visualization of the area, they also require a significant investment in model development, testing, and application. Further, the computational nature of the models makes them useful for understanding the system but less practical for direct applications for daily water quality estimations. Statistical models have also been widely used in marine and freshwater systems, increasingly in applications used directly by beach

Modeling in nearshore coastal waters can provide insights for source tracking,1,2 dispersion and diffusion3−7 and persistence of environmental contaminants.8 By quantifying the processes responsible for water quality changes in time and space, risks to wildlife or human health can be estimated. Models for nearshore waters may include mechanistic models or empirical statistical models. Mechanistic models, which are based on conservation principles, have been widely used to track contamination and in recent years, more advanced mechanistic models have included biological processes, such as bacterial inactivation, to account for a broader range of variation in water quality. Statistical models, however, are data-based and rely on the relationships between measurements of hydrometeorology and known concentrations of the target contaminants.9 Both types of models are increasingly being used to predict fecal indicator organisms (FIO, e.g., bacteria and viruses) responsible for degrading beach water quality, resulting in swimming advisories and closed beaches throughout coastal areas. Rarely, however, have the two modeling approaches been subjected to a careful comparison in a given location based on the same data sets. © XXXX American Chemical Society

Received: November 1, 2015 Revised: January 26, 2016 Accepted: January 29, 2016

A

DOI: 10.1021/acs.est.5b05378 Environ. Sci. Technol. XXXX, XXX, XXX−XXX

Article

Environmental Science & Technology managers to estimate water quality at their swimming beaches.21,22 In these applications, statistical models are used in place of routine monitoring for culturable FIO. Use of these models has been encouraged by the US EPA because they provide results in a fraction of the time associated with culturing analyses.23,24 While statistical models can be as simple as a relationship between rainfall and FIO,25 they can also incorporate high-intensity automated data collection at multiple beaches.26 Most active statistical models fall somewhere between these two examples, but their development requires the collection of multiple years of data, statistical interpretation of the information, and validation and improvement. Statistical models are a practical solution for beach management, but their specificity and sensitivity are hard to improve due to the highly variable nature of FIO in a variety of beach environments and inadequate predictors.27,28 Few studies have used mechanistic and statistical models together to inform the usefulness of models and to improve the accuracy of the two modeling approaches. Because both types of models use similar performance metrics such as R2 or the root-mean squared error, comparisons of the two models could help determine the appropriate model application in a given situation. Further, a comparison of models at an individual location would provide insight into the inner workings of each model. Froelich et al.29 used mechanistic and statistical models for bacteria of the genus Vibrio in the Neuse River Estuary in North Carolina. The two models were compared using data collected at different stations in the estuary, and the mechanistic model generally outperformed their statistical model. Feng et al.17 used a two-dimensional, depth-averaged numerical mass balance model based on advection, dispersion, and reactions as well as a statistical regression model to predict enterococci levels at the Hobie beach, a marine beach on the Atlantic coast of South Florida. Both models correctly predicted approximately 70% of advisories based on data collected at knee-depth while 90% of advisories were correctly predicted at waist-depth. The authors recommend the mass balance model for more informed management decisions due to its ability to describe the spatiotemporal evolution of enterococci levels. In this study, we compare mechanistic and statistical models using data collected over a three-month period at three southern Lake Michigan beaches to compare the accuracy and predictive capabilities of the two approaches. Information derived from mechanistic models were used to improve the statistical models and vice versa. The use of this type of cooperative modeling can be used to improve predictions of water quality, resulting in more accurate estimates of beach conditions in the Great Lakes and along marine coasts. Further, the findings can be used for developing remediation activities for sources or conditions that lead to high concentrations of FIO at these and similar beaches.30

Figure 1. Map of the study area showing the three Ogden Dunes beaches (OD1, OD2, and OD3) and the source at Burns Ditch (BD). Locations of the ADCPs deployed during summer 2008 are also marked.

beaches. The three beaches OD1, OD2, and OD3 are about 1500, 800, and 500 m, west of the Burns Ditch outfall, respectively. Burns Ditch is the outfall of the Little Calumet River, which drains a mixed land-use watershed that includes heavy industry, agricultural, and residential areas.9 Periodically, combined sewer overflows (CSOs) are discharged into Burns Ditch from several municipal wastewater treatment plants. During heavy rain events that result in a CSO, concentrations of E. coli as high as 10 000 CFU/100 mL have been recorded.32 E. coli concentrations fluctuate widely, however, with concentrations often below 100 CFU/100 mL during dry years. The geometric means of E. coli at BD, OD1, OD2, and OD3 locations during the study period were 222, 12, 19, and 21 CFU/100 mL, respectively. Semidiurnal water samples were usually collected in kneedeep water around 7:00 AM and 2:00 PM from early June to late August 2008, and samples were analyzed within 4 h of collection for E. coli using the IDEXX Colilert-18 reagent and IDEXX Quanti-Tray 2000 method (Standard Methods SM9223B, American Public Health Association). Additional water quality variables monitored in this study include turbidity, specific conductance or electrical conductivity (Econ), water temperature, and outflow of the Burns Ditch. E. coli concentrations, turbidity, and the additional water quality variables were measured twice daily on all days except during weekends. In addition, available meteorological observations were obtained from the National Climatic Data Center (NCDC) and National Data Buoy Center (NDBC) weather stations surrounding the lake. Additional details of the site sampling are available in Thupaki et al.5 2.2. Mechanistic Model. A coupled three-dimensional hydrodynamic and water quality model was used to simulate temporal and spatial distribution of E. coli in Lake Michigan. The model was based on the unstructured grid Finite Volume Community Ocean Model (FVCOM).33 Details can be found in Chen et al.33,34 The E. coli fate and transport equation appears as shown below.

2. MATERIALS AND METHODS 2.1. Study Site. The study area is located near the PortageBurns waterway (Burns Ditch) in southern Lake Michigan (USGS station number 04095090). The three Ogden Dunes beaches OD1, OD2, and OD3, the focus of the current research, are impacted by the Burns Ditch outfall (BD in Figure 1). To measure nearshore currents for testing the mechanistic hydrodynamic model, bottom-mounted, upward-looking Acoustic Doppler Current Profilers (ADCPs) were deployed in the study region31 from early June through late August 2008. Water samples were collected in knee-deep waters at the

∂C ∂C ∂C ∂C +u +v +w = ∂t ∂x ∂y ∂z ∂ ⎛⎜ ∂C ⎞⎟ ∂ ⎛ ∂C ⎞ ∂ ⎛⎜ ∂C ⎟⎞ + +S KH KV ⎜KH ⎟ + ∂x ⎝ ∂x ⎠ ∂y ⎝ ∂y ⎠ ∂z ⎝ ∂z ⎠ B

(1)

DOI: 10.1021/acs.est.5b05378 Environ. Sci. Technol. XXXX, XXX, XXX−XXX

Article

Environmental Science & Technology ⎛ ∂(f vsC) ⎞ p S = −⎜⎜ + kII0e−kezC + kdC ⎟⎟θ T − 20 ⎝ ∂z ⎠

2.2.1. Electrical Conductivity Modeling. To identify mixing parameters that best describe transport in the nearshore region, conservative solute transport modeling is an important first step. Dye tracer and drifter studies4,40 provide useful data; however, they are expensive, time-consuming, and the data collected tend to be limited (typically a few days). Long time series data (e.g., over the entire summer season) are most helpful, therefore we have explored the possibility of using natural tracers. A requirement for successful modeling is that there be a clear gradient/contrast between the background lake water and tracer values at the source (river mouth at BD). This requirement was satisfied by Econ, an easily measured water quality variable used here as a tracer with the following caveats. Major ions including Ca, Cl, F, K, Na, NO3, Mg, PO4, and SO4 affect Econ values in lake water leading to many sources and sinks in a natural environment; therefore Econ is not conservative in general. However, due to the proximity of the three Ogden Dunes beaches to the Burns Waterway outfall (∼1 km), no sources and sinks were considered significant. Econ measurements are also known to be dependent on water temperature;41 however, the simulated temperature in the nearshore region was relatively constant between the three sample locations, indicating that an Econ-temperature relation was not needed. Despite all the factors that are known to influence Econ values in lake water, Schimmelpfennig et al.37 concluded that Econ can be used as a suitable tracer in Lake Tegel in Germany. 2.2.2. Hourly E. coli at the BD Outfall. In the mechanistic model, observed river discharge and E. coli concentrations at Burns Ditch were used as inputs to the modeling domain. Since E. coli data were measured twice daily, hourly river discharge information was used to approximate the hourly distribution of E. coli at unsampled times using statistical techniques. To do this, logistic distributions were fitted to empirical cumulative distributions of river discharge and observed E. coli to determine parameters of the distribution of each variable. The cumulative distribution function (CDF) of the logistic distribution is given by the following:

(2)

where C denotes concentration of E. coli (CFU/100 mL), and (u, v, w) are the x, y, and z components of velocity (m/d). KH and KV are the horizontal and vertical mixing coefficients (m2/ d), respectively. S denotes a sink term for E. coli. f p is the fraction of E. coli attached to particles, vs is the settling velocity (m/d), KI is the inactivation rate of E. coli due to sunlight (m2/ W. d), I0 denotes short-wave radiation at the water surface (W/ m2), ke is the sunlight extinction coefficient (m−1), kd is the base mortality rate (d−1), and the effect of temperature (T) on the loss rate is modeled by the term θT−20. The inactivation formulation (eq 2) used is essentially similar to the one described in Liu et al.18 who used a vertically integrated two-dimensional model. This formulation was modified to account for 3D geometry in the Princeton Ocean Model,4 and the same formulation was later adapted to the FVCOM unstructured-grid framework.35,36 This inactivation formulation and transport model are used in this work with several major changes to further improve model performance including (a) the use of observed Econ to serve as a long-term tracer to improve conservative transport simulations, (b) the use of statistical distributions to generate high-resolution time series of E. coli at the source, (c) the use of LIDAR data for nearshore bathymetry, and (d) an accurate interpolation of meteorological data using a natural neighbor method. All of these improvements were triggered by an initial comparison with results from our statistical modeling and details are described below. Econ was used as a conservative tracer37 to evaluate hydrodynamic and mixing parameters by comparing the simulated and observed values at the beaches. The discharge of the Burns Ditch, as a point source of E. coli and Econ, were added from a node point located at the USGS Burns Ditch station (BD in Figure 1). The background concentrations of Econ and E. coli were set to 286 μS/cm (lake background value) and zero, respectively. The unstructured mesh used in the mechanistic model had 12 684 nodes and 23 602 triangular elements in the horizontal direction and 20 vertical layers. The horizontal spatial resolution of the unstructured triangular meshes ranged from 40 m near the BD outfall to 2−5 km in the center of the lake which provided a good representation of complex nearshore geometry and features, especially near the Ogden Dunes beaches (Figure S1 in the Supporting Information, SI). Six arcsecond bathymetry data were obtained from the NOAA National Geophysical Data Center (NGDC) and interpolated to the unstructured mesh using the natural neighbor method. Along the Indiana coast where finer resolution was needed, a two-meter resolution bathymetric data from the NOAA was utilized based on a 2008 LIDAR data set. Hourly meteorological observations, including wind speed and direction, air temperature, cloud cover, dew point, and relative humidity, were interpolated over the computational grids using a smoothed natural neighbor algorithm to calculate wind and heat flux fields over the lake surface.38 For the heat flux calculations, long-wave solar radiation was calculated using the model based on air temperature and cloud cover,39 and shortwave solar radiation was calculated using the clear-sky value and the measured cloud cover.40

F(z) =

⎛ (x − μ) ⎞ 1 1 1 ⎟= = + tanh⎜ ⎝ 2s ⎠ (1 + e−z) 2 2

⎛ π(x − μ) ⎞ 1 1 + tanh⎜ ⎟ ⎝ 2 3σ ⎠ 2 2 x−μ z= , −∞ < x < ∞ , −∞ < μ < ∞ , s > 0 s

(3)

where μ is mean, and s is a scale parameter (related to the standard deviation σ). On the basis of E. coli data from two-day hourly intensive sampling, we found that the hourly river discharge and hourly E. coli distributions at BD had the same empirical CDFs. In other words, the magnitudes of flow and E. coli concentrations in the river had the same frequency of occurrence within the hourly interval. Details are in the SI. As described in the SI, this technique produced better results compared to a model with linearly interpolated hourly E. coli data. 2.3. Statistical Model. In a separate project, we developed multiple linear regression (MLR) models with and without interaction effects based on data collected in the morning. The data collected in the afternoon were excluded because they are correlated with the morning data of the same day and violate the requirement of independent sample points in MLR models. Several explanatory variables were considered for inclusion in C

DOI: 10.1021/acs.est.5b05378 Environ. Sci. Technol. XXXX, XXX, XXX−XXX

Article

Environmental Science & Technology

Figure 2. Comparison of observed (symbols) and simulated (color lines) electrical conductivity at the three beaches.

The 3/2 power in (eq 8) indicates that E. coli concentrations at the source (BD) are sensitive to changes in turbidity. Similar relations between C and τ for the beaches indicated that both the exponent of τ and the R2 values decrease for the individual beaches. Combining the source characteristic relation (eq 7) with (eq 5) for all beaches, we get the following:

the model (SI). Parsimonious models were identified by backward elimination. The following parsimonious model with only four explanatory variables was finally obtained: E[log10 C ] = 1.094 − 0.092 ln(Q ) + 0.179 ln(HS) − 1.595I + 0.393 ln(τ )

(4)

E[C ] = 1.09ττ00.488 ≈ τ τ0 ,

2

with an adjusted R = 0.468, where E[] denotes expectation, C is the E. coli concentration, τ is the turbidity at the beaches, Q is the river discharge (15 min average), Hs is the significant wave height, and I is the solar irradiance. Equation 4 was once believed to be the best model we could obtain following a standard procedure for the development of empirical models. To further improve the model, insights obtained from our mechanistic modeling were used. Mechanistic modeling indicated that E. coli concentrations at the individual beaches are strongly dependent on the dynamics of the plume originating from BD although a weak correlation often existed with water quality variables such as turbidity or conductivity at the same location. This observation prompted us to look for a relation for source-normalized E. coli concentration at the beaches as a function of normalized turbidity. This resulted in the following regression equation for all three beaches: ⎡C⎤ ⎛ τ ⎞0.98 E⎢ ⎥ = 0.2⎜ ⎟ , ⎣ C0 ⎦ ⎝ τ0 ⎠

E[log10 C ] = 0.537 + 0.172 ln(τ )ln(τ0), R2 = 0.749(N = 129)

(5)

RMSE = (6)

R2 = 0.82

R2 =

1 n

n

∑ (Oi − Pi)2 i=1

n ∑i = 1 (Oi n ∑i = 1 (Oi

− O̅ )(Pi − P ̅ ) n

− O̅ )2 ∑i = 1 (Pi − P ̅ )2

n

PBIAS =

(7)

Fn =

Or on the original scale, we have approximately: E[C0] = 5.485τ03/2

(10)

The above relation indicates that a simple and highly parsimonious model based on an interaction term between turbidity at the source, and the individual beaches have high predictive ability. The model is also easy to apply as it requires only turbidity measurement from a morning sample. Model coefficients were found to be significant based on t tests and an F-test (p < 0.001). The model in (eq 10) was based on morning data and afternoon samples were used to evaluate model performance using the following metrics:

where C and τ are the E. coli concentration and turbidity, respectively, at the beaches, and C0 and τ0 are their corresponding values at BD. Examining the source characteristics at BD alone, we obtained the following relation: E[log10 C0] = 1.702 + 1.488(ln τ0),

(9)

After the significance of the term τ(τ0)1/2 had been recognized, a better model was obtained when turbidity was transformed to the natural log space:

R2 = 0.343

⎡C⎤ ⎛τ⎞ or approximately: E⎢ ⎥ = 0.2⎜ ⎟ ⎣ C0 ⎦ ⎝ τ0 ⎠

R2 = 0.521

(8)

∑i = 1 (Oi − Pi) × 100 n

∑i = 1 Oi

|| Oi , Pi || where || Oi , Pi || = || Oi , 0 ||

⎛1 ⎜⎜ ⎝n

n



i=1



∑ |Oi − Pi|2 ⎟⎟ (11)

D

DOI: 10.1021/acs.est.5b05378 Environ. Sci. Technol. XXXX, XXX, XXX−XXX

Article

Environmental Science & Technology

Figure 3. Comparison of observed (symbols) and simulated (color lines) E. coli concentrations at the three beaches.

Figure 4. Comparison of statistical (diamonds) and mechanistic (box plots) models with observations for the three beaches. Data and models for the morning samples are shown using blue color symbols while red symbols denote afternoon samples. In the box plots, the median is shown using a symbol (⊙) and outliers are denotes using the plus (+) symbol.

Here Oi and Pi denote observed and predicted values of a variable, respectively. For E. coli, all metrics are based on the log10-transformed values. The R2 and RMSE are well-known metrics, while PBIAS is a measure of the tendency of the simulated data to be higher or lower relative to the observations.42 The Fourier norm provides an indication of the variance in the observed data that is not captured by the model.5 A Fourier norm of zero indicates perfect agreement between data and model results. Two other metrics (described in the SI) − the Nash−Sutcliffe efficiency (NSE) and RSR, a standardized version of the RMSE are used to compare the performance of the two models.

observed Econ at the Ogden Dunes beaches, and statistics for the comparison are available in the SI. The results have R2 values ranging from 0.54 to 0.62, and RMSE ranging from 41.5 to 55.8 μS/cm. At OD3 which is the nearest beach to the source of BD, Econ varied with depth, with a higher concentration near the surface while the distribution was nearly vertically well-mixed at OD1. Overall, the model described the observed Econ reasonably well. The long time series data allowed us to identify the best mixing parameters (KH, KV in eq 1) that described conservative solute transport accurately over the three month period. This was useful because mixing and inactivation parameters both influence E. coli peaks in the model leading to considerable uncertainty in model outcomes. For E. coli, comparison of observations with simulated results based on the mechanistic model are shown in Figure 3. The following parameters were used.

3. RESULTS The model simulated the hydrodynamics accurately except for a short period during an intense storm around Julian Day (JD) 220 (SI). Figure 2 shows the comparison of simulated and E

DOI: 10.1021/acs.est.5b05378 Environ. Sci. Technol. XXXX, XXX, XXX−XXX

Article

Environmental Science & Technology

Table 1. Summary Statistics for the Mechanistic and Statistical Models for E. coli Based on the Morning and Afternoon Samples station OD3

sample time

model

NSE

PBIAS

RSR

R2

RMSE

Fn

morning samples

statistical model mechanistic model statistical model mechanistic model statistical model mechanistic model statistical model mechanistic model statistical model mechanistic model statistical model mechanistic model statistical model mechanistic model statistical model mechanistic model

0.628 0.459 0.652 0.228 0.469 0.199 0.517 0.534 0.544 −0.319 0.015 −0.100 0.554 0.133 0.444 0.299

−4.427 −5.980 −0.965 3.411 1.560 15.021 −9.796 6.993 −9.148 15.758 −27.829 12.692 −3.792 8.094 −11.553 7.250

0.610 0.735 0.590 0.878 0.729 0.895 0.695 0.683 0.675 1.148 0.992 1.049 0.667 0.931 0.745 0.837

0.797 0.744 0.816 0.691 0.686 0.646 0.743 0.817 0.768 0.426 0.565 0.563 0.749 0.603 0.710 0.722

0.407 0.491 0.337 0.502 0.457 0.561 0.467 0.459 0.428 0.728 0.560 0.592 0.431 0.601 0.464 0.521

0.291 0.351 0.222 0.330 0.324 0.398 0.323 0.318 0.334 0.567 0.468 0.495 0.316 0.440 0.333 0.373

afternoon samples OD2

morning samples afternoon samples

OD1

morning samples afternoon samples

all stations

morning samples afternoon samples

based on natural neighbor interpolation (earlier versions of the model used the nearest neighbor method). Both models are suitable for real-time and near real-time predictions as discussed below.

vS = 1m/d, ke = 0.55m−1, kI = 0.003m 2 /(W. d), fP = 0.05, kd = 0.777d −1 , and θ = 1.07.

The results indicate that contamination originating from discharge of the BD is the key contributor to the E. coli levels at all three beaches. As we get further away from the BD station, E. coli concentration has a lower variation with depth similar to Econ. Figure 4 shows the observed and predicted E. coli based on the mechanistic and statistical models for morning (blue symbols) and afternoon (red symbols) samples. Since predicted E. coli values from the mechanistic model vary with depth, a box plot with symbols to denote the median [⊙] and outliers (+) was used to show the distribution. For beach closures, the Indiana standard for single sample maximum is 235 CFU/100 mL, and this value is marked using dashed lines in Figure 4 to easily spot the false positives and negatives. Summary statistics of E. coli concentrations from the mechanistic model were calculated for each beach for all sampling times and compared with the results of the statistical model in Table 1 for the morning and afternoon samples separately. Considering all stations and the morning and afternoon samples separately, we found that the mechanistic model outperformed the statistical model for the afternoon samples while the opposite is true for the morning samples (Table 1). Since the statistical model was developed using only the morning data, model performance deteriorated slightly for the afternoon data. The mechanistic model is a three-dimensional transport model and large data sets corresponding to the spatiotemporal evolution of E. coli plumes are generated. Detailed statistics corresponding to the predicted variability within the water column are summarized in Table S4 based on the metrics in eq 11 and compared with the results from the statistical model. Overall, although both models produced comparable results, the simple and parsimonious statistical model was found to generally outperform the particular version of the mechanistic model considered in the present study. The mechanistic model itself benefitted from an initial comparison with results from the statistical model and significant improvements resulted from long-term “tracer” transport modeling using Econ, the use of an unstructured grid model to better resolve nearshore features, the use of statistical methods to generate high-frequency E. coli data at BD and improved assimilation of meteorological data

4. DISCUSSION Our initial development of statistical22,26,28 and mechanistic4,5,18 models for the Ogden Dunes beaches proceeded as separate activities. The RMSE values for log-transformed observed and simulated values of E. coli based on the Princeton Ocean Model (a structured grid model) reported in Thupaki et al.5 were around 1.36 for the same beach sites considered in the present work (although Thupaki et al.5 used a shorter period to test their model). By explicitly modeling waves, sediment transport, and bacteria-sediment interactions, Thupaki et al.5 reported a significant improvement in their model performance as indicated by the reduced values of RMSE (between 0.49 and 0.54 with sediment processes included). Summary statistics provided in Tables 1 and S4 indicate that the current version of the mechanistic model without including sediment-bacteria interactions was able to perform just as well as the model with sediment processes in Thupaki et al.5 From the perspective of health risks, predicting the occurrence of E. coli levels exceeding 235 CFU/100 mL at beaches is crucial. The statistical model correctly captured six out of the eight exceedances while the mechanistic model captured four (Figures 3 and 4). In the mechanistic model, accurately capturing meteorological forcing during intense storm events has been a challenge, and the model suffered both in terms of hydrodynamics (SI) and transport. Improving the meteorological forcing is expected to further improve the mechanistic model. For JD 161.5 (Figure 3) high E. coli (2,419 CFU/100 mL) levels were noted at the beaches OD1 and OD2 (Figure 4), and these were underestimated by both models producing false-negatives. On JD 220, the large storm caused a high fluctuation of E. coli which was well-captured by the mechanistic model. However, the model yielded a false-positive during the afternoon at all beaches. Overall both models predicted the observed trends well as summarized in Tables 1 and S4. Finally a few comments on the application of the models developed in this paper for beach management are in order. The simple statistical model based on eq 10 requires only two F

DOI: 10.1021/acs.est.5b05378 Environ. Sci. Technol. XXXX, XXX, XXX−XXX

Environmental Science & Technology



ACKNOWLEDGMENTS This research was funded in part through the USGS Oceans Research Priorities Plan and the Great Lakes Restoration Initiative. We thank Muruleedhara Byappanahalli, Dawn Shively, Kasia Przybyla-Kelly, Ashley Spoljaric, Pramod Thupaki, Mark Blouin, and Glen Black for their contributions to this research. Any use of trade, product, or firm names is for descriptive purposes only and does not imply endorsement by the U.S. Government. This article is Contribution 2014 of the USGS Great Lakes Science Center.

turbidity measurementsone value at the beach of interest (τ) and another at the nearby source (τ0) impacting the beach. Although not examined in this paper, it will be interesting to understand how turbidity from point sources located on either side of a beach impacts water quality at the beach (e.g., the Mt. Baldy beach impacted by Trail Creek and Kintzele Ditch in southern Lake Michigan, reported in Liu et al.18 and Nevers et al.43). The simplicity and ease of application are the major strengths of statistical models in general and eq 10 in particular. Combining statistical models with commercially available sensors, data-logging, telemetry and web scripts opens up exciting possibilities for automating beach management and such systems were recently implemented for nine Chicago beaches.44 However, making mechanistic models operational involves considerable effort, and only a few institutions/ agencies have the infrastructure to make operational forecasts (e.g., NOAA). The statistical models seem to have an advantage here, and using insights gained from more complex mechanistic models to further improve empirical models appears to be a promising avenue for the near future. Mechanistic models have the advantage that they can help make more informed decisions due to their ability to provide detailed information.17 We also note that model testing as reported in the paper is retrospective. Evaluating model performance using additional data (not used for model development) represents another level of model testing, not reported in this paper. While the simpler statistical model can be used to make real-time forecasts using turbidity values from morning samples, the mechanistic model requires both discharge and E. coli values at the source. A variety of approaches can be used to make short-term forecasts of source characteristics including process-based river models,45−47 watershed models,48,49 statistical methods based on the use of probability distributions as well as approaches based on wavelet and neural network methods.50,51 The watershed models48 can also be used to address questions related to nonpoint source pollution. In summary, we refined statistical and mechanistic models of indicator bacteria (E. coli) at beaches in southern Lake Michigan using a cooperative modeling approach. Using process-based reasoning derived from observations of simulated plumes from mechanistic models, we were able to identify parsimonious empirical models with considerable predictive power and the ability to generate real-time forecasts. From a mechanistic modeling point of view, the greatest advantage to having the improved statistical models is that they provide a basis for assessment. Without such a basis, it is difficult to know if the model can be improved further and by how much. This cooperative modeling approach is expected to lead to gains in both types of models at other sites impacted by point sources.





REFERENCES

(1) Sokolova, E.; Åström, J.; Pettersson, T. J. R.; Bergstedt, O.; Hermansson, M. Estimation of pathogen concentrations in a drinking water source using hydrodynamic modelling and microbial source tracking. J. Water Health 2012, 10 (3), 358−370. (2) Sokolova, E.; Pettersson, T. J. R.; Bergstedt, O.; Hermansson, M. Hydrodynamic modelling of the microbial water quality in a drinking water source as input for risk reduction management. J. Hydrol. 2013, 497, 15−23. (3) Boehm, A. B.; Sanders, B. F.; Winant, C. D. Cross-shelf transport at Huntington Beach. Implications for the fate of sewage discharged through an offshore ocean outfall. Environ. Sci. Technol. 2002, 36 (9), 1899−1906. (4) Thupaki, P.; Phanikumar, M. S.; Beletsky, D.; Schwab, D. J. Nevers, M. B.; Whitman, R. L. Budget analysis of Escherichia coli at a southern Lake Michigan beach. Environ. Sci. Technol. 2010, 44 (3), 1010−1016. (5) Thupaki, P.; Phanikumar, M. S.; Schwab, D. J.; Nevers, M. B.; Whitman, R. L. Evaluating the role of sediment-bacteria interactions on Escherichia coli concentrations at beaches in southern Lake Michigan. J. Geophys. Res. Oceans 2013, 118 (12), 7049−7065. (6) Grant, S. B.; Litton-Mueller, R. M.; Ahn, J. H. Measuring and modeling the flux of fecal bacteria across the sediment-water interface in a turbulent stream. Water Resour. Res. 2011, 47 (5), 1−13 W05517. (7) Rippy, M. A.; Franks, P. J. S.; Feddersen, F.; Guza, R. T.; Moore, D. F. Physical dynamics controlling variability in nearshore fecal pollution: Fecal indicator bacteria as passive particles. Mar. Pollut. Bull. 2013, 66 (1−2), 151−157. (8) Fries, J. S.; Characklis, G. W.; Noble, R. T. Attachment of fecal indicator bacteria to particles in the Neuse River Estuary, NC. J. Environ. Eng. 2006, 132 (10), 1338−1345. (9) Nevers, M. B.; Whitman, R. L. Nowcast modeling of Escherichia coli concentrations at multiple urban beaches of southern Lake Michigan. Water Res. 2005, 39 (20), 5250−5260. (10) McCorquodale, J. A.; Georgiou, I.; Carnelos, S.; Englande, A. J. Modeling coliforms in storm water plumes. J. Environ. Eng. Sci. 2004, 3 (5), 419−431. (11) Kim, J. H.; Grant, S. B.; McGee, C. D.; Sanders, B. F.; Largier, J. L. Locating Sources of Surf Zone Pollution: A Mass Budget Analysis of Fecal Indicator Bacteria at Huntington Beach, California. Environ. Sci. Technol. 2004, 38 (9), 2626−2636. (12) Boehm, A. B.; Keymer, D. P.; Shellenbarger, G. G. An analytical model of enterococci inactivation, grazing, and transport in the surf zone of a marine beach. Water Res. 2005, 39 (15), 3565−3578. (13) Grant, S. B.; Kim, J. H.; Jones, B. H.; Jenkins, S. A.; Wasyl, J.; Cudaback, C. Surf zone entrainment, along-shore transport, and human health implications of pollution from tidal outlets. J. Geophys. Res. 2005, 110 (C10), C10025. (14) Sanders, B. F.; Arega, F.; Sutula, M. Modeling the dry-weather tidal cycling of fecal indicator bacteria in surface waters of an intertidal wetland. Water Res. 2005, 39 (14), 3394−3408. (15) de Brauwere, A.; de Brye, B.; Servais, P.; Passerat, J.; Deleersnijder, E. Modelling Escherichia coli concentrations in the tidal Scheldt river and estuary. Water Res. 2011, 45 (9), 2724−2738. (16) Feng, Z.; Reniers, A.; Haus, B. K.; Solo-Gabriele, H. M. Modeling sediment-related enterococci loading, transport, and

ASSOCIATED CONTENT

S Supporting Information *

The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.est.5b05378. Additional comparisons, details of methods and discussion are available as noted in the text (PDF)



Article

AUTHOR INFORMATION

Corresponding Author

*Phone: 517-432-0851; e-mail: [email protected] (M.S.P.). Notes

The authors declare no competing financial interest. G

DOI: 10.1021/acs.est.5b05378 Environ. Sci. Technol. XXXX, XXX, XXX−XXX

Article

Environmental Science & Technology inactivation at an embayed nonpoint source beach. Water Resour. Res. 2013, 49 (2), 693−712. (17) Feng, Z.; Reniers, A.; Haus, B. K.; Solo-Gabriele, H. M.; Wang, J. D.; Fleming, L. E. A predictive model for microbial counts on beaches where intertidal sand is the primary source. Mar. Pollut. Bull. 2015, 94 (1−2), 37−47. (18) Liu, L.; Phanikumar, M. S.; Molloy, S. L.; Whitman, R. L.; Shively, D. A.; Nevers, M. B.; Schwab, D. J.; Rose, J. B. Modeling the Transport and Inactivation of E. coli and Enterococci in the NearShore Region of Lake Michigan. Environ. Sci. Technol. 2006, 40 (16), 5022−5028. (19) Ge, Z.; Whitman, R. L.; Nevers, M. B.; Phanikumar, M. S. WaveInduced Mass Transport Affects Daily Escherichia coli Fluctuations in Nearshore Water. Environ. Sci. Technol. 2012, 46 (4), 2204−2211. (20) Ge, Z.; Whitman, R. L.; Nevers, M. B.; Phanikumar, M. S.; Byappanahalli, M. N. Nearshore hydrodynamics as loading and forcing factors for Escherichia coli contamination at an embayed beach. Limnol. Oceanogr. 2012, 57 (1), 362−381. (21) Francy, D. S.; Stelzer, E. A.; Duris, J. W.; Brady, A. M. G.; Harrison, J. H.; Johnson, H. E.; Ware, M. W. Predictive Models for Escherichia coli Concentrations at Inland Lake Beaches and Relationship of Model Variables to Pathogen Detection. Appl. Environ. Microbiol. 2013, 79 (5), 1676−1688. (22) Nevers, M. B.; Whitman, R. L. Efficacy of monitoring and empirical predictive modeling at improving public health protection at Chicago beaches. Water Res. 2011, 45 (4), 1659−1668. (23) US EPA. Action plan for beaches and recreational waters; EPA/ 600/R-98/079; US Environmental Protection Agency: Washington, DC, 1999. (24) US EPA. Recreational Water Quality Criteria; EPA-820-F-12− 058; US Environmental Protection Agency, Office of Water: Washington, DC, 2012. (25) Stidson, R. T.; Gray, C. A.; McPhail, C. D. Development and use of modelling techniques for real-time bathing water quality predictions. Water Environ. J. 2012, 26 (1), 7−18. (26) Nevers, M. B.; Byappanahalli, M. N.; Edge, T. A.; Whitman, R. L. Beach science in the Great Lakes. J. Great Lakes Res. 2014, 40 (1), 1−14. (27) Boehm, A. B.; Whitman, R. L.; Nevers, M. B.; Hou, D.; Weisberg, S. B. Nowcasting Recreational Water Quality. In Statistical Framework for Recreational Water Quality Criteria and Monitoring; John Wiley & Sons, Ltd: New York, 2007; pp 179−210. (28) Whitman, R. L.; Nevers, M. B. Escherichia coli Sampling Reliability at a Frequently Closed Chicago Beach: Monitoring and Management Implications. Environ. Sci. Technol. 2004, 38 (16), 4241− 4246. (29) Froelich, B.; Bowen, J.; Gonzalez, R.; Snedeker, A.; Noble, R. Mechanistic and statistical models of total Vibrio abundance in the Neuse River Estuary. Water Res. 2013, 47 (15), 5783−5793. (30) Hampson, D.; Crowther, J.; Bateman, I.; Kay, D.; Posen, P.; Stapleton, C.; Wyer, M.; Fezzi, C.; Jones, P.; Tzanopoulos, J. Predicting microbial pollution concentrations in UK rivers in response to land use change. Water Res. 2010, 44 (16), 4748−4759. (31) Thupaki, P.; Phanikumar, M. S.; Whitman, R. L. Solute dispersion in the coastal boundary layer of southern Lake Michigan. J. Geophys. Res. Oceans 2013, 118 (3), 1606−1617. (32) Olyphant, G. A.; Thomas, J.; Whitman, R. L.; Harper, D. Characterization and statistical modeling of bacterial (Escherichia coli) outflows from watersheds that discharge into southern Lake Michigan. Environ. Monit. Assess. 2003, 81, 289−300. (33) Chen, C.; Beardsley, R.; Cowles, G. An Unstructured Grid, Finite-Volume Coastal Ocean Model (FVCOM) System. Oceanography 2006, 19 (1), 78−89. (34) Chen, C.; Liu, H.; Beardsley, R. C. An unstructured grid, finitevolume, three-dimensional, primitive equations ocean model: application to coastal ocean and estuaries. J. Atmospheric Ocean. Technol. 2003, 20 (1), 159−186. (35) Wendzel, A. Constraining mechanistic models of indicator bacteria at recreational beaches in Lake Michigan using easily-measurable

environmental variables. M. S. Dissertation, Michigan State University, East Lansing, MI, 2014. (36) Thupaki, P.; Phanikumar, M. S.; Nevers, M. B.; Whitman, R. L. Modeling the effects of hydrologic separation on the Chicago area waterway system on water quality in Lake Michigan; Great Lakes and Mississippi River Interbasin Study (GLMRIS) Report; US Army Corps of Engineers: Chicago, 2013; Appendix F; p F639−F743. (37) Schimmelpfennig, S.; Kirillin, G.; Engelhardt, C.; Nützmann, G. Effects of wind-driven circulation on river intrusion in Lake Tegel: modeling study with projection on transport of pollutants. Environ. Fluid Mech. 2012, 12 (4), 321−339. (38) Schwab, D. J.; Beletsky, D. Lake Michigan Mass Balance Study: Hydrodynamic Modeling Project; NOAA Technical Memorandum ERL GLERL-108; Great Lakes Environmental Research Laboratory: Ann Arbor, MI, 1998. (39) Parkinson, C. L.; Washington, W. M. A large-scale numerical model of sea ice. J. Geophys. Res. 1979, 84 (C1), 311−337. (40) Nguyen, T. D.; Thupaki, P.; Anderson, E. J.; Phanikumar, M. S. Summer circulation and exchange in the Saginaw Bay-Lake Huron system. J. Geophys. Res. Oceans 2014, 119 (4), 2713−2734. (41) Hayashi, M. Temperature-electrical conductivity relation of water for environmental monitoring and geophysical data inversion. Environ. Monit. Assess. 2004, 96 (1−3), 119−128. (42) Fry, L. M.; Hunter, T. S.; Phanikumar, M. S.; Fortin, V.; Gronewold, A. D. Identifying streamgage networks for maximizing the effectiveness of regional water balance modeling: Identifying Gage Networks. Water Resour. Res. 2013, 49 (5), 2689−2700. (43) Nevers, M. B.; Whitman, R. L.; Frick, W. E.; Ge, Z. Interaction and Influence of Two Creeks on Concentrations of Nearby Beaches: Exploration of Predictability and Mechanisms. J. Environ. Qual. 2007, 36 (5), 1338. (44) Shively, D. A.; Nevers, M. B.; Breitenbach, C.; Phanikumar, M. S.; Przybyla-Kelly, K.; Spoljaric, A. M.; Whitman, R. L. Prototypic Automated Continuous Recreational Water Quality Monitoring of Nine Chicago Beaches. J. Environ. Manage. 2016, 166, 285−293. (45) Anderson, E. J.; Phanikumar, M. S. Surface storage dynamics in large rivers: Comparing three-dimensional particle transport, onedimensional fractional derivative and multi-rate transient storage models. Water Resour. Res. 2011, 47 (9), W09511. (46) Shen, C.; Phanikumar, M. S. An efficient space-fractional dispersion approximation for stream solute transport modeling. Adv. Water Resour. 2009, 32 (10), 1482−1494. (47) Phanikumar, M. S.; Aslam, I.; Shen, C.; Long, D. T.; Voice, T. C. Separating surface storage from hyporheic retention in natural streams using wavelet decomposition of acoustic Doppler current profiles. Water Resour. Res. 2007, 43 (5), W05406. (48) Niu, J.; Phanikumar, M. S. Modeling watershed-scale solute transport using an integrated, process-based hydrologic model with applications to bacterial fate and transport. J. Hydrol. 2015, 529 (1), 35−48. (49) Niu, J.; Shen, C.; Li, S. G.; Phanikumar, M. S. Quantifying storage changes in regional Great Lakes watersheds using a coupled subsurface - land surface process model and GRACE, MODIS products. Water Resour. Res. 2014, 50 (9), 7359−7377. (50) Brion, G. M.; Lingireddy, S. A neural network approach to identifying non-point sources of microbial contamination. Water Res. 1999, 33 (14), 3099−3106. (51) Chang, F. J.; Chen, Y. C. A counterpropagation fuzzy-neural network modeling approach to real time streamflow prediction. J. Hydrol. 2001, 245 (1−4), 153−164.

H

DOI: 10.1021/acs.est.5b05378 Environ. Sci. Technol. XXXX, XXX, XXX−XXX