Real-Time Nowcasting of Microbiological Water ... - ACS Publications

to protect the public from exposure to contaminated water.1 According to the ..... shoreline E. coli concentrations, depending on the flow direction o...
0 downloads 0 Views 2MB Size
Subscriber access provided by TUFTS UNIV

Environmental Modeling

Real-Time Nowcasting of Microbiological Water Quality at Recreational Beaches: A Wavelet and Artificial Neural Network Based Hybrid Modeling Approach Juan Zhang, Han Qiu, Xiaoyu Li, Jie Niu, Meredith Becker Nevers, Xiaonong Hu, and Mantha S Phanikumar Environ. Sci. Technol., Just Accepted Manuscript • DOI: 10.1021/acs.est.8b01022 • Publication Date (Web): 29 Jun 2018 Downloaded from http://pubs.acs.org on July 5, 2018

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 34

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

Environmental Science & Technology

Real-Time Nowcasting of Microbiological Water Quality at Recreational Beaches: A Wavelet and Artificial Neural Network Based Hybrid Modeling Approach Juan Zhang1, Han Qiu2, Xiaoyu Li3, Jie Niu1*, Meredith Nevers4, Xiaonong Hu1, and Mantha S. Phanikumar2* 1

Institute of Groundwater and Earth Sciences, Jinan University, Guangzhou 510632, China. Department of Civil and Environmental Engineering, Michigan State University, East Lansing, MI 3 Department of Mathematics and Statistics, Auburn University, Auburn, AL 4 USGS Great Lakes Science Center, Lake Michigan Ecological Research Station, Chesterton, IN 46304 *Corresponding Authors e-mails: [email protected], [email protected], Phone: (517) 432-0851 2

Abstract The number of beach closings due to bacterial contamination continues to be on the rise in recent

20

years, putting beachgoers at risk of exposure to contaminated water. Current approaches predict

21

levels of indicator bacteria using regression models containing a number of explanatory

22

variables. Data-based modeling approaches can supplement routine monitoring data and provide

23

highly accurate short-term forecasts of beach water quality. In this paper, we apply the nonlinear

24

autoregressive network with exogenous inputs (NARX) method with explanatory variables to

25

predict Escherichia coli (E. coli) concentrations at four Lake Michigan beach sites. We also

26

apply the nonlinear input-output network (NIO) and nonlinear autoregressive neural network

27

(NAR) methods in addition to a hybrid wavelet-NAR (WA-NAR) model and demonstrate their

28

application. All models were tested using 3 months of observed data. Results revealed that the

29

NARX models provided the best performance and that the WA-NAR model, which requires no

30

explanatory variables, outperformed the NIO and NAR models; therefore, the WA-NAR model

31

is suitable for application to data scarce regions. The models proposed in this paper were

32

evaluated using multiple performance metrics including sensitivity and specificity measures and

33

produced results comparable or superior to previous mechanistic and statistical models

34

developed for the same beach sites.

35

models (R2 ~ 0.8 for the beach sites and ~0.9 for the river site) indicate that the new class of

36

models shows promise for beach management.

The relatively high R2 values between data and the NARX

37 1 ACS Paragon Plus Environment

Environmental Science & Technology

Page 2 of 34

38 39 40 41

1. Introduction

42

Levels of fecal indicator bacteria (FIB) are monitored at coastal and inland recreational beaches

43

to protect the public from exposure to contaminated water.1 According to the NRDC,2 10% of all

44

monitoring samples in 2013 exceeded EPA's benchmark value and the Great Lakes region,

45

which included 902 coastal beaches from 8 US states in 2013, had the highest exceedances rate

46

(13%) of all beaches in that region, indicating that swimmers continue to face health risks from

47

microbiological pollution at beaches. Beach water quality is highly dynamic and can change

48

quickly in a matter of minutes, depending on a number of environmental factors; therefore, to be

49

useful and protective of public health, notifications should be issued in a timely manner.

50

Traditional culture-based methods, however, require a 18 to 24 hour laboratory assay, during

51

which time beach-goers could potentially be exposed to contaminated water. Alternative

52

methods for beach management, including rapid methods such as quantitative polymerize chain

53

reaction (qPCR) and predictive modeling, continue to receive significant attention in recent

54

years.3-5 Even with rapid methods, real-time or even same-day water quality results may not

55

always be available.1 While faster techniques that provide results in 6 hours or less have been

56

developed and tested,6-10 using qPCR requires significant up-front investment and expertise for

57

analysis. The US EPA has encouraged the use of qPCR for enterococci and the 2012

58

Recreational Water Quality Criteria provide a Beach Action Value (BAV) for qPCR-based

59

testing of 1,000 CCE (calibrator cell equivalents)/100 mL.

60

2 ACS Paragon Plus Environment

Page 3 of 34

Environmental Science & Technology

61

Due to the lag time between the time of sampling and the time when results are available, models

62

are useful tools to supplement monitoring data.1 A variety of models have been used in the past

63

to forecast FIB levels at beaches, and they generally belong to two broad categories: mechanistic

64

or process-based models,4,11-17 which are based on conservation principles, and statistical

65

regression models and other data-based approaches.3,4,18-19 Mechanistic models of beach water

66

quality require detailed information on sources, including flow and FIB levels from tributaries.

67

While mechanistic models are excellent tools for gaining insights into key processes4, they

68

require considerable resources and training for model setup, running and post-processing. If the

69

objective is to protect public health by making timely and accurate predictions at beach sites,

70

data-based approaches may be attractive since the technology can be transferred from one site to

71

another with relative ease.

72 73

While multiple regression (MR) models of beach water quality have received considerable

74

attention in the past due to their simple and robust nature and ease of implementation, other

75

data-based methods such as Artificial Neural Networks (ANN)20 have the potential to improve

76

the accuracy of forecasts.21-24 These methods, however, have not found widespread application in

77

beach management. ANNs were successfully used in the past to forecast many of the factors that

78

directly or indirectly impact FIB levels at beaches including rainfall patterns,25-28 stream

79

flows,29-31 suspended sediment concentrations,32-34 lake water levels,35-37 wave breaking and

80

changes in beach profiles.38-40 The success of these earlier studies provides ample motivation for

81

applying ANN methods to beach management. Zhang et al.41 applied an ANN model to forecast

82

levels of enterococci at the Holly Beach in Louisiana. They used 15 environmental variables

83

such as salinity, water temperature and wind speed as inputs to their model. For datasets

3 ACS Paragon Plus Environment

Environmental Science & Technology

Page 4 of 34

84

collected during years 2007-2009, the performance of their ANN model was superior (linear

85

correlation coefficient, LCC = 0.857) to that of multiple-regression (MR) models implemented in

86

the US EPA Virtual Beach model42 – e.g., LCC=0.337 for a non-linear Virtual Beach model for

87

the same dataset. The use of explanatory environmental variables as inputs to beach water

88

quality models is a well-known approach in the development of multiple-regression models. In

89

another study,43 ten environmental variables were used as inputs to forecast the FIB

90

concentrations of an urban waterway in the Chicago River utilizing the ANN method. ANN

91

models have advantages over traditional MR models when the underlying function

92

approximating the data is highly complex and nonlinear. In fact, ANN models are known to

93

work even when the underlying function cannot be expressed in terms of any known

94

mathematical functions. Therefore, the ANN model, as in the approach of Zhang et al.,41 can be

95

viewed as a progression of the MR approach.

96 97

In addition to these previous studies, it is also possible to use ANNs in a fundamentally different

98

data-based approach -- to generate short-term forecasts that rely on data alone without any

99

explanatory variables.44 The idea is that if a suitable method of time series analysis exists, then

100

the information needed to make short-term forecasts is all contained within the data, and no other

101

explanatory variable is required to provide additional information. Although ANN methods

102

described above have proven extremely useful for making short- and long-term forecasts in

103

many areas of science and engineering, microbiological water quality data are characterized by

104

nonstationarity (i.e., statistical properties such as the mean and variance do not remain constant

105

over time). ANN methods have well-known limitations in dealing with non-stationary datasets,45

106

while wavelet transforms can be robust tools for handling nonstationary datasets. Wavelet

4 ACS Paragon Plus Environment

Page 5 of 34

Environmental Science & Technology

107

methods have been used in the last two decades for analysis of both time series and spatial data.46

108

An important feature of wavelet analysis is the ability to decompose the original data into high

109

and low frequency contributions (i.e., fine and coarse features in the data) for further analysis.

110

Wavelets have been used, for example, for analysis of nearshore hydrodynamics47 including

111

longshore currents along a sandy coast48 and to address questions related to solute transport in

112

rivers49 and heterogeneous porous media.50 Therefore, by combining the strengths of wavelet

113

transforms (the ability to deal with nonstationary data) with those of ANN methods (the ability to

114

deal with nonlinearity), a powerful class of hybrid methods can be constructed for reliable,

115

short-term forecasts.

116 117

Wavelet-ANN (WANN) methods have been the focus of several recent papers, and the methods

118

produce reliable forecasts of daily river discharge and suspended sediment concentration, rainfall

119

runoff, snow melt-driven floods, and droughts.51-56 The objective of this paper is to explore the

120

possibility of applying ANN and combined wavelet-neural network models for nowcasting FIB

121

levels at recreational beaches in order to support advisory and beach closure decisions. Daily

122

monitoring data are routinely available for many beach sites in the United States; however, as

123

noted earlier, beach management based on water quality sampling alone is inadequate due to a

124

24-hour delay in running the assays, during which time conditions at the beach can change

125

quickly. Therefore specific questions that will be addressed in the paper include the following:

126

(1) if daily monitoring data are used in a ANN or WANN modeling framework, is it possible to

127

forecast tomorrow’s FIB levels using yesterday’s monitoring data? (2) is it possible to generate

128

forecasts with a high degree of confidence as quantified by standard metrics such as the

129

coefficient of determination (R2) and the root mean square errors (RMSE)? (3) how do ANN and

5 ACS Paragon Plus Environment

Environmental Science & Technology

Page 6 of 34

130

WANN methods perform relative to statistical (MR) and fully three-dimensional mechanistic

131

models developed for the same datasets? The first question is related to data requirements and

132

feasibility of the WANN approach since the method requires sufficiently long time series data to

133

meet the requirement of wavelet decomposition. Since only daily monitoring data are available, a

134

data augmentation method has to be used before applying the WANN model and this leads to

135

consideration of whether the original data follow a certain distribution, which is considered a

136

data feasibility question. The second question is important because previous modeling activities

137

have generally achieved R2 values less than 0.7,57-58 with the exception of Safaie et al.4 For this

138

reason, we set a higher standard for our proposed models to achieve an R2 value of 0.7 or better.

139

Generally, an R2 value greater than 0.7 is considered a strong relationship, according to Moore et

140

al.;59 therefore, for the third question if the WANN approach can yield forecasts with R2 values

141

comparable to or higher than 0.7 without the use of explanatory variables, then it represents a

142

significant advance over the current model-based approaches for beach management. The

143

datasets used in this research have been selected to facilitate direct comparisons with MR and

144

mechanistic models developed earlier4. To the best of our knowledge, there is no published work

145

that examines the application of the WANN approach for forecasting FIB levels at recreational

146

beaches and one of our aims in this paper is to fill this gap.

147 148

2. Methods

149

Three types of ANN models, the Nonlinear Input-Output (NIO)60 Nonlinear Autoregressive

150

neural network (NAR)61 and the Nonlinear Autoregressive Network with eXogenous inputs

151

(NARX)62 network, are applied in this work.20 The details of these three models are described in

152

the supporting information (SI), sections S1-S3. In all models, the dependent variable of interest

6 ACS Paragon Plus Environment

Page 7 of 34

Environmental Science & Technology

153

from the point of prediction is the log10 transformed E. coli data (referred to as LGEC herein).

154

Briefly, the NIO model (Model 1) predicts E. coli levels at the three beach sites (OD1, OD2,

155

OD3) using past values of E. coli data at the river mouth (BD) since discharge from the river

156

mouth is known to impact bacteria levels at the three beaches4. The NAR model (Model 2), on

157

the other hand, predicts E. coli levels at any site using only the past values of E. coli at the same

158

site. In both models (NIO and NAR) other than the E. coli data, no explanatory variables are

159

used unlike the NARX model that uses explanatory variables such as turbidity and wave height

160

in addition to the past values of E. coli at the site where predictions are being made. The input

161

parameters used in the NARX models in our work (Models 3 and 4) included (in addition to the

162

E. coli data): (1) 24-hour rainfall, (2) 4-hour water temperature (WTEMP4HR), (3) natural log of

163

turbidity at the beach sites (LNTURB), (4) daily log10(discharge) from the river mouth

164

(LGDISCH24HR), (5) an interaction term that is the product of daily discharge and turbidity at

165

the river mouth (LOAD24) and (6) onshore wind speed, which is the speed of wind blowing

166

from the lake towards the land (WNDSPD_ONSHORE) and (7) natural log of the 4-hour

167

significant wave height (LNWHSIG4HR). Model 3 used parameters 1, 2, 3, 4, and 5 in the above

168

list while Model 4 used 1, 2, 3, 6, and 7. The wavelet decomposition technique (described in

169

section S4 in the Supporting Information) is combined with the NAR model to further improve

170

the accuracy of the NAR model prediction. In ANN models, the available data are usually

171

divided into three subsets - training, validation and testing periods (more information is available

172

in the Supporting Information). Since the model used for beach management will be first trained

173

and validated, we report the testing R2 and RMSE values for all models. Moreover, the model

174

sensitivity and specificity metrics58 are also used to assess model performance. Sensitivity

175

(specificity) refers to the probability of exceedance (non-exceedance) of the EPA standard (i.e.,

7 ACS Paragon Plus Environment

Environmental Science & Technology

Page 8 of 34

176

the BAV) that can be predicted by the models. These metrics provide additional information on

177

the usefulness of the models for beach management.

178 179

The focus of the present paper is on nowcasting bacterial levels at four sites in Southern Lake

180

Michigan. Unlike some previous applications in which the wavelet decomposition was

181

performed on the whole time series data including the testing period, a practice that is misleading

182

as it leads to highly inflated values of R2, in the present work, the network has no information on

183

the wavelet coefficients for future data while making a forecast since wavelet decomposition for

184

the forecast period has not yet been done.

185 186

2.1 The WA-NAR hybrid model:

187

A WA-NAR hybrid model is obtained by combining the two methods, the NAR and the discrete

188

wavelet transform (DWT). The wavelet decomposition coefficients of the FIB data are passed

189

into the NAR model to make a short-term forecast. We can examine the correlation between

190

different wavelet components and the original time series data. For the WA-NAR model inputs,

191

the original LGEC time series of each site are decomposed into various detail components (W's)

192

at different resolution levels and one approximation component (C) at the last coarse resolution

193

level using the á Trous algorithm.63 A key detail involves the number of wavelet levels required

194

to approximate the original data. Although there is no existing theory to tell how many resolution

195

levels are needed for any given time series, it is generally believed that L = int [log10 ( N )] levels

196

are needed,64 where L denotes the number of wavelet levels and N is the number of data points

197

in the time series.

198 8 ACS Paragon Plus Environment

Page 9 of 34

Environmental Science & Technology

199

Monitoring programs in the Great Lakes region start with the beginning of the beach season and

200

continue through summer (approximately late May-late August). Assuming the availability of

201

daily data for a 3-month period not counting weekends, we get N ≈ 80 and L = 1 . This means

202

that data obtained from a daily monitoring program are not sufficiently dense for a

203

straightforward application of the wavelet theory; however, monitoring data can be augmented /

204

up-sampled in several ways to facilitate the application of the WANN method. The approaches

205

essentially make an assumption about how data varies between sampling times (successive

206

days). One approach is to augment the monitoring data using model hindcasts based on a

207

well-tested mechanistic model that uses small time steps. This represents a process-based method

208

of interpolating data on successive days, allowing the generation of high-resolution time series

209

data (e.g., hourly or every minute) needed for the WA-NAR approach. Other alternative

210

approaches include regular interpolation method, the simplest being linear interpolation, and

211

Markov Chain Monte Carlo65-66 (MCMC) sampling with Kalman filtering67 to remove the noise.

212

The choice of the method should depend on the factors that contribute to elevated bacterial levels

213

at the sites. At the beach sites considered in the present work, loading from nearby tributaries

214

was the primary factor18 contributing to the pollution. The presence or absence of a peak at a

215

beach on any given day depends on the direction of the river plume within the lake

216

environment16,68 (i.e., plume traveling toward the beach or away from it). River plume directions

217

generally shift over the course of several days, although within-day shifts are possible;4,16,68-69

218

therefore, daily sampling provides a reasonable temporal resolution on most days from the point

219

of capturing the peaks. We note, however, that time scales shorter than the diurnal scale cannot

220

be resolved using daily monitoring data and statistical sampling, and mechanistic model

221

hindcasts have the ability to resolve additional details (e.g., peaks) at the sub-diurnal time scales.

9 ACS Paragon Plus Environment

Environmental Science & Technology

Page 10 of 34

222

After comparing the time series data obtained by the Markov Chain Monte Carlo (MCMC)

223

approach at half-daily resolution with mechanistic modeling results4, we decided to use

224

half-hourly data based on MCMC method combined with Kalman filtering for the application of

225

the WA-NAR model. The half-hourly “observed” data prior to the current time of prediction

226

were generated by using the semi-diurnal time series data. More details are provided in the

227

Supporting Information section S5. The WA-NAR method was implemented using the wavelet

228

and neural network toolboxes in MATLAB version 9.1, R2016b (The Mathworks Inc., Natick,

229

MA). We also implemented the above method to decompose the variables that have strong

230

correlation with LGEC. All the wavelet decomposition coefficients of input variables are used as

231

inputs into the NAR model for nowcasting. For the newly generated half-hourly data, the first

232

70% of E. coli concentration data (1361 hours) were used for training; 15% of data (291 hours)

233

were used for validation and the remaining 15% were used for testing purpose.

234 235

2.2 Site description

236

To demonstrate the application of the WA-NAR method to nowcast elevated FIB levels at

237

beaches, we use E. coli data collected during the summer of year 2008 at four sites in southern

238

Lake Michigan. Out of the four sites, one site (BD) represents the river mouth while the

239

remaining three sites (OD3, OD2, OD1 in increasing distance from BD, see map in Figure 1) are

240

beach sites. The three beach sites are primarily impacted by contamination from the river mouth

241

at BD, therefore E. coli and turbidity values at the BD site are important inputs that control the

242

dynamics of E. coli at the three beach sites4. Additional details of the sites and the data can be

243

found in Safaie et al.4

244

10 ACS Paragon Plus Environment

Page 11 of 34

Environmental Science & Technology

245

Burns Ditch (BD) is the river mouth of the Little Calumet River in northwest Indiana. The three

246

sampling locations OD1, OD2, and OD3 are located to the west of the outfall in the town of

247

Ogden Dunes. E. coli concentrations in Burns Ditch are historically high, and likely influence

248

shoreline E. coli concentrations, depending on the flow direction of the river plume.18 During

249

2008, water samples were collected at each location and analyzed for E. coli within 4-6 hours in

250

the laboratory using a defined substrate technology (IDEXX Inc., Westbrook, ME). For

251

modeling purposes, E. coli data were log10-transformed since the value could vary by orders of

252

magnitudes. Water samples were also analyzed in the laboratory for turbidity (NTU; 2100N

253

Turbidimeter, Hach Company, Loveland, CO).

254

3. Results

255

The R2 and RMSE values for the testing period data sets of all models are listed in Table 1 while

256

the results of R2 and RMSE during all three periods (training, validation and testing) are shown

257

in Table S6 (in the Supporting Information). We now report the testing period R2 and RMSE

258

values as pairs in the following sections.

259

3.1 The NIO model

260

Figures 2(a) and S2 show the results of the Model 1, the NIO model. The value of (R2, RMSE) is

261

(0.53, 0.33) during the testing period at OD1 site, which is the farthest beach site to the waterway

262

(Figure 1). The (R2, RMSE) values for the testing period are (0.43, 0.53) and (0.46, 0.32) at the

263

OD2 and OD3 sites respectively (Table 1).

264

3.2 The NAR model

265

The past values of the original LGEC data were used as inputs to predict the LGEC values at all

266

four sites. The performance metrics for Model 2, the NAR model are shown in Figures 2(c) and

267

S3. The value of (R2, RMSE) is (0.8, 0.11) during testing period at BD site (Figure S3(a)). Figure

11 ACS Paragon Plus Environment

Environmental Science & Technology

Page 12 of 34

268

2(c) shows model performance during the three periods of training, validation and testing at the

269

OD1 site with the (R2, RMSE) value of (0.38, 0.24) during the testing period. The values are

270

(0.34, 0.41) and (0.59, 0.45) at OD2 and OD3 sites (Figure S3) respectively.

271

3.3 The NARX model

272

In order to determine the inputs of the NARX model from the ten environmental factors

273

measured at the sites (see Supporting Information for details and explanation of variable names),

274

cross-correlation between the ten environmental factors and LGEC for different time lags were

275

examined in addition to the auto-correlation of LGEC. The results are listed in Tables S1-S4 for

276

the four sites. Two sets of NARX models were developed based on different combinations of

277

explanatory variables for nowcasting LGEC datasets at the four sites. The NARX models are

278

trained and tested based on different combinations of time series at all four sites such that

279

environmental variables with similar values of correlation coefficients and with a strong

280

correlation with E. coli concentrations fall into similar groups. Four-hour water temperature

281

(WTEMP4HR) and daily rainfall are also considered as the input variables for the NARX models,

282

which were found to be important input variables as they affect the FIB concentrations and

283

contribute to elevate contaminant levels in the receiving water.70 Rainfall data comes from the

284

National Park service weather station that is located in Porter, IN. We used aggregated rainfall

285

data from the previous 24 hours as an explanatory variable. Figures 2(e) and S4 show the Model

286

3 (NARX model) results based on LGDISCH24HR (log10 of 24-hour discharge from the river),

287

BD_LNTURB (natural log of turbidity at the river mouth BD), LNTURB (natural log of

288

turbidity at the beach site), water temperature (WTEMP4HR), rainfall and LGEC as inputs,

289

which are the parameters with high correlation coefficients with the dependent variable LGEC at

290

four sites. The R2 and RMSE during testing period are listed in Table 1, and the (R2, RMSE)

12 ACS Paragon Plus Environment

Page 13 of 34

Environmental Science & Technology

291

values during all periods are shown in Table S6. Figure S4(a) shows the Model 3 result at the BD

292

site. The value of (R2, RMSE) for the testing period is (0.86, 0.16). Figure 2(e) shows the

293

performance of Model 3 at the OD1 site with an (R2, RMSE) value of (0.8, 0.15) during the

294

testing period. The corresponding (R2, RMSE) values are (0.82, 0.31) and (0.80, 0.23) at sites

295

OD2 and OD3 respectively.

296

In another version of the NARX model (Model 4, Table 1), the input variables LNTURB,

297

LNWHSIG4HR, WNDSPD_ONSHORE, WTEMP4HR, rainfall and LGEC are used and tested

298

at all four sites. The results of Model 4 are shown in Figure S5 and the results of R2 and RMSE

299

during testing period are shown in Table 1, and the corresponding R2 and RMSE for all three

300

periods are listed in Table S6.

301

3.4 The WA-NAR hybrid model

302

E. coli data spanning a period of 86 days (from early June to the end of August 2008) are

303

available for developing the models. Details of data augmentation / up-sampling are included in

304

the Supporting Information and the results based on three wavelet levels are reported in Table S5.

305

Since the number of wavelet coefficients used in the model increases with the wavelet levels, we

306

have attempted to use an optimum number of wavelet levels that minimize forecast errors and

307

avoid over-fitting.

308

As can be expected, the most important components are those that have a high correlation with

309

the original data. The correlation coefficients between the last level of the previous sub-time

310

series and the original time series (e.g., r (t − 1, t ) and r (t − 2, t ) etc.) are shown in Table S5 for

311

the detail (W) and approximation (C) components for the E. coli data at all four sites. Based on

312

the strength of these correlation coefficients, the important sub-time series components ( Wi,n , Cn ,

313

i = 2,3,4 , n = 1,2,... n hours lagging time) were used as the inputs to the WA-NAR model. 13 ACS Paragon Plus Environment

Environmental Science & Technology

Page 14 of 34

314 315

The wavelet decomposition coefficients of the past values of the half-hourly LGEC data were

316

used as input to forecast LGEC. The scatterplot and time series for comparing the observed and

317

simulated LGEC datasets at BD site using Model 5, the WA-NAR model are showed in Figure 3

318

and Figure S7. The (R2, RMSE) value for testing period at the BD site is (0.86, 0.05) (Table 1).

319

The corresponding values at the OD1, OD2 and OD3 sites are (0.62, 0.07), (0.57, 0.1) and (0.62,

320

0.11) respectively. The higher R2 and the lower RMSE values were obtained for the BD site

321

relative to the sites at OD1, OD2 and OD3.

322 323

As explained in the methods section and the Supporting Information, after training the network

324

by the WA-NAR model, the network was used to predict the future LGEC at the four sampling

325

sites. The input values are the random time series that were generated by MCMC slice sampling

326

with the original observations retained at the original sampling times. The feedback loop only

327

performs a one-step-ahead prediction when the NAR network is “open”. While the loop of the

328

NAR network is closed when the training of model is completed, it performs multi-step-ahead

329

predictions. The predictions were done at 0.25-, 0.5- and 1-day (i.e., 12, 24, and 48 half-hourly

330

data points respectively) time scales, and the R2 and RMSE values were calculated for all three

331

cases (Table 2). The (R2, RMSE) values for 0.25-day ahead prediction are (0.83, 0.04), (0.51,

332

0.06), (0.7, 0.08) and (0.53, 0.08) at the four sites, respectively. Similarly, the prediction results

333

of (R2, RMSE) at 0.5- and 1-day ahead predictions are listed in Table 2.

334

The sensitivity and specificity metrics for all models are showed in Table S6 and Figure S8. It

335

can be seen that when there are no exceedances the models have zero sensitivity. For all the

336

models evaluated, sensitivities are greater than 0.3, and specificities are greater than 0.9.

14 ACS Paragon Plus Environment

Page 15 of 34

Environmental Science & Technology

337

4. Discussion

338

We found that the WA-NAR model was substantially more accurate than the NAR model at all

339

four sites as shown by the higher R2 values and lower RMSE values. The discrete wavelet

340

transform allowed most of the noisy data to be removed and facilitated the extraction of

341

quasi-periodic and periodic signals in the original time series. The wavelet coefficients obtained

342

by decomposing the original data contain all of the important information at different temporal

343

scales and can be used to make short-term predictions. The wavelet transform improved the

344

performance of the NAR nowcasting model by providing useful information at various

345

decomposition levels. Hence the WA-NAR model is a potentially useful new method for

346

nowcasting indicator bacteria originating from river plumes and at river beach/park sites. The

347

NARX models also show the accurate simultaneous prediction with environmental variables as

348

inputs. Comparing the results of ANN models between BD site and OD sites, the former (BD site

349

model) is superior to the latter one (OD sites models). This result is mainly determined by the

350

special geographical location of the sampling sites. The lake / beach sites (OD sites) are strongly

351

affected by waves, wind, bacterial loading from shoreline sand and bird inputs, re-suspension of

352

bottom sediment and many other processes making it relatively more difficult to make accurate

353

predictions at the beach sites. Our results indicate that FIB levels at river sites can be predicted

354

more accurately compared to beach sites. The river mouth (BD site) has significant impact on the

355

dynamics of E. coli at beaches Therefore LGEC data at the BD site brings information that can

356

positively influence predictions at the beach sites. The comparisons shown in this work are

357

encouraging and provide motivation to further examine this class of methods for beach

358

management, especially when combined with automated continuous monitoring and prediction

359

of beach water quality.14

15 ACS Paragon Plus Environment

Environmental Science & Technology

Page 16 of 34

360 361

With the half-hourly LGEC data that is a random time series generated using the MCMC slice

362

sampling and replaced with original observations as input, the WA-NAR model was used to

363

nowcast the E. coli concentrations at different time scales (0.25-, 0.5- and 1-day). The results

364

(Table 2) indicate that the predicted results are acceptable for beach management, responding to

365

the first question we sought to address in the introduction. The standard metrics (R2 and RMSE)

366

were used to quantify the performance of each model, and some models have R2 values greater

367

than 0.7 indicating a high degree of confidence with the forecast, responding to the second

368

question. Moreover, predictive models with sensitivity greater than 0.3, and specificity greater

369

than 0.9 can also be considered as good (Table 1). The ANN models developed in this work are

370

better than the MLR models reported in Park et al.,34 for example. The R2 and RMSE values

371

compared with those from MLR and mechanistic models, especially R2 values higher than 0.7 in

372

our NARX models and R2 values around 0.7 for the WA-NAR model without the use of

373

explanatory variables, prove that the ANN and wavelet-ANN based approaches considered in

374

this work are promising. The WA-NAR models are particularly appealing for application to data

375

scarce regions without access to any data on explanatory variables (the third question we sought

376

to address in the introduction).

377 378

Despite the differences in the parameters of the two NARX models evaluated in this work

379

(Models 3 and 4), the two models produced comparable performance. Model 4 used the

380

significant wave height and onshore wind speed - two important parameters known to influence

381

E. coli in coastal waters. Onshore winds push river plumes towards the shore inducing an

382

alongshore current that can carry shore-hugging plumes several kilometers along the shoreline

16 ACS Paragon Plus Environment

Page 17 of 34

Environmental Science & Technology

383

from the river mouth. Wave activity has the potential to resuspend E. coli from bottom sediment.

384

It is encouraging to note that both NARX models (Models 3, 4) produced high R2 values (0.86

385

and 0.94) at the river site BD. It can be expected that further network optimization and data

386

exploration (including the use of data transformations and identification of significant interaction

387

terms) will lead to additional improvements in the performance of these models pushing the R2

388

limit further up to values around 0.9.

389 390

The NARX models (e.g., Model 4) described in this paper produced some of the highest values

391

of R2 in the literature and are comparable to or better than the statistical and mechanistic

392

modeling results reported earlier4. The MCMC method was used to generate half-hourly interval

393

LGEC data so that tomorrow’s values can be predicted using yesterday’s monitoring data. Such

394

an approach to up-sample data is also needed at the beginning of the beach season when the

395

amount of available data is limited. Using the wavelet decomposition data and training and

396

validation results from the previous year’s beach season is another promising approach but this is

397

beyond the scope of the present paper. The ANN or WANN models described in this paper can

398

also be applied to recreational waters with different conditions (e.g., marine beaches with tides

399

and influence of salinity or beaches where shoreline sand, bird inputs and the presence of

400

breakwaters modify circulation and FIB fate and transport) and to predict harmful algal blooms71

401

in lakes which are influenced by several of the same explanatory variables used in this work.

402

While a majority of the published mechanistic nearshore models use observed data collected in

403

the past to generate model hindcasts, it is possible to link well-tested watershed models72-75 that

404

describe surface and subsurface transport processes76-77 with beach water quality models to

405

generate continuous forecasts in real-time. However, very few sites use linked watershed - beach

17 ACS Paragon Plus Environment

Environmental Science & Technology

Page 18 of 34

406

water quality models for making real-time nowcasts of beach water quality, perhaps due to the

407

enormous effort involved in independently testing and linking the two types of models. By

408

including rainfall and other relevant explanatory variables in a NARX or WA-NAR model, the

409

models can represent watershed-scale fate and transport processes that control the fluxes of FIB

410

delivered to downstream receiving water bodies such as lakes. Another class of models that have

411

considerable promise for beach management combine the use of wavelet decomposition with the

412

best NARX class of models that use explanatory variables such as turbidity as input. This class

413

of WA-NARX models are a natural extension of the WA-NAR models evaluated in this work.

414 415

Acknowledgments

416

This work was partially supported by National Natural Science Foundation of China (41530316).

417

We acknowledge the use of the IAN symbol libraries in creating the Abstract Art

418

(http://ian.umces.edu/symbols).

419 420

Supporting Information

421

The methods of ANN models, wavelet decomposition, MCMC sampling and Kalman filtering

422

are presented in Supporting Information sections S1-S5. Figures S1-S8 show the architecture of

423

NAR neural network, as well as plots of wavelet decomposition and, the models’ performance,

424

the random number generation of the logarithm of E. coli using the MCMC method and the

425

predicted versus observed LGEC using the ANN models at OD sites. The tables of the

426

auto-correlation coefficients of LGEC and the cross-correlation coefficients between LGEC and

427

other input parameters at the four sites are presented in Tables S1-S4. Auto-correlation

18 ACS Paragon Plus Environment

Page 19 of 34

Environmental Science & Technology

428

coefficients for different wavelet components are included in Table S5. The values of R2 and

429

RMSE for the training, validation and test periods are listed in Table S6.

430 431

References

432

(1) U.S. Environmental Protection Agency. Predictive Tools for Beach Notification, Volume I:

433 434 435 436 437

Review and Technical Protocol; 2010, I, 61. (2) Dorfman, M.; Haren, A. Testing the Waters, Twenty-fourth Edition, Nat. Resour. Def. Counc. Washington, D.C. 2014. (3) Nevers, M. B.; Whitman, R. L. Efficacy of monitoring and empirical predictive modeling at improving public health protection at Chicago beaches Water Res. 2011, 45 (4), 1659–1668.

438

(4) Safaie, A.; Wendzel, A.; Ge, Z.; Nevers, M. B.; Whitman, R. L.; Corsi, S. R.; Phanikumar, M.

439

S. Comparative Evaluation of Statistical and Mechanistic Models of Escherichia coli at

440

Beaches in Southern Lake Michigan Environ. Sci. Technol. 2016, 50 (5), 2442−2449.

441

(5) Whitman, R. L.; Ge, Z.; Nevers, M. B.; Boehm, A. B.; Chern, E. C.; Haugland, R. A.;

442

Lukasik, A. M.; Molina, M.; Przybyla-Kelly, K.; Shively, D. A. Relationship and variation of

443

qPCR and culturable Enterococci estimates in ambient surface waters are predictable Environ.

444

Sci. Technol. 2010, 44 (13), 5049–5054.

445

(6) Haugland, R. A. Comparison of enterococcus measurements in freshwater at two recreational

446

beaches by quantitative polymerase chain reaction and membrane filter culture analysis

447

Water Res. 2005, 39, 559–568.

448

(7) Wade, T.J. High sensitivity of children to swimming-associated gastrointestinal illness:

449

results using a rapid assay of recreational water quality Epidemiology 2008, 19, 375–383.

19 ACS Paragon Plus Environment

Environmental Science & Technology

Page 20 of 34

450

(8) Griffith, J.F.; Weisberg, S. B. Challenges in implementing new technology for beach water

451

quality monitoring: lessons from a California demonstration project Mar. Technol. Soc. J.

452

2011, 45 (2), 65–73.

453

(9) Sheth, N., McDermott, C.; Busse, K.; Kleinheinz, G. Evaluation of Enterococcus

454

concentrations at beaches in Door County, WI (Lake Michigan, USA) by qPCR and defined

455

substrate culture analysis J. Great Lakes Res. 2016, 42 (4), 768-774.

456

(10) Dorevitch, S.; Shrestha, A.; DeFlorio-Barker, S.; Breitenbach, C.; Heimler, I. Monitoring

457

urban beaches with qPCR vs. culture measures of fecal indicator bacteria: Implications for

458

public notification Environ. Health. 2017, 16 (1), 45.

459

(11) Boehm, A. B.; Keymer, D. P.; Shellenbarger, G. G. An analytical model of enterococci

460

inactivation, grazing, and transport in the surf zone of a marine beach Water Res. 2005, 39

461

(15), 3565–3578.

462 463

(12) Gao, G.; Falconer, R. A.; Lin, B. Numerical modelling of sediment-bacteria interaction processes in surface waters Water Res. 2011, 45 (5), 1951–1960.

464

(13) Ge, Z.; Whitman, R. L.; Nevers, M. B.; Phanikumar, M. S. Wave-induced mass transport

465

affects daily Escherichia coli fluctuations in nearshore water Environ. Sci. Technol. 2012, 46

466

(4), 2204–2211.

467

(14) Ge, Z.; Whitman, R. L.; Nevers, M. B.; Phanikumar, M. S.; Byappanahalli, M. N. Nearshore

468

hydrodynamics as loading and forcing factors for Escherichia coli contamination at an

469

embayed beach Limnol. Oceanogr. 2012, 57 (1), 362–381.

470 471

(15) Kashefipour, S. M.; Lin, B.; Harris, E.; Falconer, R. A. Hydro-environmental modelling for bathing water compliance of an estuarine basin Water Res. 2002, 36 (7), 1854–1868.

20 ACS Paragon Plus Environment

Page 21 of 34

Environmental Science & Technology

472

(16) Liu, L.; Phanikumar, M. S.; Molloy, S. L.; Whitman, R. L.; Shively, D. A.; Nevers, M. B.;

473

Schwab, D. J.; Rose, J. B. Modeling the transport and inactivation of E. coli and enterococci

474

in the near-shore region of Lake Michigan Environ. Sci. Technol. 2006, 40 (16), 5022–5028.

475

(17) Sanders, B. F.; Arega, F.; Sutula, M. Modeling the dry-weather tidal cycling of fecal

476

indicator bacteria in surface waters of an intertidal wetland Water Res. 2005, 39 (14), 3394–

477

3408.

478

(18) Nevers, M. B.; Whitman, R. L. Nowcast modeling of Escherichia coli concentrations at

479

multiple urban beaches of southern Lake Michigan Water Res. 2005, 39 (20), 5250–5260.

480

(19) Shively, D. A.; Nevers, M. B.; Breitenbach, C.; Phanikumar, M. S.; Przybyla-Kelly, K.;

481

Spoljaric, A. M.; Whitman, R. L. Prototypic automated continuous recreational water quality

482

monitoring of nine Chicago beaches J. Environ. Manage. 2016, 166, 285-293.

483

(20) Samarasinghe, S. Neural Networks for Applied Sciences and Engineering: From

484

Fundamentals to Complex Pattern Recognition, 1st ed.; Auerbach Publications: Boca Raton,

485

2006.

486

(21) Thoe, W.; Gold, M.; Griesbach, A.; Grimmer, M.; Taggart, M. L.; Boehm, A. B. Sunny with

487

a chance of gastroenteritis: predicting swimmer risk at California beaches Environ. Sci.

488

Technol. 2014, 49 (1), 423-431.

489 490 491 492

(22) Tian, W.; Liao, Z.; Zhang, J. An optimization of artificial neural network model for predicting chlorophyll dynamics Ecol. Model. 2017, 364, 42-52. (23) Shen, CP. Deep learning: A next-generation big-data approach for hydrology, Eos, 2018, 99, https://doi.org/10.1029/2018EO095649.

21 ACS Paragon Plus Environment

Environmental Science & Technology

Page 22 of 34

493

(24) Fang, K.; Shen, C.; Kifer, D.; Yang, X. Prolongation of smap to spatio‐temporally

494

seamless coverage of continental us using a deep learning neural network Geophys. Res. Lett.

495

2017, 44.

496 497

(25) Brion, G. M.; Lingireddy, S. A neural network approach to identifying non-point sources of microbial contamination Water Res. 1999, 33 (14), 3099–3106.

498

(26) Chen, C.-S.; Chen, B. P.-T.; Chou, F. N.-F.; Yang, C.-C. Development and application of a

499

decision group Back-Propagation Neural Network for flood forecasting J. Hydrol. 2010, 385

500

(1-4), 173–182.

501 502 503 504 505 506 507 508

(27) French, M. N.; Krajewski, W. F.; Cuykendall, R. R. Rainfall forecasting in space and time using a neural network J. Hydrol. 1992, 137 (1-4), 1–31. (28) Lin, G.-F.; Wu, M.-C. A hybrid neural network model for typhoon-rainfall forecasting J. Hydrol. 2009, 375 (3-4), 450–458. (29) Chang, F.-J.; Chen, Y.-C. A counterpropagation fuzzy-neural network modeling approach to real time streamflow prediction J. Hydrol. 2001, 245 (1-4), 153–164. (30) Imrie, C. E.; Durucan, S.; Korre, A. River flow prediction using artificial neural networks: generalisation beyond the calibration range J. Hydrol. 2000, 233 (1-4), 138–153.

509

(31) Triana, E.; Labadie, J. W.; Gates, T. K.; Anderson, C. W. Neural network approach to

510

stream-aquifer modeling for improved river basin management J. Hydrol. 2010, 391 (3-4),

511

235–247.

512 513

(32) Kerem Cigizoglu, H.; Kisi, Ö. Methods to improve the neural network performance in suspended sediment estimation J. Hydrol. 2006, 317 (3-4), 221–238.

22 ACS Paragon Plus Environment

Page 23 of 34

Environmental Science & Technology

514

(33) Cobaner, M.; Unal, B.; Kisi, O. Suspended sediment concentration estimation by an

515

adaptive neuro-fuzzy and neural network approaches using hydro-meteorological data J.

516

Hydrol. 2009, 367 (1-2), 52–61.

517

(34) Park, Y.; Kim, M.; Pachepsky, Y.; Choi, S. H.; Cho, J. G.,; Jeon, J. Development of a

518

nowcasting system using machine learning approaches to predict fecal contamination levels

519

at recreational beaches in korea J. Environ. Qual. 2018.

520 521

(35) Huang, W.; Murray, C.; Kraus, N.; Rosati, J. Development of a regional neural network for coastal water level predictions Ocean Eng. 2003, 30 (17), 2275–2295.

522

(36) Khalil, B.; Ouarda, T. B. M. J.; St-Hilaire, A. Estimation of water quality characteristics at

523

ungauged sites using artificial neural networks and canonical correlation analysis J. Hydrol.

524

2011, 405 (3-4), 277–287.

525

(37) Sahoo, G. B.; Ray, C.; Wang, J. Z.; Hubbs, S. A.; Song, R.; Jasperse, J.; Seymour, D. Use of

526

artificial neural networks to evaluate the effectiveness of riverbank filtration Water Res. 2005,

527

39 (12), 2505–2516.

528

(38) Chua, L. H. C.; Wong, T. S. W. Improving event-based rainfall–runoff modeling using a

529

combined artificial neural network–kinematic wave approach J. Hydrol. 2010, 390 (1-2), 92–

530

107.

531 532 533 534 535 536

(39) Hashemi, M. R.; Ghadampour, Z.; Neill, S. P. Using an artificial neural network to model seasonal changes in beach profiles Ocean Eng. 2010, 37 (14-15), 1345–1356. (40) Lee, K.-H.; Mizutani, N.; Fujii, T. Prediction of Wave Breaking on a Gravel Beach by an Artificial Neural Network J. Coast. Res. 2011, 272, 318–328. (41) Zhang, Z.; Deng, Z.; Rusch, K. A. Development of predictive models for determining enterococci levels at Gulf Coast beaches Water Res. 2012, 46 (2), 465–474.

23 ACS Paragon Plus Environment

Environmental Science & Technology

Page 24 of 34

537

(42) Ge, Z.; Frick, W. E. Time-frequency analysis of beach bacteria variations and its

538

implication for recreational water quality modeling Environ. Sci. Technol. 2009, 43 (4),

539

1128–1133.

540

(43) Vijayashanthar, V.; Qiao, J.; Zhu, Z.; Entwistle, P.; Yu, G. Modeling fecal indicator bacteria

541

in urban waterways using artificial neural networks J. Environ. Eng. 2018, 144 (6),

542

05018003.

543 544

(44) Wu, C. L.; Chau, K. W.; Li, Y. S. Methods to improve neural network performance in daily flows prediction J. Hydrol. 2009, 372 (1), 80-93.

545

(45) Adamowski, J.; Sun, K. Development of a coupled wavelet transform and neural network

546

method for flow forecasting of non-perennial rivers in semi-arid watersheds J. Hydrol. 2010,

547

390 (1-2), 85–91.

548 549 550 551

(46) Torrence, C.; Compo, G. P. A Practical Guide to Wavelet Analysis Bull. Am. Meteorol. Soc. 1998, 79 (1), 61–78. (47) Różyński, G.; Reeve, D. Multi-resolution analysis of nearshore hydrodynamics using discrete wavelet transforms Coast. Eng. 2005, 52 (9), 771–792.

552

(48) Kaczmarek, J.; Rozynski, G.; Pruszak, Z. Long period oscillations in the longshore current

553

on a sandy, barred coast investigated with singular spectrum analysis Oceanologia 2005, 47

554

(1), 5-25.

555

(49) Phanikumar, M. S.; Aslam, I.; Shen, C.; Long, D. T.; Voice, T. C. Separating surface

556

storage from hyporheic retention in natural streams using wavelet decomposition of acoustic

557

Doppler current profiles Water Resour. Res. 2007, 43 (5), 576-576.

558

(50) Qi, X.; Neupauer, R. M. Wavelet analysis of characteristic length scales and orientation of

559

two-dimensional heterogeneous porous media Adv. Water Resour. 2010, 33 (4), 514–524.

24 ACS Paragon Plus Environment

Page 25 of 34

Environmental Science & Technology

560

(51) Adamowski, J. F. Development of a short-term river flood forecasting method for snowmelt

561

driven floods based on wavelet and cross-wavelet analysis J. Hydrol. 2008, 353 (3-4), 247–

562

266.

563

(52) Anctil, F.; Tape, D. G. An exploration of artificial neural network rainfall-runoff forecasting

564

combined with wavelet decomposition J. Environ. Eng. Sci. 2004, 3 (S1), S121–S128(8).

565

(53) Kim, T.-W.; Valdés, J. B. Nonlinear Model for Drought Forecasting Based on a

566

Conjunction of Wavelet Transforms and Neural Networks J. Hydrol. Eng. 2003, 8 (6), 319–

567

328.

568 569 570 571 572 573

(54) Kişi, Ö. Neural Networks and Wavelet Conjunction Model for Intermittent Streamflow Forecasting J. Hydrol. Eng. 2009, 14 (8), 773–782. (55) Partal, T.; Cigizoglu, H. K. Estimation and forecasting of daily suspended sediment data using wavelet–neural networks J. Hydrol. 2008, 358 (3-4), 317–331. (56) Shiri, J.; Kisi, O. Short-term and long-term streamflow forecasting using a wavelet and neuro-fuzzy conjunction model J. Hydrol. 2010, 394 (3-4), 486–493.

574

(57) Nevers, M. B.; Boehm, A. B.; Sadowsky, M. J.; Whitman, R. L. Modeling fate and transport

575

of fecal bacteria in surface water. Center for Integrated Data Analytics Wisconsin Science

576

Center 2011.

577

(58) Francy, D. S.; Brady, A. M. G.; Carvin, R. B.; Corsi, S. R.; Fuller, L. M.; Harrison, J. H.

578

Developing and implementing predictive models for estimating recreational water quality at

579

Great Lakes Beaches: U.S. Geological Survey Scientific Investigations Report 2013, 2013–

580

5166.

581 582

(59) Moore, D. S.; Notz, W. I.; Flinger, M. A. The basic practice of statistics, 6th edition. New York, New York: W. H. Freeman and Company. 2013.

25 ACS Paragon Plus Environment

Environmental Science & Technology

Page 26 of 34

583

(60) Kurtulus, B.; Razack, M. Evaluation of the ability of an artificial neural network model to

584

simulate the input-output responses of a large karstic aquifer: the la rochefoucauld aquifer

585

(charente, france) Hydrogeol. J. 2007, 15 (2), 241-254.

586

(61) Benmouiza, K.; Cheknane, A. Forecasting hourly global solar radiation using hybrid k

587

-means and nonlinear autoregressive neural network models Energ. Convers. Manage.

588

2013, 75 (5), 561-569.

589

(62) Li, G.; Wen, C.; Zheng, W. X.; Chen, Y. Identification of a class of nonlinear autoregressive

590

models with exogenous inputs based on kernel machines. IEEE T. Signal Proces. 2011, 59

591

(5), 2146-2159.

592

(63) Adamowski, J.; Sun, K. Development of a coupled wavelet transform and neural network

593

method for flow forecasting of non-perennial rivers in semi-arid watersheds J. Hydrol. 2010,

594

390 (1–2), 85-91.

595

(64) Tiwari, M. K.; Chatterjee, C. Development of an accurate and reliable hourly flood

596

forecasting model using wavelet–bootstrap–ANN (WBANN) hybrid approach J. Hydrol.

597

2010, 394 (3-4), 458–470.

598

(65) Zhu, Q.; Riley, W. J.; Tang, J.; Koven, C. D. Multiple soil nutrient competition between

599

plants, microbes, and mineral surfaces: model development, parameterization, and example

600

applications in several tropical forests Biogeosci. Discuss. 2016, 12 (5), 4057-4106.

601

(66) Ricciuto, D. M.; Davis, K. J.; Keller, K. A bayesian calibration of a simple carbon cycle

602

model: the role of observations in estimating and reducing uncertainty Global Biogeochem.

603

Cy. 2008, 22 (2).

604

(67) Haykin, S. S. Kalman Filtering and Neural Networks. John Wiley & Sons, Inc. 2001.

26 ACS Paragon Plus Environment

Page 27 of 34

Environmental Science & Technology

605

(68) Thupaki, P.; Phanikumar, M. S.; Beletsky, D.; Schwab, D. J.; Nevers, M. B.; Whitman, R. L.

606

Budget Analysis of Escherichia coli at a Southern Lake Michigan Beach Environ. Sci.

607

Technol. 2010, 44 (3), 1010-1016.

608

(69) Nekouee, N.; Hamidi, S. A.; Roberts, P. J. W.; Schwab, D. J. A coupled empirical -

609

numerical model for a buoyant river plume in Lake Michigan Water Air Soil Poll. 2015, 226

610

(12), 1-15.

611

(70) He, L. M.; He, Z. L. Water quality prediction of marine recreational beaches receiving

612

watershed baseflow and stormwater runoff in southern California, USA. Water Res. 2008, 42

613

(10-11), 2563-2573.

614

(71) Walter, M.; Recknagel, F.; Carpenter, C.; Bormans, M. Predicting eutrophication effects in

615

the Burrinjuck Reservoir (Australia) by means of the deterministic model SALMO and the

616

recurrent neural network model ANNA. Ecol. Model 2001, 146 (1), 97-113.

617 618 619 620 621 622

(72) Dorner, S. M.; Anderson, W. B.; Slawson, R. M.; Kouwen, N.; Huck, P. M. Hydrologic Modeling of Pathogen Fate and Transport Environ. Sci. Technol. 2006, 40 (15), 4746–4753. (73) Haydon, S.; Deletic, A. Development of a coupled pathogen-hydrologic catchment model J. Hydrol. 2006, 328 (3-4), 467–480. (74) Panday, S.; Huyakorn, P. S. A fully coupled physically-based spatially-distributed model for evaluating surface/subsurface flow Adv. Water Resour. 2004, 27 (4), 361–382.

623

(75) Shen, C.; Phanikumar, M. S. A process-based, distributed hydrologic model based on a

624

large-scale method for surface–subsurface coupling Adv. Water Resour. 2010, 33 (12), 1524–

625

1541.

27 ACS Paragon Plus Environment

Environmental Science & Technology

Page 28 of 34

626

(76) Safaie, A.; Litchman, E.; Phanikumar, M. S. Evaluating the role of groundwater in

627

circulation and thermal structure within a deep, inland lake, Adv Water Resour. 2017, 108,

628

310-327.

629

(77) Niu, J.; Phanikumar, M. S. Modeling watershed-scale solute transport using an integrated,

630

process-based hydrologic model with applications to bacterial fate and transport J. Hydrol.

631

2015, 529 (1), 35-48.

28 ACS Paragon Plus Environment

Page 29 of 34

632 633 634

Environmental Science & Technology

Tables Table 1 The model results for LGEC during the testing period. Models Model 1

Model 2

Input Parameters NIO

NAR

Model 3 NARX Model 4

Model 5

WA-NAR

LGEC at BD site LGEC at BD site LGEC at OD1 site LGEC at OD2 site LGEC at OD3 site LGEC LGDISCH24HR BD_LNTURB LNTURB WTEMP4HR Rainfall LGEC LNTURB LNWHSIG4HR WNDSPD_ONSHORE WTEMP4HR Rainfall Wavelet-decomposed coefficients of LGEC

Sites OD1 OD2 OD3 BD OD1 OD2

R2 0.53 0.43 0.46 0.8 0.38 0.34

RMSE 0.33 0.53 0.32 0.11 0.24 0.41

OD3

0.59

0.45

BD

0.86

0.16

OD1

0.8

0.15

OD2

0.82

0.31

OD3

0.8

0.23

BD

0.94

0.07

OD1

0.77

0.26

OD2

0.83

0.23

OD3

0.82

0.29

BD

0.86

0.05

OD1

0.62

0.07

OD2

0.57

0.1

OD3

0.62

0.11

635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 29 ACS Paragon Plus Environment

Environmental Science & Technology

650 651

Table 2. The model performance metrics of R2 and RMSE for different prediction time scales Sites BD

OD1

OD2

OD3 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678

Page 30 of 34

Prediction time (d) 0.25 0.5 1 0.25 0.5 1 0.25 0.5 1 0.25 0.5 1

R2 0.83 0.8 0.91 0.51 0.34 0.77 0.7 0.62 0.56 0.53 0.54 0.6

RMSE 0.04 0.04 0.07 0.06 0.06 0.07 0.08 0.08 0.15 0.08 0.08 0.13

Figures Figure 1. Map of southern Lake Michigan showing the beach sites and the tributaries. Figure 2. (a and b) Performance of the NIO model with LGEC data at BD site as input: (a) Linear regression plot of the simulation results at OD1. (b) Time series of data and simulations at OD1. (c and d) Performance of the NAR model with original LGEC data at the beach site as input: (c) Linear regression plot of the simulation results at OD1. (d) Time series of data and simulations at OD1. (e and f) Performance of the NARX model (Model 3) with LGEC, LGDISCH24HR, LNTURB, WTEMP4HR and rainfall as inputs: (e) Linear regression plot of the simulation results at OD1. (f) Time series of data and simulations at OD1. In the scatterplots (panels on the left), blue, red and green symbols denote data used during the training, validation and test periods respectively while lines of the same color represent the best-fit lines. The black dashed line in the left panels represents the perfect fit (1:1) line. The time series of original data (black circles) and simulation results (blue, red and green lines) are shown in the panels on the right. Figure 3. Performance of the WA-NAR model at BD site based on wavelet decomposition of LGEC as inputs. (a) Linear regression plot of the simulation results. (b) Time series of data and simulations. In the left panel, the x- and y-axes represent the observed and simulated LGEC values.

30 ACS Paragon Plus Environment

Page 31 of 34

Environmental Science & Technology

ACS Paragon Plus Environment

2.5

Environmental Science3.5 & Technology

1 to 1 Training data Validation data Testing data Fit, training Fit, validation Fit, testing

2 1.5 1

(a)

3 2.5

0.5 0.5

1

1.5

2

2.5

2 1.5 1 0.5

3

0.5 0 3.5 3

0.5

1

1.5

2

Model 2: R2_train=0.41 R2_vali=0.21 R2_test=0.38 2.5

1

3

0 3.5

2

2

1.5

1.5 1

Model 3: R2_train=0.81 R2_vali=0.85 R2_test=0.8 0.5

1

1.5

2

Observed, log10(E. coli )

2.5

3

Model 3

1.5

2.5

0

(f)

2

2.5

0.5

Model 2

2.5

3

1

(d)

0.5

(e)

0

Model 1

1

(c)

0 0 3.5

(b)

1.5

log10(E. coli )

Predicted, log10(E. coli )

2.5

Page 32 of 34

2 Model 1: R2_train=0.58 R2_vali=0.34 R2_test=0.53

0 0 3

Observation Simulation, training Simulation, validation Simulation, testing

0.5 0 3.5

160

ACS Paragon Plus Environment

170

180

190

200

210

220

Julian Day (2008)

230

240

250

Page 33 of 34

Environmental Science & Technology

4.5

Predicted, log10(E. coli )

4 3.5

1 to 1 Training data Validation data Testing data Fit, training Fit, validation Fit, testing

(a)

3 2.5

R2 _train=0.88 R2 _vali=0.64 R2 _test=0.86

2 1.5 1.5

2

2.5

4.5 4

log10(E.Coli)

3.5

3

3.5

4

Observed, log10(E. coli )

Observation Simulation, training Simulation, validation Simulation, testing

4.5

(b)

3

2.5 2 1.5 160

170

180

190 200 210 ACS Paragon Plus Environment

Julian Day (2008)

220

230

240

250

Environmental Science & Technology

ACS Paragon Plus Environment

Page 34 of 34