Subscriber access provided by TUFTS UNIV
Environmental Modeling
Real-Time Nowcasting of Microbiological Water Quality at Recreational Beaches: A Wavelet and Artificial Neural Network Based Hybrid Modeling Approach Juan Zhang, Han Qiu, Xiaoyu Li, Jie Niu, Meredith Becker Nevers, Xiaonong Hu, and Mantha S Phanikumar Environ. Sci. Technol., Just Accepted Manuscript • DOI: 10.1021/acs.est.8b01022 • Publication Date (Web): 29 Jun 2018 Downloaded from http://pubs.acs.org on July 5, 2018
Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.
is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.
Page 1 of 34
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Environmental Science & Technology
Real-Time Nowcasting of Microbiological Water Quality at Recreational Beaches: A Wavelet and Artificial Neural Network Based Hybrid Modeling Approach Juan Zhang1, Han Qiu2, Xiaoyu Li3, Jie Niu1*, Meredith Nevers4, Xiaonong Hu1, and Mantha S. Phanikumar2* 1
Institute of Groundwater and Earth Sciences, Jinan University, Guangzhou 510632, China. Department of Civil and Environmental Engineering, Michigan State University, East Lansing, MI 3 Department of Mathematics and Statistics, Auburn University, Auburn, AL 4 USGS Great Lakes Science Center, Lake Michigan Ecological Research Station, Chesterton, IN 46304 *Corresponding Authors e-mails:
[email protected],
[email protected], Phone: (517) 432-0851 2
Abstract The number of beach closings due to bacterial contamination continues to be on the rise in recent
20
years, putting beachgoers at risk of exposure to contaminated water. Current approaches predict
21
levels of indicator bacteria using regression models containing a number of explanatory
22
variables. Data-based modeling approaches can supplement routine monitoring data and provide
23
highly accurate short-term forecasts of beach water quality. In this paper, we apply the nonlinear
24
autoregressive network with exogenous inputs (NARX) method with explanatory variables to
25
predict Escherichia coli (E. coli) concentrations at four Lake Michigan beach sites. We also
26
apply the nonlinear input-output network (NIO) and nonlinear autoregressive neural network
27
(NAR) methods in addition to a hybrid wavelet-NAR (WA-NAR) model and demonstrate their
28
application. All models were tested using 3 months of observed data. Results revealed that the
29
NARX models provided the best performance and that the WA-NAR model, which requires no
30
explanatory variables, outperformed the NIO and NAR models; therefore, the WA-NAR model
31
is suitable for application to data scarce regions. The models proposed in this paper were
32
evaluated using multiple performance metrics including sensitivity and specificity measures and
33
produced results comparable or superior to previous mechanistic and statistical models
34
developed for the same beach sites.
35
models (R2 ~ 0.8 for the beach sites and ~0.9 for the river site) indicate that the new class of
36
models shows promise for beach management.
The relatively high R2 values between data and the NARX
37 1 ACS Paragon Plus Environment
Environmental Science & Technology
Page 2 of 34
38 39 40 41
1. Introduction
42
Levels of fecal indicator bacteria (FIB) are monitored at coastal and inland recreational beaches
43
to protect the public from exposure to contaminated water.1 According to the NRDC,2 10% of all
44
monitoring samples in 2013 exceeded EPA's benchmark value and the Great Lakes region,
45
which included 902 coastal beaches from 8 US states in 2013, had the highest exceedances rate
46
(13%) of all beaches in that region, indicating that swimmers continue to face health risks from
47
microbiological pollution at beaches. Beach water quality is highly dynamic and can change
48
quickly in a matter of minutes, depending on a number of environmental factors; therefore, to be
49
useful and protective of public health, notifications should be issued in a timely manner.
50
Traditional culture-based methods, however, require a 18 to 24 hour laboratory assay, during
51
which time beach-goers could potentially be exposed to contaminated water. Alternative
52
methods for beach management, including rapid methods such as quantitative polymerize chain
53
reaction (qPCR) and predictive modeling, continue to receive significant attention in recent
54
years.3-5 Even with rapid methods, real-time or even same-day water quality results may not
55
always be available.1 While faster techniques that provide results in 6 hours or less have been
56
developed and tested,6-10 using qPCR requires significant up-front investment and expertise for
57
analysis. The US EPA has encouraged the use of qPCR for enterococci and the 2012
58
Recreational Water Quality Criteria provide a Beach Action Value (BAV) for qPCR-based
59
testing of 1,000 CCE (calibrator cell equivalents)/100 mL.
60
2 ACS Paragon Plus Environment
Page 3 of 34
Environmental Science & Technology
61
Due to the lag time between the time of sampling and the time when results are available, models
62
are useful tools to supplement monitoring data.1 A variety of models have been used in the past
63
to forecast FIB levels at beaches, and they generally belong to two broad categories: mechanistic
64
or process-based models,4,11-17 which are based on conservation principles, and statistical
65
regression models and other data-based approaches.3,4,18-19 Mechanistic models of beach water
66
quality require detailed information on sources, including flow and FIB levels from tributaries.
67
While mechanistic models are excellent tools for gaining insights into key processes4, they
68
require considerable resources and training for model setup, running and post-processing. If the
69
objective is to protect public health by making timely and accurate predictions at beach sites,
70
data-based approaches may be attractive since the technology can be transferred from one site to
71
another with relative ease.
72 73
While multiple regression (MR) models of beach water quality have received considerable
74
attention in the past due to their simple and robust nature and ease of implementation, other
75
data-based methods such as Artificial Neural Networks (ANN)20 have the potential to improve
76
the accuracy of forecasts.21-24 These methods, however, have not found widespread application in
77
beach management. ANNs were successfully used in the past to forecast many of the factors that
78
directly or indirectly impact FIB levels at beaches including rainfall patterns,25-28 stream
79
flows,29-31 suspended sediment concentrations,32-34 lake water levels,35-37 wave breaking and
80
changes in beach profiles.38-40 The success of these earlier studies provides ample motivation for
81
applying ANN methods to beach management. Zhang et al.41 applied an ANN model to forecast
82
levels of enterococci at the Holly Beach in Louisiana. They used 15 environmental variables
83
such as salinity, water temperature and wind speed as inputs to their model. For datasets
3 ACS Paragon Plus Environment
Environmental Science & Technology
Page 4 of 34
84
collected during years 2007-2009, the performance of their ANN model was superior (linear
85
correlation coefficient, LCC = 0.857) to that of multiple-regression (MR) models implemented in
86
the US EPA Virtual Beach model42 – e.g., LCC=0.337 for a non-linear Virtual Beach model for
87
the same dataset. The use of explanatory environmental variables as inputs to beach water
88
quality models is a well-known approach in the development of multiple-regression models. In
89
another study,43 ten environmental variables were used as inputs to forecast the FIB
90
concentrations of an urban waterway in the Chicago River utilizing the ANN method. ANN
91
models have advantages over traditional MR models when the underlying function
92
approximating the data is highly complex and nonlinear. In fact, ANN models are known to
93
work even when the underlying function cannot be expressed in terms of any known
94
mathematical functions. Therefore, the ANN model, as in the approach of Zhang et al.,41 can be
95
viewed as a progression of the MR approach.
96 97
In addition to these previous studies, it is also possible to use ANNs in a fundamentally different
98
data-based approach -- to generate short-term forecasts that rely on data alone without any
99
explanatory variables.44 The idea is that if a suitable method of time series analysis exists, then
100
the information needed to make short-term forecasts is all contained within the data, and no other
101
explanatory variable is required to provide additional information. Although ANN methods
102
described above have proven extremely useful for making short- and long-term forecasts in
103
many areas of science and engineering, microbiological water quality data are characterized by
104
nonstationarity (i.e., statistical properties such as the mean and variance do not remain constant
105
over time). ANN methods have well-known limitations in dealing with non-stationary datasets,45
106
while wavelet transforms can be robust tools for handling nonstationary datasets. Wavelet
4 ACS Paragon Plus Environment
Page 5 of 34
Environmental Science & Technology
107
methods have been used in the last two decades for analysis of both time series and spatial data.46
108
An important feature of wavelet analysis is the ability to decompose the original data into high
109
and low frequency contributions (i.e., fine and coarse features in the data) for further analysis.
110
Wavelets have been used, for example, for analysis of nearshore hydrodynamics47 including
111
longshore currents along a sandy coast48 and to address questions related to solute transport in
112
rivers49 and heterogeneous porous media.50 Therefore, by combining the strengths of wavelet
113
transforms (the ability to deal with nonstationary data) with those of ANN methods (the ability to
114
deal with nonlinearity), a powerful class of hybrid methods can be constructed for reliable,
115
short-term forecasts.
116 117
Wavelet-ANN (WANN) methods have been the focus of several recent papers, and the methods
118
produce reliable forecasts of daily river discharge and suspended sediment concentration, rainfall
119
runoff, snow melt-driven floods, and droughts.51-56 The objective of this paper is to explore the
120
possibility of applying ANN and combined wavelet-neural network models for nowcasting FIB
121
levels at recreational beaches in order to support advisory and beach closure decisions. Daily
122
monitoring data are routinely available for many beach sites in the United States; however, as
123
noted earlier, beach management based on water quality sampling alone is inadequate due to a
124
24-hour delay in running the assays, during which time conditions at the beach can change
125
quickly. Therefore specific questions that will be addressed in the paper include the following:
126
(1) if daily monitoring data are used in a ANN or WANN modeling framework, is it possible to
127
forecast tomorrow’s FIB levels using yesterday’s monitoring data? (2) is it possible to generate
128
forecasts with a high degree of confidence as quantified by standard metrics such as the
129
coefficient of determination (R2) and the root mean square errors (RMSE)? (3) how do ANN and
5 ACS Paragon Plus Environment
Environmental Science & Technology
Page 6 of 34
130
WANN methods perform relative to statistical (MR) and fully three-dimensional mechanistic
131
models developed for the same datasets? The first question is related to data requirements and
132
feasibility of the WANN approach since the method requires sufficiently long time series data to
133
meet the requirement of wavelet decomposition. Since only daily monitoring data are available, a
134
data augmentation method has to be used before applying the WANN model and this leads to
135
consideration of whether the original data follow a certain distribution, which is considered a
136
data feasibility question. The second question is important because previous modeling activities
137
have generally achieved R2 values less than 0.7,57-58 with the exception of Safaie et al.4 For this
138
reason, we set a higher standard for our proposed models to achieve an R2 value of 0.7 or better.
139
Generally, an R2 value greater than 0.7 is considered a strong relationship, according to Moore et
140
al.;59 therefore, for the third question if the WANN approach can yield forecasts with R2 values
141
comparable to or higher than 0.7 without the use of explanatory variables, then it represents a
142
significant advance over the current model-based approaches for beach management. The
143
datasets used in this research have been selected to facilitate direct comparisons with MR and
144
mechanistic models developed earlier4. To the best of our knowledge, there is no published work
145
that examines the application of the WANN approach for forecasting FIB levels at recreational
146
beaches and one of our aims in this paper is to fill this gap.
147 148
2. Methods
149
Three types of ANN models, the Nonlinear Input-Output (NIO)60 Nonlinear Autoregressive
150
neural network (NAR)61 and the Nonlinear Autoregressive Network with eXogenous inputs
151
(NARX)62 network, are applied in this work.20 The details of these three models are described in
152
the supporting information (SI), sections S1-S3. In all models, the dependent variable of interest
6 ACS Paragon Plus Environment
Page 7 of 34
Environmental Science & Technology
153
from the point of prediction is the log10 transformed E. coli data (referred to as LGEC herein).
154
Briefly, the NIO model (Model 1) predicts E. coli levels at the three beach sites (OD1, OD2,
155
OD3) using past values of E. coli data at the river mouth (BD) since discharge from the river
156
mouth is known to impact bacteria levels at the three beaches4. The NAR model (Model 2), on
157
the other hand, predicts E. coli levels at any site using only the past values of E. coli at the same
158
site. In both models (NIO and NAR) other than the E. coli data, no explanatory variables are
159
used unlike the NARX model that uses explanatory variables such as turbidity and wave height
160
in addition to the past values of E. coli at the site where predictions are being made. The input
161
parameters used in the NARX models in our work (Models 3 and 4) included (in addition to the
162
E. coli data): (1) 24-hour rainfall, (2) 4-hour water temperature (WTEMP4HR), (3) natural log of
163
turbidity at the beach sites (LNTURB), (4) daily log10(discharge) from the river mouth
164
(LGDISCH24HR), (5) an interaction term that is the product of daily discharge and turbidity at
165
the river mouth (LOAD24) and (6) onshore wind speed, which is the speed of wind blowing
166
from the lake towards the land (WNDSPD_ONSHORE) and (7) natural log of the 4-hour
167
significant wave height (LNWHSIG4HR). Model 3 used parameters 1, 2, 3, 4, and 5 in the above
168
list while Model 4 used 1, 2, 3, 6, and 7. The wavelet decomposition technique (described in
169
section S4 in the Supporting Information) is combined with the NAR model to further improve
170
the accuracy of the NAR model prediction. In ANN models, the available data are usually
171
divided into three subsets - training, validation and testing periods (more information is available
172
in the Supporting Information). Since the model used for beach management will be first trained
173
and validated, we report the testing R2 and RMSE values for all models. Moreover, the model
174
sensitivity and specificity metrics58 are also used to assess model performance. Sensitivity
175
(specificity) refers to the probability of exceedance (non-exceedance) of the EPA standard (i.e.,
7 ACS Paragon Plus Environment
Environmental Science & Technology
Page 8 of 34
176
the BAV) that can be predicted by the models. These metrics provide additional information on
177
the usefulness of the models for beach management.
178 179
The focus of the present paper is on nowcasting bacterial levels at four sites in Southern Lake
180
Michigan. Unlike some previous applications in which the wavelet decomposition was
181
performed on the whole time series data including the testing period, a practice that is misleading
182
as it leads to highly inflated values of R2, in the present work, the network has no information on
183
the wavelet coefficients for future data while making a forecast since wavelet decomposition for
184
the forecast period has not yet been done.
185 186
2.1 The WA-NAR hybrid model:
187
A WA-NAR hybrid model is obtained by combining the two methods, the NAR and the discrete
188
wavelet transform (DWT). The wavelet decomposition coefficients of the FIB data are passed
189
into the NAR model to make a short-term forecast. We can examine the correlation between
190
different wavelet components and the original time series data. For the WA-NAR model inputs,
191
the original LGEC time series of each site are decomposed into various detail components (W's)
192
at different resolution levels and one approximation component (C) at the last coarse resolution
193
level using the á Trous algorithm.63 A key detail involves the number of wavelet levels required
194
to approximate the original data. Although there is no existing theory to tell how many resolution
195
levels are needed for any given time series, it is generally believed that L = int [log10 ( N )] levels
196
are needed,64 where L denotes the number of wavelet levels and N is the number of data points
197
in the time series.
198 8 ACS Paragon Plus Environment
Page 9 of 34
Environmental Science & Technology
199
Monitoring programs in the Great Lakes region start with the beginning of the beach season and
200
continue through summer (approximately late May-late August). Assuming the availability of
201
daily data for a 3-month period not counting weekends, we get N ≈ 80 and L = 1 . This means
202
that data obtained from a daily monitoring program are not sufficiently dense for a
203
straightforward application of the wavelet theory; however, monitoring data can be augmented /
204
up-sampled in several ways to facilitate the application of the WANN method. The approaches
205
essentially make an assumption about how data varies between sampling times (successive
206
days). One approach is to augment the monitoring data using model hindcasts based on a
207
well-tested mechanistic model that uses small time steps. This represents a process-based method
208
of interpolating data on successive days, allowing the generation of high-resolution time series
209
data (e.g., hourly or every minute) needed for the WA-NAR approach. Other alternative
210
approaches include regular interpolation method, the simplest being linear interpolation, and
211
Markov Chain Monte Carlo65-66 (MCMC) sampling with Kalman filtering67 to remove the noise.
212
The choice of the method should depend on the factors that contribute to elevated bacterial levels
213
at the sites. At the beach sites considered in the present work, loading from nearby tributaries
214
was the primary factor18 contributing to the pollution. The presence or absence of a peak at a
215
beach on any given day depends on the direction of the river plume within the lake
216
environment16,68 (i.e., plume traveling toward the beach or away from it). River plume directions
217
generally shift over the course of several days, although within-day shifts are possible;4,16,68-69
218
therefore, daily sampling provides a reasonable temporal resolution on most days from the point
219
of capturing the peaks. We note, however, that time scales shorter than the diurnal scale cannot
220
be resolved using daily monitoring data and statistical sampling, and mechanistic model
221
hindcasts have the ability to resolve additional details (e.g., peaks) at the sub-diurnal time scales.
9 ACS Paragon Plus Environment
Environmental Science & Technology
Page 10 of 34
222
After comparing the time series data obtained by the Markov Chain Monte Carlo (MCMC)
223
approach at half-daily resolution with mechanistic modeling results4, we decided to use
224
half-hourly data based on MCMC method combined with Kalman filtering for the application of
225
the WA-NAR model. The half-hourly “observed” data prior to the current time of prediction
226
were generated by using the semi-diurnal time series data. More details are provided in the
227
Supporting Information section S5. The WA-NAR method was implemented using the wavelet
228
and neural network toolboxes in MATLAB version 9.1, R2016b (The Mathworks Inc., Natick,
229
MA). We also implemented the above method to decompose the variables that have strong
230
correlation with LGEC. All the wavelet decomposition coefficients of input variables are used as
231
inputs into the NAR model for nowcasting. For the newly generated half-hourly data, the first
232
70% of E. coli concentration data (1361 hours) were used for training; 15% of data (291 hours)
233
were used for validation and the remaining 15% were used for testing purpose.
234 235
2.2 Site description
236
To demonstrate the application of the WA-NAR method to nowcast elevated FIB levels at
237
beaches, we use E. coli data collected during the summer of year 2008 at four sites in southern
238
Lake Michigan. Out of the four sites, one site (BD) represents the river mouth while the
239
remaining three sites (OD3, OD2, OD1 in increasing distance from BD, see map in Figure 1) are
240
beach sites. The three beach sites are primarily impacted by contamination from the river mouth
241
at BD, therefore E. coli and turbidity values at the BD site are important inputs that control the
242
dynamics of E. coli at the three beach sites4. Additional details of the sites and the data can be
243
found in Safaie et al.4
244
10 ACS Paragon Plus Environment
Page 11 of 34
Environmental Science & Technology
245
Burns Ditch (BD) is the river mouth of the Little Calumet River in northwest Indiana. The three
246
sampling locations OD1, OD2, and OD3 are located to the west of the outfall in the town of
247
Ogden Dunes. E. coli concentrations in Burns Ditch are historically high, and likely influence
248
shoreline E. coli concentrations, depending on the flow direction of the river plume.18 During
249
2008, water samples were collected at each location and analyzed for E. coli within 4-6 hours in
250
the laboratory using a defined substrate technology (IDEXX Inc., Westbrook, ME). For
251
modeling purposes, E. coli data were log10-transformed since the value could vary by orders of
252
magnitudes. Water samples were also analyzed in the laboratory for turbidity (NTU; 2100N
253
Turbidimeter, Hach Company, Loveland, CO).
254
3. Results
255
The R2 and RMSE values for the testing period data sets of all models are listed in Table 1 while
256
the results of R2 and RMSE during all three periods (training, validation and testing) are shown
257
in Table S6 (in the Supporting Information). We now report the testing period R2 and RMSE
258
values as pairs in the following sections.
259
3.1 The NIO model
260
Figures 2(a) and S2 show the results of the Model 1, the NIO model. The value of (R2, RMSE) is
261
(0.53, 0.33) during the testing period at OD1 site, which is the farthest beach site to the waterway
262
(Figure 1). The (R2, RMSE) values for the testing period are (0.43, 0.53) and (0.46, 0.32) at the
263
OD2 and OD3 sites respectively (Table 1).
264
3.2 The NAR model
265
The past values of the original LGEC data were used as inputs to predict the LGEC values at all
266
four sites. The performance metrics for Model 2, the NAR model are shown in Figures 2(c) and
267
S3. The value of (R2, RMSE) is (0.8, 0.11) during testing period at BD site (Figure S3(a)). Figure
11 ACS Paragon Plus Environment
Environmental Science & Technology
Page 12 of 34
268
2(c) shows model performance during the three periods of training, validation and testing at the
269
OD1 site with the (R2, RMSE) value of (0.38, 0.24) during the testing period. The values are
270
(0.34, 0.41) and (0.59, 0.45) at OD2 and OD3 sites (Figure S3) respectively.
271
3.3 The NARX model
272
In order to determine the inputs of the NARX model from the ten environmental factors
273
measured at the sites (see Supporting Information for details and explanation of variable names),
274
cross-correlation between the ten environmental factors and LGEC for different time lags were
275
examined in addition to the auto-correlation of LGEC. The results are listed in Tables S1-S4 for
276
the four sites. Two sets of NARX models were developed based on different combinations of
277
explanatory variables for nowcasting LGEC datasets at the four sites. The NARX models are
278
trained and tested based on different combinations of time series at all four sites such that
279
environmental variables with similar values of correlation coefficients and with a strong
280
correlation with E. coli concentrations fall into similar groups. Four-hour water temperature
281
(WTEMP4HR) and daily rainfall are also considered as the input variables for the NARX models,
282
which were found to be important input variables as they affect the FIB concentrations and
283
contribute to elevate contaminant levels in the receiving water.70 Rainfall data comes from the
284
National Park service weather station that is located in Porter, IN. We used aggregated rainfall
285
data from the previous 24 hours as an explanatory variable. Figures 2(e) and S4 show the Model
286
3 (NARX model) results based on LGDISCH24HR (log10 of 24-hour discharge from the river),
287
BD_LNTURB (natural log of turbidity at the river mouth BD), LNTURB (natural log of
288
turbidity at the beach site), water temperature (WTEMP4HR), rainfall and LGEC as inputs,
289
which are the parameters with high correlation coefficients with the dependent variable LGEC at
290
four sites. The R2 and RMSE during testing period are listed in Table 1, and the (R2, RMSE)
12 ACS Paragon Plus Environment
Page 13 of 34
Environmental Science & Technology
291
values during all periods are shown in Table S6. Figure S4(a) shows the Model 3 result at the BD
292
site. The value of (R2, RMSE) for the testing period is (0.86, 0.16). Figure 2(e) shows the
293
performance of Model 3 at the OD1 site with an (R2, RMSE) value of (0.8, 0.15) during the
294
testing period. The corresponding (R2, RMSE) values are (0.82, 0.31) and (0.80, 0.23) at sites
295
OD2 and OD3 respectively.
296
In another version of the NARX model (Model 4, Table 1), the input variables LNTURB,
297
LNWHSIG4HR, WNDSPD_ONSHORE, WTEMP4HR, rainfall and LGEC are used and tested
298
at all four sites. The results of Model 4 are shown in Figure S5 and the results of R2 and RMSE
299
during testing period are shown in Table 1, and the corresponding R2 and RMSE for all three
300
periods are listed in Table S6.
301
3.4 The WA-NAR hybrid model
302
E. coli data spanning a period of 86 days (from early June to the end of August 2008) are
303
available for developing the models. Details of data augmentation / up-sampling are included in
304
the Supporting Information and the results based on three wavelet levels are reported in Table S5.
305
Since the number of wavelet coefficients used in the model increases with the wavelet levels, we
306
have attempted to use an optimum number of wavelet levels that minimize forecast errors and
307
avoid over-fitting.
308
As can be expected, the most important components are those that have a high correlation with
309
the original data. The correlation coefficients between the last level of the previous sub-time
310
series and the original time series (e.g., r (t − 1, t ) and r (t − 2, t ) etc.) are shown in Table S5 for
311
the detail (W) and approximation (C) components for the E. coli data at all four sites. Based on
312
the strength of these correlation coefficients, the important sub-time series components ( Wi,n , Cn ,
313
i = 2,3,4 , n = 1,2,... n hours lagging time) were used as the inputs to the WA-NAR model. 13 ACS Paragon Plus Environment
Environmental Science & Technology
Page 14 of 34
314 315
The wavelet decomposition coefficients of the past values of the half-hourly LGEC data were
316
used as input to forecast LGEC. The scatterplot and time series for comparing the observed and
317
simulated LGEC datasets at BD site using Model 5, the WA-NAR model are showed in Figure 3
318
and Figure S7. The (R2, RMSE) value for testing period at the BD site is (0.86, 0.05) (Table 1).
319
The corresponding values at the OD1, OD2 and OD3 sites are (0.62, 0.07), (0.57, 0.1) and (0.62,
320
0.11) respectively. The higher R2 and the lower RMSE values were obtained for the BD site
321
relative to the sites at OD1, OD2 and OD3.
322 323
As explained in the methods section and the Supporting Information, after training the network
324
by the WA-NAR model, the network was used to predict the future LGEC at the four sampling
325
sites. The input values are the random time series that were generated by MCMC slice sampling
326
with the original observations retained at the original sampling times. The feedback loop only
327
performs a one-step-ahead prediction when the NAR network is “open”. While the loop of the
328
NAR network is closed when the training of model is completed, it performs multi-step-ahead
329
predictions. The predictions were done at 0.25-, 0.5- and 1-day (i.e., 12, 24, and 48 half-hourly
330
data points respectively) time scales, and the R2 and RMSE values were calculated for all three
331
cases (Table 2). The (R2, RMSE) values for 0.25-day ahead prediction are (0.83, 0.04), (0.51,
332
0.06), (0.7, 0.08) and (0.53, 0.08) at the four sites, respectively. Similarly, the prediction results
333
of (R2, RMSE) at 0.5- and 1-day ahead predictions are listed in Table 2.
334
The sensitivity and specificity metrics for all models are showed in Table S6 and Figure S8. It
335
can be seen that when there are no exceedances the models have zero sensitivity. For all the
336
models evaluated, sensitivities are greater than 0.3, and specificities are greater than 0.9.
14 ACS Paragon Plus Environment
Page 15 of 34
Environmental Science & Technology
337
4. Discussion
338
We found that the WA-NAR model was substantially more accurate than the NAR model at all
339
four sites as shown by the higher R2 values and lower RMSE values. The discrete wavelet
340
transform allowed most of the noisy data to be removed and facilitated the extraction of
341
quasi-periodic and periodic signals in the original time series. The wavelet coefficients obtained
342
by decomposing the original data contain all of the important information at different temporal
343
scales and can be used to make short-term predictions. The wavelet transform improved the
344
performance of the NAR nowcasting model by providing useful information at various
345
decomposition levels. Hence the WA-NAR model is a potentially useful new method for
346
nowcasting indicator bacteria originating from river plumes and at river beach/park sites. The
347
NARX models also show the accurate simultaneous prediction with environmental variables as
348
inputs. Comparing the results of ANN models between BD site and OD sites, the former (BD site
349
model) is superior to the latter one (OD sites models). This result is mainly determined by the
350
special geographical location of the sampling sites. The lake / beach sites (OD sites) are strongly
351
affected by waves, wind, bacterial loading from shoreline sand and bird inputs, re-suspension of
352
bottom sediment and many other processes making it relatively more difficult to make accurate
353
predictions at the beach sites. Our results indicate that FIB levels at river sites can be predicted
354
more accurately compared to beach sites. The river mouth (BD site) has significant impact on the
355
dynamics of E. coli at beaches Therefore LGEC data at the BD site brings information that can
356
positively influence predictions at the beach sites. The comparisons shown in this work are
357
encouraging and provide motivation to further examine this class of methods for beach
358
management, especially when combined with automated continuous monitoring and prediction
359
of beach water quality.14
15 ACS Paragon Plus Environment
Environmental Science & Technology
Page 16 of 34
360 361
With the half-hourly LGEC data that is a random time series generated using the MCMC slice
362
sampling and replaced with original observations as input, the WA-NAR model was used to
363
nowcast the E. coli concentrations at different time scales (0.25-, 0.5- and 1-day). The results
364
(Table 2) indicate that the predicted results are acceptable for beach management, responding to
365
the first question we sought to address in the introduction. The standard metrics (R2 and RMSE)
366
were used to quantify the performance of each model, and some models have R2 values greater
367
than 0.7 indicating a high degree of confidence with the forecast, responding to the second
368
question. Moreover, predictive models with sensitivity greater than 0.3, and specificity greater
369
than 0.9 can also be considered as good (Table 1). The ANN models developed in this work are
370
better than the MLR models reported in Park et al.,34 for example. The R2 and RMSE values
371
compared with those from MLR and mechanistic models, especially R2 values higher than 0.7 in
372
our NARX models and R2 values around 0.7 for the WA-NAR model without the use of
373
explanatory variables, prove that the ANN and wavelet-ANN based approaches considered in
374
this work are promising. The WA-NAR models are particularly appealing for application to data
375
scarce regions without access to any data on explanatory variables (the third question we sought
376
to address in the introduction).
377 378
Despite the differences in the parameters of the two NARX models evaluated in this work
379
(Models 3 and 4), the two models produced comparable performance. Model 4 used the
380
significant wave height and onshore wind speed - two important parameters known to influence
381
E. coli in coastal waters. Onshore winds push river plumes towards the shore inducing an
382
alongshore current that can carry shore-hugging plumes several kilometers along the shoreline
16 ACS Paragon Plus Environment
Page 17 of 34
Environmental Science & Technology
383
from the river mouth. Wave activity has the potential to resuspend E. coli from bottom sediment.
384
It is encouraging to note that both NARX models (Models 3, 4) produced high R2 values (0.86
385
and 0.94) at the river site BD. It can be expected that further network optimization and data
386
exploration (including the use of data transformations and identification of significant interaction
387
terms) will lead to additional improvements in the performance of these models pushing the R2
388
limit further up to values around 0.9.
389 390
The NARX models (e.g., Model 4) described in this paper produced some of the highest values
391
of R2 in the literature and are comparable to or better than the statistical and mechanistic
392
modeling results reported earlier4. The MCMC method was used to generate half-hourly interval
393
LGEC data so that tomorrow’s values can be predicted using yesterday’s monitoring data. Such
394
an approach to up-sample data is also needed at the beginning of the beach season when the
395
amount of available data is limited. Using the wavelet decomposition data and training and
396
validation results from the previous year’s beach season is another promising approach but this is
397
beyond the scope of the present paper. The ANN or WANN models described in this paper can
398
also be applied to recreational waters with different conditions (e.g., marine beaches with tides
399
and influence of salinity or beaches where shoreline sand, bird inputs and the presence of
400
breakwaters modify circulation and FIB fate and transport) and to predict harmful algal blooms71
401
in lakes which are influenced by several of the same explanatory variables used in this work.
402
While a majority of the published mechanistic nearshore models use observed data collected in
403
the past to generate model hindcasts, it is possible to link well-tested watershed models72-75 that
404
describe surface and subsurface transport processes76-77 with beach water quality models to
405
generate continuous forecasts in real-time. However, very few sites use linked watershed - beach
17 ACS Paragon Plus Environment
Environmental Science & Technology
Page 18 of 34
406
water quality models for making real-time nowcasts of beach water quality, perhaps due to the
407
enormous effort involved in independently testing and linking the two types of models. By
408
including rainfall and other relevant explanatory variables in a NARX or WA-NAR model, the
409
models can represent watershed-scale fate and transport processes that control the fluxes of FIB
410
delivered to downstream receiving water bodies such as lakes. Another class of models that have
411
considerable promise for beach management combine the use of wavelet decomposition with the
412
best NARX class of models that use explanatory variables such as turbidity as input. This class
413
of WA-NARX models are a natural extension of the WA-NAR models evaluated in this work.
414 415
Acknowledgments
416
This work was partially supported by National Natural Science Foundation of China (41530316).
417
We acknowledge the use of the IAN symbol libraries in creating the Abstract Art
418
(http://ian.umces.edu/symbols).
419 420
Supporting Information
421
The methods of ANN models, wavelet decomposition, MCMC sampling and Kalman filtering
422
are presented in Supporting Information sections S1-S5. Figures S1-S8 show the architecture of
423
NAR neural network, as well as plots of wavelet decomposition and, the models’ performance,
424
the random number generation of the logarithm of E. coli using the MCMC method and the
425
predicted versus observed LGEC using the ANN models at OD sites. The tables of the
426
auto-correlation coefficients of LGEC and the cross-correlation coefficients between LGEC and
427
other input parameters at the four sites are presented in Tables S1-S4. Auto-correlation
18 ACS Paragon Plus Environment
Page 19 of 34
Environmental Science & Technology
428
coefficients for different wavelet components are included in Table S5. The values of R2 and
429
RMSE for the training, validation and test periods are listed in Table S6.
430 431
References
432
(1) U.S. Environmental Protection Agency. Predictive Tools for Beach Notification, Volume I:
433 434 435 436 437
Review and Technical Protocol; 2010, I, 61. (2) Dorfman, M.; Haren, A. Testing the Waters, Twenty-fourth Edition, Nat. Resour. Def. Counc. Washington, D.C. 2014. (3) Nevers, M. B.; Whitman, R. L. Efficacy of monitoring and empirical predictive modeling at improving public health protection at Chicago beaches Water Res. 2011, 45 (4), 1659–1668.
438
(4) Safaie, A.; Wendzel, A.; Ge, Z.; Nevers, M. B.; Whitman, R. L.; Corsi, S. R.; Phanikumar, M.
439
S. Comparative Evaluation of Statistical and Mechanistic Models of Escherichia coli at
440
Beaches in Southern Lake Michigan Environ. Sci. Technol. 2016, 50 (5), 2442−2449.
441
(5) Whitman, R. L.; Ge, Z.; Nevers, M. B.; Boehm, A. B.; Chern, E. C.; Haugland, R. A.;
442
Lukasik, A. M.; Molina, M.; Przybyla-Kelly, K.; Shively, D. A. Relationship and variation of
443
qPCR and culturable Enterococci estimates in ambient surface waters are predictable Environ.
444
Sci. Technol. 2010, 44 (13), 5049–5054.
445
(6) Haugland, R. A. Comparison of enterococcus measurements in freshwater at two recreational
446
beaches by quantitative polymerase chain reaction and membrane filter culture analysis
447
Water Res. 2005, 39, 559–568.
448
(7) Wade, T.J. High sensitivity of children to swimming-associated gastrointestinal illness:
449
results using a rapid assay of recreational water quality Epidemiology 2008, 19, 375–383.
19 ACS Paragon Plus Environment
Environmental Science & Technology
Page 20 of 34
450
(8) Griffith, J.F.; Weisberg, S. B. Challenges in implementing new technology for beach water
451
quality monitoring: lessons from a California demonstration project Mar. Technol. Soc. J.
452
2011, 45 (2), 65–73.
453
(9) Sheth, N., McDermott, C.; Busse, K.; Kleinheinz, G. Evaluation of Enterococcus
454
concentrations at beaches in Door County, WI (Lake Michigan, USA) by qPCR and defined
455
substrate culture analysis J. Great Lakes Res. 2016, 42 (4), 768-774.
456
(10) Dorevitch, S.; Shrestha, A.; DeFlorio-Barker, S.; Breitenbach, C.; Heimler, I. Monitoring
457
urban beaches with qPCR vs. culture measures of fecal indicator bacteria: Implications for
458
public notification Environ. Health. 2017, 16 (1), 45.
459
(11) Boehm, A. B.; Keymer, D. P.; Shellenbarger, G. G. An analytical model of enterococci
460
inactivation, grazing, and transport in the surf zone of a marine beach Water Res. 2005, 39
461
(15), 3565–3578.
462 463
(12) Gao, G.; Falconer, R. A.; Lin, B. Numerical modelling of sediment-bacteria interaction processes in surface waters Water Res. 2011, 45 (5), 1951–1960.
464
(13) Ge, Z.; Whitman, R. L.; Nevers, M. B.; Phanikumar, M. S. Wave-induced mass transport
465
affects daily Escherichia coli fluctuations in nearshore water Environ. Sci. Technol. 2012, 46
466
(4), 2204–2211.
467
(14) Ge, Z.; Whitman, R. L.; Nevers, M. B.; Phanikumar, M. S.; Byappanahalli, M. N. Nearshore
468
hydrodynamics as loading and forcing factors for Escherichia coli contamination at an
469
embayed beach Limnol. Oceanogr. 2012, 57 (1), 362–381.
470 471
(15) Kashefipour, S. M.; Lin, B.; Harris, E.; Falconer, R. A. Hydro-environmental modelling for bathing water compliance of an estuarine basin Water Res. 2002, 36 (7), 1854–1868.
20 ACS Paragon Plus Environment
Page 21 of 34
Environmental Science & Technology
472
(16) Liu, L.; Phanikumar, M. S.; Molloy, S. L.; Whitman, R. L.; Shively, D. A.; Nevers, M. B.;
473
Schwab, D. J.; Rose, J. B. Modeling the transport and inactivation of E. coli and enterococci
474
in the near-shore region of Lake Michigan Environ. Sci. Technol. 2006, 40 (16), 5022–5028.
475
(17) Sanders, B. F.; Arega, F.; Sutula, M. Modeling the dry-weather tidal cycling of fecal
476
indicator bacteria in surface waters of an intertidal wetland Water Res. 2005, 39 (14), 3394–
477
3408.
478
(18) Nevers, M. B.; Whitman, R. L. Nowcast modeling of Escherichia coli concentrations at
479
multiple urban beaches of southern Lake Michigan Water Res. 2005, 39 (20), 5250–5260.
480
(19) Shively, D. A.; Nevers, M. B.; Breitenbach, C.; Phanikumar, M. S.; Przybyla-Kelly, K.;
481
Spoljaric, A. M.; Whitman, R. L. Prototypic automated continuous recreational water quality
482
monitoring of nine Chicago beaches J. Environ. Manage. 2016, 166, 285-293.
483
(20) Samarasinghe, S. Neural Networks for Applied Sciences and Engineering: From
484
Fundamentals to Complex Pattern Recognition, 1st ed.; Auerbach Publications: Boca Raton,
485
2006.
486
(21) Thoe, W.; Gold, M.; Griesbach, A.; Grimmer, M.; Taggart, M. L.; Boehm, A. B. Sunny with
487
a chance of gastroenteritis: predicting swimmer risk at California beaches Environ. Sci.
488
Technol. 2014, 49 (1), 423-431.
489 490 491 492
(22) Tian, W.; Liao, Z.; Zhang, J. An optimization of artificial neural network model for predicting chlorophyll dynamics Ecol. Model. 2017, 364, 42-52. (23) Shen, CP. Deep learning: A next-generation big-data approach for hydrology, Eos, 2018, 99, https://doi.org/10.1029/2018EO095649.
21 ACS Paragon Plus Environment
Environmental Science & Technology
Page 22 of 34
493
(24) Fang, K.; Shen, C.; Kifer, D.; Yang, X. Prolongation of smap to spatio‐temporally
494
seamless coverage of continental us using a deep learning neural network Geophys. Res. Lett.
495
2017, 44.
496 497
(25) Brion, G. M.; Lingireddy, S. A neural network approach to identifying non-point sources of microbial contamination Water Res. 1999, 33 (14), 3099–3106.
498
(26) Chen, C.-S.; Chen, B. P.-T.; Chou, F. N.-F.; Yang, C.-C. Development and application of a
499
decision group Back-Propagation Neural Network for flood forecasting J. Hydrol. 2010, 385
500
(1-4), 173–182.
501 502 503 504 505 506 507 508
(27) French, M. N.; Krajewski, W. F.; Cuykendall, R. R. Rainfall forecasting in space and time using a neural network J. Hydrol. 1992, 137 (1-4), 1–31. (28) Lin, G.-F.; Wu, M.-C. A hybrid neural network model for typhoon-rainfall forecasting J. Hydrol. 2009, 375 (3-4), 450–458. (29) Chang, F.-J.; Chen, Y.-C. A counterpropagation fuzzy-neural network modeling approach to real time streamflow prediction J. Hydrol. 2001, 245 (1-4), 153–164. (30) Imrie, C. E.; Durucan, S.; Korre, A. River flow prediction using artificial neural networks: generalisation beyond the calibration range J. Hydrol. 2000, 233 (1-4), 138–153.
509
(31) Triana, E.; Labadie, J. W.; Gates, T. K.; Anderson, C. W. Neural network approach to
510
stream-aquifer modeling for improved river basin management J. Hydrol. 2010, 391 (3-4),
511
235–247.
512 513
(32) Kerem Cigizoglu, H.; Kisi, Ö. Methods to improve the neural network performance in suspended sediment estimation J. Hydrol. 2006, 317 (3-4), 221–238.
22 ACS Paragon Plus Environment
Page 23 of 34
Environmental Science & Technology
514
(33) Cobaner, M.; Unal, B.; Kisi, O. Suspended sediment concentration estimation by an
515
adaptive neuro-fuzzy and neural network approaches using hydro-meteorological data J.
516
Hydrol. 2009, 367 (1-2), 52–61.
517
(34) Park, Y.; Kim, M.; Pachepsky, Y.; Choi, S. H.; Cho, J. G.,; Jeon, J. Development of a
518
nowcasting system using machine learning approaches to predict fecal contamination levels
519
at recreational beaches in korea J. Environ. Qual. 2018.
520 521
(35) Huang, W.; Murray, C.; Kraus, N.; Rosati, J. Development of a regional neural network for coastal water level predictions Ocean Eng. 2003, 30 (17), 2275–2295.
522
(36) Khalil, B.; Ouarda, T. B. M. J.; St-Hilaire, A. Estimation of water quality characteristics at
523
ungauged sites using artificial neural networks and canonical correlation analysis J. Hydrol.
524
2011, 405 (3-4), 277–287.
525
(37) Sahoo, G. B.; Ray, C.; Wang, J. Z.; Hubbs, S. A.; Song, R.; Jasperse, J.; Seymour, D. Use of
526
artificial neural networks to evaluate the effectiveness of riverbank filtration Water Res. 2005,
527
39 (12), 2505–2516.
528
(38) Chua, L. H. C.; Wong, T. S. W. Improving event-based rainfall–runoff modeling using a
529
combined artificial neural network–kinematic wave approach J. Hydrol. 2010, 390 (1-2), 92–
530
107.
531 532 533 534 535 536
(39) Hashemi, M. R.; Ghadampour, Z.; Neill, S. P. Using an artificial neural network to model seasonal changes in beach profiles Ocean Eng. 2010, 37 (14-15), 1345–1356. (40) Lee, K.-H.; Mizutani, N.; Fujii, T. Prediction of Wave Breaking on a Gravel Beach by an Artificial Neural Network J. Coast. Res. 2011, 272, 318–328. (41) Zhang, Z.; Deng, Z.; Rusch, K. A. Development of predictive models for determining enterococci levels at Gulf Coast beaches Water Res. 2012, 46 (2), 465–474.
23 ACS Paragon Plus Environment
Environmental Science & Technology
Page 24 of 34
537
(42) Ge, Z.; Frick, W. E. Time-frequency analysis of beach bacteria variations and its
538
implication for recreational water quality modeling Environ. Sci. Technol. 2009, 43 (4),
539
1128–1133.
540
(43) Vijayashanthar, V.; Qiao, J.; Zhu, Z.; Entwistle, P.; Yu, G. Modeling fecal indicator bacteria
541
in urban waterways using artificial neural networks J. Environ. Eng. 2018, 144 (6),
542
05018003.
543 544
(44) Wu, C. L.; Chau, K. W.; Li, Y. S. Methods to improve neural network performance in daily flows prediction J. Hydrol. 2009, 372 (1), 80-93.
545
(45) Adamowski, J.; Sun, K. Development of a coupled wavelet transform and neural network
546
method for flow forecasting of non-perennial rivers in semi-arid watersheds J. Hydrol. 2010,
547
390 (1-2), 85–91.
548 549 550 551
(46) Torrence, C.; Compo, G. P. A Practical Guide to Wavelet Analysis Bull. Am. Meteorol. Soc. 1998, 79 (1), 61–78. (47) Różyński, G.; Reeve, D. Multi-resolution analysis of nearshore hydrodynamics using discrete wavelet transforms Coast. Eng. 2005, 52 (9), 771–792.
552
(48) Kaczmarek, J.; Rozynski, G.; Pruszak, Z. Long period oscillations in the longshore current
553
on a sandy, barred coast investigated with singular spectrum analysis Oceanologia 2005, 47
554
(1), 5-25.
555
(49) Phanikumar, M. S.; Aslam, I.; Shen, C.; Long, D. T.; Voice, T. C. Separating surface
556
storage from hyporheic retention in natural streams using wavelet decomposition of acoustic
557
Doppler current profiles Water Resour. Res. 2007, 43 (5), 576-576.
558
(50) Qi, X.; Neupauer, R. M. Wavelet analysis of characteristic length scales and orientation of
559
two-dimensional heterogeneous porous media Adv. Water Resour. 2010, 33 (4), 514–524.
24 ACS Paragon Plus Environment
Page 25 of 34
Environmental Science & Technology
560
(51) Adamowski, J. F. Development of a short-term river flood forecasting method for snowmelt
561
driven floods based on wavelet and cross-wavelet analysis J. Hydrol. 2008, 353 (3-4), 247–
562
266.
563
(52) Anctil, F.; Tape, D. G. An exploration of artificial neural network rainfall-runoff forecasting
564
combined with wavelet decomposition J. Environ. Eng. Sci. 2004, 3 (S1), S121–S128(8).
565
(53) Kim, T.-W.; Valdés, J. B. Nonlinear Model for Drought Forecasting Based on a
566
Conjunction of Wavelet Transforms and Neural Networks J. Hydrol. Eng. 2003, 8 (6), 319–
567
328.
568 569 570 571 572 573
(54) Kişi, Ö. Neural Networks and Wavelet Conjunction Model for Intermittent Streamflow Forecasting J. Hydrol. Eng. 2009, 14 (8), 773–782. (55) Partal, T.; Cigizoglu, H. K. Estimation and forecasting of daily suspended sediment data using wavelet–neural networks J. Hydrol. 2008, 358 (3-4), 317–331. (56) Shiri, J.; Kisi, O. Short-term and long-term streamflow forecasting using a wavelet and neuro-fuzzy conjunction model J. Hydrol. 2010, 394 (3-4), 486–493.
574
(57) Nevers, M. B.; Boehm, A. B.; Sadowsky, M. J.; Whitman, R. L. Modeling fate and transport
575
of fecal bacteria in surface water. Center for Integrated Data Analytics Wisconsin Science
576
Center 2011.
577
(58) Francy, D. S.; Brady, A. M. G.; Carvin, R. B.; Corsi, S. R.; Fuller, L. M.; Harrison, J. H.
578
Developing and implementing predictive models for estimating recreational water quality at
579
Great Lakes Beaches: U.S. Geological Survey Scientific Investigations Report 2013, 2013–
580
5166.
581 582
(59) Moore, D. S.; Notz, W. I.; Flinger, M. A. The basic practice of statistics, 6th edition. New York, New York: W. H. Freeman and Company. 2013.
25 ACS Paragon Plus Environment
Environmental Science & Technology
Page 26 of 34
583
(60) Kurtulus, B.; Razack, M. Evaluation of the ability of an artificial neural network model to
584
simulate the input-output responses of a large karstic aquifer: the la rochefoucauld aquifer
585
(charente, france) Hydrogeol. J. 2007, 15 (2), 241-254.
586
(61) Benmouiza, K.; Cheknane, A. Forecasting hourly global solar radiation using hybrid k
587
-means and nonlinear autoregressive neural network models Energ. Convers. Manage.
588
2013, 75 (5), 561-569.
589
(62) Li, G.; Wen, C.; Zheng, W. X.; Chen, Y. Identification of a class of nonlinear autoregressive
590
models with exogenous inputs based on kernel machines. IEEE T. Signal Proces. 2011, 59
591
(5), 2146-2159.
592
(63) Adamowski, J.; Sun, K. Development of a coupled wavelet transform and neural network
593
method for flow forecasting of non-perennial rivers in semi-arid watersheds J. Hydrol. 2010,
594
390 (1–2), 85-91.
595
(64) Tiwari, M. K.; Chatterjee, C. Development of an accurate and reliable hourly flood
596
forecasting model using wavelet–bootstrap–ANN (WBANN) hybrid approach J. Hydrol.
597
2010, 394 (3-4), 458–470.
598
(65) Zhu, Q.; Riley, W. J.; Tang, J.; Koven, C. D. Multiple soil nutrient competition between
599
plants, microbes, and mineral surfaces: model development, parameterization, and example
600
applications in several tropical forests Biogeosci. Discuss. 2016, 12 (5), 4057-4106.
601
(66) Ricciuto, D. M.; Davis, K. J.; Keller, K. A bayesian calibration of a simple carbon cycle
602
model: the role of observations in estimating and reducing uncertainty Global Biogeochem.
603
Cy. 2008, 22 (2).
604
(67) Haykin, S. S. Kalman Filtering and Neural Networks. John Wiley & Sons, Inc. 2001.
26 ACS Paragon Plus Environment
Page 27 of 34
Environmental Science & Technology
605
(68) Thupaki, P.; Phanikumar, M. S.; Beletsky, D.; Schwab, D. J.; Nevers, M. B.; Whitman, R. L.
606
Budget Analysis of Escherichia coli at a Southern Lake Michigan Beach Environ. Sci.
607
Technol. 2010, 44 (3), 1010-1016.
608
(69) Nekouee, N.; Hamidi, S. A.; Roberts, P. J. W.; Schwab, D. J. A coupled empirical -
609
numerical model for a buoyant river plume in Lake Michigan Water Air Soil Poll. 2015, 226
610
(12), 1-15.
611
(70) He, L. M.; He, Z. L. Water quality prediction of marine recreational beaches receiving
612
watershed baseflow and stormwater runoff in southern California, USA. Water Res. 2008, 42
613
(10-11), 2563-2573.
614
(71) Walter, M.; Recknagel, F.; Carpenter, C.; Bormans, M. Predicting eutrophication effects in
615
the Burrinjuck Reservoir (Australia) by means of the deterministic model SALMO and the
616
recurrent neural network model ANNA. Ecol. Model 2001, 146 (1), 97-113.
617 618 619 620 621 622
(72) Dorner, S. M.; Anderson, W. B.; Slawson, R. M.; Kouwen, N.; Huck, P. M. Hydrologic Modeling of Pathogen Fate and Transport Environ. Sci. Technol. 2006, 40 (15), 4746–4753. (73) Haydon, S.; Deletic, A. Development of a coupled pathogen-hydrologic catchment model J. Hydrol. 2006, 328 (3-4), 467–480. (74) Panday, S.; Huyakorn, P. S. A fully coupled physically-based spatially-distributed model for evaluating surface/subsurface flow Adv. Water Resour. 2004, 27 (4), 361–382.
623
(75) Shen, C.; Phanikumar, M. S. A process-based, distributed hydrologic model based on a
624
large-scale method for surface–subsurface coupling Adv. Water Resour. 2010, 33 (12), 1524–
625
1541.
27 ACS Paragon Plus Environment
Environmental Science & Technology
Page 28 of 34
626
(76) Safaie, A.; Litchman, E.; Phanikumar, M. S. Evaluating the role of groundwater in
627
circulation and thermal structure within a deep, inland lake, Adv Water Resour. 2017, 108,
628
310-327.
629
(77) Niu, J.; Phanikumar, M. S. Modeling watershed-scale solute transport using an integrated,
630
process-based hydrologic model with applications to bacterial fate and transport J. Hydrol.
631
2015, 529 (1), 35-48.
28 ACS Paragon Plus Environment
Page 29 of 34
632 633 634
Environmental Science & Technology
Tables Table 1 The model results for LGEC during the testing period. Models Model 1
Model 2
Input Parameters NIO
NAR
Model 3 NARX Model 4
Model 5
WA-NAR
LGEC at BD site LGEC at BD site LGEC at OD1 site LGEC at OD2 site LGEC at OD3 site LGEC LGDISCH24HR BD_LNTURB LNTURB WTEMP4HR Rainfall LGEC LNTURB LNWHSIG4HR WNDSPD_ONSHORE WTEMP4HR Rainfall Wavelet-decomposed coefficients of LGEC
Sites OD1 OD2 OD3 BD OD1 OD2
R2 0.53 0.43 0.46 0.8 0.38 0.34
RMSE 0.33 0.53 0.32 0.11 0.24 0.41
OD3
0.59
0.45
BD
0.86
0.16
OD1
0.8
0.15
OD2
0.82
0.31
OD3
0.8
0.23
BD
0.94
0.07
OD1
0.77
0.26
OD2
0.83
0.23
OD3
0.82
0.29
BD
0.86
0.05
OD1
0.62
0.07
OD2
0.57
0.1
OD3
0.62
0.11
635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 29 ACS Paragon Plus Environment
Environmental Science & Technology
650 651
Table 2. The model performance metrics of R2 and RMSE for different prediction time scales Sites BD
OD1
OD2
OD3 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678
Page 30 of 34
Prediction time (d) 0.25 0.5 1 0.25 0.5 1 0.25 0.5 1 0.25 0.5 1
R2 0.83 0.8 0.91 0.51 0.34 0.77 0.7 0.62 0.56 0.53 0.54 0.6
RMSE 0.04 0.04 0.07 0.06 0.06 0.07 0.08 0.08 0.15 0.08 0.08 0.13
Figures Figure 1. Map of southern Lake Michigan showing the beach sites and the tributaries. Figure 2. (a and b) Performance of the NIO model with LGEC data at BD site as input: (a) Linear regression plot of the simulation results at OD1. (b) Time series of data and simulations at OD1. (c and d) Performance of the NAR model with original LGEC data at the beach site as input: (c) Linear regression plot of the simulation results at OD1. (d) Time series of data and simulations at OD1. (e and f) Performance of the NARX model (Model 3) with LGEC, LGDISCH24HR, LNTURB, WTEMP4HR and rainfall as inputs: (e) Linear regression plot of the simulation results at OD1. (f) Time series of data and simulations at OD1. In the scatterplots (panels on the left), blue, red and green symbols denote data used during the training, validation and test periods respectively while lines of the same color represent the best-fit lines. The black dashed line in the left panels represents the perfect fit (1:1) line. The time series of original data (black circles) and simulation results (blue, red and green lines) are shown in the panels on the right. Figure 3. Performance of the WA-NAR model at BD site based on wavelet decomposition of LGEC as inputs. (a) Linear regression plot of the simulation results. (b) Time series of data and simulations. In the left panel, the x- and y-axes represent the observed and simulated LGEC values.
30 ACS Paragon Plus Environment
Page 31 of 34
Environmental Science & Technology
ACS Paragon Plus Environment
2.5
Environmental Science3.5 & Technology
1 to 1 Training data Validation data Testing data Fit, training Fit, validation Fit, testing
2 1.5 1
(a)
3 2.5
0.5 0.5
1
1.5
2
2.5
2 1.5 1 0.5
3
0.5 0 3.5 3
0.5
1
1.5
2
Model 2: R2_train=0.41 R2_vali=0.21 R2_test=0.38 2.5
1
3
0 3.5
2
2
1.5
1.5 1
Model 3: R2_train=0.81 R2_vali=0.85 R2_test=0.8 0.5
1
1.5
2
Observed, log10(E. coli )
2.5
3
Model 3
1.5
2.5
0
(f)
2
2.5
0.5
Model 2
2.5
3
1
(d)
0.5
(e)
0
Model 1
1
(c)
0 0 3.5
(b)
1.5
log10(E. coli )
Predicted, log10(E. coli )
2.5
Page 32 of 34
2 Model 1: R2_train=0.58 R2_vali=0.34 R2_test=0.53
0 0 3
Observation Simulation, training Simulation, validation Simulation, testing
0.5 0 3.5
160
ACS Paragon Plus Environment
170
180
190
200
210
220
Julian Day (2008)
230
240
250
Page 33 of 34
Environmental Science & Technology
4.5
Predicted, log10(E. coli )
4 3.5
1 to 1 Training data Validation data Testing data Fit, training Fit, validation Fit, testing
(a)
3 2.5
R2 _train=0.88 R2 _vali=0.64 R2 _test=0.86
2 1.5 1.5
2
2.5
4.5 4
log10(E.Coli)
3.5
3
3.5
4
Observed, log10(E. coli )
Observation Simulation, training Simulation, validation Simulation, testing
4.5
(b)
3
2.5 2 1.5 160
170
180
190 200 210 ACS Paragon Plus Environment
Julian Day (2008)
220
230
240
250
Environmental Science & Technology
ACS Paragon Plus Environment
Page 34 of 34