Subscriber access provided by UNSW Library
Article
Who smells? Forecasting taste and odor in a drinking water reservoir Michael J. Kehoe, Kwok P. Chun, and Helen M. Baulch Environ. Sci. Technol., Just Accepted Manuscript • DOI: 10.1021/acs.est.5b00979 • Publication Date (Web): 12 Aug 2015 Downloaded from http://pubs.acs.org on August 13, 2015
Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.
Environmental Science & Technology is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.
Page 1 of 32
Environmental Science & Technology
1
Who smells? Forecasting taste and odor in a
2
drinking water reservoir
3
Michael J. Kehoe*1,2, Kwok P. Chun2, Helen M. Baulch,1,2
4
1. School of Environment and Sustainability, University of Saskatchewan, Saskatoon,
5
Saskatchewan, Canada
6
2. Global Institute for Water Security, University of Saskatchewan, Saskatoon, Saskatchewan,
7
Canada
8
KEYWORDS taste and odor, forecasting, random forest, water treatment
9 10
* Phone: 1306 966 7226, Fax: 1306 966 1193, Email:
[email protected].
11
ACS Paragon Plus Environment
1
Environmental Science & Technology
Page 2 of 32
12 13
ABSTRACT
14
Taste and odor problems can impede public trust in drinking water, and impose major costs on
15
water utilities. The ability to forecast taste and odor events in source waters, in advance, is
16
shown for the first time in this paper. This could allow water utilities to adapt treatment, and
17
where effective treatment is not available, consumers could be warned. A unique 24-year time
18
series, from an important drinking water reservoir in Saskatchewan, Canada, is used to develop
19
forecasting models of odor using chlorophyll a, turbidity, total phosphorous, temperature, and
20
the following odor producing algae taxa: Anabaena spp., Aphanizemenon spp., Oscillatoria spp.,
21
Chlorophyta, Cyclotella spp. and Asterionella spp.. We demonstrate, using linear regression and
22
random forest models, that odor events can be forecast at 0-26 week time lags, and that the
23
models are able to capture a significant increase in threshold odor number in the mid-1990s.
24
Models with a fortnight time-lag show high predictive capacity (R2 = 0.71 for random forest;
25
0.52 for linear regression). Predictive skill declines for time lags from 0 to 15 weeks, then
26
increases again, to R2 values of 0.61 (random forest) and 0.48 (linear regression) at a 26-week
27
lag. The random forest model is also able to provide accurate forecasting of TON levels
28
requiring treatment 12 weeks in advance - 93% true positive rate with 0% false positive rate.
29
Results of the random forest model demonstrate that phytoplankton taxonomic data outperform
30
chlorophyll a in terms of predictive importance.
31
ACS Paragon Plus Environment
2
Page 3 of 32
32
Environmental Science & Technology
INTRODUCTION
33 34
Smell and taste are the primary ways people assess the quality of their drinking water. The
35
occurrence of taste and odor (T&O) compounds in treated water can erode public confidence in
36
drinking water safety 1. Furthermore, T&O’s may reflect anthropogenic degradation of water
37
quality 2. It is not surprising that taste and odor compounds in drinking water have long had
38
significant social and economic effects 3. The costs associated with treatment and avoidance of
39
these problems are substantial 4. In the US, it is estimated that consumers now spend more than
40
$813 million annually on bottled water, to avoid tastes and odors associated with algal
41
metabolites 5. At the Buffalo Pound Water Treatment Plant, in Saskatchewan Canada,
42
regeneration of the approximate 1000 tonnes of granular activated carbon, used each year for
43
removal of T&O compounds, amounts to 8% of the annual plant operating costs (Personal
44
communication from Ryan Johnson 6). Besides affecting drinking water, T&O compounds also
45
spoil the taste of fish – creating major issues in the aquaculture industry 7.
46 47
Taste and odor compounds are produced by a number of different phytoplankton and bacteria
48
genera. Cyanobacteria and actinobacteria produce two of the most problematic T&Os: geosmin
49
(trans-1, 10-dimethyl-trans-9-decalol) and MIB (2-methylisoborneol). Both compounds are
50
recalcitrant to common treatment options 4 and can be detected at extremely low concentrations –
51
1.3 ng/L for geosmin and 6.3 ng/L for MIB 8. Amongst the cyanobacteria, only filamentous
52
genera have been found to produce geosmin and MIB 9. Like cyanobacteria, only a subset of
53
actinobacteria produce geosmin and MIB 10. Actinobacteria are commonly assumed to contribute
ACS Paragon Plus Environment
3
Environmental Science & Technology
Page 4 of 32
54
to aquatic taste and odor via runoff from soils into surface waters 9. However, actinobacteria can
55
also function within aquatic environments producing MIB from the metabolism of phytoplankton
56
as a carbon source 11. Diatoms and chrysophytes can also produce taste and odor compounds. In
57
their case, the T&Os are produced after death due to enzymatic degradation of their
58
polyunsaturated fatty acids by bacteria 12. These T&Os are more easily degraded than geosmin
59
and MIB and as a result, tend to be a lesser issue for drinking water treatment. As with
60
cyanobacteria and actinobacteria, only particular species of diatoms and chrysophytes are
61
associated with T&Os. One notable species of diatom producing T&Os is Cyclotella - a common
62
constituent of spring blooms in temperate lakes which can produce sulfur-based T&O’s 13.
63
Beyond these microbial sources there are a number of other compounds which can cause taste
64
and odor issues. These include pesticides and other pollutants, and chemicals used in treatment 8.
65 66
The development of models for predicting or forecasting taste and odor events has been
67
hampered by a lack of long-term time series. To date most studies have used linear regression
68
models that contain common parameters associated with phytoplankton productivity (e.g.,
69
chlorophyll a, turbidity/water transparency, and total phosphorus) to predict concentrations of
70
taste and odor compounds 14-18. This linear modelling approach has been extended to non-linear
71
models, which include a broader range of parameters, including microbial abundance data 19, 20.
72
Most recently, non-linear models have been developed that include detailed measurements of
73
hydrodynamics and phytoplankton data with a view to incorporation in hydrodynamic models 21.
74
However, all of these models are based on short-term datasets – ranging from single time points
75
up to three years 21- which can make assessment of model performance and relationships among
76
variables difficult.
ACS Paragon Plus Environment
4
Page 5 of 32
Environmental Science & Technology
77 78
In this paper, we develop, for the first time, forecasting models of odor in a drinking water
79
source using routine water quality inputs and algal numbers. Our models are tested for their
80
ability to predict taste and odor at future times, not just current conditions. This is something not
81
explicitly explored in previous studies. We apply the longest known time series of odor
82
dynamics to calibrate and validate random forest and linear regression models. The primary
83
modelling objective was to predict threshold odor number a fortnight in advance as this is a
84
timescale which allows preparation of treatment options and public warning if needed. But we
85
also tested the predictive ability of models forecasting at a range of time delays in order to
86
understand at what time scale odor can be predicted. The uncertainty of the models predictions is
87
quantified and receiver operator curves are constructed for selected models.
88 89
MATERIALS AND METHODS
90 91
Study site and data description
92 93
Buffalo Pound Lake (Saskatchewan, Canada) (S1) is a eutrophic reservoir that supplies drinking
94
water to approximately 1/4 of the population of Saskatchewan – approximately 230,000 people.
95
The lake is shallow (3 m average depth), narrow (1 km), and long (29 km). Originally a very
96
shallow natural lake, the lake depth has been increased by the installation of a dam (the first
97
being constructed in 1939). Motivated in part by persistent issues with taste and odor, the
ACS Paragon Plus Environment
5
Environmental Science & Technology
Page 6 of 32
98
Buffalo Pound water treatment plant has been monitoring a range of water quality parameters
99
since 1977. Collected weekly, the data includes threshold odor number (TON) along with
100
standard water quality parameters such as chlorophyll a, total phosphorus, temperature and
101
turbidity (methods summarized in S2). The data also includes weekly phytoplankton count. Algal
102
genus identification was conducted by trained staff but not trained taxonomists. This data
103
afforded an opportunity to identify whether phytoplankton abundance (by genera as well as
104
division) provides more useful predictors than the more commonly used biogeochemical
105
parameters. In this study, we restrict our analysis to periods where complete weekly data for the
106
parameters of interest were available. This meant that possible predictors, nitrate and ammonia,
107
were excluded from the models (due to variation in measurement frequency). Some
108
phytoplankton data has been misplaced and so a continuous data set was not available.
109
Nonetheless, despite these shortcomings, 1251 weeks (24 years) of data, spanning 1982-2011,
110
were left with which to calibrate and validate forecasting models.
111 112
Threshold odor number
113 114
Threshold odor number (TON) is an indicator of water odor persistence -not intensity-
115
determined using serial dilutions with odor-free water. The water treatment plant uses a modified
116
version of the 2150B method of the American Public Health Association 22. Beginning with the
117
most diluted samples first, multiple trained individuals (an odor panel) are asked to report the
118
first dilution at which the odor can be detected. Despite analytical advances that now allow the
119
detection of individual T&O compounds at very low concentrations, TON remains a common
ACS Paragon Plus Environment
6
Page 7 of 32
Environmental Science & Technology
23, 24
120
method for determining the magnitude of taste and odor compounds
121
method include low cost, simplicity (with no complex instrumentation requirements), and
122
generality – that is -- all compounds perceptible to the odor panel are reported, rather than having
123
to test for the tens of compounds that can cause taste and odor 8. Furthermore, if water is from a
124
common source and proper standardized procedures are followed, variation in human perception
125
is similar to the variation of direct chemical analysis 25. However, critiques of the method include
126
the fact it is not a direct measure of odor intensity and may not correlate with consumer
127
complaints or reflect treatment options (26 ; but see
128
and not absolute intensity. While more recent sensory methods (e.g., flavor profile analyses
129
may ultimately supersede TON, TON data constitute a valuable long-term source of information
130
on odor problems where records and consistent methodology have been maintained.
24
. Advantages of this
). It is also a measure of odor persistence 22
)
131 132
Model Development
133 134
The dataset was filtered to exclude parameters, and time periods for which weekly data were
135
not available. Due to missing phytoplankton data, the periods 1985-1987 and 1993-1995 were
136
omitted from the model. This left 1275 weeks with which to construct the models.
137
following 9 predictors were then chosen for our model-based analyses: chlorophyll a, turbidity,
138
total phosphorous, temperature, and the following phytoplankton taxa: Anabaena spp.,
139
Aphanizemenon spp./Oscillatoria spp., Chlorophyta, Cyclotella spp. and Asterionella spp.
140
Aphanizemenon spp. and Oscillatoria spp. data were combined because the data record had them
141
sometimes recorded separately and sometimes together.
142
phosphorous and temperature have all featured in previous odor modelling studies (Table S3)
The
Chlorophyll a, turbidity, total
ACS Paragon Plus Environment
7
Environmental Science & Technology
Page 8 of 32
143
and we wanted to see how they compared against the algal data as predictors. Predictors were
144
used in linear and non-linear models with the objective of predicting natural logarithm
145
transformed TON. Linear (regression) and non-linear models (random forests) were calibrated
146
and
147
calibration/validation process was repeated 100 times in order to quantify the uncertainty
148
resulting from the choice of calibration and validation dataset.
validated
(90-10%
split)
on
randomized
subsets
of
the total
dataset.
This
149 150
The point of any forecasting model is to predict events in advance. If water treatment plants can
151
be warned of taste and odour problems before they happen then they can make preparations to
152
treat and manage the problem. It also affords managers time to reassure the community that,
153
while unpleasant, common taste and odour compounds are safe for consumption. By comparing
154
the predictive performance of models at different time lags it is possible determine which
155
forecasting time lags are feasible. Models were calibrated for 27 different forecasting time-lags
156
ranging from the current levels (i.e. no time to lag) up to 26 weeks in advance – in one week
157
increments. More formally, models were calibrated to predict taste and odour ( ) for the
158
predictor variables ,
159 160
= ( ) (Equation 1)
161 162
Where is the specific model (random forest or linear regression), and n is a range of different
163
time lags (n=0-26). This allowed quantification of how the predictive importance of the predictor
164
variables changes with the forecasting time-lag.
165
ACS Paragon Plus Environment
8
Page 9 of 32
Environmental Science & Technology
166
The model construction methodology contained elements which were similar for both the linear
167
regression and random forest models, as well as some which were different. In what follows the
168
linear regression and random forest models are described, then the general procedure used to
169
calibrate, validate and measure model performance is explained in detail.
170
The linear regression model constructed according to equation 2:
171
( ) = ∑
( ) + (Equation 2)
172
Where ( ) is the predicted log-transformed TON values, is bias, and are the respective
173
coefficients of each of the n=9 predictor variables ( ). Uncertainty in model predictions was
174
calculated at the 95% confidence level. All standard assumptions of the linear regression model
175
were tested. The primary purpose of using a linear model was to provide a baseline against
176
which to compare the non-linear random forest model. The R function ‘lm’
177
linear regression modelling.
27
was used for all
178 179
Random forests are an ensemble machine learning method which constructs a non-linear
180
function based on the mean response of an ensemble of simpler decision tree models 28. Random
181
forests are increasingly being used to model and understand environmental systems 29. The
182
random forest is an application of bagging to decisions trees. Bagging is an ensemble modelling
183
method designed to avoid over-fitting of models 30. A large number of simple or weak models
184
are constructed on random sub-samples of a dataset, and then aggregated in some way - usually
185
averaging in the case of regression, and mode in the case of decision making. Bagging can be
186
applied to any model type, for example, a linear regression model, but when applied to decision
ACS Paragon Plus Environment
9
Environmental Science & Technology
Page 10 of 32
187
trees it is known as a random forest 28. Construction of a random forest proceeds as follows.
188
First, a random subset of the whole data set is selected and a decision tree constructed 31. This
189
random model construction process is then repeated a number of times until an ensemble of
190
decision trees exists. Each member of the random forest ensemble is a simple decision tree
191
biased towards predicting their own particular training data, and they make poor predictors of the
192
total dataset. However, provided the trees are not correlated – i.e. not calibrated on similar data
193
sets - when the mean prediction (in the case of regression) of a large number of these randomly
194
constructed simple decisions trees (forest) is calculated, they produce low variance and unbiased
195
predictions 28. Besides the mean response, higher order statistics- such as standard deviation or
196
skewness- can also be useful. The standard deviation of the ensemble predictions provides
197
information on the degree of disagreement amongst trees on a particular prediction. Here we
198
report uncertainty as plus and minus two standard deviations from the mean ensemble prediction.
199
A further advantage of random forests is that they are able to also provide information on the
200
relative importance of different predictors. This is done by considering how prediction accuracy
201
changes when a given parameter is excluded from the model. Here this is calculated as the
202
average reduction in mean square error. All random forest models developed here were with the
203
‘randomForest’ package 32. Here 500 trees were constructed for each ensemble and bagging was
204
carried out with replacement. This means that for each iteration, the data used to calibrate a
205
member decision tree was returned to the total dataset for possible subsequent selection.
206 207
The same general procedure was followed for the development of both the linear regression and
208
random forest models. For each model type a random 90% subset of the available dataset was
209
chosen to be for calibration with the remaining 10% reserved for validation. The performance of
ACS Paragon Plus Environment
10
Page 11 of 32
Environmental Science & Technology
210
model at predicting log-transformed TON on both the calibration and validation data sets was
211
quantified using the coefficient of determination or R2
212
= 1 − ∑ (Equation 2) ( )
∑ ( )
213 214
The choice of which subset of data to use for calibration and which to use for validation can
215
affect model identification. If processes underlying the data set are not continuously present then
216
there is potential for the decision of how to break up the data set in to calibration and validation
217
datasets to affect model identification. For example, different choices of calibration set could
218
result in different parameter estimations for the linear regression model. Here we wanted to
219
estimate the uncertainty involved in this choice. To do this the calibration/validation procedure
220
was repeated, on different randomly selected subsets, 100 times. The 95% confidence level of
221
R2, random forest variable importance and linear regression model coefficients were all then
222
estimated as their mean value plus and minus two standard deviations.
223 224
The performance of the random forest was assessed further. The two-week time lag model
225
performance was assessed by correlation analyses between the observed and predictive values.
226
Then the ability of the random forest model to correctly classify exceedance of TON of 10 at 2
227
week, 12 week and 26 week time lag was assessed using receiver operator curve plots.
228
Exceedance of TON 10 is sufficient to trigger extra treatment procedure at the water treatment
229
plant so it is operationally important that the models are good at predicting this. Receiver
230
operator curves are a method for determining a corresponding threshold in model predictions
ACS Paragon Plus Environment
11
Environmental Science & Technology
Page 12 of 32
231
which best balance correctly predicting exceedance of TON 10 (true positives) versus incorrectly
232
predicting exceedance of TON 10 (false positives). First the transformed observed TON data was
233
converted a boolean variable based on whether or not the individual values exceeded 10.
234
Receiver operator curves were then constructed using the R package ‘ROCR’ 33. The method
235
implemented in the package calculates the true positive and false positive rates for different
236
choices of thresholds which cover the range of model predictions. For example, a threshold
237
greater than the highest model prediction means all the model predictions are mapped to FALSE
238
(no exceedance of TON 10). This leads to a true positive rate and false positive rate of zero by
239
definition. The threshold is then progressively increased and corresponding true positive and
240
false positive values calculated. Eventually the model predictions are all mapped to TRUE and
241
the true positive and false positive rate are both one by definition. By analyzing a plot of the true
242
positive rate against the false positive rate, a threshold in the model predictions which best
243
predict exceedance of TON 10 but minimize incorrect prediction of this value can be identified.
244
By calculating curves for random forest models of short (2 week), mid (12 week) and long (26
245
week) time lags it is possible to provide the water treatment plant with predictions of the
246
likelihood of TON exceeding the treatment threshold (true positive rate) well ahead of time and
247
also quantify the likelihood of these predictions being wrong (false positive rate). With ROC
248
analysis the greater the area under the curve the better the performance of the model at
249
classifying the observed data at all thresholds. The optimal threshold is somewhat an arbitrary
250
choice which depends on the relative benefits and costs of true positive and false positives.
251
However, the value which best balances true positive predictions versus false positives is the
252
point on the graph closest to the top left of the graph, and it was this measure we chose to use to
ACS Paragon Plus Environment
12
Page 13 of 32
Environmental Science & Technology
253
compare the different models. All the ROC analysis was carried out on model predictions of the
254
validation data set (10% of total data).
255 256
Annual TON values in Buffalo Pound appear to have increased since the approximately the mid
257
1990’s. A hypothesis that this might represent a regime shift in the TON variables was tested. A
258
test of regime shift was conducted using a method developed for climate data 34. This method
259
works by conducting sequential t-tests. New values in time are compared to historical values and
260
if they are significantly different (p