Who Smells? Forecasting Taste and Odor in a ... - ACS Publications

Aug 12, 2015 - Who Smells? Forecasting Taste and Odor in a Drinking Water Reservoir ... Environmental Science & Technology 2017 51 (12), 7254-7262...
0 downloads 0 Views 2MB Size
Subscriber access provided by UNSW Library

Article

Who smells? Forecasting taste and odor in a drinking water reservoir Michael J. Kehoe, Kwok P. Chun, and Helen M. Baulch Environ. Sci. Technol., Just Accepted Manuscript • DOI: 10.1021/acs.est.5b00979 • Publication Date (Web): 12 Aug 2015 Downloaded from http://pubs.acs.org on August 13, 2015

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Environmental Science & Technology is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 32

Environmental Science & Technology

1

Who smells? Forecasting taste and odor in a

2

drinking water reservoir

3

Michael J. Kehoe*1,2, Kwok P. Chun2, Helen M. Baulch,1,2

4

1. School of Environment and Sustainability, University of Saskatchewan, Saskatoon,

5

Saskatchewan, Canada

6

2. Global Institute for Water Security, University of Saskatchewan, Saskatoon, Saskatchewan,

7

Canada

8

KEYWORDS taste and odor, forecasting, random forest, water treatment

9 10

* Phone: 1306 966 7226, Fax: 1306 966 1193, Email: [email protected].

11

ACS Paragon Plus Environment

1

Environmental Science & Technology

Page 2 of 32

12 13

ABSTRACT

14

Taste and odor problems can impede public trust in drinking water, and impose major costs on

15

water utilities. The ability to forecast taste and odor events in source waters, in advance, is

16

shown for the first time in this paper. This could allow water utilities to adapt treatment, and

17

where effective treatment is not available, consumers could be warned. A unique 24-year time

18

series, from an important drinking water reservoir in Saskatchewan, Canada, is used to develop

19

forecasting models of odor using chlorophyll a, turbidity, total phosphorous, temperature, and

20

the following odor producing algae taxa: Anabaena spp., Aphanizemenon spp., Oscillatoria spp.,

21

Chlorophyta, Cyclotella spp. and Asterionella spp.. We demonstrate, using linear regression and

22

random forest models, that odor events can be forecast at 0-26 week time lags, and that the

23

models are able to capture a significant increase in threshold odor number in the mid-1990s.

24

Models with a fortnight time-lag show high predictive capacity (R2 = 0.71 for random forest;

25

0.52 for linear regression). Predictive skill declines for time lags from 0 to 15 weeks, then

26

increases again, to R2 values of 0.61 (random forest) and 0.48 (linear regression) at a 26-week

27

lag. The random forest model is also able to provide accurate forecasting of TON levels

28

requiring treatment 12 weeks in advance - 93% true positive rate with 0% false positive rate.

29

Results of the random forest model demonstrate that phytoplankton taxonomic data outperform

30

chlorophyll a in terms of predictive importance.

31

ACS Paragon Plus Environment

2

Page 3 of 32

32

Environmental Science & Technology

INTRODUCTION

33 34

Smell and taste are the primary ways people assess the quality of their drinking water. The

35

occurrence of taste and odor (T&O) compounds in treated water can erode public confidence in

36

drinking water safety 1. Furthermore, T&O’s may reflect anthropogenic degradation of water

37

quality 2. It is not surprising that taste and odor compounds in drinking water have long had

38

significant social and economic effects 3. The costs associated with treatment and avoidance of

39

these problems are substantial 4. In the US, it is estimated that consumers now spend more than

40

$813 million annually on bottled water, to avoid tastes and odors associated with algal

41

metabolites 5. At the Buffalo Pound Water Treatment Plant, in Saskatchewan Canada,

42

regeneration of the approximate 1000 tonnes of granular activated carbon, used each year for

43

removal of T&O compounds, amounts to 8% of the annual plant operating costs (Personal

44

communication from Ryan Johnson 6). Besides affecting drinking water, T&O compounds also

45

spoil the taste of fish – creating major issues in the aquaculture industry 7.

46 47

Taste and odor compounds are produced by a number of different phytoplankton and bacteria

48

genera. Cyanobacteria and actinobacteria produce two of the most problematic T&Os: geosmin

49

(trans-1, 10-dimethyl-trans-9-decalol) and MIB (2-methylisoborneol). Both compounds are

50

recalcitrant to common treatment options 4 and can be detected at extremely low concentrations –

51

1.3 ng/L for geosmin and 6.3 ng/L for MIB 8. Amongst the cyanobacteria, only filamentous

52

genera have been found to produce geosmin and MIB 9. Like cyanobacteria, only a subset of

53

actinobacteria produce geosmin and MIB 10. Actinobacteria are commonly assumed to contribute

ACS Paragon Plus Environment

3

Environmental Science & Technology

Page 4 of 32

54

to aquatic taste and odor via runoff from soils into surface waters 9. However, actinobacteria can

55

also function within aquatic environments producing MIB from the metabolism of phytoplankton

56

as a carbon source 11. Diatoms and chrysophytes can also produce taste and odor compounds. In

57

their case, the T&Os are produced after death due to enzymatic degradation of their

58

polyunsaturated fatty acids by bacteria 12. These T&Os are more easily degraded than geosmin

59

and MIB and as a result, tend to be a lesser issue for drinking water treatment. As with

60

cyanobacteria and actinobacteria, only particular species of diatoms and chrysophytes are

61

associated with T&Os. One notable species of diatom producing T&Os is Cyclotella - a common

62

constituent of spring blooms in temperate lakes which can produce sulfur-based T&O’s 13.

63

Beyond these microbial sources there are a number of other compounds which can cause taste

64

and odor issues. These include pesticides and other pollutants, and chemicals used in treatment 8.

65 66

The development of models for predicting or forecasting taste and odor events has been

67

hampered by a lack of long-term time series. To date most studies have used linear regression

68

models that contain common parameters associated with phytoplankton productivity (e.g.,

69

chlorophyll a, turbidity/water transparency, and total phosphorus) to predict concentrations of

70

taste and odor compounds 14-18. This linear modelling approach has been extended to non-linear

71

models, which include a broader range of parameters, including microbial abundance data 19, 20.

72

Most recently, non-linear models have been developed that include detailed measurements of

73

hydrodynamics and phytoplankton data with a view to incorporation in hydrodynamic models 21.

74

However, all of these models are based on short-term datasets – ranging from single time points

75

up to three years 21- which can make assessment of model performance and relationships among

76

variables difficult.

ACS Paragon Plus Environment

4

Page 5 of 32

Environmental Science & Technology

77 78

In this paper, we develop, for the first time, forecasting models of odor in a drinking water

79

source using routine water quality inputs and algal numbers. Our models are tested for their

80

ability to predict taste and odor at future times, not just current conditions. This is something not

81

explicitly explored in previous studies. We apply the longest known time series of odor

82

dynamics to calibrate and validate random forest and linear regression models. The primary

83

modelling objective was to predict threshold odor number a fortnight in advance as this is a

84

timescale which allows preparation of treatment options and public warning if needed. But we

85

also tested the predictive ability of models forecasting at a range of time delays in order to

86

understand at what time scale odor can be predicted. The uncertainty of the models predictions is

87

quantified and receiver operator curves are constructed for selected models.

88 89

MATERIALS AND METHODS

90 91

Study site and data description

92 93

Buffalo Pound Lake (Saskatchewan, Canada) (S1) is a eutrophic reservoir that supplies drinking

94

water to approximately 1/4 of the population of Saskatchewan – approximately 230,000 people.

95

The lake is shallow (3 m average depth), narrow (1 km), and long (29 km). Originally a very

96

shallow natural lake, the lake depth has been increased by the installation of a dam (the first

97

being constructed in 1939). Motivated in part by persistent issues with taste and odor, the

ACS Paragon Plus Environment

5

Environmental Science & Technology

Page 6 of 32

98

Buffalo Pound water treatment plant has been monitoring a range of water quality parameters

99

since 1977. Collected weekly, the data includes threshold odor number (TON) along with

100

standard water quality parameters such as chlorophyll a, total phosphorus, temperature and

101

turbidity (methods summarized in S2). The data also includes weekly phytoplankton count. Algal

102

genus identification was conducted by trained staff but not trained taxonomists. This data

103

afforded an opportunity to identify whether phytoplankton abundance (by genera as well as

104

division) provides more useful predictors than the more commonly used biogeochemical

105

parameters. In this study, we restrict our analysis to periods where complete weekly data for the

106

parameters of interest were available. This meant that possible predictors, nitrate and ammonia,

107

were excluded from the models (due to variation in measurement frequency). Some

108

phytoplankton data has been misplaced and so a continuous data set was not available.

109

Nonetheless, despite these shortcomings, 1251 weeks (24 years) of data, spanning 1982-2011,

110

were left with which to calibrate and validate forecasting models.

111 112

Threshold odor number

113 114

Threshold odor number (TON) is an indicator of water odor persistence -not intensity-

115

determined using serial dilutions with odor-free water. The water treatment plant uses a modified

116

version of the 2150B method of the American Public Health Association 22. Beginning with the

117

most diluted samples first, multiple trained individuals (an odor panel) are asked to report the

118

first dilution at which the odor can be detected. Despite analytical advances that now allow the

119

detection of individual T&O compounds at very low concentrations, TON remains a common

ACS Paragon Plus Environment

6

Page 7 of 32

Environmental Science & Technology

23, 24

120

method for determining the magnitude of taste and odor compounds

121

method include low cost, simplicity (with no complex instrumentation requirements), and

122

generality – that is -- all compounds perceptible to the odor panel are reported, rather than having

123

to test for the tens of compounds that can cause taste and odor 8. Furthermore, if water is from a

124

common source and proper standardized procedures are followed, variation in human perception

125

is similar to the variation of direct chemical analysis 25. However, critiques of the method include

126

the fact it is not a direct measure of odor intensity and may not correlate with consumer

127

complaints or reflect treatment options (26 ; but see

128

and not absolute intensity. While more recent sensory methods (e.g., flavor profile analyses

129

may ultimately supersede TON, TON data constitute a valuable long-term source of information

130

on odor problems where records and consistent methodology have been maintained.

24

. Advantages of this

). It is also a measure of odor persistence 22

)

131 132

Model Development

133 134

The dataset was filtered to exclude parameters, and time periods for which weekly data were

135

not available. Due to missing phytoplankton data, the periods 1985-1987 and 1993-1995 were

136

omitted from the model. This left 1275 weeks with which to construct the models.

137

following 9 predictors were then chosen for our model-based analyses: chlorophyll a, turbidity,

138

total phosphorous, temperature, and the following phytoplankton taxa: Anabaena spp.,

139

Aphanizemenon spp./Oscillatoria spp., Chlorophyta, Cyclotella spp. and Asterionella spp.

140

Aphanizemenon spp. and Oscillatoria spp. data were combined because the data record had them

141

sometimes recorded separately and sometimes together.

142

phosphorous and temperature have all featured in previous odor modelling studies (Table S3)

The

Chlorophyll a, turbidity, total

ACS Paragon Plus Environment

7

Environmental Science & Technology

Page 8 of 32

143

and we wanted to see how they compared against the algal data as predictors. Predictors were

144

used in linear and non-linear models with the objective of predicting natural logarithm

145

transformed TON. Linear (regression) and non-linear models (random forests) were calibrated

146

and

147

calibration/validation process was repeated 100 times in order to quantify the uncertainty

148

resulting from the choice of calibration and validation dataset.

validated

(90-10%

split)

on

randomized

subsets

of

the total

dataset.

This

149 150

The point of any forecasting model is to predict events in advance. If water treatment plants can

151

be warned of taste and odour problems before they happen then they can make preparations to

152

treat and manage the problem. It also affords managers time to reassure the community that,

153

while unpleasant, common taste and odour compounds are safe for consumption. By comparing

154

the predictive performance of models at different time lags it is possible determine which

155

forecasting time lags are feasible. Models were calibrated for 27 different forecasting time-lags

156

ranging from the current levels (i.e. no time to lag) up to 26 weeks in advance – in one week

157

increments. More formally, models were calibrated to predict taste and odour ( ) for the

158

predictor variables  ,

159 160

 = ( ) (Equation 1)

161 162

Where  is the specific model (random forest or linear regression), and n is a range of different

163

time lags (n=0-26). This allowed quantification of how the predictive importance of the predictor

164

variables changes with the forecasting time-lag.

165

ACS Paragon Plus Environment

8

Page 9 of 32

Environmental Science & Technology

166

The model construction methodology contained elements which were similar for both the linear

167

regression and random forest models, as well as some which were different. In what follows the

168

linear regression and random forest models are described, then the general procedure used to

169

calibrate, validate and measure model performance is explained in detail.

170

The linear regression model constructed according to equation 2:

171

( ) = ∑

 ( )  +  (Equation 2)

172

Where ( ) is the predicted log-transformed TON values,  is bias, and  are the respective

173

coefficients of each of the n=9 predictor variables ( ). Uncertainty in model predictions was

174

calculated at the 95% confidence level. All standard assumptions of the linear regression model

175

were tested. The primary purpose of using a linear model was to provide a baseline against

176

which to compare the non-linear random forest model. The R function ‘lm’

177

linear regression modelling.

27

was used for all

178 179

Random forests are an ensemble machine learning method which constructs a non-linear

180

function based on the mean response of an ensemble of simpler decision tree models 28. Random

181

forests are increasingly being used to model and understand environmental systems 29. The

182

random forest is an application of bagging to decisions trees. Bagging is an ensemble modelling

183

method designed to avoid over-fitting of models 30. A large number of simple or weak models

184

are constructed on random sub-samples of a dataset, and then aggregated in some way - usually

185

averaging in the case of regression, and mode in the case of decision making. Bagging can be

186

applied to any model type, for example, a linear regression model, but when applied to decision

ACS Paragon Plus Environment

9

Environmental Science & Technology

Page 10 of 32

187

trees it is known as a random forest 28. Construction of a random forest proceeds as follows.

188

First, a random subset of the whole data set is selected and a decision tree constructed 31. This

189

random model construction process is then repeated a number of times until an ensemble of

190

decision trees exists. Each member of the random forest ensemble is a simple decision tree

191

biased towards predicting their own particular training data, and they make poor predictors of the

192

total dataset. However, provided the trees are not correlated – i.e. not calibrated on similar data

193

sets - when the mean prediction (in the case of regression) of a large number of these randomly

194

constructed simple decisions trees (forest) is calculated, they produce low variance and unbiased

195

predictions 28. Besides the mean response, higher order statistics- such as standard deviation or

196

skewness- can also be useful. The standard deviation of the ensemble predictions provides

197

information on the degree of disagreement amongst trees on a particular prediction. Here we

198

report uncertainty as plus and minus two standard deviations from the mean ensemble prediction.

199

A further advantage of random forests is that they are able to also provide information on the

200

relative importance of different predictors. This is done by considering how prediction accuracy

201

changes when a given parameter is excluded from the model. Here this is calculated as the

202

average reduction in mean square error. All random forest models developed here were with the

203

‘randomForest’ package 32. Here 500 trees were constructed for each ensemble and bagging was

204

carried out with replacement. This means that for each iteration, the data used to calibrate a

205

member decision tree was returned to the total dataset for possible subsequent selection.

206 207

The same general procedure was followed for the development of both the linear regression and

208

random forest models. For each model type a random 90% subset of the available dataset was

209

chosen to be for calibration with the remaining 10% reserved for validation. The performance of

ACS Paragon Plus Environment

10

Page 11 of 32

Environmental Science & Technology

210

model at predicting log-transformed TON on both the calibration and validation data sets was

211

quantified using the coefficient of determination or R2

212

    = 1 − ∑ (Equation 2)  ( )

∑ (  ) 



213 214

The choice of which subset of data to use for calibration and which to use for validation can

215

affect model identification. If processes underlying the data set are not continuously present then

216

there is potential for the decision of how to break up the data set in to calibration and validation

217

datasets to affect model identification. For example, different choices of calibration set could

218

result in different parameter estimations for the linear regression model. Here we wanted to

219

estimate the uncertainty involved in this choice. To do this the calibration/validation procedure

220

was repeated, on different randomly selected subsets, 100 times. The 95% confidence level of

221

R2, random forest variable importance and linear regression model coefficients were all then

222

estimated as their mean value plus and minus two standard deviations.

223 224

The performance of the random forest was assessed further. The two-week time lag model

225

performance was assessed by correlation analyses between the observed and predictive values.

226

Then the ability of the random forest model to correctly classify exceedance of TON of 10 at 2

227

week, 12 week and 26 week time lag was assessed using receiver operator curve plots.

228

Exceedance of TON 10 is sufficient to trigger extra treatment procedure at the water treatment

229

plant so it is operationally important that the models are good at predicting this. Receiver

230

operator curves are a method for determining a corresponding threshold in model predictions

ACS Paragon Plus Environment

11

Environmental Science & Technology

Page 12 of 32

231

which best balance correctly predicting exceedance of TON 10 (true positives) versus incorrectly

232

predicting exceedance of TON 10 (false positives). First the transformed observed TON data was

233

converted a boolean variable based on whether or not the individual values exceeded 10.

234

Receiver operator curves were then constructed using the R package ‘ROCR’ 33. The method

235

implemented in the package calculates the true positive and false positive rates for different

236

choices of thresholds which cover the range of model predictions. For example, a threshold

237

greater than the highest model prediction means all the model predictions are mapped to FALSE

238

(no exceedance of TON 10). This leads to a true positive rate and false positive rate of zero by

239

definition. The threshold is then progressively increased and corresponding true positive and

240

false positive values calculated. Eventually the model predictions are all mapped to TRUE and

241

the true positive and false positive rate are both one by definition. By analyzing a plot of the true

242

positive rate against the false positive rate, a threshold in the model predictions which best

243

predict exceedance of TON 10 but minimize incorrect prediction of this value can be identified.

244

By calculating curves for random forest models of short (2 week), mid (12 week) and long (26

245

week) time lags it is possible to provide the water treatment plant with predictions of the

246

likelihood of TON exceeding the treatment threshold (true positive rate) well ahead of time and

247

also quantify the likelihood of these predictions being wrong (false positive rate). With ROC

248

analysis the greater the area under the curve the better the performance of the model at

249

classifying the observed data at all thresholds. The optimal threshold is somewhat an arbitrary

250

choice which depends on the relative benefits and costs of true positive and false positives.

251

However, the value which best balances true positive predictions versus false positives is the

252

point on the graph closest to the top left of the graph, and it was this measure we chose to use to

ACS Paragon Plus Environment

12

Page 13 of 32

Environmental Science & Technology

253

compare the different models. All the ROC analysis was carried out on model predictions of the

254

validation data set (10% of total data).

255 256

Annual TON values in Buffalo Pound appear to have increased since the approximately the mid

257

1990’s. A hypothesis that this might represent a regime shift in the TON variables was tested. A

258

test of regime shift was conducted using a method developed for climate data 34. This method

259

works by conducting sequential t-tests. New values in time are compared to historical values and

260

if they are significantly different (p