Variation of Bacterial Communities with Water ... - ACS Publications

Jean Pierre Nshimyimana†‡∥⊥ , Adam Joshua Ehrich Freedman‡∥, Peter Shanahan‡∥, Lloyd C. H. Chua§, and Janelle R. Thompson‡∥. † ...
0 downloads 0 Views 2MB Size
Subscriber access provided by ORTA DOGU TEKNIK UNIVERSITESI KUTUPHANESI

Article

Variation of Bacterial Communities with Water Quality in an Urban Tropical Catchment Jean Pierre Nshimyimana, Adam Joshua Ehrich Freedman, Peter Shanahan, Lloyd C.H. Chua, and Janelle Renee Thompson Environ. Sci. Technol., Just Accepted Manuscript • DOI: 10.1021/acs.est.6b04737 • Publication Date (Web): 17 Apr 2017 Downloaded from http://pubs.acs.org on April 18, 2017

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Environmental Science & Technology is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 38

Environmental Science & Technology

1

Variation of Bacterial Communities with Water Quality in an Urban Tropical Catchment

2

Jean Pierre Nshimyimana1, 2, 4, 5, Adam Joshua Ehrich Freedman2,4, Peter Shanahan2, 4, Lloyd C.

3

H. Chua3, and Janelle R. Thompson2, 4, *

4

1. School of Civil and Environmental Engineering, Nanyang Technological University (NTU),

5 6 7

50 Nanyang Avenue, Singapore 639798, Singapore 2. Department of Civil and Environmental Engineering, Massachusetts Institute of Technology (MIT), 77 Massachusetts Avenue, Cambridge, MA 02139, USA

8

3. School of Engineering, Deakin University, Waurn Ponds, Geelong, Victoria 3216, Australia

9

4. Centre for Environmental Sensing and Modeling (CENSAM), Singapore-MIT Alliance for

10 11 12

Research and Technology (SMART), 1 Create Way, Singapore 138602, Singapore 5. Singapore Center on Environmental Life Sciences Engineering (SCELSE), NTU, 60 Nanyang Drive, Singapore 637551

13

14

*Corresponding author: Janelle Thompson: [email protected]

15

1 ACS Paragon Plus Environment

Environmental Science & Technology

16

17 18 19 20 21 22 23 24 25 26 27 28 29

2 ACS Paragon Plus Environment

Page 2 of 38

Page 3 of 38

30

31

Environmental Science & Technology

Abstract A major challenge for assessment of water quality in tropical environments is the natural

32

occurrence and potential growth of Fecal Indicator Bacteria (FIB). To gain a better

33

understanding of the relationship between measured levels of FIB and the distribution of sewage-

34

associated bacteria including potential pathogens in the tropics this study compared the

35

abundance of FIB (Total coliforms and E. coli) and the Bacteroidales (HF183 marker) with

36

bacterial community structure determined by next-generation amplicon sequencing. Water was

37

sampled twice over 6 months from 18 sites within a tropical urban catchment and reservoir,

38

followed by extraction of DNA from microorganisms, and sequencing targeting the V3-V4

39

region of the 16S rRNA gene. Multivariate statistical analyses indicated that bacterial

40

community composition (BCC) varied between reservoir and catchment, within catchment land-

41

uses, and with E. coli concentration. Beta-regression indicated that the proportion of sequences

42

from sewage-associated taxa (SAT) or pathogen-like sequences (PLS) were predicted most

43

significantly by measured levels of E. coli(log MPN/100ml) (χ2>8.7; p400bp which enable more confident taxonomic assignment. NGS now provides the opportunity

77

to examine how microbial diversity at the genus and species-level varies with water quality

78

predicted by the abundance of FIB and source tracking markers, especially in tropical

79

environments. While numerous studies have attempted to relate concentrations of specific

80

pathogen targets to FIB measured in a water body e.g. 35-38 such studies generally showed poor to

81

moderate correlation, likely due to highly variable dynamics of a specific pathogen target in a

82

complex environment. A survey-based approach such as NGS provides the opportunity to

83

simultaneously evaluate a diversity of microorganisms and identify potential risk agents without

84

pre-defining targeted groups.

85

The tropical urban island of Singapore has an advanced and reputable water and

86

wastewater management infrastructure where stormwater is collected through engineered drains

87

and stored in reservoirs. Studies of surface microbial water quality in Singapore have noted

88

elevated levels of FIB (E. coli, total coliforms, and Enterococci) at sites from various catchments

89

during dry weather (i.e. defined as >48 hours after a rain event) 39-42.

90

The goal of this study was to apply IlluminaTM MiSeq 16S rRNA gene amplitag

91

sequencing to characterize the bacterial community composition (BCC) including sequences

92

related to sewage-associated taxa (SAT) and human pathogens (i.e. pathogen-like sequences,

93

PLS) in water samples collected from an urban reservoir and catchment in Singapore. We

94

hypothesized that bacterial communities in samples would vary with site, land-use, and sample

95

date, reflecting seasonal and spatial ecology. We also hypothesized that sites with high measured

3 ACS Paragon Plus Environment

Environmental Science & Technology

96

levels of E. coli and/or HF183 marker would harbor bacterial communities enriched in pathogen-

97

like bacteria or SAT across land-uses and sample dates. Our findings will be useful for

98

evaluating the utility of next-generation sequencing to identify impaired tropical waters and to

99

identify specific bacterial targets that may be relevant for further monitoring using quantitative

100

methods.

101

2. Methodology

102

2.1 Study design and sample collection

103

2.1.1 Sampling and Site Characterization

104

Water samples from 18 sites in an urban reservoir and catchment in the northwest of

105

Singapore Island were collected during dry weather in January and July 2009 (Figure S5) 39. Dry

106

weather was defined as >48 hours following a rainfall event based on rain gauges distributed

107

around the watershed and monitored by the Public Utilities Board41. Two additional samples

108

were obtained from municipal sewers in a high-density residential area within the catchment in

109

January 2010. The catchment covers 61 square kilometers with mixed land uses where the

110

residential areas (R) (19%) are distinguished by a high-density population in high-rise buildings

111

and farming areas (F) (5%) are characterized by horticultural and agricultural activities including

112

small scale production of flowers, vegetables, ornamental fish, and chicken eggs 39, 40. The

113

undeveloped area (U), the largest of the land-use categories covering 76% of the catchment, is

114

maintained as limited-access land dominated by native vegetation (Figure S5). Land use data was

115

provided by the Singapore Public Utilities Board in the form of a GIS shapefile43.

116

Storm water and sewage are transported in the catchment via separate conveyance

117

systems. An underground system conveys sewage to wastewater treatment plants, while storm

118

water and surface runoff are drained by open concrete-lined channels that discharge into rivers 4 ACS Paragon Plus Environment

Page 6 of 38

Page 7 of 38

Environmental Science & Technology

119

and reservoirs. Some of the farming areas are served by on-site sewage and wastewater treatment

120

systems, while the rest of the farming areas and residential area is served by the underground

121

sewerage system. Catchment water samples were collected from open concrete-lined channels

122

conveying water to the reservoir. The majority of catchment collection sites were drains in small

123

upstream watersheds with uniform land use. Exceptions were sites F9, F10, and R2 classified as

124

residential or horticultural, which also drained a minor proportion of undeveloped lands.

125

Reservoir water samples were collected at four stations approximately 800m and 1,200m apart to

126

provide spatial coverage of the reservoir surface.

127

The climate is tropical with seasons defined by prevailing wind directions and weather

128

patterns corresponding to the Northeast Monsoon (December to March), Southwest Monsoon

129

(June to September), and two-inter Monsoon seasons, with year-round temperatures ranging

130

from highs of 29 to 31°C during the day to lows of 23 to 24°C at night. Water temperatures vary

131

between 27 and 29°C throughout the year 39.

132

2.1.2 Quantification of HF183 and IDEXX-based enumeration of E. coli and Total

133

coliforms

134

Analysis of samples by DNA extraction and qPCR-based quantification of the HF183

135

marker according to MIQE standards has been reported in a prior publication40. In brief,

136

particulates from water samples were concentrated onto 0.2-µm-pore-size cartridge filters

137

(Millipore, Billerica, MA, USA), subjected to extraction of environmental DNA, and

138

quantification of the Bacteroidales HF183 marker in units of genome equivalents (GE) by

139

qPCR40. Enumeration of FIB (E. coli, Total Coliforms) by the most-probable-number (MPN)

140

method (IDEXX Laboratories, Inc., Westbrook, ME, USA) was carried out with 100 ml volumes

141

of undiluted sample, or with 1:10 or 1:100 sample dilutions in sterile deionized water40. The

5 ACS Paragon Plus Environment

Environmental Science & Technology

142

detection limit for HF183 was 150 GE/100ml while for E. coli the detection limit was 1

143

MPN/100ml40.

144

2.2 Illumina sequencing

145

2.2.1 Library preparation Environmental DNA was used as template for PCRs for Illumina library preparation

146 147

targeting the V3 to V4 16S rRNA region as described in Preheim et al. 44 with modification of

148

primers (Table 1). Briefly, the 16S rRNA gene was amplified using Taq polymerase (New

149

England BioLabs® Inc, Ipswich, MA, USA) in 20 µl of reaction volume containing 100 µM each

150

of primers 357F 45 and 806R 46, 10 mM dNTPs, 50 mM MgCl2, and bovine serum albumin

151

(BSA). To avoid cycling templates past the mid-log phase and to normalize template

152

concentration, samples were subjected to Real-time qPCR to determine the optimum PCR cycles

153

for library construction (15 to 27 cycles). A no-sample DNA extraction control was included as

154

template and did not amplify during qPCR or during library construction, therefore was not

155

included in sequencing. Illumina adaptors and barcodes were added as previously described 44

156

(Table 1). Barcoded PCR products at the predicted size of 550-650 bp were gel purified

157

(QIAQuick Gel extraction kit, QIAGEN®, Valencia, CA, USA) and sequenced using the

158

Illumina MiSeq platform at the MIT BioMicro Center (Cambridge, MA, USA) (Table S5). All

159

DNA sequences generated in this study have been deposited in Genbank (accession numbers

160

KX967493-KX976459).

161

2.2.2

162

Next generation 16S rRNA gene sequencing Base-calling and quality filtering were implemented by Illumina MiSeq software to

163

generate FASTQ files containing sequences and quality scores. Resulting FASTQ files were

164

demultiplexed based on barcode sequence and were processed through the UPARSE pipeline for

6 ACS Paragon Plus Environment

Page 8 of 38

Page 9 of 38

Environmental Science & Technology

165

additional quality control and identification of operational taxonomic units (OTUs) at 97%

166

nucleotide identity 47. Overlapping regions of each paired-end sequence were merged to create a

167

single read. Sequences were then quality filtered by adjusted Q score, globally trimmed to 400bp

168

(sequences shorter than 400bp were discarded), and were de-replicated. Following OTU

169

clustering all singleton sequences were discarded per recommended settings and chimeric

170

sequences were identified using UCHIME 47. OTUs were taxonomically classified based on

171

representative sequences (cluster centroids) from kingdom to species using Silva ARB software

172

48

173

mapped to OTUs to create a matrix of sequence abundance.

174

2.3 Identification of pathogen-like sequences (PLS) and sewage-associated taxa (SAT)

175

with a bootstrap value of 60 % as assignment cut off. Trimmed and filtered sequences were

OTUs were screened to identify genera and species corresponding to human etiological

176

agents as indicated by the US National Institute of Health (US NIH) 49, the Pathosystems

177

Resources Integration Center (PATRIC) in collaboration with the National Institute of Allergy

178

and Infectious Diseases (NIAID) 50, and a database of emerging infectious diseases 51. In

179

addition, all OTUs assigned to pathogen-bearing genera were screened for species-level

180

relatedness to potential bacterial pathogens obtained from clinical specimens associated with

181

human disease using BlastN with the criterion of ≥99% sequence identity where the best-hit

182

sequence was confirmed via BLAST distance-based clustering. Sewage-associated taxa (SAT)

183

were identified by one of two criteria: 1) as OTUs annotated to a genus previously determined as

184

sewage-associated by McClellan and co-workers 1 , or 2) OTUs shared by two municipal sewage

185

samples from Singapore with annotations indicating that they were derived from sewage or the

186

human gut (Table S2).

7 ACS Paragon Plus Environment

Environmental Science & Technology

187 188

2.4 Data Analysis Multivariate analysis of bacterial community composition (BCC) and the diversity of

189

pathogen-like sequences (PLS) and sewage-associated taxa (SAT) was conducted in

190

PERMANOVA+ for Plymouth Routines In Multivariate Ecological Research (PRIMER) V7 52.

191

Principal Coordinate Analysis (PCO) and ANOSIM (analysis of similarity) of Bray-Curtis

192

similarity indices were used to identify samples with similar bacterial community composition.

193

Permutational multivariate analysis of variance (PERMANOVA) was used to explore how BCC,

194

SAT, and PLS varied with land use or sampling dates and was implemented for OTUs (BCC,

195

and PLSs) or genera (SAT, PLSg). Concentrations of fecal indicator bacteria (Total coliforms

196

and E. coli, MPN/100 ml) or HF183 marker (GE/100 ml) were log-transformed prior to all

197

statistical analyses and modeling. The relationship between bacterial community composition

198

(BCC) and log E. coli concentration or log HF183 GE/100ml was determined using the BIONEV

199

best selection procedure routine with AIC (Akaike information criterion) as the selection

200

criterion based on 999 permutations in PERMANOVA+. The variation in BCC, SAT, or PLS

201

explained by the abundance of the log HF183 marker or log E. coli concentration was

202

determined by distance-based linear modeling (DistLM) routine implementing the AIC selection

203

criteria and Best procedure followed by application of the marginal test 53. Similarity Percentages

204

(SIMPER) calculated by decomposing average Bray-Curtis dissimilarity between all pairs of

205

samples into percentage contributions from each taxa, were used to identify taxa contributing to

206

the similarity or dissimilarity of bacterial communities sampled in the catchment and reservoir.

207

The sequence diversity in samples was compared at different sampling efforts by

208

rarefaction analysis through the permute, lattice, and vegan packages in R Version 3.2.454-57. The

209

distribution of log10-transformed indicator bacteria (E. coli and total coliforms) and HF183

8 ACS Paragon Plus Environment

Page 10 of 38

Page 11 of 38

Environmental Science & Technology

210

marker levels across samples was examined by Pearson's correlation and hierarchical clustering

211

using Ward's method on standardized data (JMP Pro v.12). The extent to which log E. coli

212

concentration, log HF183, land use, and sample date accounted for variability in the proportion

213

of either PLS or SAT sequences observed across all samples in the dataset was modeled using

214

beta-regression implemented via maximum likelihood in JMP Pro v.12 (SAS Institute Inc., Cary,

215

NC, USA). To confirm the robustness of observed trends models were also run for catchment

216

samples only. Beta-regression was selected as it models a continuous dependent variable

217

restricted to the interval (0, 1) with respect to continuous and/or categorical predictor variables

218

through a regression structure58. The statistical significance of individual predictors was assessed

219

via the Wald Chi Squared test.

220

3. Results

221

3.1 Bacterial Community Composition (BCC) in an Urban Reservoir and Catchment

222

A total of 3,810,864 paired-end Illumina MiSeq reads were quality filtered and

223

overlapping paired ends were merged into 1,189,972 sequences ranging from 17,986 to 67,583

224

sequences per sample (also referred to herein as “reads”). Sequences were mapped onto 9,205

225

OTUs using the UPARSE pipeline (Table 2). All OTUs classified as bacterial (8, 967) were

226

classified according to phylum and 96.4%, 94.0%, 89.3%, and 67.0% were classified to class,

227

order, family, and genus, respectively. Overall, sequences from the Proteobacteria phyla

228

dominated most samples (57% of sequences) followed by Bacteriodetes (16%), Cyanobacteria

229

(9%), Firmicutes (6%), and Actinobacteria (4%) (Figure 1A). Rarefaction analysis of OTU

230

richness indicated that, despite > 17,000 reads per sample, most sites were not sampled to

231

saturation suggesting undiscovered diversity (Figure 1B).

9 ACS Paragon Plus Environment

Environmental Science & Technology

232

To evaluate the potential role of spatial and seasonal ecology in structuring bacterial

233

communities in the reservoir and catchment samples the effects of land-use, sample site and

234

sample month was examined. Catchment samples collected from the same site on two different

235

dates harbored bacterial communities that were significantly correlated (ANOSIM R=0.32,

236

p=0.03). The distribution of bacterial OTUs varied significantly between reservoir and catchment

237

samples (PERMANOVA p=0.001, F =8.7) (Figure 2A-B) and among catchment land uses

238

(PERMANOVA p=0.009, F=2) but did not vary significantly between months of sample

239

collection (PERMANOVA p=0.16, F=1.2). Reservoir samples clustered away from samples

240

collected from the catchment by PCO (Figure 2A) and were enriched in sequences from

241

Cyanobacteria, Planctomycetes, Chlorobi, Bacteroidetes, and Chloroflexi (Spearman R=0.65 to

242

0.9 to PCO1). Horticultural and residential samples were enriched in Proteobacteria (Spearman

243

R=0.50 to PCO2) and Firmicutes (Spearman R=-0.98 to PCO1) (Figure 2A). Taken together

244

these results suggest that characteristics of the sampling locations played a stronger role

245

influencing the overall bacterial community composition than temporal variation.

246

3.2 Relationship between BCC, Fecal Indicator Bacteria (FIB), and HF183

247

To examine the hypothesis that sites with elevated FIB would harbor distinct bacterial

248

communities, the relationship between BCC, FIB (E. coli, total coliform), and HF183 was

249

examined in the reservoir and catchment samples. Total coliform was highly co-linear with E.

250

coli(R=0.84), thus E. coli was used to represent both in subsequent analysis. As previously

251

reported40, the majority of catchment samples were associated with E. coli levels greater than the

252

US EPA single grab sample threshold of 235 MPN/100ml and E. coli concentrations were

253

significantly related to land-use. The range of E. coli in samples considered in this study was

254

below detection to 2.0 x 105 MPN/100ml, while HF183 concentrations ranged from 4.6 x 102 to

10 ACS Paragon Plus Environment

Page 12 of 38

Page 13 of 38

Environmental Science & Technology

255

9.5 x 105 GE/100ml. The composition and diversity of bacterial communities in the samples was

256

correlated to a combination of log E. coli (MPN/100ml) and log HF183 (GE/100ml) (BIOENV

257

Spearman R=0.48), and explained a combined cumulative variance of the bacterial community

258

structure of 25.3% (DistLM, R2=0.25). E. coli concentrations explained more variation in the

259

composition of bacterial communities than HF183 (E. coli, variation of 12%, p=0.001 compared

260

to HF183, variation of 5.5%, p=0.016) (Table 3).

261

3.3 Distribution and Composition of Sewage-Associated Taxa (SAT)

262

To examine the hypothesis that sites with elevated levels of FIB would harbor signatures

263

of sewage-associated taxa (SAT), we identified (30,087) reads (n=16 genera) that corresponded

264

to OTUs shared by sewage samples analyzed as part of this study (n=2) or to bacterial groups

265

proposed by McLellan et al. (2010) as associated with human fecal pollution. Sewage-associated

266

sequences from this study shared substantial overlap with SAT described by McLellan et al.

267

(2010) including the shared genera Bifidobacterium, Bacteroides, Parabacteroides,

268

Faecalibacterium, Roseburia, Ruminococcus, Akkermansia, Subdoligranulum, Papillibacter, and

269

Sutterella (Table S2)1.

270

The composition of SAT sequences varied with catchment land use (PERMANOVA, p =

271

0.004) but not sample month (Figure 2C, Table S1) with the genera Prevotella,

272

Faecalibacterium, and Bifidobacterium enriched in horticultural areas and Papillibacter enriched

273

in residential areas (Table S4). The composition of SAT was moderately correlated to measured

274

levels of E. coli and weakly correlated with HF183 levels (BIOENV: E. coli, R=0.55 and HF183,

275

R=0.1). E. coli and HF183 explained a combined cumulative variance of 37.6% (R2=0.37) in

276

SAT composition with E. coli explaining more variation than HF183 (E. coli: 22.3%, p=0.001

277

compared to HF183: 14.4%, p=0.001) (Table 3).

11 ACS Paragon Plus Environment

Environmental Science & Technology

278

The proportion of SAT sequences were highest in horticultural areas (0.05% to 16.5% of

279

total reads; n=14) and lowest in the reservoir (0.2).

291

3. 4 Distribution and Composition of Pathogen-like Sequences

292

To examine the hypothesis that samples with elevated levels of FIB would also harbor

293

signatures of potential human pathogens, we classified sequences as pathogen-like based on

294

named genus or species-level identity to known or emerging pathogens by BlastN. Out of 75,687

295

sequences, 6.3% were classified to 33 genera harboring known pathogens (PLSg) (Table 2)

296

while 2.3% of sequences matched pathogens at the species level (PLSs). The most highly

297

represented PLSg were Acinetobacter (38%), Arcobacter (22%), Pseudomonas (8.2%),

298

Aeromonas (7.4%), and Clostridium (7%). Samples with the highest and lowest contribution

299

from PLSg sequences were respectively F10_7 (33%) and K4_1 (0.43%). The composition of

300

PLSg and PLSs in samples clustered distinctly with reservoir or catchment origin (Figure 2D and

12 ACS Paragon Plus Environment

Page 14 of 38

Page 15 of 38

Environmental Science & Technology

301

Figure S3) and varied with catchment land use and collection month (PERMANOVA, p0.1

χ2=3.35, p=0.067

SAT

All Samples

0.66

χ2=14.0, p=0.0002

p>0.1

χ2=7.6, p=0.0057

SAT

Catchment only

0.61

χ2=12.0, p=0.0005

p>0.1 χ2=13.1, p=0.0003 χ2=12.8, p=0.0003

p>0.1

χ2=6.0, p=0.014

*Datasets considered were all samples (n=36) or catchment-only samples (n=30).

24 ACS Paragon Plus Environment

Page 27 of 38

Environmental Science & Technology

Figure 1: (A) Highly represented phyla across samples collected from the reservoir and catchment. Sample codes indicate land use: R=Residential, U=Undeveloped, K=Reservoir, F=Horticultural/Farming, sample number, and collection date “_1” identifies samples collected January 2009, and the rest were collected in July 2009. (B) Rarefaction analysis of species richness in individual samples. Line color corresponds to land use: Red=Reservoir, Blue=Residential, Cyan=Undeveloped, Green=Horticultural, and Purple = Reference samples. Reference samples 114_Sw and 115_Sw were collected January 2010 from sewage infrastructure

25 ACS Paragon Plus Environment

Environmental Science & Technology

Figure 2. Multivariate analysis of bacterial community composition based on Principal Coordinate Analysis (PCO) of Bray-Curtis resemblance between samples. Sample codes indicate land use: R=Residential, U=Undeveloped, K=Reservoir, and F=Horticultural/Farming. Bacterial communities are distinguished by (A) bacterial phyla (B) OTUs, (C) SAT OTUs, and (D) PLS OTUs in the catchment (horticultural, residential, and undeveloped) and reservoir sites. (A) Individual bacterial phyla contributing to variation were determined by Spearman correlation (R>0.65) to the first two PCO axis and are represented by vectors.

26 ACS Paragon Plus Environment

Page 28 of 38

Page 29 of 38

Environmental Science & Technology

Figure 3. Draftsman plot of quantities considered in this study: Log HF183 (GE/100ml and Copies/ng), E. coli (MPN/100ml), and proportions of sequences corresponding to SAT, PLSg, PLSs, and B. dorei OTU45. Significant Pearson correlations (p