Subscriber access provided by ORTA DOGU TEKNIK UNIVERSITESI KUTUPHANESI
Article
Variation of Bacterial Communities with Water Quality in an Urban Tropical Catchment Jean Pierre Nshimyimana, Adam Joshua Ehrich Freedman, Peter Shanahan, Lloyd C.H. Chua, and Janelle Renee Thompson Environ. Sci. Technol., Just Accepted Manuscript • DOI: 10.1021/acs.est.6b04737 • Publication Date (Web): 17 Apr 2017 Downloaded from http://pubs.acs.org on April 18, 2017
Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.
Environmental Science & Technology is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.
Page 1 of 38
Environmental Science & Technology
1
Variation of Bacterial Communities with Water Quality in an Urban Tropical Catchment
2
Jean Pierre Nshimyimana1, 2, 4, 5, Adam Joshua Ehrich Freedman2,4, Peter Shanahan2, 4, Lloyd C.
3
H. Chua3, and Janelle R. Thompson2, 4, *
4
1. School of Civil and Environmental Engineering, Nanyang Technological University (NTU),
5 6 7
50 Nanyang Avenue, Singapore 639798, Singapore 2. Department of Civil and Environmental Engineering, Massachusetts Institute of Technology (MIT), 77 Massachusetts Avenue, Cambridge, MA 02139, USA
8
3. School of Engineering, Deakin University, Waurn Ponds, Geelong, Victoria 3216, Australia
9
4. Centre for Environmental Sensing and Modeling (CENSAM), Singapore-MIT Alliance for
10 11 12
Research and Technology (SMART), 1 Create Way, Singapore 138602, Singapore 5. Singapore Center on Environmental Life Sciences Engineering (SCELSE), NTU, 60 Nanyang Drive, Singapore 637551
13
14
*Corresponding author: Janelle Thompson:
[email protected] 15
1 ACS Paragon Plus Environment
Environmental Science & Technology
16
17 18 19 20 21 22 23 24 25 26 27 28 29
2 ACS Paragon Plus Environment
Page 2 of 38
Page 3 of 38
30
31
Environmental Science & Technology
Abstract A major challenge for assessment of water quality in tropical environments is the natural
32
occurrence and potential growth of Fecal Indicator Bacteria (FIB). To gain a better
33
understanding of the relationship between measured levels of FIB and the distribution of sewage-
34
associated bacteria including potential pathogens in the tropics this study compared the
35
abundance of FIB (Total coliforms and E. coli) and the Bacteroidales (HF183 marker) with
36
bacterial community structure determined by next-generation amplicon sequencing. Water was
37
sampled twice over 6 months from 18 sites within a tropical urban catchment and reservoir,
38
followed by extraction of DNA from microorganisms, and sequencing targeting the V3-V4
39
region of the 16S rRNA gene. Multivariate statistical analyses indicated that bacterial
40
community composition (BCC) varied between reservoir and catchment, within catchment land-
41
uses, and with E. coli concentration. Beta-regression indicated that the proportion of sequences
42
from sewage-associated taxa (SAT) or pathogen-like sequences (PLS) were predicted most
43
significantly by measured levels of E. coli(log MPN/100ml) (χ2>8.7; p400bp which enable more confident taxonomic assignment. NGS now provides the opportunity
77
to examine how microbial diversity at the genus and species-level varies with water quality
78
predicted by the abundance of FIB and source tracking markers, especially in tropical
79
environments. While numerous studies have attempted to relate concentrations of specific
80
pathogen targets to FIB measured in a water body e.g. 35-38 such studies generally showed poor to
81
moderate correlation, likely due to highly variable dynamics of a specific pathogen target in a
82
complex environment. A survey-based approach such as NGS provides the opportunity to
83
simultaneously evaluate a diversity of microorganisms and identify potential risk agents without
84
pre-defining targeted groups.
85
The tropical urban island of Singapore has an advanced and reputable water and
86
wastewater management infrastructure where stormwater is collected through engineered drains
87
and stored in reservoirs. Studies of surface microbial water quality in Singapore have noted
88
elevated levels of FIB (E. coli, total coliforms, and Enterococci) at sites from various catchments
89
during dry weather (i.e. defined as >48 hours after a rain event) 39-42.
90
The goal of this study was to apply IlluminaTM MiSeq 16S rRNA gene amplitag
91
sequencing to characterize the bacterial community composition (BCC) including sequences
92
related to sewage-associated taxa (SAT) and human pathogens (i.e. pathogen-like sequences,
93
PLS) in water samples collected from an urban reservoir and catchment in Singapore. We
94
hypothesized that bacterial communities in samples would vary with site, land-use, and sample
95
date, reflecting seasonal and spatial ecology. We also hypothesized that sites with high measured
3 ACS Paragon Plus Environment
Environmental Science & Technology
96
levels of E. coli and/or HF183 marker would harbor bacterial communities enriched in pathogen-
97
like bacteria or SAT across land-uses and sample dates. Our findings will be useful for
98
evaluating the utility of next-generation sequencing to identify impaired tropical waters and to
99
identify specific bacterial targets that may be relevant for further monitoring using quantitative
100
methods.
101
2. Methodology
102
2.1 Study design and sample collection
103
2.1.1 Sampling and Site Characterization
104
Water samples from 18 sites in an urban reservoir and catchment in the northwest of
105
Singapore Island were collected during dry weather in January and July 2009 (Figure S5) 39. Dry
106
weather was defined as >48 hours following a rainfall event based on rain gauges distributed
107
around the watershed and monitored by the Public Utilities Board41. Two additional samples
108
were obtained from municipal sewers in a high-density residential area within the catchment in
109
January 2010. The catchment covers 61 square kilometers with mixed land uses where the
110
residential areas (R) (19%) are distinguished by a high-density population in high-rise buildings
111
and farming areas (F) (5%) are characterized by horticultural and agricultural activities including
112
small scale production of flowers, vegetables, ornamental fish, and chicken eggs 39, 40. The
113
undeveloped area (U), the largest of the land-use categories covering 76% of the catchment, is
114
maintained as limited-access land dominated by native vegetation (Figure S5). Land use data was
115
provided by the Singapore Public Utilities Board in the form of a GIS shapefile43.
116
Storm water and sewage are transported in the catchment via separate conveyance
117
systems. An underground system conveys sewage to wastewater treatment plants, while storm
118
water and surface runoff are drained by open concrete-lined channels that discharge into rivers 4 ACS Paragon Plus Environment
Page 6 of 38
Page 7 of 38
Environmental Science & Technology
119
and reservoirs. Some of the farming areas are served by on-site sewage and wastewater treatment
120
systems, while the rest of the farming areas and residential area is served by the underground
121
sewerage system. Catchment water samples were collected from open concrete-lined channels
122
conveying water to the reservoir. The majority of catchment collection sites were drains in small
123
upstream watersheds with uniform land use. Exceptions were sites F9, F10, and R2 classified as
124
residential or horticultural, which also drained a minor proportion of undeveloped lands.
125
Reservoir water samples were collected at four stations approximately 800m and 1,200m apart to
126
provide spatial coverage of the reservoir surface.
127
The climate is tropical with seasons defined by prevailing wind directions and weather
128
patterns corresponding to the Northeast Monsoon (December to March), Southwest Monsoon
129
(June to September), and two-inter Monsoon seasons, with year-round temperatures ranging
130
from highs of 29 to 31°C during the day to lows of 23 to 24°C at night. Water temperatures vary
131
between 27 and 29°C throughout the year 39.
132
2.1.2 Quantification of HF183 and IDEXX-based enumeration of E. coli and Total
133
coliforms
134
Analysis of samples by DNA extraction and qPCR-based quantification of the HF183
135
marker according to MIQE standards has been reported in a prior publication40. In brief,
136
particulates from water samples were concentrated onto 0.2-µm-pore-size cartridge filters
137
(Millipore, Billerica, MA, USA), subjected to extraction of environmental DNA, and
138
quantification of the Bacteroidales HF183 marker in units of genome equivalents (GE) by
139
qPCR40. Enumeration of FIB (E. coli, Total Coliforms) by the most-probable-number (MPN)
140
method (IDEXX Laboratories, Inc., Westbrook, ME, USA) was carried out with 100 ml volumes
141
of undiluted sample, or with 1:10 or 1:100 sample dilutions in sterile deionized water40. The
5 ACS Paragon Plus Environment
Environmental Science & Technology
142
detection limit for HF183 was 150 GE/100ml while for E. coli the detection limit was 1
143
MPN/100ml40.
144
2.2 Illumina sequencing
145
2.2.1 Library preparation Environmental DNA was used as template for PCRs for Illumina library preparation
146 147
targeting the V3 to V4 16S rRNA region as described in Preheim et al. 44 with modification of
148
primers (Table 1). Briefly, the 16S rRNA gene was amplified using Taq polymerase (New
149
England BioLabs® Inc, Ipswich, MA, USA) in 20 µl of reaction volume containing 100 µM each
150
of primers 357F 45 and 806R 46, 10 mM dNTPs, 50 mM MgCl2, and bovine serum albumin
151
(BSA). To avoid cycling templates past the mid-log phase and to normalize template
152
concentration, samples were subjected to Real-time qPCR to determine the optimum PCR cycles
153
for library construction (15 to 27 cycles). A no-sample DNA extraction control was included as
154
template and did not amplify during qPCR or during library construction, therefore was not
155
included in sequencing. Illumina adaptors and barcodes were added as previously described 44
156
(Table 1). Barcoded PCR products at the predicted size of 550-650 bp were gel purified
157
(QIAQuick Gel extraction kit, QIAGEN®, Valencia, CA, USA) and sequenced using the
158
Illumina MiSeq platform at the MIT BioMicro Center (Cambridge, MA, USA) (Table S5). All
159
DNA sequences generated in this study have been deposited in Genbank (accession numbers
160
KX967493-KX976459).
161
2.2.2
162
Next generation 16S rRNA gene sequencing Base-calling and quality filtering were implemented by Illumina MiSeq software to
163
generate FASTQ files containing sequences and quality scores. Resulting FASTQ files were
164
demultiplexed based on barcode sequence and were processed through the UPARSE pipeline for
6 ACS Paragon Plus Environment
Page 8 of 38
Page 9 of 38
Environmental Science & Technology
165
additional quality control and identification of operational taxonomic units (OTUs) at 97%
166
nucleotide identity 47. Overlapping regions of each paired-end sequence were merged to create a
167
single read. Sequences were then quality filtered by adjusted Q score, globally trimmed to 400bp
168
(sequences shorter than 400bp were discarded), and were de-replicated. Following OTU
169
clustering all singleton sequences were discarded per recommended settings and chimeric
170
sequences were identified using UCHIME 47. OTUs were taxonomically classified based on
171
representative sequences (cluster centroids) from kingdom to species using Silva ARB software
172
48
173
mapped to OTUs to create a matrix of sequence abundance.
174
2.3 Identification of pathogen-like sequences (PLS) and sewage-associated taxa (SAT)
175
with a bootstrap value of 60 % as assignment cut off. Trimmed and filtered sequences were
OTUs were screened to identify genera and species corresponding to human etiological
176
agents as indicated by the US National Institute of Health (US NIH) 49, the Pathosystems
177
Resources Integration Center (PATRIC) in collaboration with the National Institute of Allergy
178
and Infectious Diseases (NIAID) 50, and a database of emerging infectious diseases 51. In
179
addition, all OTUs assigned to pathogen-bearing genera were screened for species-level
180
relatedness to potential bacterial pathogens obtained from clinical specimens associated with
181
human disease using BlastN with the criterion of ≥99% sequence identity where the best-hit
182
sequence was confirmed via BLAST distance-based clustering. Sewage-associated taxa (SAT)
183
were identified by one of two criteria: 1) as OTUs annotated to a genus previously determined as
184
sewage-associated by McClellan and co-workers 1 , or 2) OTUs shared by two municipal sewage
185
samples from Singapore with annotations indicating that they were derived from sewage or the
186
human gut (Table S2).
7 ACS Paragon Plus Environment
Environmental Science & Technology
187 188
2.4 Data Analysis Multivariate analysis of bacterial community composition (BCC) and the diversity of
189
pathogen-like sequences (PLS) and sewage-associated taxa (SAT) was conducted in
190
PERMANOVA+ for Plymouth Routines In Multivariate Ecological Research (PRIMER) V7 52.
191
Principal Coordinate Analysis (PCO) and ANOSIM (analysis of similarity) of Bray-Curtis
192
similarity indices were used to identify samples with similar bacterial community composition.
193
Permutational multivariate analysis of variance (PERMANOVA) was used to explore how BCC,
194
SAT, and PLS varied with land use or sampling dates and was implemented for OTUs (BCC,
195
and PLSs) or genera (SAT, PLSg). Concentrations of fecal indicator bacteria (Total coliforms
196
and E. coli, MPN/100 ml) or HF183 marker (GE/100 ml) were log-transformed prior to all
197
statistical analyses and modeling. The relationship between bacterial community composition
198
(BCC) and log E. coli concentration or log HF183 GE/100ml was determined using the BIONEV
199
best selection procedure routine with AIC (Akaike information criterion) as the selection
200
criterion based on 999 permutations in PERMANOVA+. The variation in BCC, SAT, or PLS
201
explained by the abundance of the log HF183 marker or log E. coli concentration was
202
determined by distance-based linear modeling (DistLM) routine implementing the AIC selection
203
criteria and Best procedure followed by application of the marginal test 53. Similarity Percentages
204
(SIMPER) calculated by decomposing average Bray-Curtis dissimilarity between all pairs of
205
samples into percentage contributions from each taxa, were used to identify taxa contributing to
206
the similarity or dissimilarity of bacterial communities sampled in the catchment and reservoir.
207
The sequence diversity in samples was compared at different sampling efforts by
208
rarefaction analysis through the permute, lattice, and vegan packages in R Version 3.2.454-57. The
209
distribution of log10-transformed indicator bacteria (E. coli and total coliforms) and HF183
8 ACS Paragon Plus Environment
Page 10 of 38
Page 11 of 38
Environmental Science & Technology
210
marker levels across samples was examined by Pearson's correlation and hierarchical clustering
211
using Ward's method on standardized data (JMP Pro v.12). The extent to which log E. coli
212
concentration, log HF183, land use, and sample date accounted for variability in the proportion
213
of either PLS or SAT sequences observed across all samples in the dataset was modeled using
214
beta-regression implemented via maximum likelihood in JMP Pro v.12 (SAS Institute Inc., Cary,
215
NC, USA). To confirm the robustness of observed trends models were also run for catchment
216
samples only. Beta-regression was selected as it models a continuous dependent variable
217
restricted to the interval (0, 1) with respect to continuous and/or categorical predictor variables
218
through a regression structure58. The statistical significance of individual predictors was assessed
219
via the Wald Chi Squared test.
220
3. Results
221
3.1 Bacterial Community Composition (BCC) in an Urban Reservoir and Catchment
222
A total of 3,810,864 paired-end Illumina MiSeq reads were quality filtered and
223
overlapping paired ends were merged into 1,189,972 sequences ranging from 17,986 to 67,583
224
sequences per sample (also referred to herein as “reads”). Sequences were mapped onto 9,205
225
OTUs using the UPARSE pipeline (Table 2). All OTUs classified as bacterial (8, 967) were
226
classified according to phylum and 96.4%, 94.0%, 89.3%, and 67.0% were classified to class,
227
order, family, and genus, respectively. Overall, sequences from the Proteobacteria phyla
228
dominated most samples (57% of sequences) followed by Bacteriodetes (16%), Cyanobacteria
229
(9%), Firmicutes (6%), and Actinobacteria (4%) (Figure 1A). Rarefaction analysis of OTU
230
richness indicated that, despite > 17,000 reads per sample, most sites were not sampled to
231
saturation suggesting undiscovered diversity (Figure 1B).
9 ACS Paragon Plus Environment
Environmental Science & Technology
232
To evaluate the potential role of spatial and seasonal ecology in structuring bacterial
233
communities in the reservoir and catchment samples the effects of land-use, sample site and
234
sample month was examined. Catchment samples collected from the same site on two different
235
dates harbored bacterial communities that were significantly correlated (ANOSIM R=0.32,
236
p=0.03). The distribution of bacterial OTUs varied significantly between reservoir and catchment
237
samples (PERMANOVA p=0.001, F =8.7) (Figure 2A-B) and among catchment land uses
238
(PERMANOVA p=0.009, F=2) but did not vary significantly between months of sample
239
collection (PERMANOVA p=0.16, F=1.2). Reservoir samples clustered away from samples
240
collected from the catchment by PCO (Figure 2A) and were enriched in sequences from
241
Cyanobacteria, Planctomycetes, Chlorobi, Bacteroidetes, and Chloroflexi (Spearman R=0.65 to
242
0.9 to PCO1). Horticultural and residential samples were enriched in Proteobacteria (Spearman
243
R=0.50 to PCO2) and Firmicutes (Spearman R=-0.98 to PCO1) (Figure 2A). Taken together
244
these results suggest that characteristics of the sampling locations played a stronger role
245
influencing the overall bacterial community composition than temporal variation.
246
3.2 Relationship between BCC, Fecal Indicator Bacteria (FIB), and HF183
247
To examine the hypothesis that sites with elevated FIB would harbor distinct bacterial
248
communities, the relationship between BCC, FIB (E. coli, total coliform), and HF183 was
249
examined in the reservoir and catchment samples. Total coliform was highly co-linear with E.
250
coli(R=0.84), thus E. coli was used to represent both in subsequent analysis. As previously
251
reported40, the majority of catchment samples were associated with E. coli levels greater than the
252
US EPA single grab sample threshold of 235 MPN/100ml and E. coli concentrations were
253
significantly related to land-use. The range of E. coli in samples considered in this study was
254
below detection to 2.0 x 105 MPN/100ml, while HF183 concentrations ranged from 4.6 x 102 to
10 ACS Paragon Plus Environment
Page 12 of 38
Page 13 of 38
Environmental Science & Technology
255
9.5 x 105 GE/100ml. The composition and diversity of bacterial communities in the samples was
256
correlated to a combination of log E. coli (MPN/100ml) and log HF183 (GE/100ml) (BIOENV
257
Spearman R=0.48), and explained a combined cumulative variance of the bacterial community
258
structure of 25.3% (DistLM, R2=0.25). E. coli concentrations explained more variation in the
259
composition of bacterial communities than HF183 (E. coli, variation of 12%, p=0.001 compared
260
to HF183, variation of 5.5%, p=0.016) (Table 3).
261
3.3 Distribution and Composition of Sewage-Associated Taxa (SAT)
262
To examine the hypothesis that sites with elevated levels of FIB would harbor signatures
263
of sewage-associated taxa (SAT), we identified (30,087) reads (n=16 genera) that corresponded
264
to OTUs shared by sewage samples analyzed as part of this study (n=2) or to bacterial groups
265
proposed by McLellan et al. (2010) as associated with human fecal pollution. Sewage-associated
266
sequences from this study shared substantial overlap with SAT described by McLellan et al.
267
(2010) including the shared genera Bifidobacterium, Bacteroides, Parabacteroides,
268
Faecalibacterium, Roseburia, Ruminococcus, Akkermansia, Subdoligranulum, Papillibacter, and
269
Sutterella (Table S2)1.
270
The composition of SAT sequences varied with catchment land use (PERMANOVA, p =
271
0.004) but not sample month (Figure 2C, Table S1) with the genera Prevotella,
272
Faecalibacterium, and Bifidobacterium enriched in horticultural areas and Papillibacter enriched
273
in residential areas (Table S4). The composition of SAT was moderately correlated to measured
274
levels of E. coli and weakly correlated with HF183 levels (BIOENV: E. coli, R=0.55 and HF183,
275
R=0.1). E. coli and HF183 explained a combined cumulative variance of 37.6% (R2=0.37) in
276
SAT composition with E. coli explaining more variation than HF183 (E. coli: 22.3%, p=0.001
277
compared to HF183: 14.4%, p=0.001) (Table 3).
11 ACS Paragon Plus Environment
Environmental Science & Technology
278
The proportion of SAT sequences were highest in horticultural areas (0.05% to 16.5% of
279
total reads; n=14) and lowest in the reservoir (0.2).
291
3. 4 Distribution and Composition of Pathogen-like Sequences
292
To examine the hypothesis that samples with elevated levels of FIB would also harbor
293
signatures of potential human pathogens, we classified sequences as pathogen-like based on
294
named genus or species-level identity to known or emerging pathogens by BlastN. Out of 75,687
295
sequences, 6.3% were classified to 33 genera harboring known pathogens (PLSg) (Table 2)
296
while 2.3% of sequences matched pathogens at the species level (PLSs). The most highly
297
represented PLSg were Acinetobacter (38%), Arcobacter (22%), Pseudomonas (8.2%),
298
Aeromonas (7.4%), and Clostridium (7%). Samples with the highest and lowest contribution
299
from PLSg sequences were respectively F10_7 (33%) and K4_1 (0.43%). The composition of
300
PLSg and PLSs in samples clustered distinctly with reservoir or catchment origin (Figure 2D and
12 ACS Paragon Plus Environment
Page 14 of 38
Page 15 of 38
Environmental Science & Technology
301
Figure S3) and varied with catchment land use and collection month (PERMANOVA, p0.1
χ2=3.35, p=0.067
SAT
All Samples
0.66
χ2=14.0, p=0.0002
p>0.1
χ2=7.6, p=0.0057
SAT
Catchment only
0.61
χ2=12.0, p=0.0005
p>0.1 χ2=13.1, p=0.0003 χ2=12.8, p=0.0003
p>0.1
χ2=6.0, p=0.014
*Datasets considered were all samples (n=36) or catchment-only samples (n=30).
24 ACS Paragon Plus Environment
Page 27 of 38
Environmental Science & Technology
Figure 1: (A) Highly represented phyla across samples collected from the reservoir and catchment. Sample codes indicate land use: R=Residential, U=Undeveloped, K=Reservoir, F=Horticultural/Farming, sample number, and collection date “_1” identifies samples collected January 2009, and the rest were collected in July 2009. (B) Rarefaction analysis of species richness in individual samples. Line color corresponds to land use: Red=Reservoir, Blue=Residential, Cyan=Undeveloped, Green=Horticultural, and Purple = Reference samples. Reference samples 114_Sw and 115_Sw were collected January 2010 from sewage infrastructure
25 ACS Paragon Plus Environment
Environmental Science & Technology
Figure 2. Multivariate analysis of bacterial community composition based on Principal Coordinate Analysis (PCO) of Bray-Curtis resemblance between samples. Sample codes indicate land use: R=Residential, U=Undeveloped, K=Reservoir, and F=Horticultural/Farming. Bacterial communities are distinguished by (A) bacterial phyla (B) OTUs, (C) SAT OTUs, and (D) PLS OTUs in the catchment (horticultural, residential, and undeveloped) and reservoir sites. (A) Individual bacterial phyla contributing to variation were determined by Spearman correlation (R>0.65) to the first two PCO axis and are represented by vectors.
26 ACS Paragon Plus Environment
Page 28 of 38
Page 29 of 38
Environmental Science & Technology
Figure 3. Draftsman plot of quantities considered in this study: Log HF183 (GE/100ml and Copies/ng), E. coli (MPN/100ml), and proportions of sequences corresponding to SAT, PLSg, PLSs, and B. dorei OTU45. Significant Pearson correlations (p