Subscriber access provided by University of Newcastle, Australia
Article
A High-Throughput DNA Sequencing Approach to Determine Sources of Fecal Bacteria in a Lake Superior Estuary Clairessa M Brown, Christopher Staley, Ping Wang, Brent Dalzell, Chan Lan Chun, and Michael J. Sadowsky Environ. Sci. Technol., Just Accepted Manuscript • DOI: 10.1021/acs.est.7b01353 • Publication Date (Web): 22 Jun 2017 Downloaded from http://pubs.acs.org on June 25, 2017
Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.
Environmental Science & Technology is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.
Page 1 of 30
Environmental Science & Technology
A High-Throughput DNA Sequencing Approach to Determine Sources of Fecal Bacteria in a Lake Superior Estuary
1 2 3 4 5
Clairessa M. Brown1, Christopher Staley1, Ping Wang1, Brent Dalzell2, Chan Lan Chun3, and
6
Michael J. Sadowsky1,*
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
1
BioTechnology Institute, University of Minnesota, St. Paul, MN, 2Department of Soil, Water, Climate, University of Minnesota, St. Paul, MN, and 3Natural Resources Research Institute and Department of Civil Engineering, University of Minnesota Duluth, Duluth MN
*Address Correspondences to: Dr. Michael J. Sadowsky, 1479 Gortner Ave., 140 Gortner Labs, BioTechnology Institute, University of Minnesota. St. Paul, MN 55108 USA; Email:
[email protected]; Phone (612) 624-2706
Keywords: next-generation sequencing, high-throughput sequencing, microbial source tracking, fecal pollution, Lake Superior, bacterial community structure
ACS Paragon Plus Environment
Environmental Science & Technology
29 30
Page 2 of 30
Abstract
31
Current microbial source tracking (MST) methods, employed to determine sources of fecal
32
contamination in waterways, use molecular markers targeting host-associated bacteria in animal
33
or human feces. However, there is a lack of knowledge about fecal microbiome composition in
34
several animals and imperfect marker specificity and sensitivity. To overcome these issues, a
35
community-based MST method has been developed. Here, we describe a study done in the Lake
36
Superior-St. Louis River estuary using SourceTracker, a program that calculates the source
37
contribution to an environment. High-throughput DNA sequencing of microbiota from a diverse
38
collection of fecal samples obtained from 11 types of animals (wild, agricultural, and
39
domesticated) and treated effluent (n=233), was used to generate a fecal library to perform
40
community-based MST. Analysis of 319 fecal and environmental samples revealed that the
41
community compositions in water and fecal samples were significantly different, allowing for
42
determination of the presence of fecal inputs and identification of specific sources.
43
SourceTracker results indicated that fecal bacterial inputs into the Lake Superior estuary were
44
primarily attributed to wastewater effluent, and to a lesser extent geese and gull wastes. These
45
results suggest that a community-based MST method may be another useful tool to determine
46
sources of aquatic fecal bacteria.
47
ACS Paragon Plus Environment
2
Page 3 of 30
48 49 50 51
Environmental Science & Technology
1. Introduction Microbial source tracking methods (MST) rely on the use of specific microorganisms or
52
bacterial profiles, thought to be host-associated (e.g. human, ruminant, bird), to determine
53
sources of fecal bacteria in the environment 1. The intestinal tracts of different animals harbor
54
distinct bacterial communities that vary in the presence and abundance of specific taxa 2. This
55
allows for the determination of different fecal sources by using unique bacterial profiles seen
56
within different animals’ fecal microbiomes.
57
Currently, the most common MST methods involve the use of quantitative PCR (qPCR)
58
and molecular markers (primers) that mainly target the 16S rRNA genes of presumptively host-
59
associated microorganisms. The majority of these markers target members of the genus
60
Bacteroides. However, developing qPCR markers that are both sensitive and specific has proven
61
to be challenging in several cases 3–6.
62
The introduction of high-throughput DNA sequencing (HTS) technology has allowed
63
increased use of culture-independent methods to rapidly assess bacterial community structure in
64
several different environments 2,7–10. Community-based MST methods involve the creation of a
65
library of operational taxonomic units (OTUs) present in feces from different animal types,
66
featuring source-specific microbial profiles. These libraries, referred to as fecal taxon libraries
67
(FTL), are compared with bacterial community profiles from environmental samples to find
68
shared taxa or OTUs between the two sample types.
69
Several community-based MST studies 7,8,11–13,14 have utilized the SourceTracker
70
program 15 which accepts quality-controlled sequence data, along with a list of samples either
71
identified as a potential fecal source or an environmental sink. SourceTracker uses a Bayesian
ACS Paragon Plus Environment
3
Environmental Science & Technology
72
statistical approach to determine the percentage of the bacterial community, and the probability,
73
that a potential fecal source contributed to an environment 15.
Page 4 of 30
74
While SourceTracker results include the standard deviation for an estimation of error,
75
higher confidence in the Bayesian model used to predict sources can be achieved by running
76
SourceTracker multiple times, in a bootstrap-like manner, and measuring the uncertainty of the
77
model using the relative standard deviation (RSD) 13. An important consideration with
78
community-based MST methods is determining the appropriate library size of samples for each
79
source. This can be established by using statistical power analysis, which determines whether the
80
sample size is large enough to be able to detect significant differences between sample groups.
81
The Dirichlet multinomial distribution can be used to model data during power analysis, such
82
that Type II error due to overdispersion is alleviated 16. This approach has been successfully used
83
to model microbial community analysis data 17.
84
Lake Superior, the largest freshwater lake in the world by area, is used for recreational
85
and commercial purposes. Fecal bacterial inputs onto beaches in the Lake Superior harbor area,
86
Park Point, and the St. Louis River estuary, measured by E. coli concentrations, have been found
87
to be seasonally impacted by waterfowl 18. Results obtained using qPCR in previous research
88
have also indicated that the largest point source contributor of fecal-associated bacteria to the
89
Lake Superior-Duluth Harbor was treated wastewater effluent, likely originating from two local
90
wastewater treatment plants (WWTP) 19,20.
91
In this study, we examined potential sources of fecal bacterial inputs into the Duluth-
92
Superior Harbor and St. Louis River in the Lake Superior watershed employing a community-
93
based MST approach using Illumina DNA sequencing data and SourceTracker analyses. To
94
achieve this goal, water samples were collected over two years, from seven different sites in the
ACS Paragon Plus Environment
4
Page 5 of 30
Environmental Science & Technology
95
Duluth-Superior Harbor and St. Louis River estuary in Duluth, MN, USA along with fecal
96
samples from 11 different types of animal sources and treated wastewater effluent. Samples were
97
sequenced and high-quality HTS data from animal and effluent samples were used to create a
98
FTL for community-based MST. Power analyses, done using a Dirichlet multinomial distribution
99
to model 16S community data, were used to determine whether an appropriate library size had
100
been obtained for the entire FTL. Multiple SourceTracker runs allowed fecal sources to be
101
identified, along with RSD values which allowed for an estimation of confidence to be assigned
102
to the predicted proportions of fecal sources. Results of this study show that a community-based
103
MST approach, with an appropriately-sized FTL, could potentially be used as a tool to allow
104
watershed managers to confidently predict sources of fecal inputs into waterways.
105 106 107 108
2.1 Sample collection. Triplicate, two-liter water samples were collected just below the
109
surface from seven sites in the Lake Superior-St. Louis estuary during summer and fall 2014 and
110
the summer of 2015 (Fig. 1). These sites included the: St. Louis River site 1 in 2014 and St.
111
Louis River site 2 in 2015, Western Lake Superior Sanitary District (WLSSD) outfall, Rice’s
112
Point, Brewery Creek storm drain, Southworth Marsh storm drain, and Minnesota Point Beach
113
sites in both years (Table S1). The St. Louis River site in 2014 changed due to accessibility.
114
Water was sampled on July 23, August 20, and November 5 in 2014. In 2015, water was
115
sampled on June 16 and July 27. All water samples were transported on ice to the lab, and stored
116
at 4°C for less than 24 hours prior to filtration.
117 118
2. Methods
Fecal samples were collected from 11 different animal types from a variety of sources including farms, via wildlife managers, and from personal pet donations across Minnesota.
ACS Paragon Plus Environment
5
Environmental Science & Technology
119
Several grams of feces were collected from individual chickens, cows, production turkeys,
120
swine, beavers, gulls, Canada geese, wild and domesticated rabbits, deer, cats, and dogs.
121
Triplicate, two-liter samples of treated wastewater effluent were collected from WLSSD during
122
every sampling event.
123
Page 6 of 30
2.2 Sample processing and DNA extraction. Fecal samples were transported at 4°C (on
124
ice) and stored at -20°C until DNA could be extracted. All triplicate 2L environmental water
125
samples were pre-filtered through 5µm nitrocellulose filters (Millipore-Sigma, St. Louis, MO) to
126
remove debris and larger microorganisms and processed as previously described 21. The
127
environmental water was subsequently filtered through 0.45µm, and then 0.22µm, nitrocellulose
128
filters (Millipore-Sigma, St. Louis, MO) to capture all bacteria. Filters were transferred into 50ml
129
conical tubes containing 2ml of 0.01% sodium pyrophosphate buffer, pH 7.0 containing 0.2%
130
Tween 20 (polyethylene glycol sorbitan monolaurate) and vortexed, twice, for 3 min at room
131
temperature. Each conical tube held up to five filters from the same environmental site, and some
132
of the 0.45µm and 0.22µm filters were placed in the same tube if space allowed. The supernatant,
133
containing re-suspended cells from the filters, were transferred from the conical tubes to 1.5ml
134
Eppendorf tubes and centrifuged for 3 min at 13,300×g. Cell pellets from 0.45µm and 0.22µm
135
filters from the same sample were ultimately combined if the filters from the same sample were
136
placed in different 50ml conical tubes. Samples were stored at -20°C until DNA was extracted.
137
DNA from fecal and environmental water samples was extracted using the DNeasy PowerSoil
138
DNA extraction kit (Qiagen, CA, USA), as per kit directions.
139
2.3 PCR and DNA sequencing. PCR and DNA sequencing were performed at the
140
University of Minnesota Genomics Center (St. Paul, MN, USA) using primers F784 (5’
141
RGGATTAGATACCC 3’) 22 and 1046R (5’ CGACRRCCATGCANCACCT 3’) 23 , targeting
ACS Paragon Plus Environment
6
Page 7 of 30
Environmental Science & Technology
142
the V5 and V6 regions of the 16S rRNA gene. Sequencing was done using the dual indexing
143
method as previously described 24. Amplicons were paired-end sequenced on the Illumina HiSeq
144
2000, HiSeq 2500 (150bp), and MiSeq (300bp) platforms (Illumina, San Diego, CA).
145
Sequencing results can be accessed from GenBank under BioProject PRJNA377760.
146
2.4 Processing of sequence data. All sequencing data obtained from the Illumina MiSeq
147
platform runs were trimmed to 150 bp to match the run length obtained from Illumina HiSeq
148
runs. All sequence processing was performed using QIIME versions 1.8.0 and 1.9.1 software 25.
149
Illumina adapter contamination and low quality base regions were removed using Trimmomatic
150
v. 3.2 26. Primers, homopolymers >8, and reads smaller than 75% of the amplicon length were
151
removed using Pandaseq 27. This program was also used to concatenate reads using the fastq-join
152
script 28. Chimeras were removed using UCHIME 6.1 29. Open reference OTUs were grouped
153
using uclust, at 97% identity, and compared to the SILVA ver. 119 reference database 30,31 using
154
the PyNAST alignment algorithm 25. Taxonomy was assigned utilizing the RDP Classifier, with
155
an 80% bootstrap value 32. Singletons were removed from the dataset and all data that passed
156
quality control were used for statistical analysis. The final dataset was comprised of DNA
157
sequences from 20 beavers, 14 cats, 16 chickens, 32 cows, 19 deer, 17 dogs, 25 geese, 14 gulls,
158
18 wild and domesticated rabbits (treated as a single source), 18 swine, 18 turkeys, and 22
159
effluent samples.
160
2.5 Statistical analyses. Statistical analyses were done using QIIME v. 1.8.0 25, RStudio
161
v. 0.99.896, R v. 3.2.1 33 and mothur v. 1.34.0 34. After sequence processing, multiple rarefaction
162
depths were evaluated by random sampling of sequences so that the number of observed taxa
163
were close to the number of expected taxa. A final sequence depth of 25,000 was chosen for all
164
subsequent statistical analysis. Bray Curtis dissimilarity 35 was calculated in mothur and was
ACS Paragon Plus Environment
7
Environmental Science & Technology
Page 8 of 30
165
used in principal coordinates analysis (PCoA) and analysis of molecular variance (AMOVA) 36.
166
Hierarchical clustering was calculated in the R package pvclust in RStudio 37 using the UPGMA
167
method. Clustering was performed on a Bray Curtis dissimilarity matrix with 1,000 bootstrap
168
iterations containing the averages of family-level taxa abundances from all sample types.
169
Community-based MST was done using SourceTracker v. 1.0 15 through QIIME v. 1.9.0 25.
170
While family-level taxa tables rarefied to a sequencing depth of 25,000 were used,
171
SourceTracker was run with default parameters five independent times on the same FTL.
172
Spearman rank correlations were performed in RStudio to determine the correlation between
173
predicted source proportions obtained via SourceTracker and RSD values determined from the
174
five independent runs of the SourceTracker program.
175
RSD analysis was performed to estimate confidence in SourceTracker proportions and
176
was calculated by using the average standard deviations of a sample across the five independent
177
SourceTracker runs. This value was divided by the average predicted source proportions of that
178
same sample obtained from the five independent SourceTracker runs.
179
Power analyses were performed on all animal fecal samples to determine whether
180
sufficient samples from each source type were present in the library to avoid increased statistical
181
Type II error. Library size analysis was done by reducing the number of individuals within each
182
source type (ranging from 14 to 32) and then running power analysis using the R HMP package
183
to determine a minimum size of the library 16. If the source type did not have as many samples as
184
required, all samples were used.
185 186 187 188 189 190
ACS Paragon Plus Environment
8
Page 9 of 30
191 192 193 194
Environmental Science & Technology
3. Results 3.1 Bacterial community structure among water and fecal samples Water samples from seven different locations in the Duluth-Superior Harbor and St.
195
Louis River estuary in Duluth, MN and animal fecal samples were collected during 2014 and
196
2015. The V5 and V6 regions of the 16S rRNA gene were sequenced from 319 environmental
197
lake water, fecal, and treated wastewater effluent samples yielding nearly 60 million reads. The
198
233 fecal samples were obtained from 12 different domesticated, agricultural, and wild animal
199
types, as well as wastewater effluent, and together the source sequences comprised the FTL used
200
for community-based MST.
201
The average sequencing coverage for water and feces was 99%, and ranged from 97% to
202
100% (Table S2). Feces, on average, had lower diversity than did lake water (Table S2). Feces
203
had, on average, a Shannon index of 4.2 and 1,672 OTUs that clustered at 97% similarity, while
204
freshwater samples had an average Shannon index of 4.8 and 3,272 OTUs clustering at 97%
205
similarity (Table S2).
206
Hierarchical clustering showed that the environmental water and animal samples
207
clustered separately, indicating the microbiota in feces is different from that found in water (Fig.
208
2A). The most common microbiota found in environmental water communities included
209
members of the families Sporichthyaceae, LD12, and Comamonadaceae (Fig. 2B). In contrast,
210
members of the families Erysipelotrichaceae, Peptostreptococcaceae, and Ruminococcaceae
211
were prevalent in the treated wastewater effluent samples, and animal fecal samples were
212
dominated by members of the families Ruminococcaceae, Lachnospiraceae, and
213
Peptostreptococcaceae (Fig. 2B). Nearly 30% of the sequencing reads from rabbits, swine, deer,
214
cows, and beavers were classified as Ruminococcaceae and Lachnospiraceae. Cats, dogs, and
ACS Paragon Plus Environment
9
Environmental Science & Technology
Page 10 of 30
215
chickens had nearly equal amounts of those families, as well as several other families, which
216
explains why their samples clustered together (Fig. 2A). While geese had nearly equal amounts
217
of Ruminococcaceae and Lachnospiraceae, which was similar to cats, dogs, and chickens,
218
several of the families that make up the community structure were relatively different (Fig. 2A).
219
Similar to what was found with hierarchical clustering, PCoA revealed that fecal and
220
environmental water samples clustered separately (Fig. 3). AMOVA pairwise-comparisons
221
identified significantly different (p