A High-Throughput DNA-Sequencing Approach for Determining

Jun 22, 2017 - Current microbial source-tracking (MST) methods, employed to determine sources of fecal contamination in waterways, use molecular marke...
1 downloads 10 Views 17MB Size
Subscriber access provided by University of Newcastle, Australia

Article

A High-Throughput DNA Sequencing Approach to Determine Sources of Fecal Bacteria in a Lake Superior Estuary Clairessa M Brown, Christopher Staley, Ping Wang, Brent Dalzell, Chan Lan Chun, and Michael J. Sadowsky Environ. Sci. Technol., Just Accepted Manuscript • DOI: 10.1021/acs.est.7b01353 • Publication Date (Web): 22 Jun 2017 Downloaded from http://pubs.acs.org on June 25, 2017

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Environmental Science & Technology is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 30

Environmental Science & Technology

A High-Throughput DNA Sequencing Approach to Determine Sources of Fecal Bacteria in a Lake Superior Estuary

1 2 3 4 5

Clairessa M. Brown1, Christopher Staley1, Ping Wang1, Brent Dalzell2, Chan Lan Chun3, and

6

Michael J. Sadowsky1,*

7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28

1

BioTechnology Institute, University of Minnesota, St. Paul, MN, 2Department of Soil, Water, Climate, University of Minnesota, St. Paul, MN, and 3Natural Resources Research Institute and Department of Civil Engineering, University of Minnesota Duluth, Duluth MN

*Address Correspondences to: Dr. Michael J. Sadowsky, 1479 Gortner Ave., 140 Gortner Labs, BioTechnology Institute, University of Minnesota. St. Paul, MN 55108 USA; Email: [email protected]; Phone (612) 624-2706

Keywords: next-generation sequencing, high-throughput sequencing, microbial source tracking, fecal pollution, Lake Superior, bacterial community structure

ACS Paragon Plus Environment

Environmental Science & Technology

29 30

Page 2 of 30

Abstract

31

Current microbial source tracking (MST) methods, employed to determine sources of fecal

32

contamination in waterways, use molecular markers targeting host-associated bacteria in animal

33

or human feces. However, there is a lack of knowledge about fecal microbiome composition in

34

several animals and imperfect marker specificity and sensitivity. To overcome these issues, a

35

community-based MST method has been developed. Here, we describe a study done in the Lake

36

Superior-St. Louis River estuary using SourceTracker, a program that calculates the source

37

contribution to an environment. High-throughput DNA sequencing of microbiota from a diverse

38

collection of fecal samples obtained from 11 types of animals (wild, agricultural, and

39

domesticated) and treated effluent (n=233), was used to generate a fecal library to perform

40

community-based MST. Analysis of 319 fecal and environmental samples revealed that the

41

community compositions in water and fecal samples were significantly different, allowing for

42

determination of the presence of fecal inputs and identification of specific sources.

43

SourceTracker results indicated that fecal bacterial inputs into the Lake Superior estuary were

44

primarily attributed to wastewater effluent, and to a lesser extent geese and gull wastes. These

45

results suggest that a community-based MST method may be another useful tool to determine

46

sources of aquatic fecal bacteria.

47

ACS Paragon Plus Environment

2

Page 3 of 30

48 49 50 51

Environmental Science & Technology

1. Introduction Microbial source tracking methods (MST) rely on the use of specific microorganisms or

52

bacterial profiles, thought to be host-associated (e.g. human, ruminant, bird), to determine

53

sources of fecal bacteria in the environment 1. The intestinal tracts of different animals harbor

54

distinct bacterial communities that vary in the presence and abundance of specific taxa 2. This

55

allows for the determination of different fecal sources by using unique bacterial profiles seen

56

within different animals’ fecal microbiomes.

57

Currently, the most common MST methods involve the use of quantitative PCR (qPCR)

58

and molecular markers (primers) that mainly target the 16S rRNA genes of presumptively host-

59

associated microorganisms. The majority of these markers target members of the genus

60

Bacteroides. However, developing qPCR markers that are both sensitive and specific has proven

61

to be challenging in several cases 3–6.

62

The introduction of high-throughput DNA sequencing (HTS) technology has allowed

63

increased use of culture-independent methods to rapidly assess bacterial community structure in

64

several different environments 2,7–10. Community-based MST methods involve the creation of a

65

library of operational taxonomic units (OTUs) present in feces from different animal types,

66

featuring source-specific microbial profiles. These libraries, referred to as fecal taxon libraries

67

(FTL), are compared with bacterial community profiles from environmental samples to find

68

shared taxa or OTUs between the two sample types.

69

Several community-based MST studies 7,8,11–13,14 have utilized the SourceTracker

70

program 15 which accepts quality-controlled sequence data, along with a list of samples either

71

identified as a potential fecal source or an environmental sink. SourceTracker uses a Bayesian

ACS Paragon Plus Environment

3

Environmental Science & Technology

72

statistical approach to determine the percentage of the bacterial community, and the probability,

73

that a potential fecal source contributed to an environment 15.

Page 4 of 30

74

While SourceTracker results include the standard deviation for an estimation of error,

75

higher confidence in the Bayesian model used to predict sources can be achieved by running

76

SourceTracker multiple times, in a bootstrap-like manner, and measuring the uncertainty of the

77

model using the relative standard deviation (RSD) 13. An important consideration with

78

community-based MST methods is determining the appropriate library size of samples for each

79

source. This can be established by using statistical power analysis, which determines whether the

80

sample size is large enough to be able to detect significant differences between sample groups.

81

The Dirichlet multinomial distribution can be used to model data during power analysis, such

82

that Type II error due to overdispersion is alleviated 16. This approach has been successfully used

83

to model microbial community analysis data 17.

84

Lake Superior, the largest freshwater lake in the world by area, is used for recreational

85

and commercial purposes. Fecal bacterial inputs onto beaches in the Lake Superior harbor area,

86

Park Point, and the St. Louis River estuary, measured by E. coli concentrations, have been found

87

to be seasonally impacted by waterfowl 18. Results obtained using qPCR in previous research

88

have also indicated that the largest point source contributor of fecal-associated bacteria to the

89

Lake Superior-Duluth Harbor was treated wastewater effluent, likely originating from two local

90

wastewater treatment plants (WWTP) 19,20.

91

In this study, we examined potential sources of fecal bacterial inputs into the Duluth-

92

Superior Harbor and St. Louis River in the Lake Superior watershed employing a community-

93

based MST approach using Illumina DNA sequencing data and SourceTracker analyses. To

94

achieve this goal, water samples were collected over two years, from seven different sites in the

ACS Paragon Plus Environment

4

Page 5 of 30

Environmental Science & Technology

95

Duluth-Superior Harbor and St. Louis River estuary in Duluth, MN, USA along with fecal

96

samples from 11 different types of animal sources and treated wastewater effluent. Samples were

97

sequenced and high-quality HTS data from animal and effluent samples were used to create a

98

FTL for community-based MST. Power analyses, done using a Dirichlet multinomial distribution

99

to model 16S community data, were used to determine whether an appropriate library size had

100

been obtained for the entire FTL. Multiple SourceTracker runs allowed fecal sources to be

101

identified, along with RSD values which allowed for an estimation of confidence to be assigned

102

to the predicted proportions of fecal sources. Results of this study show that a community-based

103

MST approach, with an appropriately-sized FTL, could potentially be used as a tool to allow

104

watershed managers to confidently predict sources of fecal inputs into waterways.

105 106 107 108

2.1 Sample collection. Triplicate, two-liter water samples were collected just below the

109

surface from seven sites in the Lake Superior-St. Louis estuary during summer and fall 2014 and

110

the summer of 2015 (Fig. 1). These sites included the: St. Louis River site 1 in 2014 and St.

111

Louis River site 2 in 2015, Western Lake Superior Sanitary District (WLSSD) outfall, Rice’s

112

Point, Brewery Creek storm drain, Southworth Marsh storm drain, and Minnesota Point Beach

113

sites in both years (Table S1). The St. Louis River site in 2014 changed due to accessibility.

114

Water was sampled on July 23, August 20, and November 5 in 2014. In 2015, water was

115

sampled on June 16 and July 27. All water samples were transported on ice to the lab, and stored

116

at 4°C for less than 24 hours prior to filtration.

117 118

2. Methods

Fecal samples were collected from 11 different animal types from a variety of sources including farms, via wildlife managers, and from personal pet donations across Minnesota.

ACS Paragon Plus Environment

5

Environmental Science & Technology

119

Several grams of feces were collected from individual chickens, cows, production turkeys,

120

swine, beavers, gulls, Canada geese, wild and domesticated rabbits, deer, cats, and dogs.

121

Triplicate, two-liter samples of treated wastewater effluent were collected from WLSSD during

122

every sampling event.

123

Page 6 of 30

2.2 Sample processing and DNA extraction. Fecal samples were transported at 4°C (on

124

ice) and stored at -20°C until DNA could be extracted. All triplicate 2L environmental water

125

samples were pre-filtered through 5µm nitrocellulose filters (Millipore-Sigma, St. Louis, MO) to

126

remove debris and larger microorganisms and processed as previously described 21. The

127

environmental water was subsequently filtered through 0.45µm, and then 0.22µm, nitrocellulose

128

filters (Millipore-Sigma, St. Louis, MO) to capture all bacteria. Filters were transferred into 50ml

129

conical tubes containing 2ml of 0.01% sodium pyrophosphate buffer, pH 7.0 containing 0.2%

130

Tween 20 (polyethylene glycol sorbitan monolaurate) and vortexed, twice, for 3 min at room

131

temperature. Each conical tube held up to five filters from the same environmental site, and some

132

of the 0.45µm and 0.22µm filters were placed in the same tube if space allowed. The supernatant,

133

containing re-suspended cells from the filters, were transferred from the conical tubes to 1.5ml

134

Eppendorf tubes and centrifuged for 3 min at 13,300×g. Cell pellets from 0.45µm and 0.22µm

135

filters from the same sample were ultimately combined if the filters from the same sample were

136

placed in different 50ml conical tubes. Samples were stored at -20°C until DNA was extracted.

137

DNA from fecal and environmental water samples was extracted using the DNeasy PowerSoil

138

DNA extraction kit (Qiagen, CA, USA), as per kit directions.

139

2.3 PCR and DNA sequencing. PCR and DNA sequencing were performed at the

140

University of Minnesota Genomics Center (St. Paul, MN, USA) using primers F784 (5’

141

RGGATTAGATACCC 3’) 22 and 1046R (5’ CGACRRCCATGCANCACCT 3’) 23 , targeting

ACS Paragon Plus Environment

6

Page 7 of 30

Environmental Science & Technology

142

the V5 and V6 regions of the 16S rRNA gene. Sequencing was done using the dual indexing

143

method as previously described 24. Amplicons were paired-end sequenced on the Illumina HiSeq

144

2000, HiSeq 2500 (150bp), and MiSeq (300bp) platforms (Illumina, San Diego, CA).

145

Sequencing results can be accessed from GenBank under BioProject PRJNA377760.

146

2.4 Processing of sequence data. All sequencing data obtained from the Illumina MiSeq

147

platform runs were trimmed to 150 bp to match the run length obtained from Illumina HiSeq

148

runs. All sequence processing was performed using QIIME versions 1.8.0 and 1.9.1 software 25.

149

Illumina adapter contamination and low quality base regions were removed using Trimmomatic

150

v. 3.2 26. Primers, homopolymers >8, and reads smaller than 75% of the amplicon length were

151

removed using Pandaseq 27. This program was also used to concatenate reads using the fastq-join

152

script 28. Chimeras were removed using UCHIME 6.1 29. Open reference OTUs were grouped

153

using uclust, at 97% identity, and compared to the SILVA ver. 119 reference database 30,31 using

154

the PyNAST alignment algorithm 25. Taxonomy was assigned utilizing the RDP Classifier, with

155

an 80% bootstrap value 32. Singletons were removed from the dataset and all data that passed

156

quality control were used for statistical analysis. The final dataset was comprised of DNA

157

sequences from 20 beavers, 14 cats, 16 chickens, 32 cows, 19 deer, 17 dogs, 25 geese, 14 gulls,

158

18 wild and domesticated rabbits (treated as a single source), 18 swine, 18 turkeys, and 22

159

effluent samples.

160

2.5 Statistical analyses. Statistical analyses were done using QIIME v. 1.8.0 25, RStudio

161

v. 0.99.896, R v. 3.2.1 33 and mothur v. 1.34.0 34. After sequence processing, multiple rarefaction

162

depths were evaluated by random sampling of sequences so that the number of observed taxa

163

were close to the number of expected taxa. A final sequence depth of 25,000 was chosen for all

164

subsequent statistical analysis. Bray Curtis dissimilarity 35 was calculated in mothur and was

ACS Paragon Plus Environment

7

Environmental Science & Technology

Page 8 of 30

165

used in principal coordinates analysis (PCoA) and analysis of molecular variance (AMOVA) 36.

166

Hierarchical clustering was calculated in the R package pvclust in RStudio 37 using the UPGMA

167

method. Clustering was performed on a Bray Curtis dissimilarity matrix with 1,000 bootstrap

168

iterations containing the averages of family-level taxa abundances from all sample types.

169

Community-based MST was done using SourceTracker v. 1.0 15 through QIIME v. 1.9.0 25.

170

While family-level taxa tables rarefied to a sequencing depth of 25,000 were used,

171

SourceTracker was run with default parameters five independent times on the same FTL.

172

Spearman rank correlations were performed in RStudio to determine the correlation between

173

predicted source proportions obtained via SourceTracker and RSD values determined from the

174

five independent runs of the SourceTracker program.

175

RSD analysis was performed to estimate confidence in SourceTracker proportions and

176

was calculated by using the average standard deviations of a sample across the five independent

177

SourceTracker runs. This value was divided by the average predicted source proportions of that

178

same sample obtained from the five independent SourceTracker runs.

179

Power analyses were performed on all animal fecal samples to determine whether

180

sufficient samples from each source type were present in the library to avoid increased statistical

181

Type II error. Library size analysis was done by reducing the number of individuals within each

182

source type (ranging from 14 to 32) and then running power analysis using the R HMP package

183

to determine a minimum size of the library 16. If the source type did not have as many samples as

184

required, all samples were used.

185 186 187 188 189 190

ACS Paragon Plus Environment

8

Page 9 of 30

191 192 193 194

Environmental Science & Technology

3. Results 3.1 Bacterial community structure among water and fecal samples Water samples from seven different locations in the Duluth-Superior Harbor and St.

195

Louis River estuary in Duluth, MN and animal fecal samples were collected during 2014 and

196

2015. The V5 and V6 regions of the 16S rRNA gene were sequenced from 319 environmental

197

lake water, fecal, and treated wastewater effluent samples yielding nearly 60 million reads. The

198

233 fecal samples were obtained from 12 different domesticated, agricultural, and wild animal

199

types, as well as wastewater effluent, and together the source sequences comprised the FTL used

200

for community-based MST.

201

The average sequencing coverage for water and feces was 99%, and ranged from 97% to

202

100% (Table S2). Feces, on average, had lower diversity than did lake water (Table S2). Feces

203

had, on average, a Shannon index of 4.2 and 1,672 OTUs that clustered at 97% similarity, while

204

freshwater samples had an average Shannon index of 4.8 and 3,272 OTUs clustering at 97%

205

similarity (Table S2).

206

Hierarchical clustering showed that the environmental water and animal samples

207

clustered separately, indicating the microbiota in feces is different from that found in water (Fig.

208

2A). The most common microbiota found in environmental water communities included

209

members of the families Sporichthyaceae, LD12, and Comamonadaceae (Fig. 2B). In contrast,

210

members of the families Erysipelotrichaceae, Peptostreptococcaceae, and Ruminococcaceae

211

were prevalent in the treated wastewater effluent samples, and animal fecal samples were

212

dominated by members of the families Ruminococcaceae, Lachnospiraceae, and

213

Peptostreptococcaceae (Fig. 2B). Nearly 30% of the sequencing reads from rabbits, swine, deer,

214

cows, and beavers were classified as Ruminococcaceae and Lachnospiraceae. Cats, dogs, and

ACS Paragon Plus Environment

9

Environmental Science & Technology

Page 10 of 30

215

chickens had nearly equal amounts of those families, as well as several other families, which

216

explains why their samples clustered together (Fig. 2A). While geese had nearly equal amounts

217

of Ruminococcaceae and Lachnospiraceae, which was similar to cats, dogs, and chickens,

218

several of the families that make up the community structure were relatively different (Fig. 2A).

219

Similar to what was found with hierarchical clustering, PCoA revealed that fecal and

220

environmental water samples clustered separately (Fig. 3). AMOVA pairwise-comparisons

221

identified significantly different (p