Integrated Online System for a Pyrosequencing-Based Microbial

Jul 22, 2011 - monitoring studies due in large part to difficulties in handling massive data ..... system, which is available free on our web server (...
0 downloads 0 Views 1MB Size
ARTICLE pubs.acs.org/est

Integrated Online System for a Pyrosequencing-Based Microbial Source Tracking Method that Targets Bacteroidetes 16S rDNA Tatsuya Unno,† Doris Y. W. Di,† Jeonghwan Jang,† Yae Seul Suh,† Michael J. Sadowsky,§,|| and Hor-Gil Hur*,†,‡ Department of Environmental Science and Engineering and ‡International Environmental Research Center, Gwangju Institute of Science and Technology, Gwangju 500-712, Republic of Korea and § Department of Soil, Water, and Climate, and BioTechnology Institute, University of Minnesota, St. Paul, Minnesota 55108, United States )



bS Supporting Information ABSTRACT: Genotypic microbial source tracking (MST) methods are now routinely used to determine sources of fecal contamination impacting waterways. We previously reported the development of a pyrosequencing-based MST method that assigns contamination sources based on shared operational taxonomic units (OTUs) between fecal and environmental bacterial communities. Despite decreasing sequencing costs, pyrosequencing-based MST approaches are not used in routine water quality monitoring studies due in large part to difficulties in handling massive data sets and difficulties in determining sources of fecal contamination. In the studies presented here we describe the development of an online MST tool, PyroMiST (http://env1.gist.ac. kr/∼aeml/MST.html) that uses total bacterial or Bacteroidetes 16S rDNA pyrosequencing reads to determine fecal contamination of waterways. The program cd-hit was used for OTU assignment and a Perl script was used to calculate the number of shared OTUs. The analyses require only a small number of pyrosequencing reads from environmental samples. Our results indicate that PyroMiST provides a user-friendly web interface for pyrosequence data that significantly reduces analysis time required to determine potential sources of fecal contamination in the environment.

’ INTRODUCTION The use of microbial source tracking (MST) tools to determine sources of fecal bacteria in the environment is becoming widely used in a variety of aquatic environments.1 Members of the order Bacteroidales have been widely used as targets for microbial source tracking.2 Primers specific for the phylum Bacteroidetes, designed using an ARB software program (Ludwig and Strunk, Munich, Germany),3 have been found useful to detect the phylum Bacteroidetes species originating from human and animal feces.4 Due to issues associated with cross-reactivity and sensitivity, several studies have suggested using a toolbox approach for MST studies.5,6 Recently, we suggested that the simultaneous use of multiple source tracking taxa, defined by using pyrosequencing-derived shared operational taxonomic units (OTUs),7 would prove more useful in identifying sources of fecal contamination than studies done using a single microbial taxon, such as Bacteroides and Prevotella. This internal, shared OTU, toolbox approach is based, in large part, on our results from pyrosequencing analyses of whole bacterial communities in fecal and environmental samples done using a 454 GS FLX Titanium Sequencing System (Roche, Basel, Switzerland).7 Most of the fecal samples comprised bacteria belonging to the phyla Bacteroidetes and Firmicutes, whereas the majority of bacteria identified in environmental samples included members of the phylum Proteobacteria. r 2011 American Chemical Society

Table 1. Number of Reads Assigned to Each Phylum by the Mothur Program no. of reads classified to phylum source

bacteroidetes

proteobacteria

unclassified

human

3795

22

66

chicken

2843

0

0

duck

848

74

0

goose

286

0

0

beef cattle dairy cattle

4041 7859

0 0

0 0

swine

3738

0

0

720

2

2

freshwater

In our previous OTU-based MST study,7 we used the mothur program8 for trimming, aligning, screening, filtering, and Special Issue: Ecogenomics: Environmental Received: April 21, 2011 Accepted: July 22, 2011 Revised: June 23, 2011 Published: July 22, 2011 93

dx.doi.org/10.1021/es201380c | Environ. Sci. Technol. 2012, 46, 93–98

Environmental Science & Technology

ARTICLE

Figure 1. Calculation flow of the batch program used in this study.

clustering pyrosequencing reads. Although mothur offers a wide array of useful sophisticated functions, it lacks a user-friendly interface and requires long process times for the majority of personal computers. In contrast, cd-hit9 provides an ultrafast sequence clustering method, using a short word filtering algorithm, and is especially suitable for analyses of large data sets for sequence comparisons. Cd-hit has been widely used to create gene and protein clustering data sets, including UniRef10 and SMART.11 Recently, cd-hit has proven to be useful for the analyses of metagenomes.12,13 Although PANGEA14 and QIIME15 also offer sets of user-defined batch programs to process large pyrosequencing-based metagenomic data sets, they are cumbersome to use due to their command line-based operation. In the study described here we report on the development of an easy-to-use, practical, online tool for pyrosequencing-based MST studies that uses ultrafast DNA sequence clustering software and fecal Bacteroidetes specific primers.

(10 individuals from 3 farms) that were raised in the Yeongsan River basin in South Korea. Fecal samples from the same source animals were pooled, and fecal DNA was extracted from 250 mg of each pooled fecal sample. DNA was also extracted from three 50-mL aliquots of nonfecal contaminated freshwater (NF) from the Gwangju River, a tributary of the Yeongsan River with low E. coli counts (42 CFU/100 mL) (NF); human feces (HF) (10 mg) dissolved in 50 mL of NF water; and mixed feces (MF) consisting of 100 mg of human feces and 10 mg of feces from chicken, duck, goose, beef cattle, dairy cattle, and swine dissolved in 50 mL of a NF water sample. DNA sequencing was performed by Macrogen Incorporation (Seoul, Korea) using 454 GS FLX Titanium Sequencing System (Roche, Basel, Switzerland), according to the manufacturer’s instructions. Processing Pyrosequencing Data with Mothur. Mothur version 1.17.08 was used to obtain the number of reads, OTUs, and species richness/evenness indices (Table S1). Taxonomic classification was done prior to sequence trimming in order to confirm the specificity of the Bacteroidetes primers (Table 1). Ribosomal Database project (http://pyro.cme.msu.edu/) was used to obtain sequence length histogram (Figure S1), and pyrosequence data were processed using the mothur subroutines trim.seqs (with a minimum length of 350 nucleotides, no ambiguous nucleotides, and maximum homopolymer 8), align.seqs, and filter.seqs. Chimeric sequences were also removed by using mothur. Batch Program and Online Tool Development. The online tool package described here was developed by integrating three

’ MATERIALS AND METHODS Sample Preparation and Pyrosequencing. Barcoded-pyrosequencing of animal and human fecal samples was done using Bacteroidetes specific primers 50 -AACGCTAGCTACAGGCTTAACA-30 16 and 50 -CAATATTCCTCACTGCTGCCTCCCGTA-30 .17 Briefly, fecal swabs were obtained from 30 healthy human individuals (H), and 30 chickens (C), ducks (D), geese (G), beef cattle (BC), dairy cattle (DC), and swine (S) 94

dx.doi.org/10.1021/es201380c |Environ. Sci. Technol. 2012, 46, 93–98

Environmental Science & Technology

ARTICLE

Table 2. Comparison of Percentages of Mothur-Defined Non-Shared Operational Taxonomic Units (OTUs) between Bacteroidetes 16S rDNA and Total 16S rDNA Pyrosequencing Data percentage of nonshared OTUs among source

bacteroidetes 16S rDNA

total 16S rDNA

human

85.0

90.1

chicken

94.8

89.6

duck

78.2

80.0

goose

76.9

82.2

beef cattle

70.7

78.2

dairy cattle

82.1

78.7

88.1 82.2/7.9

92.5 84.5/6.1

swine mean/deviation

Figure 2. Comparison of rarefaction curves between universal primer and Bacteroidetes specific primer for environmental samples. Legend: NF, freshwater sample, and inset for “Urban”, “Open”, and “Agricultural” is calculated using data obtained from our previous study.7

Table 3. Outputs Obtained from the PyroMiST Online Tool for Raw Pyrosequencing Data Obtained from Non-Fecal Contaminated Sample (NF), Human Fecal Contaminated Sample (HF), and Mixed Fecal Contaminated Sample (MF) sources

site: NF; total OTUs: 208; fecal OTUs: 4% % specificity % contamination

human (157)a

0

chicken (388)

0

0

duck (47)

0

0

goose (58)

0

0

beef cattle (517)

15

2

dairy cattle (615) swine (275)

0 85

1 2

sources

site: HF; total OTUs: 79; fecal OTUs: 67% % specificity % contamination 99

66

chicken (388)

0

5

duck (47) goose (58)

0 0

3 6

beef cattle (517)

0

9

dairy cattle (615)

1

5

swine (275)

0

14

site: MF; total OTUs: 390; fecal OTUs: 63% % specificity % contamination

human (157)

27

24

chicken (388)

2

4

duck (47)

0

2

goose (58)

0

2

13

23

dairy cattle (615)

10

18

swine (275)

49

30

beef cattle (517)

a

0

human (157)

sources

samples. All sequences were grouped by sample names given by barcode information and barcode and primer sequence were removed. Trimmed sequences were combined with a total 16S rDNA sequence read archive (http://www.ncbi.nlm.nih.gov/ sra) database (SRA ERP000189) or a 16S rDNA database for Bacteroidetes 16S rDNA (ERP000580) obtained from human and animal feces. Chimeric sequences in both database were removed by using mothur, and 16S rDNA sequences obtained from geese feces were removed from the database as they potentially contained large amounts of 16S rDNA from environmental bacteria.7 The program cd-hit-est9 was used to define OTUs, and shared OTUs between environmental samples and fecal samples originating from humans, chickens, ducks, beef cattle, dairy cattle, and swine. Shared OTUs were further subdivided into source-specific and non-source-specific categories. Source-specific shared OTUs are defined as OTUs strictly shared between an environmental sample and one fecal source. In contrast, non-source-specific shared OTUs are shared by >1 fecal source. The number of source-specific shared OTUs was normalized by the number of OTUs in each fecal sample and expressed as % specificity. In addition, the % contamination was determined by including nonsource-specific shared OTUs. The percentage of fecal OTUs in an environmental sample was also calculated for quantification of fecal contamination (Table 3).

’ RESULTS AND DISCUSSION Diversity of Bacteroidetes in Fecal and Environmental Samples. Results in Figure 2 show that when 16S rDNA of

Bacteroidetes species were targeted in pyrosequencing runs, fewer reads were required to saturate a rarefaction curve of a nonfecal contaminated freshwater sample (NF) in comparison to freshwater samples obtained from urban, open, and agricultural areas.7 Similarly, the saturation of rarefaction curves for each fecal source required far fewer reads of 16S rDNA from Bacteroidetes species (Figure S2) in comparison to our previous study using total 16S rDNA.7 For instance, more than 10 000 reads were required to saturate a rarefaction curve for human fecal sample when total 16S rDNA was targeted,7 whereas only approximately

Numbers in parentheses refer to number of OTUs.

major functions: sequence trimming, clustering, and defining shared OTUs (Figure 1). This requires primer and barcode sequence information used for pyrosequencing and FASTA formatted pyrosequencing results obtained from the environmental 95

dx.doi.org/10.1021/es201380c |Environ. Sci. Technol. 2012, 46, 93–98

Environmental Science & Technology

ARTICLE

Figure 3. Relative abundance of operational taxonomic units classified to each taxon at the Family level. Legend: H, human; C, chicken; D, duck; G, goose; BC, beef cattle; DC, dairy cattle; and S, swine.

Figure 4. Comparison of mothur and cd-hit in determining shared OTUs in nonfecal contaminated (NF), human fecal contaminated (HF), and mixed fecal contaminated (MF) environmental samples.

3000 reads were required for the saturation of rarefaction curve for human fecal sample in this study. The more limited number of pyrosequencing reads required for correct assignment of potential fecal sources coupled with rapidly declining sequencing costs makes this method a practical reality for MST studies. Results in Table 1 show the specificity of the Bacteroidetes primers used in this study, more than 95% of OTUs were correctly assigned to the phylum Bacteroidetes. Pyrosequence analyses also indicated that taxa other than members of the genera Bacteroides and Prevotella were also observed in the data sets (Figure 3). Analysis of Operational Taxonomic Units (OTUs). Amend et al.18 suggested that within species read abundances obtained by 454 pyrosequencing may be biased due to problems associated with PCR amplification. Therefore, rather than using the number of read difference between samples as a deterministic value of the relatedness of samples, it is more proper to examine the number of shared OTUs corresponding to the species level (at a 97% cutoff value) to determine if metagenomic sequences in two samples are similar.19 Results in Table 2 show that the percentages of fecal source-specific OTUs obtained from Bacteroidetes 16S rDNA pyrosequencing results done in this study were similar to the pyrosequencing results obtained from the total bacterial 16S rDNA in fecal and environmental samples (107 394 reads) in the short read archives ERP000189, suggesting that the use of Bacteroidetes specific primer did not decrease discrimination power among fecal sources, although total number of OTUs was decreased. We previously used the mothur program in a MST method to define shared OTUs between fecal and environmental samples.7 However, with the CPU we used (AMD Athlon Dual Core Processer 3.1 GHz), mothur took more than 9 hours to trim, align, screen, and filter pyrosequencing reads to calculate percentage of shared OTUs in SRA ERP000189. In contrast, cd-hit, an ultra fast clustering software, took 5 min to determine the number of shared OTUs in pyrosequencing data of Bacteroidetes 16S rDNA in feces from animals and those present in the environmental samples. The percentage of shared OTUs with each fecal sample in Bacteroidetes 16S rDNA pyrosequencing data from nonfecal contaminated freshwater (NF) obtained using mothur and cd-hit are very similar (Figure 4). Analyses done using the two programs also showed that while almost no OTUs were shared between the nonfecal (NF) and fecal source samples, a large number of shared OTUs were observed between 16S rDNA in human feces and those present in freshwater samples contaminated with human fecal material (HF) (Figure 4). In contrast, more fecal OTUs were detected by

cd-hit than mothur in HF and mixed fecal contaminated samples (MF), although the proportion of source-specific shared OTUs in each environmental sample detected by mothur and cd-hit were similar. Our results also suggest that there is no significant correlation between the number of shared OTUs and the amount of fecal materials. These differences, however, are not critical for MST studies and the cd-hit method described here is useful for comparative fecal contamination studies and routine water quality monitoring. Online User Interface Application. Due to the difficulty in processing massive amounts of pyrosequencing data, web user interfaces, such as the Ribosomal Database project (http://pyro. cme.msu.edu/) and the Green Gene Database project (http:// greengenes.lbl.gov/cgi-bin/nph-index.cgi) have been developed to trim, align, cluster and statistically analyze pyrosequencing data. These web applications, however, require input and use of various parameters for each step of analysis, which may prove to be difficult for casual users. To more easily apply pyrosequencing technology for automatic MST analysis, we developed a user-friendly simple online package PyroMiST (http://env1.gist.ac.kr/∼aeml/MST.html), which requires only fasta files containing pyrosequencing results and sequence information used for primers and barocodes. The PyroMiST only requires a short processing time (typically