Fecal Source Tracking in Water by Next-Generation Sequencing

Specificity testing on the candidate regions was performed using 30 E. coli isolates from .... A combination of PhP typing and β- d -glucuronidase ge...
0 downloads 0 Views 893KB Size
Article pubs.acs.org/est

Fecal Source Tracking in Water by Next-Generation Sequencing Technologies Using Host-Specific Escherichia coli Genetic Markers Ryota Gomi,† Tomonari Matsuda,‡ Yasuto Matsui,† and Minoru Yoneda*,† †

Department of Environmental Engineering, Graduate School of Engineering, Kyoto University, Katsura, Nishikyo-ku, 615-8540, Kyoto, Japan ‡ Research Center for Environmental Quality Management, Kyoto University, 1-2 Yumihama, Otsu, 520-0811, Shiga, Japan S Supporting Information *

ABSTRACT: High levels of fecal bacteria are a concern for the aquatic environment, and identifying sources of those bacteria is important for mitigating fecal pollution and preventing waterborne disease. Escherichia coli has been used as an indicator of fecal pollution, however less success has been achieved using this organism for library-independent microbial source tracking. In this study, using next-generation sequencing technology we sequenced the whole genomes of 22 E. coli isolates from known sources (9 from humans, 2 from cows, 6 from pigs, and 5 from chickens) and identified candidate host-specific genomic regions. Specificity testing on the candidate regions was performed using 30 E. coli isolates from each source. Finally, we identified 4 human-, 2 cow-, 3 pig-, and 4 chicken-specific genetic markers useful for source tracking. We also found that a combination of multiplex PCR and dual index sequencing is effective for detecting multiple genetic markers in multiple isolates at one time. This technique was applied to investigating identified genetic markers in 549 E. coli isolates obtained from the Yamato River, Japan. Results indicate that humans constitute a major source of water contamination in the river. However, further work must include isolates obtained from geographically diverse animal hosts to make this method more reliable.



(e.g., DNA fingerprint patterns) characteristics are used in library-dependent methods.3,12−14 In contrast, library-independent methods do not require construction of libraries and target a particular feature, such as the sequence of a certain DNA fragment, of a specific bacterial species. Libraryindependent methods are currently popular because they are more rapid and less costly than library-dependent methods.15 Moreover, library-independent methods are known to be both sensitive and specific for detecting fecal contamination of various host species including human, cow, gull, ruminant, pig, and horse.16,17 Among library-independent methods, those targeting Bacteroidales 16S rRNA gene using molecular techniques such as polymerase chain reaction (PCR) or those that are quantitative PCR-based have been intensively investigated because members of the Bacteroidales group are host-specific, abundant in animal feces, and survive for only short periods of time after release from their hosts.4,18−20 The Bacteroides−Prevotella group is also known to have a tendency to coevolve with their hosts,21 which is the reason the group has host-specific markers. However, Harwood et al. reported that there is no library-independent method in use that is based on host-specific E. coli markers even though E. coli has been used and studied as an indicator of fecal pollution for many years.4

INTRODUCTION Fecal contamination of water increases the risk of waterborne disease, and it is a serious problem in many countries.1−3 However, monitoring for all pathogens in environmental waters is unrealistic due to the great diversity of pathogens.4 The approach to this problem has been to monitor for fecal indicator bacteria such as fecal coliform, Escherichia coli, and enterococci bacteria.5−7 For many years, these bacteria have been detected and counted to assess the water quality, but the presence of these bacteria does not provide any information as to the originating host source.8 The human risk from domestic/ agricultural animal feces is assumed to be less than that from human feces. On the other hand, certain waterborne bacteria from nonhuman sources are known to infect humans.9 Therefore, accurate and reliable fecal source identification methods are essential to better understand the source of fecal contamination and to reduce the risk of waterborne disease. A number of methods for identifying the source of fecal pollution have been developed and evaluated.4,10,11 The basic premise of the methods is that certain fecal microorganisms are associated with particular hosts and that identified genetic or phenotypic characteristics of these microorganisms can be used as markers for identifying fecal contamination from the host.4 These methods can be divided into library-dependent and library-independent methods.9 Library-dependent methods require construction of known source libraries to differentiate environmental microorganisms from different hosts. Both phenotypic (e.g., antibiotic resistance profile) and genotypic © 2014 American Chemical Society

Received: Revised: Accepted: Published: 9616

April 24, 2014 July 15, 2014 July 23, 2014 July 23, 2014 dx.doi.org/10.1021/es501944c | Environ. Sci. Technol. 2014, 48, 9616−9623

Environmental Science & Technology

Article

the dark at 4 °C and processed within 12 h. Septic tank samples were processed using the membrane filter method with XM-G agar (Nissui, Tokyo, Japan). Fecal samples and rectal swabs were directly streaked onto XM-G agar. After overnight incubation at 36 °C, colonies with an E. coli profile (blue colonies) were selected and restreaked on fresh XM-G agar, and incubated at 36 °C. Streaking on XM-G agar and incubation were repeated until pure isolates were obtained. Obtained isolates were cultured in LB broth and stored at −85 °C in 35% glycerol. Up to three isolates were obtained from a single individual. The environmental E. coli strains used in this study were isolated from water samples from the Yamato River flowing from Nara prefecture to Osaka prefecture. In total, 27 water samples were collected from 10 sites in this river in 2011, 2012, and 2013 (Supporting Information (SI) Figure S1). The collected samples were processed by the membrane filter method to obtain E. coli isolates as described above. E. coli isolates were stored at −85 °C in 35% glycerol. E. coli Selection and DNA Extraction for Sequencing. The rep-PCR analyses were performed with 148 E. coli isolates from known sources using the procedure described by Dombek et al.12 In short, E. coli cells were grown overnight in LB broth and then gently treated with 0.05 N NaOH to release their total DNA. The total DNA in solution was separated from cell debris by centrifugation. The supernatants containing total DNA were used as templates in PCR amplification for rep-PCR DNA fingerprinting. The rep-PCR fingerprints were obtained by using primer BOX A1R (5′-CTACGGCAAGGCGACGCTGACG-3′). Following amplification, the PCR amplicons were electrophoresed, and the gel images were obtained using a Molecular Imager Gel Doc XR+ (Bio-Rad, Hercules, CA). E. coli isolates whose band patterns were different from the others within the source were randomly chosen for DNA sequencing to avoid sequencing isolates that had the same genome sequences. Nine isolates from humans, two isolates from cows, six isolates from pigs, and five isolates from chickens were selected for whole genome sequencing. DNA was extracted from each E. coli isolate by the DNeasy Blood and Tissue Kit (Qiagen, Hilden, Germany). DNA Sequencing. Nextera DNA libraries were prepared for DNA sequencing on an illumina MiSeq sequencer according to the Nextera DNA Sample Preparation Guide. Sequencing was performed according to the MiSeq System Quick Reference Guide. Each library was sequenced for 151 cycles on the MiSeq. Comparative Genome Analysis. Sequence reads from each sample were analyzed with CLC Genomics Workbench (CLC Bio, Aarhus, Denmark). Reads were initially trimmed to remove low-quality reads and mapped against the K-12 W3110 genome to remove the core genome. Second, unmapped reads were assembled into contigs for each sample. Contigs more than 1000 bp long were aligned and compared between samples by BLAST searches of CLC Genomics Workbench, and sequences that were found in samples from only one source were defined as candidate host-specific genomic regions. Specificity of these regions was tested in subsequent analyses. Specificity Testing. For specificity testing, 30 E. coli isolates from each source were randomly selected and mixed within the source. DNA was extracted from each mixture sample by the DNeasy Blood and Tissue Kit. Nextera XT DNA libraries were prepared according to the Nextera XT DNA Sample Preparation Guide. Each library was sequenced for 251

This may be because E. coli is not an ideal target for microbial source tracking as suggested by some previous studies. Studies based on rep-PCR, denaturing gradient gel electrophoresis (DGGE), and other molecular methods could not reliably pinpoint the source of fecal contamination using E. coli.10 Other studies have suggested there is within-household transmission of E. coli clones between humans and pets.22,23 The latter studies are important because the results mean that those E. coli strains can colonize multiple host species. However, Harwood et al. also reported that this may change because molecular techniques provide increasing power for examining genetic differences in bacterial groups. Actually, some studies have shown promise regarding this point. Clermont et al. found evidence for a human-specific E. coli clone,24 and EscobarParamo et al. identified a group of animal-specific strains.25 Khatib et al. found evidence of a biomarker specific to cattle26 and swine27 enterotoxigenic E. coli. Luo et al. found evidence of several candidate genomic islands specific to E. coli of either environmental or enteric origin.28 Although there are some drawbacks in using E. coli as a target for microbial source tracking, we selected this bacterium because E. coli is still widely used as an indicator of fecal pollution and identifying the source of E. coli will help produce efficient water quality management. In the present study, we sequenced the whole genomes of 22 E. coli isolates from different sources (9 from humans, 2 from cows, 6 from pigs, and 5 from chickens) using next-generation sequencing technology to identify host-specific genomic regions. In the process of identifying host-specific genomic regions, we focused on the E. coli accessory genes, which are a set of genes that do not comprise the core genome. The core genome is defined as orthologous genes that are conserved in all strains of a species29 (less than 2000 genes in the case of E. coli30−33). We compared the accessory genes in isolates from different sources and identified the respective host-specific genomic regions. Furthermore, we designed primer sets targeting parts of identified regions and conducted specificity testing on the designed primer sets (we defined primer target sites as markers). A method for analyzing multiple genetic markers in multiple isolates at one time was also developed for investigating identified host-specific genetic markers in environmental E. coli.



MATERIALS AND METHODS E. coli Strains. E. coli strains were isolated from various sources (Table 1). The human isolates were obtained from a septic tank in an official residence in Nara prefecture, Japan. There were 50 people living in the residence, and pets were not allowed. Cow fecal samples were collected in a farm in Nara prefecture, and rectal swabs of pigs and chickens were obtained in another farm in Nara prefecture. All samples were stored in Table 1. E. coli Isolates Used in This Study source human cow pig chicken river water

number of individuals sampled 1 11 10 11

number of isolates

year

54 2 31 30 31 251 140 158

2011 2011 2013 2011 2011 2011 2012 2013 9617

dx.doi.org/10.1021/es501944c | Environ. Sci. Technol. 2014, 48, 9616−9623

Environmental Science & Technology

Article

Figure 1. Protocol for the first and second PCR. An example of H8 amplification is shown in this figure. This procedure yields a library of amplification products that contain the indexes and adapters. Amplification products after the second PCR were then sequenced on the MiSeq.

cycles on the MiSeq, and sequence reads from each sample were analyzed with CLC Genomics Workbench. Reads were trimmed to remove low-quality or short sequence reads and mapped against the K-12 W3110 genome. Then, unmapped reads were mapped against the genomic regions identified by the comparative genome analysis. Analysis of Marker Possession Patterns of E. coli Isolates from Known Sources. Primers were designed for genomic regions that were found specific by specificity testing. Real-time PCR assay was performed on each E. coli isolate selected for specificity testing using each primer set designed above to analyze the marker possession patterns of each isolate. For the real-time PCR assays, PCR mixture (15 μL) was composed of 7.5 μL of 2× QuantiFast SYBR green PCR Master Mix (Qiagen), 0.3 μL each of forward and reverse primers (50 μM), and 6.9 μL of cell suspension. All PCR reactions were performed in a 96-well Hi-Plate for Real Time (Takara, Otsu, Japan) with Thermal Cycler Dice Real Time System 2 (Takara). The reactions were carried out by incubation at 95 °C for 5 min, followed by 40 cycles consisting of 95 °C for 10 s and 60 °C for 30 s. A melting curve analysis was performed after the amplification. Analysis of Genetic Markers of Environmental E. coli Isolates. To analyze multiple markers in multiple strains isolated from environmental water, a method employing multiplex PCR and dual index sequencing was developed (Figure 1). Primers for multiplex PCR were designed to amplify 2 housekeeping genes (adk and trpA) and 13 host-specific markers. The primer sequences to amplify and add adapter sequences to target markers were 5′-TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG-(forward primer sequence in Table 2)-3′ and 5′-GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG-(reverse primer sequence in Table 2)-3′, and these primers were named inner primers. The primer sequences to amplify and add P5 amplification primer sequence (P7 amplification primer sequence) and index sequences to

adaptered amplicons were 5′-AATGATACGGCGACCACCGAGATCTACAC-(Index1)-TCGTCGGCAGCGTCAGATGT-3′ and 5′-CAAGCAGAAGACGGCATACGAGAT(Index2)-GTCTCGTGGGCTCGGAGATG-3′, and these primers were named outer primers (note that the design of the outer primers was actually not optimum. It would be better to remove AGATGT and AGATG from the 3′ ends of outer primers in order to reduce nonspecific amplicons). To differentiate more than 600 samples, 28 kinds of primer containing different sequences for Index1 and 24 kinds of primer containing different sequences for Index2 were designed. The length of each index was 8 bp. Multiplex PCR was performed on each environmental E. coli isolate using the multiplex mixture of 15 sets of inner primers. PCR mixture (14.4 μL) was composed of 7.5 μL of 2× QuantiFast SYBR green PCR Master Mix (Qiagen), 4.9 μL of the multiplex mixture of inner primers (500 nM), and 2 μL of cell suspension. All PCR reactions were performed in a 96-well Hi-Plate for Real Time (Takara) with Thermal Cycler Dice Real Time System 2 (Takara). The reactions were initiated by incubation at 95 °C for 5 min, and this was followed by 25 cycles of 95 °C for 10 s and 60 °C for 30 s. After the reactions, 0.3 μL each of the outer primers (50 μM) was added into each well, and next reactions were performed. A different combination of outer primers was used for each well to differentiate samples. The reactions were carried out by incubation at 95 °C for 5 min, followed by 25 cycles of 95 °C for 10 s and 60 °C for 30 s. After the reactions, 3 μL of each PCR product was transferred to one tube and mixed well. Mixture of PCR products was purified using AMPure XP beads (Beckman Coulter Inc., Brea, CA). Seven microliters of the purified sample was then electrophoresed in a 1.5% agarose gel, and agarose containing DNA fragments between 200 and 500 bp in length was excised with a clean razor blade to remove nonspecific PCR products. DNA fragments in agarose were purified using Quantum Prep Freeze ‘N Squeeze DNA Gel 9618

dx.doi.org/10.1021/es501944c | Environ. Sci. Technol. 2014, 48, 9616−9623

Environmental Science & Technology

Article

Table 2. Primer Sets Designed in This Study target region H1 H4 H5 H6 H7 H8 H9 H12 H13 H14 H15 H24 H26 H33 Co1 Co2 Co3 P1 P3 P4 Ch2 Ch3 Ch5 Ch7 Ch9 Ch10 Ch12 Ch13 Ch14 adk

gene or featurea

sequence (5′-3′)

product size (bp)

F-ATATCGTTGGTGTTTACTGC R-CTCTGATTTGTGCGGCAT F-CCTGGCAATTCTCATTTTCT R-ACGAACTACACCTGAGATTA F-TTCCTGATGACGCGTACC R-AAAGCGGGTGATCATGTC F-GGCTACAACAATCACCGTT R-TGCTGATGACGATGAGTGG F-TTACGGTGCTTGTTCTGCT R-TTGGTAACCGGCATAGACT F-ACAGTCAGCGAGATTCTTC R-GAACGTCAGCACCACCAA F-GGGTATTTTAACGCTGTTGA R-AAGTCCTTTCACTGCATTCA F-GTAAAAGGACTGCCGGGAAA R-TCAGATCGTCCTTTACCAG F-CTCTGCAATCTCACTCGC R-TGCTGCGATCATGAGTTTG F-CAGCCTGAGCGTCTTTTAC R-CGGTGGGAAAAGAAGTTGAA F-GTATTCACCACAGCTTCCAC R-GGGTAATGGCTTTGATTTTG F-CTGGTCTGGCTTTATAACAC R-ATCATTTCCACTTGTCGGG F-CCTGGAAGCGGATTTACC R-GAAAAATGGAACGACATAGG F-CTCGAGCAAGTCGTAACAT R-CTCCAGGCATTCAATTTTCT F-TTTCTCTACACGTTTCCG R-TCAAAAACCATCGCTCGCAA F-CTAATGTGAGTCCCGGAGA R-AATTTACCGTGGTGGGGA F-CGTGTTAGATCGAATGGTTA R-TGTGGAGTTTTGCTTTTTCC F-CACAGATGAGTTAAAGCCAG R-AGTGATACAGATACAGCCGA F-TCCGTTGTTAGCCAGTACA R-CCATTTAATCCGCCAATCAT F-GTGTATATAGCATTCCTCCT R-TATTCATTATTCCGCGATGG F-CATTCTTTGAGCGCATCATC R-TCCTGTCGCCATTCTTAACC F-ATCATGAGTTTATCCGACAC R-CCTTACAGCACAATTCCATT F-GATCGTGGTCATTTTGGCA R-CAAAGTTAGCGTTCCAGGTA F-ACTAACAGCAATTCCACCAT R-TTAATATCAGCGTGCCGTA F-GGGAGCCAAAAAAAACCTT R-TTGACCGTGTTGAGTTACT F-TAAATCTCAGACCACACAAC R-GAATATGCTTGTCATCCTGC F-CGTCTGCTGGATGTAATTAA R-GCGTTAAAAATTGCATGGTG F-GTAGTAATCGCCAGCCTT R-CGAGAGCAGGAAAAAGCAT F-TGGCTGTCTGTATAACTTCT R-CTGCACAATCACAGAACTCA F-ACATTATGGATGCTGGCAA R-CTTCTTTCATCGCGTCTG

278

putative phage-related DNA recombination protein

244

Tn7-like transposition protein TnsC

153

ATP-dependent metalloprotease FtsH

260

copper resistance protein D

274

heavy metal translocating P-type ATPase

177

sodium/hydrogen exchanger precursor

237

hypothetical protein

213

hypothetical phage protein

115

phospholipase D/transphosphatidylase

271

ATP/GTP-binding protein

165

PTS system enzyme II

229

methyl-thioribulose-1-phosphate dehydratase

152

mercuric transport protein

165

hypothetical protein

118

phage antiterminator Q protein

171

hypothetical protein (similar to TibA; enterotoxigenic E. coli adhesion/invasion protein)

136

putative integrase

190

F1C fimbrial usher

208

NF

137

hypothetical protein

203

hypothetical protein and toxin−antitoxin system

176

hypothetical protein

190

flagellin structural protein

191

autotransporter adhesin

193

hypothetical protein

188

putative invertase

211

type I restriction-modification system subunit R

113

putative minor fimbrial subunit protein

171

NF

130

9619

dx.doi.org/10.1021/es501944c | Environ. Sci. Technol. 2014, 48, 9616−9623

Environmental Science & Technology

Article

Table 2. continued target region trpA

sequence (5′-3′)

product size (bp)

F-GTTGCGAAGCTGAAAGAGTA R-GCATTTTCTCTGGCTCATT

166

gene or featurea

a

Target gene or feature was determined using the BLAST tool at the National Center for Biotechnology Information (NCBI) Web site (http:// blast.ncbi.nlm.nih.gov/Blast.cgi). NF = no significant similarity was found.

Extraction Spin Columns (Bio-Rad). The product was purified using AMPure XP beads (Beckman Coulter) again. The final product was sequenced on the MiSeq according to the MiSeq System Quick Reference Guide. Sequence reads were analyzed with CLC Genomics Workbench. Reads from each sample were trimmed to remove low-quality or short sequence reads and mapped against 2 housekeeping genes and 13 host-specific markers. Real-Time PCR Assays on DNA Extracts from Fecal and Water Samples. Real-time PCR assays were conducted to test the applicability of the developed markers on total DNA extracted from fecal and water samples. Detailed descriptions of the assays are shown in the SI. Nucleotide Sequence Accession Number. The DDBJ accession number for the sequences of 22 E. coli isolates is DRP002307.

Some of the markers (primer target sites) were associated with adhesins; this is in agreement with previous reports that E. coli’s ability to establish itself within different niches mainly relies on its adherence to host surfaces,34 and specific adhesive mechanisms play an important role in the attachment of E. coli to various epithelial cells.35 Moreover, we found that sequences that had low E-value BLAST hits to the identified markers tended to be from E. coli isolated from the same host. For example, the only sequence that had a BLAST hit to Co2 was part of a plasmid sequence from a bovine necrotoxigenic E. coli strain.36 The sequences that had the lowest E-value BLAST hits to Ch7 were both from avian pathogenic E. coli.37,38 These facts indicate that the identified markers may be applicable to libraryindependent microbial source tracking methods because those E. coli isolates were obtained from geographically diverse hosts. Marker Possession Patterns of E. coli Isolates from Known Sources. Marker possession patterns of E. coli isolates from known sources were analyzed using host-specific primers developed in this study (Figure 2; also see SI Table S3). Similar



RESULTS AND DISCUSSION Identification of Markers. In total, 22 E. coli isolates were selected and sequenced on the MiSeq. We removed sequence reads that were generated from the core genome by mapping reads against the K-12 W3110 genome. Results in SI Table S1 summarize the number of reads and contigs obtained from each E. coli isolate sequenced. The number of reads significantly decreased after the mapping, which means that the core genome was effectively removed. By comparative genome analysis of sequenced isolates, 36 human-specific (H1−H36), 3 cow-specific (Co1−Co3), 4 pig-specific (P1−P4), and 18 chicken-specific (Ch1−Ch18) genomic regions were identified. SI Table S2 shows the identified regions and the results of specificity testing conducted on these genomic regions. The result of the mapping against the K-12 W3110 genome is also shown in the table for reference. Genomic regions satisfying the following conditions were decided host-specific (note that the precise specificity was checked by the analysis of marker possession patterns of each E. coli isolate selected for this specificity testing): (a) The average coverage of reads that were mapped against the genomic region of the target host is more than 5 times as high as those of the nontarget hosts; (b) the average coverage of reads that were mapped against the genomic region of the target host is more than 10-fold; (c) the average coverage of reads that were mapped against the genomic region of the nontarget hosts is less than 10-fold. We determined these conditions considering the average coverage of reads mapped against the K-12 W3110 genome. However, cow-targeting regions were not subjected to these conditions because the number of cow-targeting regions was limited. Table 2 shows primers designed against parts of these host-specific regions, and 2 housekeeping genes (adk and trpA). Each primer was designed against a sequence within one specific gene for each region if possible. The sequences of identified genomic regions and target sequences for amplification are listed in the SI.

Figure 2. Heatmap of gene presence (blue) and absence (gray) in genomes of 120 E. coli isolates from known sources. The selected hostspecific markers for each source are listed in Table 2. Detailed marker possession patterns are shown in SI Table S3.

marker possession patterns were observed among some markers (e.g., H5, H6, H7, H8, H13, and H15). Therefore, one marker was selected from the markers that showed similar possession patterns for the further analyses. In total, 13 markers (H8, H12, H14, H24, Co2, Co3, P1, P3, P4, Ch7, Ch9, Ch12, and Ch13) were chosen for the application to environmental E. coli isolates. Co1 was not selected and analyzed in the further analyses because the reverse primer targeting this marker was found to form primer-dimers with adapter sequences. The percentages of isolates from known sources that possess the selected markers are shown in Table 3. Note that it is possible to increase the specificity of host-specific marker sets by selecting different markers from Table 2 and SI Table S3, which, however, leads to a decrease in the sensitivity. 9620

dx.doi.org/10.1021/es501944c | Environ. Sci. Technol. 2014, 48, 9616−9623

Environmental Science & Technology

Article

Table 3. Percentages of Isolates from Known Sources That Possess the Selected Markersa

human isolates (n = 30) cow isolates (n = 30) pig isolates (n = 30) chicken isolates (n = 30)

human-specific markers (4 markers)

cow-specific markers (2 markers)

pig-specific markers (3 markers)

chickenspecific markers (4 markers)

66.7

0

0

3.3

6.7

53.3

0

6.7

3.3

0

30.0

16.7

0

0

0

80.0

Figure 3. Proportion of 549 isolates obtained from the Yamato River in each source group.

applicable to DNA directly extracted from water samples. Therefore, real-time PCR assays were performed using DNA extracts obtained from the septic tank, cows (n = 8), pigs (n = 7), chickens (n = 7), and 6 water samples. Copy numbers of host-specific markers and uidA gene in each sample analyzed are shown in SI Table S6 (the results are discussed in detail in the SI). Results indicate that human-specific markers and chicken-specific markers developed in this study are applicable to total DNA extracted from environmental waters. Among them, H8 and Ch13 are the most promising markers due to their host-sensitivity and host-specificity. However, concentrations of E. coli in environmental waters were not high enough for the quantification of those markers. This indicates that the applicability of the developed markers to DNA directly extracted from environmental waters is limited to significantly polluted waters. Therefore, we recommend using E. coli isolates for analyzing the developed markers. In conclusion, we identified 4 human-, 2 cow-, 3 pig-, and 4 chicken-specific genetic markers useful for source tracking, and developed a multiplex PCR and dual index sequencing-based method to quantify these markers in environmental E. coli isolates. The results of BLAST searches of the identified markers indicated that these markers may be applicable to library-independent microbial source tracking methods. Despite this, however, the percentages of environmental isolates that did not have any host-specific markers or that had genetic markers specific to more than one host were not low. It is especially important to point out that not a few E. coli strains are able to colonize multiple host species. Further studies including isolates obtained from geographically diverse animal hosts are needed to make this method more reliable.

a

An isolate that had at least one host specific marker was counted as positive.

Quantification of Genetic Markers in Environmental E. coli Isolates. Preliminary tests of the developed method (the combination of multiplex PCR and dual index sequencing) were performed with a negative control (no DNA) and 6 E. coli isolates whose marker possession patterns were already known by the previous analysis (SI Table S4). Results show that the markers amplified by multiplex PCR were precisely detected as sequence reads because the positively mapped markers in SI Table S4 are consistent with the results in SI Table S3. The developed method was applied for investigating identified genetic markers in environmental E. coli isolates (SI Table S5). An average coverage of more than 20-fold was determined to be positive. The isolates that had genetic markers specific to only one host were determined to be from the host. For example, an isolate that had genetic markers of H8 and H12 was determined to be from a human. However, 4.4% of the isolates had genetic markers specific to more than one host. This may be because these isolates were a type that could colonize multiple host species, which is consistent with the previous studies mentioned above.22,23 We also observed isolates with markers specific to more than one host in samples from known sources (Figure 2 and SI Table S3), which enabled us to classify nonclassified environmental isolates using a support vector machine (SVM). For example, an isolate that had genetic markers of H8 and Co2 was analyzed by SVM. SVM is a robust and efficient machine learning methodology for solving classification problems and has been used in various fields.39−42 A detailed description of SVM analysis is given in the SI. In total, 549 environmental isolates were analyzed, and 148 isolates (27.0%) were classified as human while a relatively small number of isolates were classified as being from other sources (3.1% for cow, 6.7% for pig, and 11.1% for chicken) (Figure 3). However, 286 isolates (52.1%) did not have any host-specific markers and were considered unclassified. These results may be because of (a) the fact that a limited fraction of the diversity within each source was sampled; (b) the existence of environment-naturalized E. coli strains;28,43 (c) the contribution of other sources that were not considered in this study; (d) the possibility that host-specific strains comprise only a small proportion of the E. coli population. Applicability of the Developed Markers to Total DNA Extracted from Fecal and Water Samples. Analysis of genetic markers of environmental E. coli isolates is somewhat time-consuming. It will be useful if the developed markers are



ASSOCIATED CONTENT

S Supporting Information *

Detailed descriptions of real-time PCR assays on DNA extracts and SVM analysis, the sequences of identified genomic regions, target sequences for amplification, Figure S1, and Tables S1− S7. This material is available free of charge via the Internet at http://pubs.acs.org.



AUTHOR INFORMATION

Corresponding Author

*Phone: +81-75-383-3355; fax: +81-75-383-3358; e-mail: [email protected]. Notes

The authors declare no competing financial interest. 9621

dx.doi.org/10.1021/es501944c | Environ. Sci. Technol. 2014, 48, 9616−9623

Environmental Science & Technology



Article

(19) Kobayashi, A.; Sano, D.; Hatori, J.; Ishii, S.; Okabe, S. Chickenand duck-associated Bacteroides-Prevotella genetic markers for detecting fecal contamination in environmental water. Appl. Microbiol. Biotechnol. 2013, 97 (16), 7427−37. (20) Shanks, O. C.; White, K.; Kelty, C. A.; Sivaganesan, M.; Blannon, J.; Meckes, M.; Varma, M.; Haugland, R. A. Performance of PCR-based assays targeting Bacteroidales genetic markers of human fecal pollution in sewage and fecal samples. Environ. Sci. Technol. 2010, 44 (16), 6281−8. (21) Bernhard, A. E.; Field, K. G. A PCR assay to discriminate human and ruminant feces on the basis of host differences in BacteroidesPrevotella genes encoding 16S rRNA. Appl. Environ. Microbiol. 2000, 66 (10), 4571−4. (22) Murray, A. C.; Kuskowski, M. A.; Johnson, J. R. Virulence factors predict Escherichia coli colonization patterns among human and animal household members. Ann. Intern. Med. 2004, 140 (10), 848−9. (23) Johnson, J. R.; Clabots, C. Sharing of virulent Escherichia coli clones among household members of a woman with acute cystitis. Clin. Infect. Dis. 2006, 43 (10), e101−8. (24) Clermont, O.; Lescat, M.; O’Brien, C. L.; Gordon, D. M.; Tenaillon, O.; Denamur, E. Evidence for a human-specific Escherichia coli clone. Environ. Microbiol. 2008, 10 (4), 1000−6. (25) Escobar-Paramo, P.; Le Menac’h, A.; Le Gall, T.; Amorin, C.; Gouriou, S.; Picard, B.; Skurnik, D.; Denamur, E. Identification of forces shaping the commensal Escherichia coli genetic structure by comparing animal and human isolates. Environ. Microbiol. 2006, 8 (11), 1975−84. (26) Khatib, L. A.; Tsai, Y. L.; Olson, B. H. A biomarker for the identification of cattle fecal pollution in water using the LTIIa toxin gene from enterotoxigenic Escherichia coli. Appl. Microbiol. Biotechnol. 2002, 59 (1), 97−104. (27) Khatib, L. A.; Tsai, Y. L.; Olson, B. H. A biomarker for the identification of swine fecal pollution in water, using the STII toxin gene from enterotoxigenic Escherichia coli. Appl. Microbiol. Biotechnol. 2003, 63 (2), 231−8. (28) Luo, C.; Walk, S. T.; Gordon, D. M.; Feldgarden, M.; Tiedje, J. M.; Konstantinidis, K. T. Genome sequencing of environmental Escherichia coli expands understanding of the ecology and speciation of the model bacterial species. Proc. Natl. Acad. Sci. U.S.A. 2011, 108 (17), 7200−5. (29) Smokvina, T.; Wels, M.; Polka, J.; Chervaux, C.; Brisse, S.; Boekhorst, J.; van Hylckama Vlieg, J. E.; Siezen, R. J. Lactobacillus paracasei comparative genomics: Towards species pan-genome definition and exploitation of diversity. PloS One 2013, 8 (7), e68731. (30) Lukjancenko, O.; Wassenaar, T. M.; Ussery, D. W. Comparison of 61 sequenced Escherichia coli genomes. Microb. Ecol. 2010, 60 (4), 708−20. (31) Touchon, M.; Hoede, C.; Tenaillon, O.; Barbe, V.; Baeriswyl, S.; Bidet, P.; Bingen, E.; Bonacorsi, S.; Bouchier, C.; Bouvet, O.; Calteau, A.; Chiapello, H.; Clermont, O.; Cruveiller, S.; Danchin, A.; Diard, M.; Dossat, C.; Karoui, M. E.; Frapy, E.; Garry, L.; Ghigo, J. M.; Gilles, A. M.; Johnson, J.; Le Bouguenec, C.; Lescat, M.; Mangenot, S.; Martinez-Jehanne, V.; Matic, I.; Nassif, X.; Oztas, S.; Petit, M. A.; Pichon, C.; Rouy, Z.; Ruf, C. S.; Schneider, D.; Tourret, J.; Vacherie, B.; Vallenet, D.; Medigue, C.; Rocha, E. P.; Denamur, E. Organised genome dynamics in the Escherichia coli species results in highly diverse adaptive paths. PLoS Genet. 2009, 5 (1), e1000344. (32) Rasko, D. A.; Rosovitz, M. J.; Myers, G. S.; Mongodin, E. F.; Fricke, W. F.; Gajer, P.; Crabtree, J.; Sebaihia, M.; Thomson, N. R.; Chaudhuri, R.; Henderson, I. R.; Sperandio, V.; Ravel, J. The pangenome structure of Escherichia coli: Comparative genomic analysis of E. coli commensal and pathogenic isolates. J. Bacteriology 2008, 190 (20), 6881−93. (33) Fukiya, S.; Mizoguchi, H.; Tobe, T.; Mori, H. Extensive genomic diversity in pathogenic Escherichia coli and Shigella Strains revealed by comparative genomic hybridization microarray. J. Bacteriol. 2004, 186 (12), 3911−21. (34) Korea, C. G.; Badouraly, R.; Prevost, M. C.; Ghigo, J. M.; Beloin, C. Escherichia coli K-12 possesses multiple cryptic but

ACKNOWLEDGMENTS This research was supported by KAKENHI (23221006) and Kyoto University’s Global Survivability Studies (GSS) program. We thank Kenji Yonetani for assistance with TOC art.



REFERENCES

(1) Bachoon, D. S.; Markand, S.; Otero, E.; Perry, G.; Ramsubhag, A. Assessment of non-point sources of fecal pollution in coastal waters of Puerto Rico and Trinidad. Mar. Pollut. Bull. 2010, 60 (7), 1117−21. (2) Furukawa, T.; Suzuki, Y. A proposal for source tracking of fecal pollution in recreational waters by pulsed-field gel electrophoresis. Microb. Environ. 2013, 28 (4), 444−449. (3) Araujo, S.; Henriques, I. S.; Leandro, S. M.; Alves, A.; Pereira, A.; Correia, A. Gulls identified as major source of fecal pollution in coastal waters: A microbial source tracking study. Sci. Total Environ. 2014, 470−471, 84−91. (4) Harwood, V. J.; Staley, C.; Badgley, B. D.; Borges, K.; Korajkic, A. Microbial source tracking markers for detection of fecal contamination in environmental waters: Relationships between pathogens and human health outcomes. FEMS Microbiol. Rev. 2014, 38 (1), 1−40. (5) Mote, B. L.; Turner, J. W.; Lipp, E. K. Persistence and growth of the fecal indicator bacteria enterococci in detritus and natural estuarine plankton communities. Appl. Environ. Microbiol. 2012, 78 (8), 2569− 77. (6) Jeong, Y.; Grant, S. B.; Ritter, S.; Pednekar, A.; Candelaria, L.; Winant, C. Identifying pollutant sources in tidally mixed systems: Case study of fecal indicator bacteria from marinas in Newport Bay, southern California. Environ. Sci. Technol. 2005, 39 (23), 9083−93. (7) Anderson, K. L.; Whitlock, J. E.; Harwood, V. J. Persistence and differential survival of fecal indicator bacteria in subtropical waters and sediments. Appl. Environ. Microbiol. 2005, 71 (6), 3041−8. (8) McLellan, S. L. Genetic diversity of Escherichia coli isolated from urban rivers and beach water. Appl. Environ. Microbiol. 2004, 70 (8), 4658−65. (9) Field, K. G.; Samadpour, M. Fecal source tracking, the indicator paradigm, and managing water quality. Water Res. 2007, 41 (16), 3517−38. (10) Meays, C. L.; Broersma, K.; Nordin, R.; Mazumder, A. Source tracking fecal bacteria in water: A critical review of current methods. J. Environ. Manage. 2004, 73 (1), 71−9. (11) Scott, T. M.; Rose, J. B.; Jenkins, T. M.; Farrah, S. R.; Lukasik, J. Microbial source tracking: Current methodology and future directions. Appl. Environ. Microbiol. 2002, 68 (12), 5796−5803. (12) Dombek, P. E.; Johnson, L. K.; Zimmerley, S. T.; Sadowsky, M. J. Use of repetitive DNA sequences and the PCR to differentiate Escherichia coli isolates from human and animal sources. Appl. Environ. Microbiol. 2000, 66 (6), 2572−7. (13) Krumperman, P. H. Multiple antibiotic resistance indexing of Escherichia coli to identify high-risk sources of fecal contamination of foods. Appl. Environ. Microbiol. 1983, 46 (1), 165−70. (14) Harwood, V. J.; Whitlock, J.; Withington, V. Classification of antibiotic resistance patterns of indicator bacteria by discriminant analysis: Use in predicting the source of fecal contamination in subtropical waters. Appl. Environ. Microbiol. 2000, 66 (9), 3698−704. (15) Shen, Z.; Duan, C.; Zhang, C.; Carson, A.; Xu, D.; Zheng, G. Using an intervening sequence of Faecalibacterium 16S rDNA to identify poultry feces. Water Res. 2013, 47 (16), 6415−22. (16) Boehm, A. B.; Van De Werfhorst, L. C.; Griffith, J. F.; Holden, P. A.; Jay, J. A.; Shanks, O. C.; Wang, D.; Weisberg, S. B. Performance of forty-one microbial source tracking methods: A twenty-seven lab evaluation study. Water Res. 2013, 47 (18), 6812−28. (17) Griffith, J. F.; Weisberg, S. B.; McGee, C. D. Evaluation of microbial source tracking methods using mixed fecal sources in aqueous test samples. J. Water Health 2003, 1 (4), 141−51. (18) Kapoor, V.; Smith, C.; Santo Domingo, J. W.; Lu, T.; Wendell, D. Correlative assessment of fecal indicators using human mitochondrial DNA as a direct marker. Environ. Sci. Technol. 2013, 47 (18), 10485−93. 9622

dx.doi.org/10.1021/es501944c | Environ. Sci. Technol. 2014, 48, 9616−9623

Environmental Science & Technology

Article

functional chaperone-usher fimbriae with distinct surface specificities. Environ. Microbiol. 2010, 12 (7), 1957−77. (35) Gaastra, W.; de Graaf, F. K. Host-specific fimbrial adhesins of noninvasive enterotoxigenic Escherichia coli strains. Microbiol. Rev. 1982, 46 (2), 129−61. (36) Johnson, T. J.; DebRoy, C.; Belton, S.; Williams, M. L.; Lawrence, M.; Nolan, L. K.; Thorsness, J. L. Pyrosequencing of the Vir plasmid of necrotoxigenic Escherichia coli. Vet. Microbiol. 2010, 144 (1−2), 100−9. (37) Wang, S.; Xia, Y.; Dai, J.; Shi, Z.; Kou, Y.; Li, H.; Bao, Y.; Lu, C. Novel roles for autotransporter adhesin AatA of avian pathogenic Escherichia coli: Colonization during infection and cell aggregation. FEMS Immunol. Med. Microbiol. 2011, 63 (3), 328−38. (38) Dai, J.; Wang, S.; Guerlebeck, D.; Laturnus, C.; Guenther, S.; Shi, Z.; Lu, C.; Ewers, C. Suppression subtractive hybridization identifies an autotransporter adhesin gene of E. coli IMT5155 specifically associated with avian pathogenic Escherichia coli (APEC). BMC Microbiol. 2010, 10, 236. (39) Cortes, C.; Vapnik, V. Support-Vector Networks. Mach Learn 1995, 20 (3), 273−297. (40) Byvatov, E.; Schneider, G. Support vector machine applications in bioinformatics. Applied Bioinform. 2003, 2 (2), 67−77. (41) Sujay Raghavendra, N.; Deka, P. C. Support vector machine applications in the field of hydrology: A review. Appl. Soft Comput. 2014, 19, 372−386. (42) Mountrakis, G.; Im, J.; Ogole, C. Support vector machines in remote sensing: A review. ISPRS J. Photogramm. Remote Sens. 2011, 66 (3), 247−259. (43) Ishii, S.; Ksoll, W. B.; Hicks, R. E.; Sadowsky, M. J. Presence and growth of naturalized Escherichia coli in temperate soils from Lake Superior watersheds. Appl. Environ. Microbiol. 2006, 72 (1), 612−21.

9623

dx.doi.org/10.1021/es501944c | Environ. Sci. Technol. 2014, 48, 9616−9623