Microbial Structures, Functions, and Metabolic Pathways in

Nov 14, 2012 - Only the predicted ORFs longer than 100 nt were translated into protein sequences using the NCBI Genetic Codes 11.(17) The ..... As sho...
6 downloads 10 Views 1MB Size
Article pubs.acs.org/est

Microbial Structures, Functions, and Metabolic Pathways in Wastewater Treatment Bioreactors Revealed Using High-Throughput Sequencing Lin Ye,† Tong Zhang,*,† Taitao Wang,‡ and Zhiwei Fang‡ †

Environmental Biotechnology Laboratory, The University of Hong Kong, Pokfulam Road, Hong Kong Beijing Genomics Institute, Shenzhen, Guangdong, China



S Supporting Information *

ABSTRACT: The objective of this study was to explore microbial community structures, functional profiles, and metabolic pathways in a lab-scale and a full-scale wastewater treatment bioreactors. In order to do this, over 12 gigabases of metagenomic sequence data and 600,000 paired-end sequences of bacterial 16S rRNA gene were generated with the Illumina HiSeq 2000 platform, using DNA extracted from activated sludge in the two bioreactors. Three kinds of sequences (16S rRNA gene amplicons, 16S rRNA gene sequences obtained from metagenomic sequencing, and predicted proteins) were used to conduct taxonomic assignments. Specially, relative abundances of ammonia-oxidizing archaea (AOA) and ammoniaoxidizing bacteria (AOB) were analyzed. Compared with quantitative real-time PCR (qPCR), metagenomic sequencing was demonstrated to be a better approach to quantify AOA and AOB in activated sludge samples. It was found that AOB were more abundant than AOA in both reactors. Furthermore, the analysis of the metabolic profiles indicated that the overall patterns of metabolic pathways in the two reactors were quite similar (73.3% of functions shared). However, for some pathways (such as carbohydrate metabolism and membrane transport), the two reactors differed in the number of pathway-specific genes.



tems15,16 at extremely high resolutions. However, this approach still cannot eliminate PCR bias. In contrast, high-throughput metagenomic sequencing, i.e. direct sequencing of the genomic DNA extracted from the environmental samples, provides a comprehensive approach without PCR bias and can be used to explore both taxonomic and functional diversity of a microbial community simultaneously. It has been used to analyze microbial communities in human gut,17 lake water,18 and food.19 So far, both 454 pyrosequencing and Illumina sequencing have been applied for metagenomic sequencing. Illumina sequencing can achieve much higher throughput thus more cost-effective for samples with high microbial diversities. 454 pyrosequencing was used to analyze activated sludge previously.20 However, due to the low throughput, the information in the activated sludge was not resolved adequately. Recently, a metagenome of activated sludge from a full-scale wastewater treatment plant (WWTP) generated by Illumina sequencing was also reported.21 The results showed that high diversities of microbial species and functional genes exist in activated sludge. In the present study, we conducted two genotyping methodologies (sequencing of bacterial 16S rRNA gene PCR

INTRODUCTION Activated sludge systems have been regarded as the highly efficient and widely used biological wastewater treatment process and are an important approach for treating both municipal and industrial wastewater.1,2 Microorganisms in the activated sludge are the main contributors for degradation and removal of these pollutants.1 Many methods have been developed to investigate and characterize the microbes in activated sludge. The cultivation-based methods were traditionally used to isolate and investigate the microorganisms in activated sludge.3,4 However, due to the high selectivity of the culture medium, most microorganisms cannot be successfully cultivated or isolated using these methods.5 Thus cultivationbased methods may introduce a large bias when being used to explore microbial community structure and function. About two decades ago, molecular methods6−9 were introduced to analyze microbial communities in a culture-independent way. These methods greatly increased our understanding of the microbial communities in activated sludge. However, these molecular methods are limited due to the PCR bias or low throughput. The next generation high-throughput sequencing technologies developed recently introduced several new approaches to analyze the microbial communities. Sequencing a huge number of 16S rRNA gene PCR amplicons has allowed exploration of the diversities and abundances of various microbial populations in activated sludge,10−12 ocean water,13,14 and other sys© 2012 American Chemical Society

Received: Revised: Accepted: Published: 13244

September 2, 2012 November 5, 2012 November 14, 2012 November 14, 2012 dx.doi.org/10.1021/es303454k | Environ. Sci. Technol. 2012, 46, 13244−13252

Environmental Science & Technology

Article

Table 1. Illumina Sequencing Data Analyzed in This Study SL category reads tagsb contigs

a

16S rRNA gene PCR amplicon reads (100 bp) metagenomic reads (100 bp) tags (∼180 bp) percentage of 16S rRNA gene in tags assembled contigs (>500 bp) percentage of reads used for assembly (%)c genes predicted from the contigs

R

batch 1

batch 2

batch 3

150,000 16,350,599 × 2d 13,996,197 0.097% 54,114 54.7 104,496

150,000 16,663,946 × 2 14,363,752 0.167% 67,759 16.8 106,885

-15,000,000 × 2e 11,048,559 0.072% 71,708 28.2 126,676

-15,000,000 × 2 11,057,977 0.066% 71,760 28.2 126,596

a

The actual reads numbers of R and SL were 213,720 and 152,448, respectively. In order to facilitate the comparison between the two samples, the numbers of reads were normalized to the same value (sequencing depth). bTags mean the sequences obtained from overlapping the paired-end metagenomic reads. cThe “percentage of reads used for assembly” is only counted for contigs but not for tags. d“ × 2” means paired-end reads. eFor batch 2 and batch 3, we used the same sequencing depth for comparison.

S1 summarizes the general parameters of Stanley WWTP, which has nitrification process. DNA Library Construction, PCR and Sequencing. For the metagenomic sequencing, libraries with insert size of 180 bp were constructed according to the manufacturer’s instructions (Illumina) for the two samples. For the 16S rRNA gene PCR amplicon sequencing, the extracted DNA was amplified by 16S rRNA gene V6 region primer set 985F (5′-CAA CGC GAA GAA CCT TAC C-3′) and 1046R (5′-CGA CAG CCA TGC ANC ACC T-3′).14 Sixnucleotide barcodes were added to the 5′ end of 958F to allow multiplexing. The PCR was conducted in a 30 μL mixture containing 0.2 μL of Ex Taq (TaKaRa, Dalian, China), 3 μL of 10× Ex Taq Buffer, 3 μL of dNTP mixture, 0.2 μM of each primer, and 20−50 ng of DNA under the following conditions: 94 °C for 2 min; 25 cycles of 94 °C 30 s, 57 °C 30 s, and 72 °C 30 s; and a final extension at 72 °C for 5 min. PCR products were purified using PCRquick-spin PCR Product Purification Kit (iNtRON Biotechnology, Korea). After purification, the PCR products were quantified by Nanodrop spectrophotometer, and the PCR products concentrations of R and SL were 46.2 ng/μL and 40.3 ng/μL, respectively. Both of the metagenomic sequencing and the 16S rRNA gene PCR amplicon sequencing were conducted using Illumina Hiseq 2000 platform by applying the 101bp paired-end (PE) strategy. A base-calling pipeline (Sequencing Control Software, Illumina) was used to process the raw fluorescent images and call sequences. As shown in Table 1, one batch of metagenomic sequencing data were generated with Sample R, and three batches of metagenomic sequencing data were generated with Sample SL. All the metagenomic sequencing data have been deposited into NCBI’s Short Read Achieve (Accession Number: SRA060680). Validation of Repeatability of DNA Extraction, PCR and Sequencing. To investigate the repeatability of DNA extraction, PCR and sequencing used in the present study, three DNA samples (A, B, and C) obtained from the activated sludge of Stanley WWTP were sequenced on the Illumina platform as described above. DNA of samples A and B were extracted as duplicates from the same activated sludge, while C was a PCR duplicate for sample A using the same DNA extract. Quality Filtering of the Raw Reads. For all the data sets, reads containing one or more uncalled bases and reads containing bases with quality score lower than 30 were removed. All the PE reads of 16S rRNA gene PCR amplicons were overlapped to decrease the sequencing errors.14 The operational taxonomic unit (OTU) analysis for the 16S rRNA

amplicons, and metagenomic sequencing) on the Illumina HiSeq2000 platform to analyze the microbial community structures and functional profiles in a lab-scale and a full-scale wastewater treatment reactors. The two genotyping approaches were compared for their applicability to reveal the taxonomic complexity of activated sludge. The functions and metabolic pathways of the two reactors were compared. In addition, metagenomic sequencing and qPCR were used to quantify ammonia-oxidizing archaea (AOA) and ammonia-oxidizing bacteria (AOB) in activated sludge samples. The overall results in our study demonstrated the application of high-throughput sequencing was a powerful tool for comprehensive characterization of taxonomic and functional profiles of the microbial community in activated sludge.



MATERIALS AND METHODS Sample Collection and DNA Extraction. In the present study, DNA was extracted using FastDNA SPIN Kit for Soil (Qbiogene, Carlsbad, CA, USA) from activated sludge samples collected from a lab-scale reactor and a full-scale WWTP at Stanley, Hong Kong, represented by R and SL, respectively. The biomass concentrations in both of the reactors were about 2000 mg/L. For each sample, 2 mL of activated sludge was used to conduct DNA extraction. The DNA concentrations, which were quantified by Nanodrop spectrophotometer (NanoDrop Technologies, Wilmington, DE, USA), of samples R and SL were 167.4 ng/μL and 87.7 ng/μL, respectively. The most obvious differences of the two reactors were the concentrations of ammonium nitrogen and organic matter. The lab-scale reactor,22 with a working volume of 2.6 L, was operated in continuous mode to investigate nitrification process. The influent was made with deionized water (67%) and seawater (33%). NH4Cl was added to the influent to get the ammonia nitrogen concentration of 200−400 mg/L. In addition, 20 mg/L of KH2PO4 was added into the influent to provide sufficient phosphorus for the growth of microorganisms. The reactor in the WWTP applied the combined biofilm and activated sludge systems. The flowchart of Stanley WWTP was shown in Figure S1. The activated sludge sample analyzed in the present study was taken from the aerobic zone. The ammonium concentration in the influent of reactor SL was about 12−22 mg NH4+-N/L. The organic matter was about 196−481 mg/L (measured by chemical oxygen demand). Nitrification rate in the reactors R and SL were about 0.14 kg(NH4+-N)/kg(Biomass)/day and 0.02 kg(NH4+-N)/kg(Biomass)/day. Figure S2 shows the performance of the labscale reactor at the sampling point for the present study. Table 13245

dx.doi.org/10.1021/es303454k | Environ. Sci. Technol. 2012, 46, 13244−13252

Environmental Science & Technology

Article

gene sequences was conducted using uclust23 embedded in Qiime.52 For the metagenomic reads, both overlapping and assembly were conducted. The parameters adopted for overlapping were as follows: at least 20 nt length of the overlap region was required, and at most two mismatches were allowed. The sequences generated by overlapping of the PE metagenomic reads were hereinafter referred to as “tags” in the present study. The reads that cannot be overlapped were not included in the analysis of “tags”. Illumina Paired-End Reads De Novo Assembly. After quality filtering, clean reads of the metagenomic data set were assembled into contigs by using SOAPdenovo assembler.24 A range of values (27−61) for the parameter K (k-mer size) were investigated, and it was found that the optimal values were ‘35’ and ‘33’ for samples R and SL, respectively. An optimal value of K was that which yielded the maximum contig number and the maximum value of N50, which is the length of the smallest contig in the set that contains the fewest contigs whose combined length represents at least 50% of the assembly.25 Only the contigs longer than 500 nt were used for further analysis.17 SOAPaligner in the SOAP package26 was used to align all metagenomic sequencing reads to the contigs. The value of the number of reads successfully aligned to the contigs divided by the total number of reads was referred to as “percentage of reads used for assembly” in the present study. Identification of 16S rRNA Genes. To identify 16S rRNA genes, all assembled tags and contigs were aligned with sequences in the RDP database (Release 10.26)27,28 using BLASTN.27 Tags and contigs with an E-value 50 nt were extracted as 16S rRNA gene sequences and used for further analysis.29 The DNA sequences identified from the “tags” were referred to as “16S rRNA gene tags” in the present study. Gene Prediction. In the present study, MetaGene30 was used to find open reading frames (ORFs) from the contigs of each sample. Only the predicted ORFs longer than 100 nt were translated into protein sequences using the NCBI Genetic Codes 11.17 The obtained protein sequences were referred to as “predicted proteins” in the present study. Taxonomic Assignment. The 16S rRNA gene PCR amplicon reads and the 16S rRNA gene tags were aligned with the Silva database31 using the BLASTN tool.27 Then the results were imported into the program MEGAN32 to conduct the taxonomic assignment with the lowest common ancestor algorithm using the program default parameters (i.e., min score 50, top percent 10, win score 0.0, and min complexity 0.3). We specified that MEGAN used the synonyms file (silva2ncbi.map) downloaded from MEGAN Web site (http://ab.inf.unituebingen.de/software/megan/) to map Silva accession numbers to NCBI taxa. The predicted proteins were aligned with protein sequences in the NCBI-NR database using the BLASTP tool.27 The data generated by BLASTP were also imported into MEGAN for taxonomic assignment as described above. AOA and AOB Abundance Analysis. Following the methodology published previously, 33 two approaches were applied to quantify the abundances of AOA and AOB using 16S rRNA genes. Method 1: BLASTN was used to align all the reads to four complete crenarchaeal AOA 16S rRNA genes (uncultured Crenarchaeote 54d9 [GI: 42557759], Candidatus Nitrosopumilus maritimus [GI:71668096], Uncultured Crenarchaeote 4B7 [GI:14548123], Candidatus Nitrosocaldus yellow-

stonii [GI:166164468]), and six AOB 16S rRNA genes (Nitrosomonas europaea [GI:30248031], Nitrosomonas communis [GI:1592803], Nitrosomonas oligotropha [GI:11545282], Nitrosomonas multiformis [GI:82701135], Nitrosospira sp B6 [GI:1236775], and Nitrosococcus oceani [GI:77163561]). Then, reads were assigned as AOA-like or AOB-like at 97% similarity and 90 bp alignment length cutoff referring to the above reference. Method 2: The reads that matched (allowing one or two mismatches) the probes specific for AOA (Cren1.1b_53733) and AOB (CTO189,34 CTO 654,34 Nsm156,35 Nso1225,35 and Nsp43636) 16S rRNA gene were counted as AOA-like and AOB-like sequences, respectively. This in silico search was conducted using BLASTN. Similarly, potential amoA gene sequences were also quantified by the methods as described above. The references are amoA genes of AOA (Nitrosopumilus maritimus SCM1(GI:71668108), Candidatus Nitrosocaldus yellowstonii (GI:166164469)), and AOB (Nitrosomonas europaea (GI:3252912), Nitrosococcus oceanus (GI:2104719), Nitrosomonas communis (GI:11545288), Nitrosomonas oligotropha (GI:11545302), Nitrosolobus multiformis (GI:1085094), and Nitrosospira sp. B6 (GI:18996142)). The cut-offs applied for amoA genes were 80% identity and 70bp alignment length, because the amoA gene is shorter and more diverse than 16S rRNA genes of AOA and AOB.37 The probes used for searching are Arch-amoAF/Arch-amoAR for AOA38 and amoA1F/amoA-2R for AOB.39 Gene Functional Classification. The contigs were annotated using MG-RAST server40 with the SEED Subsystem database. The maximum E-value cutoff and the minimum alignment length cutoff were set as 10−5 and 50 bp,41 respectively. Most of the genes were successfully classified into the hierarchical metabolic categories. Metabolic Pathway Analysis. BLASTP was used to align the protein sequences translated from ORFs to the NCBI-NR database with E-value 300 bp) was 16%, which was comparable to Sample SL in the present study. The percentage of sequences (54.7%) assembled into contigs in the Sample R was higher than that of SL and even higher than a recent study on human gut microbes using the same approach,17 which assembled 42.7% of Illumina reads. That should be attributed to the low microbial diversity in Sample R. According to Table 1, there were 0.066%−0.167% potential 16S rRNA gene sequences obtained from the assembled gene tags, which were generally in agreement with a previous study analyzing microbes in a biogas reactor using metagenomic approach.29 Taxonomic Complexity of the Microbial Diversity. As described in the Materials and Methods section, three kinds of sequences (16S rRNA amplicons, 16S rRNA gene tags, and predicted proteins) obtained by different approaches were used to conduct taxonomic assignments for bacteria. However, it was found that even at the phylum level the results of these methods were not consistent with each other (Figure 2). For Sample SL, the results of 16S rRNA gene PCR amplicons had a good linear relationship with the results of 16S rRNA gene tags. However, in other cases (the subplots in Figure 2), no good linear relationships were observed. Based on current knowledge, due to the methodological limitation, it is difficult to accurately reveal the ratio of each kind of bacteria in the activated sludge systems for their extremely high diversities. Albertsen et al.21 compared results from metagenomic sequencing and FISH analyses, which indicated that some phyla (such as Actinobacteria, Chlorof lexi, TM7, and Bacteroidetes) varied greatly when comparing the results of these two methods. The method based on 16S rRNA gene amplification suffers from PCR bias,43 which may lead to underestimation or overestimation of species abundance or diversity. Besides 16S rRNA genes, in some studies21,44 the predicted proteins were used to analyze bacterial abundance, the results were biased because 1) the assembly process generates nonredundant contigs; and, 2) the bacterial protein databases are far from completed, 3) species with a lot of strain variations will not be represented well in the resulting contigs. Theoretically, the results of 16S rRNA gene tags were more reliable owning to the lack of PCR bias and a relatively complete 16S rRNA sequence database. Furthermore, in the future it can be expected that the reliability will improve with the increase of sequencing read length. In addition, it is easy to understand that the severity of the PCR bias in different systems may not be the same.45 If PCR bias was not so severe, the result of 16S RNA gene amplicons would be close to the result of the 16S RNA gene tags, which was probably the reason why the 16S rRNA gene PCR amplicons correlated well with the results of 16S rRNA gene tags in Sample SL in Figure 2. We have also used 454 pyrosequencing for the two bioreactors in our previous studies.10,46 For the steadily

sequencing depth, 82,519 sequences (16S V6 region) were randomly selected from each of the three samples and combined together (247,557 sequences in total) for OTU analysis. As shown in Figure 1-I, most of the sequences (over

Figure 1. Repeatability of DNA extraction, PCR and sequencing for activated sludge samples. I: Number of sequences in shared and unique OTUs. A, B, and C represent three samples sent for Illumina sequencing, DNA of samples A and B were DNA extraction duplicates from the same activated sludge, while sample C was the PCR duplicate of the sample A. For each sample, 82,519 sequences were extracted to conduct OTU analysis at 3% distance cutoff. The number in the red circle represents the number of sequences that are present in the core OTUs shared by the three samples. The numbers in the green circles represent the sequences that are present in the OTUs shared by two samples. The numbers in the blue circles represent sequences that are present in the OTUs observed in only one sample. II: The percentages of the top-ten abundant OTUs in the three samples. The numbers on the bars indicate the standard deviations of percentages of the corresponding OTUs in the three samples.

97%) were classified into the OTUs shared by three samples. Only very few sequences (lower than 0.5%) were classified into the OTUs shared by two samples or only in one sample, which were probably caused by sequencing errors or some other random factors introduced during the DNA extraction and PCR. Figure 1-II showed that the percentages of major OTUs obtained from the three parallel samples were quite consistent, with low standard deviations (0.09−0.85). Both Figure 1-I and 1-II indicated that the repeatability of DNA extraction, PCR and Illumina sequencing applied in this study was pretty acceptable. Sequence Assembly Overview. In this study, 600,000 16S rRNA gene reads and more than 120 million metagenomic reads were obtained for the taxonomic and functional analyses. 13247

dx.doi.org/10.1021/es303454k | Environ. Sci. Technol. 2012, 46, 13244−13252

Environmental Science & Technology

Article

Figure 2. Abundances of bacterial phyla in the two samples determined by three different data sets. The subplots show correlation between different data sets. The “Genes”, “16S Tags”, and “16S Amplicons” represent genes predicted from ORFs, 16S rRNA gene tags identified from metagenomic sequencing, and 16S rRNA gene PCR amplicon sequencing reads, respectively.

operated full-scale reactor (Sample SL), it was confirmed that the most dominant phylum was Proteobacteria, which accounted for about 40−50% of the total bacteria. This finding is generally consistent with the activated sludge of WWTP in other studies.20,21 However, the percentages of other phyla were different between the results of metagenomic sequencing (the present study) and pyrosequencing.10,46 The similar differences were also observed between FISH and metagenomic sequencing.21 For the lab-scale reactor (Sample R), large amounts of Nitrospirae were observed (about 30% of the sequences, according to results based on gene tags; Figure 2), which was quite different from that in the previous study (about 7%)10 when the reactor was operated in the partial nitrification state. This indicates that in autotrophic nitrification reactors the abundance of Nitrospirae may be comparable with (or even outnumber) the abundance of Proteobacteria. To investigate and compare the microbial complexity in the two reactors of this study, rarefaction analysis was conducted

on the data sets of 16S rRNA gene tags and 16S rRNA gene PCR amplicon sequences using MEGAN by repeatedly sampling subsets of sequences. Then, at each sampling depth, the number of nodes (genus level) obtained based on the subsets of sequences was counted. The comparative rarefaction analysis results of 16S rRNA gene tags and 16S rRNA amplicon sequences (Figure S3) showed that the bacterial diversity in Sample SL was much higher than in Sample R. The number of nodes at genus level obtained by using 16S rRNA gene tags was about twice more than that using 16S rRNA PCR amplicon sequences (Figure S3). This was probably due to two reasons: 1) the primer pair for 16S rRNA gene V6 region is not conserved enough to amplify products from all bacteria in the activated sludge; and 2) a 16S rRNA gene tag is about two times longer than a 16S rRNA gene PCR amplicon sequence in this study and could be assigned to a genus with more confidence. 13248

dx.doi.org/10.1021/es303454k | Environ. Sci. Technol. 2012, 46, 13244−13252

Environmental Science & Technology

Article

failures were also encountered in the cases of other full-scale wastewater treatment bioreactors in our previous study.37 This may be attributed to the extremely higher microbial diversities and PCR inhibitors in these full-scale reactors. The results obtained from metagenomic data (Figure 3-I) on both amoA gene and 16S rRNA gene suggested that AOB were much more abundant than AOA in both of the two samples. The amoA gene copy number of AOA and AOB were 147 and 2.5,48 respectively, and most species of AOA and AOB have only one copy of 16S rRNA gene.49 Therefore, even these copy numbers were taken into account, AOB still dominated over AOA in the two reactors. These results were consistent with our previous report22 and others’ results50 of activated sludge systems. Although the quantification of AOA amoA gene in Sample SL failed, the qPCR results in Figure 3-II confirmed that AOB amoA gene dominated over AOA amoA gene in Sample R. However, in other environments, such as soil,33 seawater,51 and sediments,52 AOA are usually more abundant than AOB. The fact that AOB exceeded AOA in the activated sludge systems was probably due to the high ammonia concentration and high ammonia loading in the wastewater treatment bioreactors. As reported by Martens-Habbena et al.,53 AOA’s specific affinity for ammonia was higher than that of AOB. Usually AOA are abundantly present in environments with low ammonia concentrations.47,53 Based on our above results and others reports, it could be inferred that AOB may be more competitive than AOA in activated sludge system and function as the main contributors for nitrification. Figure 3 also indicates that the absolute abundance of the AOB amoA sequences and 16S rRNA sequences were not consistent. This could be caused by a number of reasons, including the different copy numbers of amoA gene and 16S rRNA in various species of AOB, the different lengths of AOB amoA gene and 16S rRNA gene, etc. However, both amoA gene and 16S rRNA gene demonstrated that AOB were more abundant than AOA in the two reactors. Gene Functional Classification and Metabolic Pathway. All the contigs of the two samples were annotated by MG-RAST40 against the SEED Subsystem database. Using the maximum E-value cutoff of 10−5 and the minimum alignment length cutoff of 50 bp, about 87.6% of the ORFs on contigs of Sample R and 78.7% of the ORFs on contigs of Sample SL were successfully assigned to hierarchical categories. This was higher than the percentage of sequences successfully assigned (40%) in a previous study conducted 4 years ago using 454 pyrosequencing.20 The possible reasons for the higher assignment percentages could be the more updated databases, different annotation methodologies used, and different kinds of sequences used for annotation. For example, reads were used for annotation by the previous study,20 while the genes predicted from assembled contigs were used for annotation in the present study. 73.3% of the functions annotated from Sample R and SL based on SEED subsystems were shared by two samples, indicating the similar metabolic pathways present in the two bioreactors. Figure 4 illustrates the abundance differences for the most common categories of functional genes for the two samples. The genes involved in carbohydrate metabolism were lower in Sample R than in Sample SL. This may be due to the low concentration of organic matter (TOC < 0.5 mg/L) in the lab-scale nitrification reactor. Besides, the genes assigned into Fatty Acids, Lipids, and Isoprenoids and Metabolism of Aromatic Compounds categories had significantly higher abundances in Sample SL than Sample R, indicating that

Table S2 illustrates the taxonomic assignment results of orders in the two samples, which was based on the taxonomic assignment results of 16S rRNA gene tags. The dominant orders in Samples R and SL were Nitrospirae and Actinomycetales, accounting for about 40% and 20% of the sequences, respectively. Among the ten most abundant orders of each sample, five orders were shared by the two samples, which were Flavobacteriales, Rhizobiales, Enterobacteriales, Actinomycetales, and Planctomycetales. This indicated that these bacteria are adaptable to both of the reactors. Abundances of AOA and AOB. In this study, the abundances of potential AOA and AOB were quantified based on the amoA gene and 16S rRNA gene using both metagenomic (alignment and probe search) and qPCR approaches (Figure 3). Due to the nonspecific amplification and/or the formation of primer dimer, the quantification of AOA amoA gene in Sample SL was not successful. The same

Figure 3. Abundances of AOA and AOB in Sample R and Sample SL. I: AOA and AOB sequences identified from metagenomic sequencing data. A: The number of reads that aligned to AOA/AOB 16S rRNA reference sequences with similarity cutoff 97% and alignment length cutoff 90 bps. B: Number of reads that matched AOA/AOB 16S rRNA specific probes, allowing up to one mismatch. C: The number of reads that aligned to AOA/AOB amoA gene reference sequences with identity cutoff 80% and alignment length cutoff 70 bps. D: Number of reads that matched AOA/AOB amoA gene specific probes, allowing up to two mismatches. II: Abundances of AOA and AOB amoA gene copy numbers quantified by qPCR. The quantification of AOA amoA gene in sample SL failed due to the serious smear and nonspecific PCR amplification. 13249

dx.doi.org/10.1021/es303454k | Environ. Sci. Technol. 2012, 46, 13244−13252

Environmental Science & Technology

Article

In addition to the metabolic categories, the gene projections of the two samples on KEGG pathways were shown in Figure S4, which illustrates an integrated picture of the potential metabolic pathways that were present in the two activated sludge systems. As can be seen in Figure S4, there was a complex assemblage of metabolic pathways, which was consistent with a high biodiversity and complexity of activated sludge systems. Overall, most of the pathways were common to both reactors (e.g., fatty acid metabolism, urea cycle, citrate cycle, etc.), which was consistent with the high percentage (73.3%) of functions shared by the two reactors. The main differences between the reactors were the abundances of genes involved in some specific pathways. It is also obvious that there were quite a few pathways in full-scale reactor not detected in the lab-scale reactor (such as steroid biosynthesis, dioxin degradation, xylene degradation, etc.), consistent with the higher microbial diversity in Sample SL than in Sample R. The nitrogen metabolism pathways, which contain the important nitrification and denitrification processes in wastewater treatment, were a focus of the study. Figure 5 shows the metabolic pathways of nitrification and denitrification. The enzyme commission (EC) numbers in Figure 5 are defined in Table S3. Figure 5 and Table S3 showed that most enzymes existed in both of the reactors. One exception is enzyme 1.7.7.2 (ferredoxin-nitrate reductase), which is an enzyme catalyzing nitrite oxidation to nitrate. The main difference between the two samples is the different abundance of each gene. As shown in Table S3, the numbers of genes coding enzyme 1.13.12.-(ammonia monooxygenase) and 1.7.3.4 (hydroxylamine oxidase) in Sample R were about 3−4 times higher than those in Sample SL, indicating the high nitrification capability in the nitrification reactor. This was consistent with the higher influent ammonium concentration and higher nitrification rate in R than SL. The genes coding enzymes (such as 1.7.99.4, 1.7.2.1, and 1.7.99.7) involved in denitrification were more abundant in Sample SL than in Sample R. Although almost no denitrification was observed in the lab-scale nitrification reactor (Figure S1), there were still quite a lot of genes related to denitrification in Sample R. It seems that these denitrification genes were not expressed in the lab-scale nitrification reactor because of the low concentration of organic matter and high concentration of dissolved oxygen. Although it is still at the very beginning, high-throughput sequencing has provided a powerful tool to explore complicated ecosystems like the activated sludge. In this study, we compared different kinds of sequences (16S rRNA gene amplicons, 16S rRNA gene tags, and predicted proteins) for taxonomic assignments and found that 16S rRNA gene tags had less bias for investigating the microbial community. Also, metagenomic sequencing was demonstrated to be a better approach to quantify AOA and AOB in activated sludge samples than qPCR which often leads to nonspecific PCR amplification and even unsuccessful PCR. Based on both metagenomic sequencing and qPCR results, it was concluded that AOB were more abundant than AOA in the lab-scale nitrification reactor and the full-scale municipal wastewater treatment reactor. Furthermore, the analysis of the metabolic profiles indicated that the overall patterns of metabolic pathways were quite similar (73.3% of functions shared) in the two different reactors. It is important to note that the metabolic pathways analyzed in the present study were only potential ones because the current analyses were based on DNA instead of RNA. Further studies based on RNA are

Figure 4. Functional categories of Sample R and Sample SL according to the SEED Subsystems database. The data of this figure were generated based on the ORFs predicted from the contigs of the two samples. The percentage refers to the percent of ORFs assigned into each category in the total ORFs.

chemical composition of wastewater in the municipal WWTP were much more complicated than that in the lab-scale reactor. It was also found that Sample SL harbored slightly more genes related to the whole nitrogen metabolism, while the total number of nitrification genes in Sample R was more than that in Sample SL (Figure 5, from ammonia to nitrate). Another

Figure 5. The nitrification and denitrification KEGG pathway. The red and blue numbers represent the abundances of the corresponding genes predicted from ORFs in Sample R and Sample SL, respectively. The blue numbers before and after “/” indicate the minimum and max values in the repeated sequencing batches. The numbers in the box represent EC numbers for each of the enzymes, whose definitions are listed in Table S3.

difference between the two samples was genes associated with membrane transport. The fact that the nitrification reactor contained about 1% salinity could be a reason for the high abundance of membrane transport genes in Sample R. Furthermore, a finding of few photosynthesis-related genes and low abundance of genes involved in phosphorus metabolism (compared with nitrogen metabolism) are similar to results reported by Sanapareddy et al.20 13250

dx.doi.org/10.1021/es303454k | Environ. Sci. Technol. 2012, 46, 13244−13252

Environmental Science & Technology

Article

(12) Ye, L.; Zhang, T. Pathogenic bacteria in sewage treatment plants as revealed by 454 pyrosequencing. Environ. Sci. Technol. 2011, 45 (17), 7173−7179. (13) Qian, P.; Wang, Y.; Lee, O.; Lau, S.; Yang, J.; Lafi, F.; AlSuwailem, A.; Wong, T. Vertical stratification of microbial communities in the Red Sea revealed by 16S rDNA pyrosequencing. ISME J. 2010, 5, 507−518. (14) Zhou, H.; Li, D.; Tam, N.; Jiang, X.; Zhang, H.; Sheng, H.; Qin, J.; Liu, X.; Zou, F. BIPES, a cost-effective high-throughput method for assessing microbial diversity. ISME J. 2010, 5, 741−749. (15) Unno, T.; Di, D. Y. W.; Jeonghwan, J.; Suh, Y.; Sadowsky, M. J.; Hur, H. G. Integrated online system for a pyrosequencing-based microbial source tracking method that targets bacteroidetes 16S rDNA. Environ. Sci. Technol. 2011, 46 (1), 93−98. (16) Claesson, M.; O’Sullivan, O.; Wang, Q.; Nikkila, J.; Marchesi, J.; Smidt, H.; De Vos, W.; Ross, R.; O’Toole, P. Comparative analysis of pyrosequencing and a phylogenetic microarray for exploring microbial community structures in the human distal intestine. PloS one 2009, 4 (8), e6669. (17) Qin, J.; Li, R.; Raes, J.; Arumugam, M.; Burgdorf, K. S.; Manichanh, C.; Nielsen, T.; Pons, N.; Levenez, F.; Yamada, T. A human gut microbial gene catalogue established by metagenomic sequencing. Nature 2010, 464 (7285), 59−65. (18) Oh, S.; Caro-Quintero, A.; Tsementzi, D.; DeLeon-Rodriguez, N.; Luo, C.; Poretsky, R.; Konstantinidis, K. T. Metagenomic insights into the evolution, function, and complexity of the planktonic microbial community of Lake Lanier, a temperate freshwater ecosystem. Appl. Environ. Microbiol. 2011, 77 (17), 6000−6011. (19) Jung, J. Y.; Lee, S. H.; Kim, J. M.; Park, M. S.; Bae, J. W.; Hahn, Y.; Madsen, E. L.; Jeon, C. O. Metagenomic analysis of kimchi, a traditional Korean fermented food. Appl. Environ. Microbiol. 2011, 77 (7), 2264−2274. (20) Sanapareddy, N.; Hamp, T. J.; Gonzalez, L. C.; Hilger, H. A.; Fodor, A. A.; Clinton, S. M. Molecular diversity of a North Carolina wastewater treatment plant as revealed by pyrosequencing. Appl. Environ. Microbiol. 2009, 75 (6), 1688−1696. (21) Albertsen, M.; Hansen, L. B. S.; Saunders, A. M.; Nielsen, P. H.; Nielsen, K. L. A metagenome of a full-scale microbial community carrying out enhanced biological phosphorus removal. ISME J. 2012, 6, 1094−1106. (22) Ye, L.; Zhang, T. Ammonia-oxidizing bacteria dominates over ammonia-oxidizing archaea in a saline nitrification reactor under low DO and high nitrogen loading. Biotechnol. Bioeng. 2011, 108 (11), 2544−2552. (23) Edgar, R. C. Search and clustering orders of magnitude faster than BLAST. Bioinformatics 2010, 26 (19), 2460−2461. (24) Li, R.; Zhu, H.; Ruan, J.; Qian, W.; Fang, X.; Shi, Z.; Li, Y.; Li, S.; Shan, G.; Kristiansen, K. De novo assembly of human genomes with massively parallel short read sequencing. Genome Res. 2010, 20 (2), 265−272. (25) Miller, J. R.; Koren, S.; Sutton, G. Assembly algorithms for nextgeneration sequencing data. Genomics 2010, 95 (6), 315−327. (26) Li, R.; Li, Y.; Kristiansen, K.; Wang, J. SOAP: short oligonucleotide alignment program. Bioinformatics 2008, 24 (5), 713−714. (27) Altschul, S.; Gish, W.; Miller, W.; Myers, E.; Lipman, D. Basic local alignment search tool. J. Mol. Biol. 1990, 215 (3), 403−410. (28) Cole, J.; Chai, B.; Marsh, T. L.; Farris, R. J.; Wang, Q.; Kulam, S.; Chandra, S.; McGarrell, D.; Schmidt, T. M.; Garrity, G. M. The Ribosomal Database Project (RDP-II): previewing a new autoaligner that allows regular updates and the new prokaryotic taxonomy. Nucleic Acids Res. 2003, 31 (1), 442−443. (29) Jaenicke, S.; Ander, C.; Bekel, T.; Bisdorf, R.; Dröge, M.; Gartemann, K. H.; Jünemann, S.; Kaiser, O.; Krause, L.; Tille, F. Comparative and joint analysis of two metagenomic datasets from a biogas fermenter obtained by 454-pyrosequencing. PloS One 2011, 6 (1), e14519.

deserved to explore the active functions in different activated sludge systems.



ASSOCIATED CONTENT

S Supporting Information *

Reactor flowchart and performance, rarefaction curves, bacterial taxonomic assignment at the order level, enzymes related to nitrification and denitrification, and KEGG pathway. This material is available free of charge via the Internet at http:// pubs.acs.org.



AUTHOR INFORMATION

Corresponding Author

*Phone: +852-28578551. Fax: +852-25595337. E-mail: [email protected]. Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS The authors thank the Hong Kong General Research Fund for the financial support of this study (7197/08E), and Lin Ye thanks The University of Hong Kong for the postgraduate studentship. Some of the computation work in this research was conducted using the HKU Computer Centre research computing facilities that are supported in part by the Hong Kong UGC Special Equipment Grant (SEG HKU09).



REFERENCES

(1) Wagner, M.; Loy, A.; Nogueira, R.; Purkhold, U.; Lee, N.; Daims, H. Microbial community composition and function in wastewater treatment plants. Antonie Van Leeuwenhoek 2002, 81 (1), 665−680. (2) Tong, R.; Beck, M.; Latten, A. Fuzzy control of the activated sludge wastewater treatment process. Automatica 1980, 16 (6), 695− 701. (3) Amann, R.; Lemmer, H.; Wagner, M. Monitoring the community structure of wastewater treatment plants: a comparison of old and new techniques. FEMS Microbiol. Ecol. 1998, 25 (3), 205−215. (4) Benedict, R.; Carlson, D. Aerobic heterotrophic bacteria in activated sludge. Water Res. 1971, 5 (11), 1023−1030. (5) Hugenholtz, P.; Goebel, B. M.; Pace, N. R. Impact of cultureindependent studies on the emerging phylogenetic view of bacterial diversity. J. Bacteriol. 1998, 180 (18), 4765−4774. (6) Muyzer, G.; De Waal, E.; Uitterlinden, A. Profiling of complex microbial populations by denaturing gradient gel electrophoresis analysis of polymerase chain reaction-amplified genes coding for 16S rRNA. Appl. Environ. Microbiol. 1993, 59 (3), 695−700. (7) Liu, W. T.; Marsh, T. L.; Cheng, H.; Forney, L. J. Characterization of microbial diversity by determining terminal restriction fragment length polymorphisms of genes encoding 16S rRNA. Appl. Environ. Microbiol. 1997, 63 (11), 4516−4522. (8) Schuppler, M.; Mertens, F.; Schön, G.; Göbel, U. B. Molecular characterization of nocardioform actinomycetes in activated sludge by 16S rRNA analysis. Microbiology 1995, 141 (2), 513−521. (9) Erhart, R.; Bradford, D.; Seviour, R.; Amann, R.; Blackall, L. Development and use of fluorescent in situ hybridization probes for the detection and identification of Microthrix parvicella in activated sludge. Syst. Appl. Microbiol. 1997, 20 (2), 310−318. (10) Ye, L.; Shao, M. F.; Zhang, T.; Tong, A. H. Y.; Lok, S. Analysis of the bacterial community in a laboratory-scale nitrification reactor and a wastewater treatment plant by 454-pyrosequencing. Water Res. 2011, 45 (15), 4390−4398. (11) Kim, T.; Kim, H.; Kwon, S.; Park, H. Nitrifying bacterial community structure of a full-scale integrated fixed-film activated sludge process as investigated by pyrosequencing. J. Microbiol. Biotechnol. 2011, 21 (3), 293−298. 13251

dx.doi.org/10.1021/es303454k | Environ. Sci. Technol. 2012, 46, 13244−13252

Environmental Science & Technology

Article

(30) Noguchi, H.; Park, J.; Takagi, T. MetaGene: prokaryotic gene finding from environmental genome shotgun sequences. Nucleic Acids Res. 2006, 34 (19), 5623−5630. (31) Pruesse, E.; Quast, C.; Knittel, K.; Fuchs, B. M.; Ludwig, W.; Peplies, J.; Glöckner, F. O. SILVA: a comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB. Nucleic Acids Res. 2007, 35 (21), 7188−7196. (32) Huson, D. H.; Mitra, S.; Ruscheweyh, H. J.; Weber, N.; Schuster, S. C. Integrative analysis of environmental sequences using MEGAN4. Genome Res. 2011, 21, 1552−1560. (33) Leininger, S.; Urich, T.; Schloter, M.; Schwark, L.; Qi, J.; Nicol, G.; Prosser, J.; Schuster, S.; Schleper, C. Archaea predominate among ammonia-oxidizing prokaryotes in soils. Nature 2006, 442 (7104), 806−809. (34) Kowalchuk, G. A.; Stephen, J. R.; De Boer, W.; Prosser, J. I.; Embley, T. M.; Woldendorp, J. W. Analysis of ammonia-oxidizing bacteria of the beta subdivision of the class Proteobacteria in coastal sand dunes by denaturing gradient gel electrophoresis and sequencing of PCR-amplified 16S ribosomal DNA fragments. Appl. Environ. Microbiol. 1997, 63 (4), 1489−1497. (35) Mobarry, B. K.; Wagner, M.; Urbain, V.; Rittmann, B. E.; Stahl, D. A. Phylogenetic probes for analyzing abundance and spatial organization of nitrifying bacteria. Appl. Environ. Microbiol. 1996, 62 (6), 2156−2162. (36) Stephen, J. R.; Kowalchuk, G. A.; Bruns, M. A. V.; McCaig, A. E.; Phillips, C. J.; Embley, T. M.; Prosser, J. I. Analysis of β-subgroup proteobacterial ammonia oxidizer populations in soil by denaturing gradient gel electrophoresis analysis and hierarchical phylogenetic probing. Appl. Environ. Microbiol. 1998, 64 (8), 2958−2965. (37) Zhang, T.; Ye, L.; Tong, A. H. Y.; Shao, M. F.; Lok, S. Ammonia-oxidizing archaea and ammonia-oxidizing bacteria in six fullscale wastewater treatment bioreactors. Appl. Microbiol. Biotechnol. 2011, 91, 1215−1225. (38) Francis, C. A.; Roberts, K. J.; Beman, J. M.; Santoro, A. E.; Oakley, B. B. Ubiquity and diversity of ammonia-oxidizing archaea in water columns and sediments of the ocean. Proc. Natl. Acad. Sci. 2005, 102 (41), 14683−14688. (39) Rotthauwe, J. H.; Witzel, K. P.; Liesack, W. The ammonia monooxygenase structural gene amoA as a functional marker: molecular fine-scale analysis of natural ammonia-oxidizing populations. Appl. Environ. Microbiol. 1997, 63 (12), 4704−4712. (40) Glass, E.; Wilkening, J.; Wilke, A.; Antonopoulos, D.; Meyer, F. Using the metagenomics RAST server (MG-RAST) for analyzing shotgun metagenomes. Cold Spring Harbor Protocols 2010, 2010 (1), pdb. prot5368. (41) Lamendella, R.; Santo Domingo, J.; Ghosh, S.; Martinson, J.; Oerther, D. Comparative fecal metagenomics unveils unique functional capacity of the swine gut. BMC Microbiol. 2011, 11 (1), 103− 120. (42) Suparna, M.; Paul, R.; Daniel, R.; Tim, U.; Jack, G.; Daniel, H. Functional analysis of metagenomes and metatranscriptomes using SEED and KEGG. BMC Bioinf. 2011, 12, 21−28. (43) Urich, T.; Lanzén, A.; Qi, J.; Huson, D.; Schleper, C.; Schuster, S. Simultaneous assessment of soil microbial community structure and function through analysis of the meta-transcriptome. PloS One 2008, 3 (6), e2527. (44) Qin, Y.; Li, D.; Yang, H. Investigation of total bacterial and ammonia oxidizing bacterial community composition in a full scale aerated submerged biofilm reactor for drinking water pretreatment in China. FEMS Microbiol. Lett. 2007, 268 (1), 126−134. (45) Lueders, T.; Friedrich, M. W. Evaluation of PCR amplification bias by terminal restriction fragment length polymorphism analysis of small-subunit rRNA and mcrA genes by using defined template mixtures of methanogenic pure cultures and soil DNA extracts. Appl. Environ. Microbiol. 2003, 69 (1), 320−326. (46) Zhang, T.; Shao, M. F.; Ye, L. 454 Pyrosequencing reveals bacterial diversity of activated sludge from 14 sewage treatment plants. ISME J. 2012, 6, 1137−1147.

(47) Konneke, M.; Bernhard, A.; De la Torre, J.; Walker, C.; Waterbury, J.; Stahl, D. Isolation of an autotrophic ammonia-oxidizing marine archaeon. Nature 2005, 437 (7058), 543−546. (48) Norton, J.; Alzerreca, J.; Suwa, Y.; Klotz, M. Diversity of ammonia monooxygenase operon in autotrophic ammonia-oxidizing bacteria. Arch. Microbiol. 2002, 177 (2), 139−149. (49) Lee, Z.; Bussema, C., III; Schmidt, T. rrnDB: documenting the number of rRNA and tRNA genes in bacteria and archaea. Nucleic Acids Res. 2009, 37, D489−D493. (50) Wells, G.; Park, H.; Yeung, C.; Eggleston, B.; Francis, C.; Criddle, C. Ammonia-oxidizing communities in a highly aerated fullscale activated sludge bioreactor: betaproteobacterial dynamics and low relative abundance of Crenarchaea. Environ. Microbiol. 2009, 11 (9), 2310−2328. (51) Wuchter, C.; Abbas, B.; Coolen, M. J. L.; Herfort, L.; van Bleijswijk, J.; Timmers, P.; Strous, M.; Teira, E.; Herndl, G. J.; Middelburg, J. J. Archaeal nitrification in the ocean. Proc. Natl. Acad. Sci. 2006, 103 (33), 12317−12322. (52) Park, S.; Park, B.; Rhee, S. Comparative analysis of archaeal 16S rRNA and amoA genes to estimate the abundance and diversity of ammonia-oxidizing archaea in marine sediments. Extremophiles 2008, 12 (4), 605−615. (53) Martens-Habbena, W.; Berube, P.; Urakawa, H.; de La Torre, J.; Stahl, D. Ammonia oxidation kinetics determine niche separation of nitrifying archaea and bacteria. Nature 2009, 461, 976−979.

13252

dx.doi.org/10.1021/es303454k | Environ. Sci. Technol. 2012, 46, 13244−13252