Characterization of Natural published by the and Affected isEnvironments
American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Subscriber accessPublished providedby byAmerican Karolinska Chemical Society. Library Institutet, University Copyright © American Chemical Society. However, no copyright
High-Resolution Temporal and Spatial
Patterns of Virome in Wastewater is published by the American Chemical 1155 Sixteenth TreatmentSociety. Systems Street N.W., Washington,
Yulin Wang, Xiao-Tao DC 20036Jiang, Lei Published American Subscriber access provided Karolinska Liu, Bing Li, and TongbybyZhang Chemical Society. Institutet, University Library Copyright © American Chemical Society. However, no copyright
Environ. Sci. Technol., Just Accepted Manuscript • Publication Date (Web): 27 Aug 2018 is published by the American Chemical
DownloadedSociety. from http:// 1155 Sixteenth pubs.acs.org on Street August 2018 N.W.,27, Washington, DC 20036
Subscriber accessPublished providedby byAmerican Karolinska Chemical Society. Library Institutet, University Copyright © American Chemical Society. However, no copyright
Just Accepted
is published by the “Just Accepted” manuscripts have been pee American Chemical online prior to technical editing, formatting for Society. 1155 Sixteenth Street N.W., Washington, Society provides “Just Accepted” as a service DC 20036 of scientific material as soon as possible a Published by American Subscriber access provided by Karolinska Chemical Society. Library Institutet, University Copyright © American Chemical Society. However, no copyright
full in PDF format accompanied by an HTM peer reviewed, but should not be considere is published by the “Just Accep Digital Object Identifier (DOI®). American Chemical the “Just Accepted” Web may not inclu Society. 1155site Sixteenth Street N.W., Washington, a manuscript is technically edited and form DC 20036 siteSubscriber and published as anbyAmerican ASAP article. No accessPublished providedby Karolinska Chemical Society. Library Institutet, University Copyright © American Chemical Society. However, no copyright
to the manuscript text and/or graphics wh ethical guidelines that apply to the journal is published by the consequences arising from the use of infor American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Subscriber accessPublished providedby byAmerican Karolinska Chemical Society. Library Institutet, University Copyright © American Chemical Society. However, no copyright
Page 1 ofEnvironmental 32 Science & Technology Phylogeny based
Metagenomic sequencing
vHMM based
Metagenomic viral contigs (mVCs) IMG/VR
Temporal dynamic ST-79 ST-5704
ST-297 ST-5762
ST-372 ST-6919
ST-503 ST-11052
ST-1927 ST-12397
ST-2135 ST-15196
Spatial distribution
ST-5602
Normalized coverage
80
60
Mar4
Jul 5 Oct 19 Nov 9 Feb 20 May 21 Aug 18 Oct 18 Dec 15 Feb 13 Apr 17 Jul 9 Aug 24 Oct 19 Dec 15 Feb 25 Apr 20 Jun 21 Aug 17 Oct 12 Dec 14 Feb 16 Apr 20 Jun 21 Aug 16 Oct 21 Dec 17 Feb 15 May 30 Jul 23 Sep 24 Nov 27 Jan 22 Mar 21 May 30 Jul 22 Oct 3 Nov 27 Jan 27 Apr 1 Jun 3 Jul 28 Oct 3 Dec 3 Feb 3 Mar 25 May 29 Jul 30 Oct 7 Dec 3
Jan 4
Jan 12
0
Feb21
20
Jan 11
ACS Paragon Plus Environment
40
2007
2008
2009
2010
2011
Date
2012
2013
2014
2015
500 shared VCs 200 shared VCs 100 shared VCs 10 shared VCs
Environmental Science & Technology
Page 2 of 32
ase
uct
ed er
27
ge
gp2
in
lys
h
180 kb
e llag e lik n−
ST-724
200 kb
yla cos
se
ose
tein
pro
c glu P− 75 TD se gp1 e d rata ativ hyd put 6−de 4,
pu
160 kb
Co
140 kb
A
y sgl ran nt yca ogl r tid nso e Pepive sekinas at e put stidin hi
120 kb
ra ste hoe osp oph se tall iga Me Al 36 DN gp2tative
100 kb
e e has ativ ynt put e s ge ylat pha mid thy C1 ase tid
6
20 kb
e− tos hep nno a se −m ase lea −D ase cle nuc ero fer onu ndo d lycltrans e g n e e − −L osy 65 ativ 78 ing DP glyc gp1 put gp1 om
d oti elta cle it idbonu tein e n u r pro eas ub II ucl ike II slass cD s] n l e I o − e eas se t c se ex lu rot eranden ase R acil 5'−3' ina dp rm ym e bin e [B ive l a e p t e o e s t h ge 4 A p2−d com Pa puta pro p27 pha gp1 DN B1 Re AT g
pe p
80 kb
56
gp
20 kb
pha
se
era
pim
e 6−
se
tor fac ma g i e s s e tein rasecleas uclea 82 e nu on pro C:7e o d ke olym i d n l EG n Ae gas − p i e K l 5 0 I ANH ns 4 L A gp2 RNH T gp5 T4− DN
120 kb
140 kb
160 kb
180 kb
200 kb
220 kb
240 kb
260 kb
Figure 1. Genome maps of the 2 longest mVCs in the ST and SWH WWTPs. All the predicted genes with functions that could not be assigned are colored in gray, and confirmed viral genes are labeled and colored in blue. If a gene had hits with viral and bacterial proteins simultaneously, the gene was assigned based on e-value and coverage. Genes with high coverage and e-value with bacterial proteins are labeled and colored in burly wood. Length scales are shown at the bottom left.
ACS Paragon Plus Environment
SWH-4941
ase rat hyd ase e r ras −de me sfe 4,6 epi ran se ylt no se nt mo an era nde rba −mnsf pe Ca P−yDl tra D de GDcos NA gly tive a put se lea nuc ein ndo rot G e tion p YI ple Y− GI com ad He 1.95 e A s gp2 la DN ein t dro se Hy mina x pro r e Te ert n lv ein tei rta rot pro Po d p ion hea let jor omp Maad c He
cyt
or
50 kb
100 kb
tein pro er th m hea mono 9 il s 1 Ta eath in gp h il s prote Ta ube il t Ta e tas lA Xh duc sin in ase ein e re oly rote ucleprot phat heemcA p exoncase phos R tive heli e−di se ase a e/ d a le puitmas leosi e kinonuc pr nuc otid end o e rib cl NH ynu H ion polative rvat es se I put A starolas uclea III e e DN yd on as r h d e n das mi Ntnage e olym II e I roteinine a Ph A p n ras DN me like p−alase oly − −L ra A p rase oyl ime DN oesteuramnt ep sphtylm nde pho ace epe N− D−d se ola NA 19 ydr H2 gp3 Ah Fts RN in e l−t rase rot p acy me ino oly III ion am A p leasedivis se tha DNonuc cell e syn endativ ate put L sph oE pho Gr oyl se bam era car nsf tra lyl idy
60 kb
t fac ing ycl rec me oso Rib e S eas GR rel −P in PE e cha r ter tid po pep trans ase S lig MF NA 5' R 2'−
20 kb
se lea s nuc rase exaonsfe e k r −li ylt cB os ReGlyc
e tas ein mu rot dis ep e d lik i − x T 1 t ro Mu supe 04g8p73HAM p
Page 3 of 32
Environmental Science & Technology
b 100
ST mVCs SWH mVCs High−novelty
75
High−novelty
50
Medium−novelty Low−novetly
Medium−novelty Low−novelty
25
0 0
1e+05
2e+05
Length of metagenomic viral contigs (bp)
Genes covered by IMG viral protein (%)
Genes covered by NCBI viral protein (%)
a
100
ST mVCs
Medium−novelty
SWH mVCs
75
Low−novelty High−novelty Medium−novelty Low−novelty
50
High−novelty
25
0 0
1e+05
2e+05
Length of metagenomic viral contigs (bp)
Figure 2. Distribution of ST (blue) and SWH (orange) mVCs by length and coverage with the NCBI RefSeq (a) and IMG viral protein databases (b). The mVCs with >70%, 35-70% and 1.0) in ST AS samples over 9 years. This visualization represents the average normalized coverage (areas) of different mVCs (color). The sectional areas at specific time points represent the average normalized coverage of mVCs. The normalized coverage of all samples is summarized in this figure, but only 53 samples are labeled on the x-axis to make the figure easier to read.
ACS Paragon Plus Environment
Page 7 of 32
Environmental Science & Technology
500 shared Viral Clusters 200 shared Viral Clusters 100 shared Viral Clusters 10 shared Viral Clusters
Figure 6. Distribution of shared viral clusters among wastewater treatment systems. The connecting lines indicate shared viral clusters that are found between two connected samples, and the colors of the lines represent the number of shared viral clusters.
ACS Paragon Plus Environment
Environmental Science & Technology
1
High-Resolution Temporal and Spatial Patterns of Virome in
2
Wastewater Treatment Systems
3
Yulin Wang1, Xiaotao Jiang1, Lei Liu1, Bing Li1, 2, Tong Zhang1, *
4
1
Environmental Biotechnology Laboratory, Department of Civil Engineering, The University of Hong Kong, Pokfulam Road, Hong Kong
5 6
2
Division of Energy and Environment, Graduate School at Shenzhen, Tsinghua
7
University, China
8
E-mail
[email protected]; Tel. 852-28578551; Fax 852-25595337.
9
1
ACS Paragon Plus Environment
Page 8 of 32
Page 9 of 32
Environmental Science & Technology
10
ABSTRACT:
11
Wastewater treatment plants (WWTPs) are considered reservoirs of viruses, but the
12
diversity and dynamic changes of viruses are not well understood. In this study, we
13
recovered 8,478 metagenomic viral contigs (mVCs; >5 kb) from two WWTPs (Shatin,
14
2,806; Shek Wu Hui, 5,672) in Hong Kong. Approximately 60% of the mVCs were
15
poorly covered (96% of DNA viruses in AS were reported to be
38
bacterial phages,8 these viruses may have an important impact on bacterial diversity,11-
39
13
40
Meanwhile, viral lysis influences the nutrient cycle in wastewater treatment systems.14
41
Taken together, these observations indicate that there is an urgent need for
42
comprehensive studies on the diversity and distribution of viruses in WWTPs.
43
It is impossible to develop universal culturing approaches and primers for naturally
44
occurring viruses.17, 18 As a result, special emphasis has been placed on the use of
45
cultivation-independent approaches for the identification of viruses. Shotgun
46
metagenomic approaches are now widely used to retrieve genomic information from
47
dsDNA viruses in environmental samples. Paez-Espino et al. (2016) reanalyzed more
48
than 5 Tb of 3,042 metagenomes in the Integrated Microbial Genomes with
49
Microbiome Samples (IMG/M) system and recovered more than 125,000 partial DNA
50
viral sequences, which provided great insight into viral diversity and habitat distribution.
51
Roux et al. (2016) analyzed the viral diversity and distribution in the oceans on a global
52
scale using metagenomics and suggested that viruses may play a key role in nutrient
53
cycling and trophic networks. These studies suggested that metagenomics could
54
provide an unprecedented method to mine viral populations in the sequenced
55
metagenomes of environmental samples with or without target viral enrichment.
56
Microbial viruses normally have diameter on the nanometer scale (i.e., 20 to 200-400
57
nm),15 and the virions of giant viruses (e.g., Tupanvirus) can reach up to 1.2 μm in
58
size.16 Luef et al. (2015) used 0.1-μm membrane filters to enrich ultrasmall microbes
59
in groundwater,19 indicating that filtration is a valid approach for the study of ultrasmall
60
microbes and may be used for the enrichment of viruses in environmental samples.
61
To gain insights into WWTPs, we used metagenomics to retrieve genomic information
62
regarding DNA viruses in AS samples. Because viral variation is a highly dynamic
63
process,20 short-term studies cannot reflect the shifts in the viral populations in complex
and may further affect the performance of biological treatment processes in WWTPs.
3
ACS Paragon Plus Environment
Page 10 of 32
Page 11 of 32
Environmental Science & Technology
64
environmental samples. In the present study, we assessed the temporal dynamics of the
65
newly identified viruses in multiyear samples and evaluated the spatial pattern of the
66
identified viruses in wastewater treatment systems. The objectives of this study were to
67
1) identify the viral populations in the AS systems of two WWTPs in Hong Kong,
68
namely, the Shatin (ST) and Shek Wu Hui (SWH) WWTPs; 2) track the temporal
69
distribution of identified viruses in AS samples collected monthly from the ST WWTP
70
over a period of ~9 years; and 3) profile the spatial distribution patterns of the virome
71
in different wastewater treatment systems. Taken together, the results of this study will
72
expand the current understanding of viral dynamics and distribution patterns in
73
wastewater treatment systems.
74
MATERIALS AND METHODS
75
Sampling and Sample Processing
76
Supernatants (3 L) of well-settled AS (30 min of settling) were individually collected
77
from the ST and SWH WWTPs (April 8, 2016). The collected samples were transferred
78
to the laboratory immediately, and subsequently filtered through 0.45-μm
79
polycarbonate membranes (Merck Millipore, USA). The filtrate was sequentially
80
filtered with 0.2-μm and 0.1-μm Isopore polycarbonate membrane filters (Merck
81
Millipore, USA) within 4 hours. The biomass that passed through the 0.2-μm
82
membranes and was retained on the 0.1-μm membranes was then used for DNA
83
extraction. Briefly, the 0.1-μm membranes for each sample were equally divided into
84
two groups (ten membranes for each group) and used as technical replicates, termed
85
ST1, ST2, SWH1, and SWH2. Considering the sample pretreatment approach, the
86
present study enriched microbes with sizes between 0.1 and 0.2 μm. For the viral
87
community, viruses with sizes >0.2 or 0.2 μm were not included in the present study. The genomic DNA was extracted
89
from these four samples using the MoBio PowerSoil DNA Isolation Kit (MO BIO
90
Laboratories, USA) following the manufacturer's instructions. DNA concentrations 4
ACS Paragon Plus Environment
Environmental Science & Technology
91
were measured using a NanoDrop spectrophotometer (ND-1000, Wilmington, US), and
92
the DNA extracts were stored at -20°C for metagenomic sequencing to recover the viral
93
sequences.
94
In total, 98 AS samples, collected monthly (from June 2, 2007 to December 3, 2015;
95
Supplementary Table S1), were obtained from the same aeration tank of the ST WWTP
96
to investigate the temporal changes in the viral populations in the AS. All the AS
97
samples were collected by mixing 1:1 with absolute ethanol (VAS:Vabsolute) and stored
98
at -20°C.21 A 1-mL aliquot of an AS sample that was not filtered through membrane
99
was used for total DNA extraction.
100
Metagenomic Sequencing and De Novo Assembly
101
Metagenomic DNA samples from the biomass (ST1, ST2, SWH1, and SWH2) that
102
passed through the 0.2-μm membrane but was retained by the 0.1-μm membrane and
103
from the AS collected monthly were sequenced on Illumina HiSeq 4000 platform (150-
104
bp paired-end reads, 350-bp insert size). The sequencing amounts for ST1, ST2, SWH1,
105
and SWH2 were 14.8, 16.7, 15.1 and 16.9 Gb, respectively. For the AS samples
106
collected monthly, the sequencing amounts ranged from 3.5 to 7.7 Gb with an average
107
data size of 5.6 Gb (Supplementary Table S1). All the raw sequencing data were
108
trimmed using the CLC Genomic Workbench (version 6.04, QIAGEN Bioinformatics,
109
Denmark) with a quality score limit of 0.01 and the number of ambiguities set to 2. The
110
clean shotgun sequencing data for the ST samples (ST1 and ST2) and SWH samples
111
(SWH1 and SWH2) were separately de novo coassembled using CLC’s de novo
112
assembly algorithm with a k-mer of 35 and minimum scaffold length of 1 kb. The
113
assembled contigs were used for downstream viral sequence identification.
114
Identification of Metagenomic Viral Contigs
115
Two different methods (i.e., a modified DNA vHMM virus detection pipeline 22 and
116
phylogeny-based pipeline23) were used to identify the metagenomic viral contigs 5
ACS Paragon Plus Environment
Page 12 of 32
Page 13 of 32
Environmental Science & Technology
117
(mVCs) in the samples from the ST and SWH WWTPs. For the DNA vHMM virus
118
detection pipeline, genes in assembled contigs were annotated using hmmsearch24
119
against the viral protein families22 and Pfam HMMs (31.0, -cut_nc mode)25. To check
120
for genes that could be annotated with KEGG Orthology (KO) clusters, all the predicted
121
genes were searched against the KEGG gene database (Release 80.1) using
122
DIAMOND.26 The contigs that had at least 5 hits with viral protein families and with
123
lengths >5 kb were extracted and further used to identify mVCs. The criteria proposed
124
by Paez-Espino et al. were modified and used in the present study (Supplementary
125
Methods, S1). For phylogeny-based identification of mVCs, all the assembled contigs
126
were searched against the NCBI nr database using DIAMOND with an e-value of 1e-
127
5.26 MEGAN627 was then used to assign assembled contigs to taxa, 23 and contigs
128
assigned to viruses were identified as mVCs. The mVCs identified by these two
129
pipelines were combined and only contigs longer than 5 kb were used for further study.
130
The accuracy of the methods used in this study was evaluated with a mock community
131
(Supplementary Methods, S1, Table S2).
132
Putative viral sequences in the metagenomes of ST and SWH were also predicted by
133
using CyVerse28 implementation of VirSorter.29 All the mVCs identified in the present
134
study were searched against the Silva ribosomal SSU database (release 119)30 using
135
BLASTn31 with an e-value of 1e-5 to check whether these newly identified mVCs carry
136
16S rRNA genes. In addition, the mVCs were annotated using hmmsearch24 against the
137
HMM database of 107 conserved, single-copy, bacterial genes.32 A survey of 16S rRNA
138
gene sequences and conserved bacterial proteins was used to further check microbial
139
contamination of the identified mVCs.
140
Similarity Comparison and Clustering
141
The sequence similarity between the identified mVCs in this study and two viral protein
142
databases (i.e., the NCBI RefSeq and IMG/VR virus databases) were computed using
143
BLASTp31 with an e-value of 1e-5. Only the best hits from the BLAST results, with >80% 6
ACS Paragon Plus Environment
Environmental Science & Technology
144
alignment of the shorter of the two sequences, were selected as valid homologous
145
proteins.22 The proportion of the identified proteins encoded by the mVCs represented
146
the novelty level, and the mVCs were grouped into three categories, high novelty (75% of the genes were covered by the
149
database). In addition, pairwise average amino acid identity (AAI) comparison of the
150
mVCs was performed using CompareM (https://github.com/dparks1134/CompareM).
151
The R package and genoPlotR33 were used to generate genome comparison maps to
152
visualize the comparisons of mVCs (Supplementary Methods, S2).
153
Gene-content based network analysis was used for mVC clustering. Complete viral
154
genomes in the NCBI viral database that shared >30% AAIs and had >20% similarity
155
coverage (shared homologous gene ratio between genomes from the NCBI viral
156
database and newly identified mVCs) with newly identified mVCs were selected as
157
reference viral genomes. The merged reference viral genomes and the newly identified
158
mVCs were then used for clustering.17, 34 The protein clusters were defined using MCL
159
(the Markov cluster algorithm) based on the results of the all-vs-all BLASTp gene
160
comparison.34 The VCs were then defined using vConTact with the predefined protein
161
clusters.34 The mVCs that were not assigned to any clusters were classified as singletons
162
(Supplementary Methods, S2).
163
Relative Abundances of mVCs and Viral Clusters in Filtered ST and SWH
164
Samples
165
The relative abundances of identified ST and SWH mVCs were estimated using the
166
mapping results. The metagenomic reads of samples from ST and SWH were mapped
167
to the identified ST and SWH mVCs, respectively, with a similarity fraction of 90%
168
over a 70% read length.17 Mapping was performed with random nonspecific matches if
169
a read aligned to more than one position with equally good scores. The mapping results
170
were then used to calculate the average coverage of mVCs (the number of nucleotides 7
ACS Paragon Plus Environment
Page 14 of 32
Page 15 of 32
Environmental Science & Technology
171
mapped to the mVCs divided by the mVC length). The average coverage was further
172
normalized to the size of the dataset and recorded as the average normalized coverage,17
173
and this value used to determine the relative abundance against the cumulative coverage
174
of all the identified mVCs. The relative abundances of viral clusters were calculated
175
based on the relative abundances of the viral members of the clusters.
176
Temporal and Spatial Distribution Patterns of Viruses in WWTPs
177
The metagenomic datasets for the 98 (~9 years) ST AS samples collected monthly were
178
used to profile the temporal dynamics of the identified ST mVCs. The shotgun reads of
179
these AS samples were mapped to the identified ST mVCs using the strategy mentioned
180
in the previous section. An mVC was considered to have been detected if more than 75%
181
of the length was covered by the mapped shotgun reads.17 The average normalized
182
coverage values of the identified mVCs were used to profile relative abundances among
183
different AS samples and viewed as stacked graphs.35
184
To study the viral biogeographic traits, sequences of viruses recovered from wastewater
185
treatment systems, full-scale WWTPs, and wastewater treatment bioreactors were
186
retrieved from the IMG/VR database.36 A total of 5,868 mVCs (Supplementary Table
187
S3) were collected from 101 metagenomes that originated from nine different projects
188
(locations). The public mVCs36 were combined with the mVCs obtained in this study
189
and with 283 reference viral genomes from the NCBI database37 (selected based on the
190
criteria mentioned in the section in viral clustering) for viral clustering. The number of
191
shared clusters between the different wastewater treatment systems was visualized
192
using the R packages “maps” and “geosphere”.
193
Nucleotide Sequence Accession Numbers
194
The nucleotide sequence data used in the present study have been deposited in the NCBI
195
database under the project ID PRJNA432264.
196
RESULTS 8
ACS Paragon Plus Environment
Environmental Science & Technology
197
Viral Populations
198
The integrated phylogeny-based and DNA vHMM virus detection pipeline achieved a
199
precision of 100% and a recall rate of 91.7% based on the mock community in this
200
study. Using this integrated pipeline, we identified 2,806 and 5,672 mVCs (>5 kb) from
201
the ST and SWH assembled contigs (Supplementary Table S4), respectively, which
202
increased the total number of mVCs from engineered systems by 21.2% (IMG/VR,
203
January, 2018). Approximately 45 and 42% of the assembled contigs with length >5 kb
204
in ST and SWH were identified as viral sequences. The maximum lengths of the mVCs
205
in the ST and SWH samples were 200,733 bp and 269,614 bp, respectively (Figure 1).
206
The number of mVCs identified by the vHMM pipeline were approximately 10- and 7-
207
fold greater than that identified by the phylogeny-based pipeline (Supplementary Table
208
S4) for the ST and SWH samples, respectively.
209
None of the newly identified mVCs carried the 16S rRNA gene sequence. 4 and 3
210
mVCs from the ST and SWH samples, respectively, carried genes encoding conserved
211
bacterial proteins, which were confirmed to be viral sequences by VirSorter29 and
212
PHAST38. In terms of the novelty of these recovered mVCs (Figure 2), only
213
approximately 1 and 5% of the mVCs exhibited high coverage (>75% of the genes of
214
the mVCs) with the viral RefSeq and vHMM viral protein databases, respectively.
215
Approximately 60% of the newly recovered mVCs were high-novelty sequences that
216
exhibited poor coverage (2% (Supplementary Figure S5). Additionally, the calculated Shannon-
257
Wiener diversity indices based on all the identified viral clusters from ST (5.09) and
258
SWH (5.76) indicated that the viral diversity in the SWH samples was higher than that
259
in the ST samples (Mann-Whitney P-value = 2.4 × 10−20). Furthermore, the Pielou's
260
evenness indices for ST (0.80) and SWH (0.86) indicated that the viral evenness of the
261
SWH samples was also higher than that of the ST samples.
262
Temporal Dynamics of the Identified Viral Community
263
We obtained 545 Gb of metagenomic sequence data from the 98 AS samples collected
264
monthly from the ST WWTP. The reads from these AS samples were mapped to the
265
identified ST mVCs to profile the temporal dynamics of the identified viral community
266
in the ST WWTP (Table S6). The recruited reads from these newly identified ST mVCs
267
accounted for only 0.02 to 0.35% among the different sequenced ST AS samples. Based
268
on the mapping results, 481 mVCs were presented at least once with a normalized
269
coverage of more than 1.0 in the monthly collected AS samples (Supplementary Table
270
S7). Although the complex AS samples were sequenced at a relatively low sequencing
271
depth, 17 mVCs were identified from more than 50% of the AS samples
272
(Supplementary Table S8). This result suggests that some of the identified viruses were
273
persistent viral populations in the ST WWTP.
274
The sums of the average normalized coverage values (Supplementary Figures S6 and
275
S7) were used to compare the changes in the identified viral populations among the
276
different AS samples. Marked fluctuations were observed from June 2007 to December
277
2015, and the relative abundances of the identified viral populations exhibited an annual 11
ACS Paragon Plus Environment
Page 18 of 32
Page 19 of 32
Environmental Science & Technology
278
increase and decrease. In general, these identified viruses with sizes between 0.1 and
279
0.2 μm thrived during the winter season in Hong Kong (i.e., at the end or beginning of
280
each year), and collapse was detected following the transient increase (Supplementary
281
Figure S6, Table S9). The average interval between the peaks of these identified mVCs
282
in the AS systems was 341 days (Supplementary Figure S7), which was generally
283
consistent with the length of a calendar year. The dynamics of the 13 core mVCs that
284
had average normalized coverage values >1.0 in all 98 AS samples demonstrated a
285
highly traceable cyclical development process in the AS system (Figure 5). Most of the
286
abundant peaks for these core viral populations were observed in January or February,
287
which was consistent with the variation patterns of viruses with an average coverage of
288
greater than 0.5 and viral clusters (Supplementary Figure S8, Figure S9).
289
Although the identified mVC pattern had minor variations in the first 3 years (2007-
290
2009), the relative abundance pattern of the identified mVCs changed drastically in the
291
subsequent years. Only 138 mVCs were detected in the first 3 years, whereas a total of
292
543 mVCs were detected at the beginning of 2010, which was almost four times the
293
previous count. In addition to the relatively low abundance of the identified mVCs/viral
294
clusters in the first 3 years (Figure 5 and Supplementary Figures 7, 9 and 10), following
295
a rapid increase and decrease in February 2009, the relative abundances of the detected
296
viruses remained constant until the beginning of 2010. Among the 2,806 identified
297
mVCs in the ST WWTP, 33 mVCs were once found to be the most dominant viruses in
298
the 98 AS samples (Table S10), indicating that the identified mVCs changed over time.
299
Among the dominant viruses identified, ST-6919 and ST-79 were two widely
300
distributed viruses, dominating 13 and 10 AS samples, respectively. As shown in Figure
301
5, ST-6919 was first detected in 2010 and was the most dominant virus among 13 of
302
the subsequent AS samples. ST-79, in contrast, was detectable in the early samples and
303
gradually became a less dominant viral population after 2010.
304
Biogeographic Pattern of Viruses in Wastewater Treatment Systems 12
ACS Paragon Plus Environment
Environmental Science & Technology
305
With the addition of public mVCs that were retrieved from different wastewater
306
treatment systems, a total of 51,678 protein clusters and 1,791 viral clusters were
307
defined from the 14,629 mVCs and complete reference viral genomes. The number of
308
identified viral clusters varied considerably in these studied wastewater treatment
309
systems, ranging from 8 to 1,029. As a result, only one viral cluster could be found in
310
over 70% of the samples. However, this widespread viral cluster (present in >70% of
311
the samples) was not an abundant viral population in the SWH samples (0.045% of the
312
total viral population) and could not be identified in the ST samples.
313
Although none of the newly identified viral clusters were shared by all 10 studied
314
wastewater treatment systems, shared viral clusters were observed between similar
315
sections/niches of any two wastewater treatment systems (Figure 6, Supplementary
316
Table S11 and S12). Notably, mVCs derived from the ST and SWH AS samples shared
317
more than 200 viral clusters with those retrieved from the effluent of a WWTP (in
318
Wisconsin, USA) and an enhanced biological phosphorus removal reactor (in
319
Queensland, Australia), accounting for more than 40% of the identified viral cluster in
320
these two public datasets. In contrast, only 1.4 to 25.7% of the viral clusters in the
321
anaerobic digester samples were shared with the AS or effluent samples. Only 15 and
322
18 viral clusters of the ST and SWH WWTPs, respectively, were shared with viral
323
clusters (70) from a sludge sample from an anaerobic digester (Illinois, USA).
324
DISCUSSION
325
A total of 8,478 novel mVCs were recovered in this study, expanding the current viral
326
database (IMG/VR) with new viral sequences from WWTPs. The proportion (~60%)
327
of low-coverage mVCs in the studied WWTPs was higher than that of the mVCs (~24%)
328
mainly identified from soil, plant-associated, and engineered system.22 In addition, the
329
low sequence similarity between the newly identified mVCs from this study and the
330
reference viral genomes was consistent with a previous study on viruses in WWTPs,8
331
indicating that the viral communities in WWTPs are still poorly covered by current 13
ACS Paragon Plus Environment
Page 20 of 32
Page 21 of 32
Environmental Science & Technology
332
databases. This observation may be explained by the result that the sequence similarity-
333
based (phylogeny-based) pipeline recovered less mVCs than the vHMM pipeline,
334
indicating that the vHMM pipeline could capture highly divergent viral sequences.22
335
According to our previous study, the identified bacterial OTUs and diversity in the
336
SWH AS sample was higher than that in the ST samples,39 which was consistent with
337
the high alpha diversity and evenness of the identified viruses and viral clusters in the
338
SWH AS system. The gene comparisons of the ST and SWH mVCs demonstrated that
339
geographically isolated systems may harbor virome with high AAIs, while the
340
differences in the genetic arrangements or inserted gene fragments identified from those
341
viral sequences with high similarities provided evidence that viruses in WWTPs may
342
exhibit high divergence and intense horizontal gene transfer. The identified large
343
connected component of the order Caudovirales confirmed homologous gene sharing
344
among these newly identified mVCs,34 which may further indicate the occurrence of
345
horizontal gene transfer among viruses of the order Caudovirales in WWTPs. However,
346
how these DNA viruses influence microbial diversity and evolution remains to be
347
studied.
348
Although many time-series studies of the virome in the human gut,40, 41 marine water,42-
349
44
350
might be the longest temporal dynamic study of the virome (0.1-0.2 μm) in AS samples.
351
This study provided unparalleled insight into the dynamics of viruses in an engineered
352
system. The regularly increasing and decreasing trends were consistent with the
353
previous conclusion that viruses undergo cyclical development.47 Studies have shown
354
that viruses are more abundant under conditions that favor bacterial or archaeal growth
355
20, 48, 49
356
with periodic changes in microbial hosts in the AS system. For the studied ST WWTP,
357
notably, the operational problems of bulking and foaming occurred in the winter. Our
358
previous study that investigated the changes of bulking and foaming bacteria in 58 AS
soils,45 bioreactors,45 and WWTPs46 have been reported in recent decades, our study
, indicating that this cyclical pattern of the identified viruses may be associated
14
ACS Paragon Plus Environment
Environmental Science & Technology
359
samples (same AS samples as those the present study) over five years (2007-2012)
360
revealed seasonal variations among the bulking and foaming bacteria.50 Generally, the
361
overall abundances of bulking and foaming bacteria increased in the early winter of
362
each year, and was followed by a bloom of identified viral populations. Therefore, the
363
cyclical patterns of the identified viruses in the present study might correlate with the
364
seasonal dynamics of bulking and foaming bacteria.
365
In addition to the cyclical development of the identified viral community, the viral
366
community structure changed significantly over time, as most of the identified mVCs
367
or viral clusters were transiently detected among the temporal AS samples. Considering
368
the same sampling point and protocol for the temporal AS samples, the continuously
369
alternating viral populations of the identified viruses might indicate changes in the
370
microbial communities in the AS systems. Although the community structures of the
371
virome varied significantly among different cycles, the relative abundances (average
372
normalized coverage) of the identified mVCs at the high-abundance points showed
373
similar values, except during the first 3 years, indicating that the ecological pattern of
374
viruses in AS tends to be consistent. A change of hydraulic retention time (HRT) in the
375
ST WWTP was recorded in 2011, which could not explain the significant shifts in the
376
viral community between 2009 and 2010. Given the dramatic changes in viral
377
abundance, the dominant populations in the first 3 years might have become rare
378
populations that were not recovered from the datasets during virome mining, which
379
might explain why the relative abundances of the viruses in the first 3 years were much
380
lower than those found in other samples.
381
Saunders et al. studied spatially and temporally distributed AS samples and discovered
382
the existence of a core community among AS ecosystems.51 This finding may be
383
consistent with our finding that a similar sample types from different wastewater
384
treatment systems tends to harbor high numbers of shared viral clusters, suggesting the
385
existence of globally generalizable viral community in different parts of wastewater 15
ACS Paragon Plus Environment
Page 22 of 32
Page 23 of 32
Environmental Science & Technology
386
treatment systems. However, none of the identified viral clusters were shared by all the
387
studied wastewater treatment systems, indicating that differences in samples types, e.g.,
388
aerobic vs. anaerobic, affected both viral and microbial communities. As the first report
389
of spatial distribution of viruses in wastewater treatment systems, our study was
390
significantly limited by the current public databases. The number of identified mVCs
391
in different samples varied from hundreds to thousands, which might be another reason
392
why globally shared viral clusters were not found in wastewater treatment systems.
393
Considering the limitation of the databases and the incomplete viral populations
394
recovered in the present study, the features associated with viral biogeography in
395
wastewater treatment systems need further investigation to decipher the comprehensive
396
spatial distribution of the virome.
397
In summary, this study provided new insights into the temporal and spatial patterns of
398
viruses in wastewater treatment systems. The identified annual dynamics and
399
transitions of dominant viral populations may support the idea that viruses in AS
400
systems have significant effects on the structure of the microbial community. The core
401
bacterial community among geographically isolated wastewater treatment systems may
402
be related to widespread viral clusters.
403
However, it should be noted that the discussed mVCs were predicted using a
404
bioinformatic approach. Despite the high recall rate of viral sequences in the synthetic
405
metagenome, the accuracy rate for complex environmental metagenome remains
406
unknown. The sample volumes used to assess the temporal variations were ~1 ml,
407
which might not represent the whole viral community in the AS system. In addition, the
408
method used in the present study recovered only viruses with sizes between 0.1 and 0.2
409
μm, and the shifts in viral community were studied based on the relative abundances of
410
identified viral sequences. Therefore, the present study provided a partial scope of the
411
total viruses in the studied WWTPs, and the viral dynamics and distributions in
412
wastewater treatment systems should be further explored in future studies. 16
ACS Paragon Plus Environment
Environmental Science & Technology
413
ASSOCIATED CONTENT
414
Supporting Information. Methodologies for identification of metagenomic viral
415
contigs (mVCs) and similarity comparison and clustering. Figures showing size
416
distribution of identified viral cluster; genome comparison between two mVCs (ST-100
417
and SWH-60036); protein-sharing network of all identified mVCs and reference viral
418
sequences; relative abundance of mVCs and top 30 abundant viral clusters; temporal
419
dynamics of all identified mVCs, core viral populations (average coverage > 0.5) and
420
all viral clusters. Tables (xlsx) showing brief information of monthly collected samples;
421
summary of mock community; summary of the collected viral sequences from WWTPs
422
and wastewater treatment systems; summary and relative abundance of identified
423
mVCs from ST and SWH WWTPs; widespread mVCs and dynamics changes of the
424
identified mVCs in ST samples collected monthly; summary of the viral clusters
425
identified in the studied wastewater treatment systems and ratio of the shared viral
426
clusters among these wastewater treatment systems.
427
AUTHOR INFORMATION
428
Corresponding Author
429
*Tel: 852-28578551; Fax 852-25595337; E-mail:
[email protected] 430
ORCID
431
Tong Zhang: 0000-0003-1148-4322
432
Notes
433
The authors declare no conflict of interest.
434
ACKNOWLEDGMENTS
435
This work was supported by the Hong Kong General Research Fund (172057/15E). Mr.
436
Yulin Wang, Dr. Xiaotao Jiang and Mr. Lei Liu wish to thank the University of Hong 17
ACS Paragon Plus Environment
Page 24 of 32
Page 25 of 32
Environmental Science & Technology
437
Kong for postgraduate scholarships. Technical support from Ms. Vicky Fung is greatly
438
appreciated. We also thank the reviewers for their constructive comments.
439
18
ACS Paragon Plus Environment
Environmental Science & Technology
440
REFERENCES
441 442
1. Breitbart, M.; Rohwer, F. Here a virus, there a virus, everywhere the same virus? Trends Microbiol. 2005, 13, (6), 278-284.
443 444
2. Rohwer, F.; Thurber, R. V. Viruses manipulate the marine environment. Nature 2009, 459, (7244), 207.
445 446 447
3. Joo, J.; Gunny, M.; Cases, M.; Hudson, P.; Albert, R.; Harvill, E. Bacteriophagemediated competition in Bordetella bacteria. Proc. R. Soc. Lond., B, Biol. Sci. 2006, 273, (1595), 1843-1848.
448 449 450
4. Koskella, B.; Brockhurst, M. A. Bacteria-phage coevolution as a driver of ecological and evolutionary processes in microbial communities. FEMS Microbiol. Rev. 2014, 38, (5), 916-31.
451 452 453 454
5. Nielsen, P. H.; Saunders, A. M.; Hansen, A. A.; Larsen, P.; Nielsen, J. L. Microbial communities involved in enhanced biological phosphorus removal from wastewater— a model system in environmental biotechnology. Curr. Opin. Biotechnol. 2012, 23, (3), 452-459.
455 456
6. Ju, F.; Zhang, T. Bacterial assembly and temporal dynamics in activated sludge of a full-scale municipal wastewater treatment plant. ISME J. 2015, 9, (3), 683-695.
457 458
7. Daims, H.; Taylor, M. W.; Wagner, M. Wastewater treatment: a model system for microbial ecology. Trends Biotechnol. 2006, 24, (11), 483-489.
459 460 461
8. Tamaki, H.; Zhang, R.; Angly, F. E.; Nakamura, S.; Hong, P. Y.; Yasunaga, T.; Kamagata, Y.; Liu, W. T. Metagenomic analysis of DNA viruses in a wastewater treatment plant in tropical climate. Environ. Microbiol. 2012, 14, (2), 441-52.
462 463
9. Wommack, K. E.; Colwell, R. R. Virioplankton: viruses in aquatic ecosystems. Microbiol. Mol. Biol. Rev. 2000, 64, (1), 69-114.
464 465 466
10. Otawa, K.; Lee, S. H.; Yamazoe, A.; Onuki, M.; Satoh, H.; Mino, T. Abundance, diversity, and dynamics of viruses on microorganisms in activated sludge processes. Microb. Ecol. 2007, 53, (1), 143-152.
467 468
11. Wu, Q.; Liu, W.-T. Determination of virus abundance, diversity and distribution in a municipal wastewater treatment plant. Water Res. 2009, 43, (4), 1101-1109.
469 470 471 472
12. Sedmak, G.; Bina, D.; MacDonald, J.; Couillard, L. Nine-year study of the occurrence of culturable viruses in source water for two drinking water treatment plants and the influent and effluent of a wastewater treatment plant in Milwaukee, Wisconsin (August 1994 through July 2003). Appl. Environ. Microbiol. 2005, 71, (2), 1042-1050.
473 474
13. Carducci, A.; Morici, P.; Pizzi, F.; Battistini, R.; Rovini, E.; Verani, M. Study of the viral removal efficiency in a urban wastewater treatment plant. Water Sci. Technol. 2008, 19
ACS Paragon Plus Environment
Page 26 of 32
Page 27 of 32
Environmental Science & Technology
475
58, (4), 893-897.
476 477
14. Suttle, C. A. Environmental microbiology: Viral diversity on the global stage. Nat. Microbiol. 2016, 1, (11), 16205.
478 479
15. Seeman, N. C.; Belcher, A. M. Emulating biology: building nanostructures from the bottom up. Proc. Natl. Acad. Sci. U.S. A. 2002, 99 Suppl 2, 6451-5.
480 481 482 483 484
16. Abrahao, J.; Silva, L.; Silva, L. S.; Khalil, J. Y. B.; Rodrigues, R.; Arantes, T.; Assis, F.; Boratto, P.; Andrade, M.; Kroon, E. G.; Ribeiro, B.; Bergier, I.; Seligmann, H.; Ghigo, E.; Colson, P.; Levasseur, A.; Kroemer, G.; Raoult, D.; La Scola, B. Tailed giant Tupanvirus possesses the most complete translational apparatus of the known virosphere. Nat. Commun. 2018, 9, (1), 749.
485 486 487 488 489 490
17. Roux, S.; Brum, J. R.; Dutilh, B. E.; Sunagawa, S.; Duhaime, M. B.; Loy, A.; Poulos, B. T.; Solonenko, N.; Lara, E.; Poulain, J.; Pesant, S.; Kandels-Lewis, S.; Dimier, C.; Picheral, M.; Searson, S.; Cruaud, C.; Alberti, A.; Duarte, C. M.; Gasol, J. M.; Vaque, D.; Tara Oceans, C.; Bork, P.; Acinas, S. G.; Wincker, P.; Sullivan, M. B. Ecogenomics and potential biogeochemical impacts of globally abundant ocean viruses. Nature 2016, 537, (7622), 689-693.
491 492
18. Rohwer, F.; Edwards, R. The Phage Proteomic Tree: a genome-based taxonomy for phage. J. Bacteriol. 2002, 184, (16), 4529-4535.
493 494 495 496
19. Luef, B.; Frischkorn, K. R.; Wrighton, K. C.; Holman, H. Y.; Birarda, G.; Thomas, B. C.; Singh, A.; Williams, K. H.; Siegerist, C. E.; Tringe, S. G.; Downing, K. H.; Comolli, L. R.; Banfield, J. F. Diverse uncultivated ultra-small bacterial cells in groundwater. Nat. Commun. 2015, 6, 6372.
497 498
20. Chibani-Chennoufi, S.; Bruttin, A.; Dillmann, M. L.; Brussow, H. Phage-host interaction: an ecological perspective. J. Bacteriol. 2004, 186, (12), 3677-86.
499 500
21. Guo, F.; Zhang, T. Profiling bulking and foaming bacteria in activated sludge by high throughput sequencing. Water Res. 2012, 46, (8), 2772-2782.
501 502 503
22. Paez-Espino, D.; Eloe-Fadrosh, E. A.; Pavlopoulos, G. A.; Thomas, A. D.; Huntemann, M.; Mikhailova, N.; Rubin, E.; Ivanova, N. N.; Kyrpides, N. C. Uncovering Earth's virome. Nature 2016, 536, (7617), 425-30.
504 505 506 507
23. Gudbergsdottir, S. R.; Menzel, P.; Krogh, A.; Young, M.; Peng, X. Novel viral genomes identified from six metagenomes reveal wide distribution of archaeal viruses and high viral diversity in terrestrial hot springs. Environ. Microbiol. 2016, 18, (3), 86374.
508 509
24. Finn, R. D.; Clements, J.; Eddy, S. R. HMMER web server: interactive sequence similarity searching. Nucleic Acids Res. 2011, 39, (suppl_2), W29-W37.
510
25. Finn, R. D.; Coggill, P.; Eberhardt, R. Y.; Eddy, S. R.; Mistry, J.; Mitchell, A. L.; 20
ACS Paragon Plus Environment
Environmental Science & Technology
511 512 513
Potter, S. C.; Punta, M.; Qureshi, M.; Sangrador-Vegas, A. The Pfam protein families database: towards a more sustainable future. Nucleic Acids Res. 2016, 44, (D1), D279D285.
514 515
26. Buchfink, B.; Xie, C.; Huson, D. H. Fast and sensitive protein alignment using DIAMOND. Nat. Meth. 2015, 12, (1), 59-60.
516 517
27. Huson, D. H.; Auch, A. F.; Qi, J.; Schuster, S. C. MEGAN analysis of metagenomic data. Genome Res. 2007, 17, (3), 377-386.
518 519 520
28. Merchant, N.; Lyons, E.; Goff, S.; Vaughn, M.; Ware, D.; Micklos, D.; Antin, P. The iPlant collaborative: cyberinfrastructure for enabling data to discovery for the life sciences. PLoS Biol. 2016, 14, (1), e1002342.
521 522
29. Roux, S.; Enault, F.; Hurwitz, B. L.; Sullivan, M. B. VirSorter: mining viral signal from microbial genomic data. PeerJ 2015, 3, e985.
523 524 525 526
30. Quast, C.; Pruesse, E.; Yilmaz, P.; Gerken, J.; Schweer, T.; Yarza, P.; Peplies, J.; Glockner, F. O. The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Res. 2013, 41, (Database issue), D5906.
527 528 529
31. Camacho, C.; Coulouris, G.; Avagyan, V.; Ma, N.; Papadopoulos, J.; Bealer, K.; Madden, T. L. BLAST+: architecture and applications. BMC Bioinformatics 2009, 10, (1), 421.
530 531 532
32. Albertsen, M.; Hugenholtz, P.; Skarshewski, A.; Nielsen, K. L.; Tyson, G. W.; Nielsen, P. H. Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes. Nat. Biotechnol. 2013, 31, (6), 533-8.
533 534
33. Guy, L.; Roat Kultima, J.; Andersson, S. G. genoPlotR: comparative gene and genome visualization in R. Bioinformatics 2010, 26, (18), 2334-2335.
535 536 537
34. Bolduc, B.; Jang, H. B.; Doulcier, G.; You, Z.-Q.; Roux, S.; Sullivan, M. B. vConTACT: an iVirus tool to classify double-stranded DNA viruses that infect Archaea and Bacteria. PeerJ 2017, 5, e3243.
538 539
35. Byron, L.; Wattenberg, M. Stacked graphs--geometry & aesthetics. IEEE Trans. Vis. Comput. Graph. 2008, 14, (6), 1245-52.
540 541 542 543 544 545 546
36. Paez-Espino, D.; Chen, I. A.; Palaniappan, K.; Ratner, A.; Chu, K.; Szeto, E.; Pillay, M.; Huang, J.; Markowitz, V. M.; Nielsen, T.; Huntemann, M.; TB, K. R.; Pavlopoulos, G. A.; Sullivan, M. B.; Campbell, B. J.; Chen, F.; McMahon, K.; Hallam, S. J.; Denef, V.; Cavicchioli, R.; Caffrey, S. M.; Streit, W. R.; Webster, J.; Handley, K. M.; Salekdeh, G. H.; Tsesmetzis, N.; Setubal, J. C.; Pope, P. B.; Liu, W. T.; Rivers, A. R.; Ivanova, N. N.; Kyrpides, N. C. IMG/VR: a database of cultured and uncultured DNA Viruses and retroviruses. Nucleic Acids Res. 2017, 45, (D1), D457-D465. 21
ACS Paragon Plus Environment
Page 28 of 32
Page 29 of 32
Environmental Science & Technology
547 548
37. Brister, J. R.; Ako-adjei, D.; Bao, Y.; Blinkova, O. NCBI Viral Genomes Resource. Nucleic Acids Res. 2015, 43, (D1), D571-D577.
549 550
38. Zhou, Y.; Liang, Y.; Lynch, K. H.; Dennis, J. J.; Wishart, D. S. PHAST: a fast phage search tool. Nucleic Acids Res. 2011, 39, (Web Server issue), W347-52.
551 552
39. Zhang, T.; Shao, M. F.; Ye, L. 454 pyrosequencing reveals bacterial diversity of activated sludge from 14 sewage treatment plants. ISME J. 2012, 6, (6), 1137-47.
553 554
40. Wang, J.; Gao, Y.; Zhao, F. Phage-bacteria interaction network in human oral microbiome. Environ. Microbiol. 2016, 18, (7), 2143-58.
555 556 557
41. Minot, S.; Bryson, A.; Chehoud, C.; Wu, G. D.; Lewis, J. D.; Bushman, F. D. Rapid evolution of the human gut virome. Proc. Natl. Acad. Sci. U. S. A. 2013, 110, (30), 12450-5.
558 559 560
42. Needham, D. M.; Chow, C.-E. T.; Cram, J. A.; Sachdeva, R.; Parada, A.; Fuhrman, J. A. Short-term observations of marine bacterial and viral communities: patterns, connections and resilience. ISME J. 2013, 7, (7), 1274-1285.
561 562 563
43. Bouvier, T.; del Giorgio, P. A. Key role of selective viral-induced mortality in determining marine bacterial community composition. Environ. Microbiol. 2007, 9, (2), 287-97.
564 565 566
44. Brum, J. R.; Hurwitz, B. L.; Schofield, O.; Ducklow, H. W.; Sullivan, M. B. Seasonal time bombs: dominant temperate viruses affect Southern Ocean microbial dynamics. ISME J. 2016, 10, (2), 437-49.
567 568
45. Gomez, P.; Buckling, A. Coevolution with phages does not influence the evolution of bacterial mutation rates in soil. ISME J. 2013, 7, (11), 2242-4.
569 570 571 572
46. Katayama, H.; Haramoto, E.; Oguma, K.; Yamashita, H.; Tajima, A.; Nakajima, H.; Ohgaki, S. One-year monthly quantitative survey of noroviruses, enteroviruses, and adenoviruses in wastewater collected from six plants in Japan. Water Res. 2008, 42, (67), 1441-8.
573 574 575
47. Maurice, C. F.; Bouvier, T.; Comte, J.; Guillemette, F.; Del Giorgio, P. A. Seasonal variations of phage life strategies and bacterial physiological states in three northern temperate lakes. Environ. Microbiol. 2010, 12, (3), 628-41.
576 577 578 579
48. Mookerjee, S.; Jaiswal, A.; Batabyal, P.; Einsporn, M. H.; Lara, R. J.; Sarkar, B.; Neogi, S. B.; Palit, A. Seasonal dynamics of Vibrio cholerae and its phages in riverine ecosystem of Gangetic West Bengal: cholera paradigm. Environ. Monit. Assess. 2014, 186, (10), 6241-50.
580 581 582
49. Mookerjee, S.; Batabyal, P.; Sarkar, M. H.; Palit, A. Seasonal Prevalence of Enteropathogenic Vibrio and Their Phages in the Riverine Estuarine Ecosystem of South Bengal. PLoS ONE 2015, 10, (9), e0137338. 22
ACS Paragon Plus Environment
Environmental Science & Technology
583 584 585
50. Jiang, X. T.; Guo, F.; Zhang, T. Population Dynamics of Bulking and Foaming Bacteria in a Full-scale Wastewater Treatment Plant over Five Years. Sci. Rep. 2016, 6, 24180.
586 587 588
51. Saunders, A. M.; Albertsen, M.; Vollertsen, J.; Nielsen, P. H. The activated sludge ecosystem contains a core community of abundant organisms. ISME J. 2016, 10, (1), 11-20.
589
23
ACS Paragon Plus Environment
Page 30 of 32
Page 31 of 32
Environmental Science & Technology
590
FIGURE LEGENDS
591
Figure 1. Genome maps of the 2 longest mVCs in the ST and SWH WWTPs. All the
592
predicted genes with functions that could not be assigned are colored in gray, and
593
confirmed viral genes are labeled and colored in blue. If a gene had hits with viral and
594
bacterial proteins simultaneously, the gene was assigned based on e-value and coverage.
595
Genes with high coverage and e-value with bacterial proteins are labeled and colored
596
in burly wood. Length scales are shown at the bottom left.
597
Figure 2. Distribution of ST (blue) and SWH (orange) mVCs by length and coverage
598
with the NCBI RefSeq (a) and IMG viral protein databases (b). The mVCs with >70%,
599
35-70% and 1.0) in ST AS samples over 9 years. This visualization represents the average
613
normalized coverage (areas) of different mVCs (color). The sectional areas at specific
614
time points represent the average normalized coverage of mVCs. The normalized
615
coverage of all samples is summarized in this figure, but only 53 samples are labeled 24
ACS Paragon Plus Environment
Environmental Science & Technology
616
on the x-axis to make the figure easier to read.
617
Figure 6. Distribution of shared viral clusters among wastewater treatment systems. The
618
connecting lines indicate shared viral clusters that are found between two connected
619
samples, and the colors of the lines represent the number of shared viral clusters.
25
ACS Paragon Plus Environment
Page 32 of 32