High-Resolution Temporal and Spatial Patterns of Virome in

Aug 27, 2018 - *Phone: 852-28578551; fax 852-25595337; e-mail: [email protected]. ... the virome in the wastewater treatment systems showed that shared vi...
0 downloads 0 Views 19MB Size
Characterization of Natural published by the and Affected isEnvironments

American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Subscriber accessPublished providedby byAmerican Karolinska Chemical Society. Library Institutet, University Copyright © American Chemical Society. However, no copyright

High-Resolution Temporal and Spatial

Patterns of Virome in Wastewater is published by the American Chemical 1155 Sixteenth TreatmentSociety. Systems Street N.W., Washington,

Yulin Wang, Xiao-Tao DC 20036Jiang, Lei Published American Subscriber access provided Karolinska Liu, Bing Li, and TongbybyZhang Chemical Society. Institutet, University Library Copyright © American Chemical Society. However, no copyright

Environ. Sci. Technol., Just Accepted Manuscript • Publication Date (Web): 27 Aug 2018 is published by the American Chemical

DownloadedSociety. from http:// 1155 Sixteenth pubs.acs.org on Street August 2018 N.W.,27, Washington, DC 20036

Subscriber accessPublished providedby byAmerican Karolinska Chemical Society. Library Institutet, University Copyright © American Chemical Society. However, no copyright

Just Accepted

is published by the “Just Accepted” manuscripts have been pee American Chemical online prior to technical editing, formatting for Society. 1155 Sixteenth Street N.W., Washington, Society provides “Just Accepted” as a service DC 20036 of scientific material as soon as possible a Published by American Subscriber access provided by Karolinska Chemical Society. Library Institutet, University Copyright © American Chemical Society. However, no copyright

full in PDF format accompanied by an HTM peer reviewed, but should not be considere is published by the “Just Accep Digital Object Identifier (DOI®). American Chemical the “Just Accepted” Web may not inclu Society. 1155site Sixteenth Street N.W., Washington, a manuscript is technically edited and form DC 20036 siteSubscriber and published as anbyAmerican ASAP article. No accessPublished providedby Karolinska Chemical Society. Library Institutet, University Copyright © American Chemical Society. However, no copyright

to the manuscript text and/or graphics wh ethical guidelines that apply to the journal is published by the consequences arising from the use of infor American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Subscriber accessPublished providedby byAmerican Karolinska Chemical Society. Library Institutet, University Copyright © American Chemical Society. However, no copyright

Page 1 ofEnvironmental 32 Science & Technology Phylogeny based

Metagenomic sequencing

vHMM based

Metagenomic viral contigs (mVCs) IMG/VR

Temporal dynamic ST-79 ST-5704

ST-297 ST-5762

ST-372 ST-6919

ST-503 ST-11052

ST-1927 ST-12397

ST-2135 ST-15196

Spatial distribution

ST-5602

Normalized coverage

80

60

Mar4

Jul 5 Oct 19 Nov 9 Feb 20 May 21 Aug 18 Oct 18 Dec 15 Feb 13 Apr 17 Jul 9 Aug 24 Oct 19 Dec 15 Feb 25 Apr 20 Jun 21 Aug 17 Oct 12 Dec 14 Feb 16 Apr 20 Jun 21 Aug 16 Oct 21 Dec 17 Feb 15 May 30 Jul 23 Sep 24 Nov 27 Jan 22 Mar 21 May 30 Jul 22 Oct 3 Nov 27 Jan 27 Apr 1 Jun 3 Jul 28 Oct 3 Dec 3 Feb 3 Mar 25 May 29 Jul 30 Oct 7 Dec 3

Jan 4

Jan 12

0

Feb21

20

Jan 11

ACS Paragon Plus Environment

40

2007

2008

2009

2010

2011

Date

2012

2013

2014

2015

500 shared VCs 200 shared VCs 100 shared VCs 10 shared VCs

Environmental Science & Technology

Page 2 of 32

ase

uct

ed er

27

ge

gp2

in

lys

h

180 kb

e llag e lik n−

ST-724

200 kb

yla cos

se

ose

tein

pro

c glu P− 75 TD se gp1 e d rata ativ hyd put 6−de 4,

pu

160 kb

Co

140 kb

A

y sgl ran nt yca ogl r tid nso e Pepive sekinas at e put stidin hi

120 kb

ra ste hoe osp oph se tall iga Me Al 36 DN gp2tative

100 kb

e e has ativ ynt put e s ge ylat pha mid thy C1 ase tid

6

20 kb

e− tos hep nno a se −m ase lea −D ase cle nuc ero fer onu ndo d lycltrans e g n e e − −L osy 65 ativ 78 ing DP glyc gp1 put gp1 om

d oti elta cle it idbonu tein e n u r pro eas ub II ucl ike II slass cD s] n l e I o − e eas se t c se ex lu rot eranden ase R acil 5'−3' ina dp rm ym e bin e [B ive l a e p t e o e s t h ge 4 A p2−d com Pa puta pro p27 pha gp1 DN B1 Re AT g

pe p

80 kb

56

gp

20 kb

pha

se

era

pim

e 6−

se

tor fac ma g i e s s e tein rasecleas uclea 82 e nu on pro C:7e o d ke olym i d n l EG n Ae gas − p i e K l 5 0 I ANH ns 4 L A gp2 RNH T gp5 T4− DN

120 kb

140 kb

160 kb

180 kb

200 kb

220 kb

240 kb

260 kb

Figure 1. Genome maps of the 2 longest mVCs in the ST and SWH WWTPs. All the predicted genes with functions that could not be assigned are colored in gray, and confirmed viral genes are labeled and colored in blue. If a gene had hits with viral and bacterial proteins simultaneously, the gene was assigned based on e-value and coverage. Genes with high coverage and e-value with bacterial proteins are labeled and colored in burly wood. Length scales are shown at the bottom left.

ACS Paragon Plus Environment

SWH-4941

ase rat hyd ase e r ras −de me sfe 4,6 epi ran se ylt no se nt mo an era nde rba −mnsf pe Ca P−yDl tra D de GDcos NA gly tive a put se lea nuc ein ndo rot G e tion p YI ple Y− GI com ad He 1.95 e A s gp2 la DN ein t dro se Hy mina x pro r e Te ert n lv ein tei rta rot pro Po d p ion hea let jor omp Maad c He

cyt

or

50 kb

100 kb

tein pro er th m hea mono 9 il s 1 Ta eath in gp h il s prote Ta ube il t Ta e tas lA Xh duc sin in ase ein e re oly rote ucleprot phat heemcA p exoncase phos R tive heli e−di se ase a e/ d a le puitmas leosi e kinonuc pr nuc otid end o e rib cl NH ynu H ion polative rvat es se I put A starolas uclea III e e DN yd on as r h d e n das mi Ntnage e olym II e I roteinine a Ph A p n ras DN me like p−alase oly − −L ra A p rase oyl ime DN oesteuramnt ep sphtylm nde pho ace epe N− D−d se ola NA 19 ydr H2 gp3 Ah Fts RN in e l−t rase rot p acy me ino oly III ion am A p leasedivis se tha DNonuc cell e syn endativ ate put L sph oE pho Gr oyl se bam era car nsf tra lyl idy

60 kb

t fac ing ycl rec me oso Rib e S eas GR rel −P in PE e cha r ter tid po pep trans ase S lig MF NA 5' R 2'−

20 kb

se lea s nuc rase exaonsfe e k r −li ylt cB os ReGlyc

e tas ein mu rot dis ep e d lik i − x T 1 t ro Mu supe 04g8p73HAM p

Page 3 of 32

Environmental Science & Technology

b 100

ST mVCs SWH mVCs High−novelty

75

High−novelty

50

Medium−novelty Low−novetly

Medium−novelty Low−novelty

25

0 0

1e+05

2e+05

Length of metagenomic viral contigs (bp)

Genes covered by IMG viral protein (%)

Genes covered by NCBI viral protein (%)

a

100

ST mVCs

Medium−novelty

SWH mVCs

75

Low−novelty High−novelty Medium−novelty Low−novelty

50

High−novelty

25

0 0

1e+05

2e+05

Length of metagenomic viral contigs (bp)

Figure 2. Distribution of ST (blue) and SWH (orange) mVCs by length and coverage with the NCBI RefSeq (a) and IMG viral protein databases (b). The mVCs with >70%, 35-70% and 1.0) in ST AS samples over 9 years. This visualization represents the average normalized coverage (areas) of different mVCs (color). The sectional areas at specific time points represent the average normalized coverage of mVCs. The normalized coverage of all samples is summarized in this figure, but only 53 samples are labeled on the x-axis to make the figure easier to read.

ACS Paragon Plus Environment

Page 7 of 32

Environmental Science & Technology

500 shared Viral Clusters 200 shared Viral Clusters 100 shared Viral Clusters 10 shared Viral Clusters

Figure 6. Distribution of shared viral clusters among wastewater treatment systems. The connecting lines indicate shared viral clusters that are found between two connected samples, and the colors of the lines represent the number of shared viral clusters.

ACS Paragon Plus Environment

Environmental Science & Technology

1

High-Resolution Temporal and Spatial Patterns of Virome in

2

Wastewater Treatment Systems

3

Yulin Wang1, Xiaotao Jiang1, Lei Liu1, Bing Li1, 2, Tong Zhang1, *

4

1

Environmental Biotechnology Laboratory, Department of Civil Engineering, The University of Hong Kong, Pokfulam Road, Hong Kong

5 6

2

Division of Energy and Environment, Graduate School at Shenzhen, Tsinghua

7

University, China

8

E-mail [email protected]; Tel. 852-28578551; Fax 852-25595337.

9

1

ACS Paragon Plus Environment

Page 8 of 32

Page 9 of 32

Environmental Science & Technology

10

ABSTRACT:

11

Wastewater treatment plants (WWTPs) are considered reservoirs of viruses, but the

12

diversity and dynamic changes of viruses are not well understood. In this study, we

13

recovered 8,478 metagenomic viral contigs (mVCs; >5 kb) from two WWTPs (Shatin,

14

2,806; Shek Wu Hui, 5,672) in Hong Kong. Approximately 60% of the mVCs were

15

poorly covered (96% of DNA viruses in AS were reported to be

38

bacterial phages,8 these viruses may have an important impact on bacterial diversity,11-

39

13

40

Meanwhile, viral lysis influences the nutrient cycle in wastewater treatment systems.14

41

Taken together, these observations indicate that there is an urgent need for

42

comprehensive studies on the diversity and distribution of viruses in WWTPs.

43

It is impossible to develop universal culturing approaches and primers for naturally

44

occurring viruses.17, 18 As a result, special emphasis has been placed on the use of

45

cultivation-independent approaches for the identification of viruses. Shotgun

46

metagenomic approaches are now widely used to retrieve genomic information from

47

dsDNA viruses in environmental samples. Paez-Espino et al. (2016) reanalyzed more

48

than 5 Tb of 3,042 metagenomes in the Integrated Microbial Genomes with

49

Microbiome Samples (IMG/M) system and recovered more than 125,000 partial DNA

50

viral sequences, which provided great insight into viral diversity and habitat distribution.

51

Roux et al. (2016) analyzed the viral diversity and distribution in the oceans on a global

52

scale using metagenomics and suggested that viruses may play a key role in nutrient

53

cycling and trophic networks. These studies suggested that metagenomics could

54

provide an unprecedented method to mine viral populations in the sequenced

55

metagenomes of environmental samples with or without target viral enrichment.

56

Microbial viruses normally have diameter on the nanometer scale (i.e., 20 to 200-400

57

nm),15 and the virions of giant viruses (e.g., Tupanvirus) can reach up to 1.2 μm in

58

size.16 Luef et al. (2015) used 0.1-μm membrane filters to enrich ultrasmall microbes

59

in groundwater,19 indicating that filtration is a valid approach for the study of ultrasmall

60

microbes and may be used for the enrichment of viruses in environmental samples.

61

To gain insights into WWTPs, we used metagenomics to retrieve genomic information

62

regarding DNA viruses in AS samples. Because viral variation is a highly dynamic

63

process,20 short-term studies cannot reflect the shifts in the viral populations in complex

and may further affect the performance of biological treatment processes in WWTPs.

3

ACS Paragon Plus Environment

Page 10 of 32

Page 11 of 32

Environmental Science & Technology

64

environmental samples. In the present study, we assessed the temporal dynamics of the

65

newly identified viruses in multiyear samples and evaluated the spatial pattern of the

66

identified viruses in wastewater treatment systems. The objectives of this study were to

67

1) identify the viral populations in the AS systems of two WWTPs in Hong Kong,

68

namely, the Shatin (ST) and Shek Wu Hui (SWH) WWTPs; 2) track the temporal

69

distribution of identified viruses in AS samples collected monthly from the ST WWTP

70

over a period of ~9 years; and 3) profile the spatial distribution patterns of the virome

71

in different wastewater treatment systems. Taken together, the results of this study will

72

expand the current understanding of viral dynamics and distribution patterns in

73

wastewater treatment systems.

74

MATERIALS AND METHODS

75

Sampling and Sample Processing

76

Supernatants (3 L) of well-settled AS (30 min of settling) were individually collected

77

from the ST and SWH WWTPs (April 8, 2016). The collected samples were transferred

78

to the laboratory immediately, and subsequently filtered through 0.45-μm

79

polycarbonate membranes (Merck Millipore, USA). The filtrate was sequentially

80

filtered with 0.2-μm and 0.1-μm Isopore polycarbonate membrane filters (Merck

81

Millipore, USA) within 4 hours. The biomass that passed through the 0.2-μm

82

membranes and was retained on the 0.1-μm membranes was then used for DNA

83

extraction. Briefly, the 0.1-μm membranes for each sample were equally divided into

84

two groups (ten membranes for each group) and used as technical replicates, termed

85

ST1, ST2, SWH1, and SWH2. Considering the sample pretreatment approach, the

86

present study enriched microbes with sizes between 0.1 and 0.2 μm. For the viral

87

community, viruses with sizes >0.2 or 0.2 μm were not included in the present study. The genomic DNA was extracted

89

from these four samples using the MoBio PowerSoil DNA Isolation Kit (MO BIO

90

Laboratories, USA) following the manufacturer's instructions. DNA concentrations 4

ACS Paragon Plus Environment

Environmental Science & Technology

91

were measured using a NanoDrop spectrophotometer (ND-1000, Wilmington, US), and

92

the DNA extracts were stored at -20°C for metagenomic sequencing to recover the viral

93

sequences.

94

In total, 98 AS samples, collected monthly (from June 2, 2007 to December 3, 2015;

95

Supplementary Table S1), were obtained from the same aeration tank of the ST WWTP

96

to investigate the temporal changes in the viral populations in the AS. All the AS

97

samples were collected by mixing 1:1 with absolute ethanol (VAS:Vabsolute) and stored

98

at -20°C.21 A 1-mL aliquot of an AS sample that was not filtered through membrane

99

was used for total DNA extraction.

100

Metagenomic Sequencing and De Novo Assembly

101

Metagenomic DNA samples from the biomass (ST1, ST2, SWH1, and SWH2) that

102

passed through the 0.2-μm membrane but was retained by the 0.1-μm membrane and

103

from the AS collected monthly were sequenced on Illumina HiSeq 4000 platform (150-

104

bp paired-end reads, 350-bp insert size). The sequencing amounts for ST1, ST2, SWH1,

105

and SWH2 were 14.8, 16.7, 15.1 and 16.9 Gb, respectively. For the AS samples

106

collected monthly, the sequencing amounts ranged from 3.5 to 7.7 Gb with an average

107

data size of 5.6 Gb (Supplementary Table S1). All the raw sequencing data were

108

trimmed using the CLC Genomic Workbench (version 6.04, QIAGEN Bioinformatics,

109

Denmark) with a quality score limit of 0.01 and the number of ambiguities set to 2. The

110

clean shotgun sequencing data for the ST samples (ST1 and ST2) and SWH samples

111

(SWH1 and SWH2) were separately de novo coassembled using CLC’s de novo

112

assembly algorithm with a k-mer of 35 and minimum scaffold length of 1 kb. The

113

assembled contigs were used for downstream viral sequence identification.

114

Identification of Metagenomic Viral Contigs

115

Two different methods (i.e., a modified DNA vHMM virus detection pipeline 22 and

116

phylogeny-based pipeline23) were used to identify the metagenomic viral contigs 5

ACS Paragon Plus Environment

Page 12 of 32

Page 13 of 32

Environmental Science & Technology

117

(mVCs) in the samples from the ST and SWH WWTPs. For the DNA vHMM virus

118

detection pipeline, genes in assembled contigs were annotated using hmmsearch24

119

against the viral protein families22 and Pfam HMMs (31.0, -cut_nc mode)25. To check

120

for genes that could be annotated with KEGG Orthology (KO) clusters, all the predicted

121

genes were searched against the KEGG gene database (Release 80.1) using

122

DIAMOND.26 The contigs that had at least 5 hits with viral protein families and with

123

lengths >5 kb were extracted and further used to identify mVCs. The criteria proposed

124

by Paez-Espino et al. were modified and used in the present study (Supplementary

125

Methods, S1). For phylogeny-based identification of mVCs, all the assembled contigs

126

were searched against the NCBI nr database using DIAMOND with an e-value of 1e-

127

5.26 MEGAN627 was then used to assign assembled contigs to taxa, 23 and contigs

128

assigned to viruses were identified as mVCs. The mVCs identified by these two

129

pipelines were combined and only contigs longer than 5 kb were used for further study.

130

The accuracy of the methods used in this study was evaluated with a mock community

131

(Supplementary Methods, S1, Table S2).

132

Putative viral sequences in the metagenomes of ST and SWH were also predicted by

133

using CyVerse28 implementation of VirSorter.29 All the mVCs identified in the present

134

study were searched against the Silva ribosomal SSU database (release 119)30 using

135

BLASTn31 with an e-value of 1e-5 to check whether these newly identified mVCs carry

136

16S rRNA genes. In addition, the mVCs were annotated using hmmsearch24 against the

137

HMM database of 107 conserved, single-copy, bacterial genes.32 A survey of 16S rRNA

138

gene sequences and conserved bacterial proteins was used to further check microbial

139

contamination of the identified mVCs.

140

Similarity Comparison and Clustering

141

The sequence similarity between the identified mVCs in this study and two viral protein

142

databases (i.e., the NCBI RefSeq and IMG/VR virus databases) were computed using

143

BLASTp31 with an e-value of 1e-5. Only the best hits from the BLAST results, with >80% 6

ACS Paragon Plus Environment

Environmental Science & Technology

144

alignment of the shorter of the two sequences, were selected as valid homologous

145

proteins.22 The proportion of the identified proteins encoded by the mVCs represented

146

the novelty level, and the mVCs were grouped into three categories, high novelty (75% of the genes were covered by the

149

database). In addition, pairwise average amino acid identity (AAI) comparison of the

150

mVCs was performed using CompareM (https://github.com/dparks1134/CompareM).

151

The R package and genoPlotR33 were used to generate genome comparison maps to

152

visualize the comparisons of mVCs (Supplementary Methods, S2).

153

Gene-content based network analysis was used for mVC clustering. Complete viral

154

genomes in the NCBI viral database that shared >30% AAIs and had >20% similarity

155

coverage (shared homologous gene ratio between genomes from the NCBI viral

156

database and newly identified mVCs) with newly identified mVCs were selected as

157

reference viral genomes. The merged reference viral genomes and the newly identified

158

mVCs were then used for clustering.17, 34 The protein clusters were defined using MCL

159

(the Markov cluster algorithm) based on the results of the all-vs-all BLASTp gene

160

comparison.34 The VCs were then defined using vConTact with the predefined protein

161

clusters.34 The mVCs that were not assigned to any clusters were classified as singletons

162

(Supplementary Methods, S2).

163

Relative Abundances of mVCs and Viral Clusters in Filtered ST and SWH

164

Samples

165

The relative abundances of identified ST and SWH mVCs were estimated using the

166

mapping results. The metagenomic reads of samples from ST and SWH were mapped

167

to the identified ST and SWH mVCs, respectively, with a similarity fraction of 90%

168

over a 70% read length.17 Mapping was performed with random nonspecific matches if

169

a read aligned to more than one position with equally good scores. The mapping results

170

were then used to calculate the average coverage of mVCs (the number of nucleotides 7

ACS Paragon Plus Environment

Page 14 of 32

Page 15 of 32

Environmental Science & Technology

171

mapped to the mVCs divided by the mVC length). The average coverage was further

172

normalized to the size of the dataset and recorded as the average normalized coverage,17

173

and this value used to determine the relative abundance against the cumulative coverage

174

of all the identified mVCs. The relative abundances of viral clusters were calculated

175

based on the relative abundances of the viral members of the clusters.

176

Temporal and Spatial Distribution Patterns of Viruses in WWTPs

177

The metagenomic datasets for the 98 (~9 years) ST AS samples collected monthly were

178

used to profile the temporal dynamics of the identified ST mVCs. The shotgun reads of

179

these AS samples were mapped to the identified ST mVCs using the strategy mentioned

180

in the previous section. An mVC was considered to have been detected if more than 75%

181

of the length was covered by the mapped shotgun reads.17 The average normalized

182

coverage values of the identified mVCs were used to profile relative abundances among

183

different AS samples and viewed as stacked graphs.35

184

To study the viral biogeographic traits, sequences of viruses recovered from wastewater

185

treatment systems, full-scale WWTPs, and wastewater treatment bioreactors were

186

retrieved from the IMG/VR database.36 A total of 5,868 mVCs (Supplementary Table

187

S3) were collected from 101 metagenomes that originated from nine different projects

188

(locations). The public mVCs36 were combined with the mVCs obtained in this study

189

and with 283 reference viral genomes from the NCBI database37 (selected based on the

190

criteria mentioned in the section in viral clustering) for viral clustering. The number of

191

shared clusters between the different wastewater treatment systems was visualized

192

using the R packages “maps” and “geosphere”.

193

Nucleotide Sequence Accession Numbers

194

The nucleotide sequence data used in the present study have been deposited in the NCBI

195

database under the project ID PRJNA432264.

196

RESULTS 8

ACS Paragon Plus Environment

Environmental Science & Technology

197

Viral Populations

198

The integrated phylogeny-based and DNA vHMM virus detection pipeline achieved a

199

precision of 100% and a recall rate of 91.7% based on the mock community in this

200

study. Using this integrated pipeline, we identified 2,806 and 5,672 mVCs (>5 kb) from

201

the ST and SWH assembled contigs (Supplementary Table S4), respectively, which

202

increased the total number of mVCs from engineered systems by 21.2% (IMG/VR,

203

January, 2018). Approximately 45 and 42% of the assembled contigs with length >5 kb

204

in ST and SWH were identified as viral sequences. The maximum lengths of the mVCs

205

in the ST and SWH samples were 200,733 bp and 269,614 bp, respectively (Figure 1).

206

The number of mVCs identified by the vHMM pipeline were approximately 10- and 7-

207

fold greater than that identified by the phylogeny-based pipeline (Supplementary Table

208

S4) for the ST and SWH samples, respectively.

209

None of the newly identified mVCs carried the 16S rRNA gene sequence. 4 and 3

210

mVCs from the ST and SWH samples, respectively, carried genes encoding conserved

211

bacterial proteins, which were confirmed to be viral sequences by VirSorter29 and

212

PHAST38. In terms of the novelty of these recovered mVCs (Figure 2), only

213

approximately 1 and 5% of the mVCs exhibited high coverage (>75% of the genes of

214

the mVCs) with the viral RefSeq and vHMM viral protein databases, respectively.

215

Approximately 60% of the newly recovered mVCs were high-novelty sequences that

216

exhibited poor coverage (2% (Supplementary Figure S5). Additionally, the calculated Shannon-

257

Wiener diversity indices based on all the identified viral clusters from ST (5.09) and

258

SWH (5.76) indicated that the viral diversity in the SWH samples was higher than that

259

in the ST samples (Mann-Whitney P-value = 2.4 × 10−20). Furthermore, the Pielou's

260

evenness indices for ST (0.80) and SWH (0.86) indicated that the viral evenness of the

261

SWH samples was also higher than that of the ST samples.

262

Temporal Dynamics of the Identified Viral Community

263

We obtained 545 Gb of metagenomic sequence data from the 98 AS samples collected

264

monthly from the ST WWTP. The reads from these AS samples were mapped to the

265

identified ST mVCs to profile the temporal dynamics of the identified viral community

266

in the ST WWTP (Table S6). The recruited reads from these newly identified ST mVCs

267

accounted for only 0.02 to 0.35% among the different sequenced ST AS samples. Based

268

on the mapping results, 481 mVCs were presented at least once with a normalized

269

coverage of more than 1.0 in the monthly collected AS samples (Supplementary Table

270

S7). Although the complex AS samples were sequenced at a relatively low sequencing

271

depth, 17 mVCs were identified from more than 50% of the AS samples

272

(Supplementary Table S8). This result suggests that some of the identified viruses were

273

persistent viral populations in the ST WWTP.

274

The sums of the average normalized coverage values (Supplementary Figures S6 and

275

S7) were used to compare the changes in the identified viral populations among the

276

different AS samples. Marked fluctuations were observed from June 2007 to December

277

2015, and the relative abundances of the identified viral populations exhibited an annual 11

ACS Paragon Plus Environment

Page 18 of 32

Page 19 of 32

Environmental Science & Technology

278

increase and decrease. In general, these identified viruses with sizes between 0.1 and

279

0.2 μm thrived during the winter season in Hong Kong (i.e., at the end or beginning of

280

each year), and collapse was detected following the transient increase (Supplementary

281

Figure S6, Table S9). The average interval between the peaks of these identified mVCs

282

in the AS systems was 341 days (Supplementary Figure S7), which was generally

283

consistent with the length of a calendar year. The dynamics of the 13 core mVCs that

284

had average normalized coverage values >1.0 in all 98 AS samples demonstrated a

285

highly traceable cyclical development process in the AS system (Figure 5). Most of the

286

abundant peaks for these core viral populations were observed in January or February,

287

which was consistent with the variation patterns of viruses with an average coverage of

288

greater than 0.5 and viral clusters (Supplementary Figure S8, Figure S9).

289

Although the identified mVC pattern had minor variations in the first 3 years (2007-

290

2009), the relative abundance pattern of the identified mVCs changed drastically in the

291

subsequent years. Only 138 mVCs were detected in the first 3 years, whereas a total of

292

543 mVCs were detected at the beginning of 2010, which was almost four times the

293

previous count. In addition to the relatively low abundance of the identified mVCs/viral

294

clusters in the first 3 years (Figure 5 and Supplementary Figures 7, 9 and 10), following

295

a rapid increase and decrease in February 2009, the relative abundances of the detected

296

viruses remained constant until the beginning of 2010. Among the 2,806 identified

297

mVCs in the ST WWTP, 33 mVCs were once found to be the most dominant viruses in

298

the 98 AS samples (Table S10), indicating that the identified mVCs changed over time.

299

Among the dominant viruses identified, ST-6919 and ST-79 were two widely

300

distributed viruses, dominating 13 and 10 AS samples, respectively. As shown in Figure

301

5, ST-6919 was first detected in 2010 and was the most dominant virus among 13 of

302

the subsequent AS samples. ST-79, in contrast, was detectable in the early samples and

303

gradually became a less dominant viral population after 2010.

304

Biogeographic Pattern of Viruses in Wastewater Treatment Systems 12

ACS Paragon Plus Environment

Environmental Science & Technology

305

With the addition of public mVCs that were retrieved from different wastewater

306

treatment systems, a total of 51,678 protein clusters and 1,791 viral clusters were

307

defined from the 14,629 mVCs and complete reference viral genomes. The number of

308

identified viral clusters varied considerably in these studied wastewater treatment

309

systems, ranging from 8 to 1,029. As a result, only one viral cluster could be found in

310

over 70% of the samples. However, this widespread viral cluster (present in >70% of

311

the samples) was not an abundant viral population in the SWH samples (0.045% of the

312

total viral population) and could not be identified in the ST samples.

313

Although none of the newly identified viral clusters were shared by all 10 studied

314

wastewater treatment systems, shared viral clusters were observed between similar

315

sections/niches of any two wastewater treatment systems (Figure 6, Supplementary

316

Table S11 and S12). Notably, mVCs derived from the ST and SWH AS samples shared

317

more than 200 viral clusters with those retrieved from the effluent of a WWTP (in

318

Wisconsin, USA) and an enhanced biological phosphorus removal reactor (in

319

Queensland, Australia), accounting for more than 40% of the identified viral cluster in

320

these two public datasets. In contrast, only 1.4 to 25.7% of the viral clusters in the

321

anaerobic digester samples were shared with the AS or effluent samples. Only 15 and

322

18 viral clusters of the ST and SWH WWTPs, respectively, were shared with viral

323

clusters (70) from a sludge sample from an anaerobic digester (Illinois, USA).

324

DISCUSSION

325

A total of 8,478 novel mVCs were recovered in this study, expanding the current viral

326

database (IMG/VR) with new viral sequences from WWTPs. The proportion (~60%)

327

of low-coverage mVCs in the studied WWTPs was higher than that of the mVCs (~24%)

328

mainly identified from soil, plant-associated, and engineered system.22 In addition, the

329

low sequence similarity between the newly identified mVCs from this study and the

330

reference viral genomes was consistent with a previous study on viruses in WWTPs,8

331

indicating that the viral communities in WWTPs are still poorly covered by current 13

ACS Paragon Plus Environment

Page 20 of 32

Page 21 of 32

Environmental Science & Technology

332

databases. This observation may be explained by the result that the sequence similarity-

333

based (phylogeny-based) pipeline recovered less mVCs than the vHMM pipeline,

334

indicating that the vHMM pipeline could capture highly divergent viral sequences.22

335

According to our previous study, the identified bacterial OTUs and diversity in the

336

SWH AS sample was higher than that in the ST samples,39 which was consistent with

337

the high alpha diversity and evenness of the identified viruses and viral clusters in the

338

SWH AS system. The gene comparisons of the ST and SWH mVCs demonstrated that

339

geographically isolated systems may harbor virome with high AAIs, while the

340

differences in the genetic arrangements or inserted gene fragments identified from those

341

viral sequences with high similarities provided evidence that viruses in WWTPs may

342

exhibit high divergence and intense horizontal gene transfer. The identified large

343

connected component of the order Caudovirales confirmed homologous gene sharing

344

among these newly identified mVCs,34 which may further indicate the occurrence of

345

horizontal gene transfer among viruses of the order Caudovirales in WWTPs. However,

346

how these DNA viruses influence microbial diversity and evolution remains to be

347

studied.

348

Although many time-series studies of the virome in the human gut,40, 41 marine water,42-

349

44

350

might be the longest temporal dynamic study of the virome (0.1-0.2 μm) in AS samples.

351

This study provided unparalleled insight into the dynamics of viruses in an engineered

352

system. The regularly increasing and decreasing trends were consistent with the

353

previous conclusion that viruses undergo cyclical development.47 Studies have shown

354

that viruses are more abundant under conditions that favor bacterial or archaeal growth

355

20, 48, 49

356

with periodic changes in microbial hosts in the AS system. For the studied ST WWTP,

357

notably, the operational problems of bulking and foaming occurred in the winter. Our

358

previous study that investigated the changes of bulking and foaming bacteria in 58 AS

soils,45 bioreactors,45 and WWTPs46 have been reported in recent decades, our study

, indicating that this cyclical pattern of the identified viruses may be associated

14

ACS Paragon Plus Environment

Environmental Science & Technology

359

samples (same AS samples as those the present study) over five years (2007-2012)

360

revealed seasonal variations among the bulking and foaming bacteria.50 Generally, the

361

overall abundances of bulking and foaming bacteria increased in the early winter of

362

each year, and was followed by a bloom of identified viral populations. Therefore, the

363

cyclical patterns of the identified viruses in the present study might correlate with the

364

seasonal dynamics of bulking and foaming bacteria.

365

In addition to the cyclical development of the identified viral community, the viral

366

community structure changed significantly over time, as most of the identified mVCs

367

or viral clusters were transiently detected among the temporal AS samples. Considering

368

the same sampling point and protocol for the temporal AS samples, the continuously

369

alternating viral populations of the identified viruses might indicate changes in the

370

microbial communities in the AS systems. Although the community structures of the

371

virome varied significantly among different cycles, the relative abundances (average

372

normalized coverage) of the identified mVCs at the high-abundance points showed

373

similar values, except during the first 3 years, indicating that the ecological pattern of

374

viruses in AS tends to be consistent. A change of hydraulic retention time (HRT) in the

375

ST WWTP was recorded in 2011, which could not explain the significant shifts in the

376

viral community between 2009 and 2010. Given the dramatic changes in viral

377

abundance, the dominant populations in the first 3 years might have become rare

378

populations that were not recovered from the datasets during virome mining, which

379

might explain why the relative abundances of the viruses in the first 3 years were much

380

lower than those found in other samples.

381

Saunders et al. studied spatially and temporally distributed AS samples and discovered

382

the existence of a core community among AS ecosystems.51 This finding may be

383

consistent with our finding that a similar sample types from different wastewater

384

treatment systems tends to harbor high numbers of shared viral clusters, suggesting the

385

existence of globally generalizable viral community in different parts of wastewater 15

ACS Paragon Plus Environment

Page 22 of 32

Page 23 of 32

Environmental Science & Technology

386

treatment systems. However, none of the identified viral clusters were shared by all the

387

studied wastewater treatment systems, indicating that differences in samples types, e.g.,

388

aerobic vs. anaerobic, affected both viral and microbial communities. As the first report

389

of spatial distribution of viruses in wastewater treatment systems, our study was

390

significantly limited by the current public databases. The number of identified mVCs

391

in different samples varied from hundreds to thousands, which might be another reason

392

why globally shared viral clusters were not found in wastewater treatment systems.

393

Considering the limitation of the databases and the incomplete viral populations

394

recovered in the present study, the features associated with viral biogeography in

395

wastewater treatment systems need further investigation to decipher the comprehensive

396

spatial distribution of the virome.

397

In summary, this study provided new insights into the temporal and spatial patterns of

398

viruses in wastewater treatment systems. The identified annual dynamics and

399

transitions of dominant viral populations may support the idea that viruses in AS

400

systems have significant effects on the structure of the microbial community. The core

401

bacterial community among geographically isolated wastewater treatment systems may

402

be related to widespread viral clusters.

403

However, it should be noted that the discussed mVCs were predicted using a

404

bioinformatic approach. Despite the high recall rate of viral sequences in the synthetic

405

metagenome, the accuracy rate for complex environmental metagenome remains

406

unknown. The sample volumes used to assess the temporal variations were ~1 ml,

407

which might not represent the whole viral community in the AS system. In addition, the

408

method used in the present study recovered only viruses with sizes between 0.1 and 0.2

409

μm, and the shifts in viral community were studied based on the relative abundances of

410

identified viral sequences. Therefore, the present study provided a partial scope of the

411

total viruses in the studied WWTPs, and the viral dynamics and distributions in

412

wastewater treatment systems should be further explored in future studies. 16

ACS Paragon Plus Environment

Environmental Science & Technology

413

ASSOCIATED CONTENT

414

Supporting Information. Methodologies for identification of metagenomic viral

415

contigs (mVCs) and similarity comparison and clustering. Figures showing size

416

distribution of identified viral cluster; genome comparison between two mVCs (ST-100

417

and SWH-60036); protein-sharing network of all identified mVCs and reference viral

418

sequences; relative abundance of mVCs and top 30 abundant viral clusters; temporal

419

dynamics of all identified mVCs, core viral populations (average coverage > 0.5) and

420

all viral clusters. Tables (xlsx) showing brief information of monthly collected samples;

421

summary of mock community; summary of the collected viral sequences from WWTPs

422

and wastewater treatment systems; summary and relative abundance of identified

423

mVCs from ST and SWH WWTPs; widespread mVCs and dynamics changes of the

424

identified mVCs in ST samples collected monthly; summary of the viral clusters

425

identified in the studied wastewater treatment systems and ratio of the shared viral

426

clusters among these wastewater treatment systems.

427

AUTHOR INFORMATION

428

Corresponding Author

429

*Tel: 852-28578551; Fax 852-25595337; E-mail: [email protected]

430

ORCID

431

Tong Zhang: 0000-0003-1148-4322

432

Notes

433

The authors declare no conflict of interest.

434

ACKNOWLEDGMENTS

435

This work was supported by the Hong Kong General Research Fund (172057/15E). Mr.

436

Yulin Wang, Dr. Xiaotao Jiang and Mr. Lei Liu wish to thank the University of Hong 17

ACS Paragon Plus Environment

Page 24 of 32

Page 25 of 32

Environmental Science & Technology

437

Kong for postgraduate scholarships. Technical support from Ms. Vicky Fung is greatly

438

appreciated. We also thank the reviewers for their constructive comments.

439

18

ACS Paragon Plus Environment

Environmental Science & Technology

440

REFERENCES

441 442

1. Breitbart, M.; Rohwer, F. Here a virus, there a virus, everywhere the same virus? Trends Microbiol. 2005, 13, (6), 278-284.

443 444

2. Rohwer, F.; Thurber, R. V. Viruses manipulate the marine environment. Nature 2009, 459, (7244), 207.

445 446 447

3. Joo, J.; Gunny, M.; Cases, M.; Hudson, P.; Albert, R.; Harvill, E. Bacteriophagemediated competition in Bordetella bacteria. Proc. R. Soc. Lond., B, Biol. Sci. 2006, 273, (1595), 1843-1848.

448 449 450

4. Koskella, B.; Brockhurst, M. A. Bacteria-phage coevolution as a driver of ecological and evolutionary processes in microbial communities. FEMS Microbiol. Rev. 2014, 38, (5), 916-31.

451 452 453 454

5. Nielsen, P. H.; Saunders, A. M.; Hansen, A. A.; Larsen, P.; Nielsen, J. L. Microbial communities involved in enhanced biological phosphorus removal from wastewater— a model system in environmental biotechnology. Curr. Opin. Biotechnol. 2012, 23, (3), 452-459.

455 456

6. Ju, F.; Zhang, T. Bacterial assembly and temporal dynamics in activated sludge of a full-scale municipal wastewater treatment plant. ISME J. 2015, 9, (3), 683-695.

457 458

7. Daims, H.; Taylor, M. W.; Wagner, M. Wastewater treatment: a model system for microbial ecology. Trends Biotechnol. 2006, 24, (11), 483-489.

459 460 461

8. Tamaki, H.; Zhang, R.; Angly, F. E.; Nakamura, S.; Hong, P. Y.; Yasunaga, T.; Kamagata, Y.; Liu, W. T. Metagenomic analysis of DNA viruses in a wastewater treatment plant in tropical climate. Environ. Microbiol. 2012, 14, (2), 441-52.

462 463

9. Wommack, K. E.; Colwell, R. R. Virioplankton: viruses in aquatic ecosystems. Microbiol. Mol. Biol. Rev. 2000, 64, (1), 69-114.

464 465 466

10. Otawa, K.; Lee, S. H.; Yamazoe, A.; Onuki, M.; Satoh, H.; Mino, T. Abundance, diversity, and dynamics of viruses on microorganisms in activated sludge processes. Microb. Ecol. 2007, 53, (1), 143-152.

467 468

11. Wu, Q.; Liu, W.-T. Determination of virus abundance, diversity and distribution in a municipal wastewater treatment plant. Water Res. 2009, 43, (4), 1101-1109.

469 470 471 472

12. Sedmak, G.; Bina, D.; MacDonald, J.; Couillard, L. Nine-year study of the occurrence of culturable viruses in source water for two drinking water treatment plants and the influent and effluent of a wastewater treatment plant in Milwaukee, Wisconsin (August 1994 through July 2003). Appl. Environ. Microbiol. 2005, 71, (2), 1042-1050.

473 474

13. Carducci, A.; Morici, P.; Pizzi, F.; Battistini, R.; Rovini, E.; Verani, M. Study of the viral removal efficiency in a urban wastewater treatment plant. Water Sci. Technol. 2008, 19

ACS Paragon Plus Environment

Page 26 of 32

Page 27 of 32

Environmental Science & Technology

475

58, (4), 893-897.

476 477

14. Suttle, C. A. Environmental microbiology: Viral diversity on the global stage. Nat. Microbiol. 2016, 1, (11), 16205.

478 479

15. Seeman, N. C.; Belcher, A. M. Emulating biology: building nanostructures from the bottom up. Proc. Natl. Acad. Sci. U.S. A. 2002, 99 Suppl 2, 6451-5.

480 481 482 483 484

16. Abrahao, J.; Silva, L.; Silva, L. S.; Khalil, J. Y. B.; Rodrigues, R.; Arantes, T.; Assis, F.; Boratto, P.; Andrade, M.; Kroon, E. G.; Ribeiro, B.; Bergier, I.; Seligmann, H.; Ghigo, E.; Colson, P.; Levasseur, A.; Kroemer, G.; Raoult, D.; La Scola, B. Tailed giant Tupanvirus possesses the most complete translational apparatus of the known virosphere. Nat. Commun. 2018, 9, (1), 749.

485 486 487 488 489 490

17. Roux, S.; Brum, J. R.; Dutilh, B. E.; Sunagawa, S.; Duhaime, M. B.; Loy, A.; Poulos, B. T.; Solonenko, N.; Lara, E.; Poulain, J.; Pesant, S.; Kandels-Lewis, S.; Dimier, C.; Picheral, M.; Searson, S.; Cruaud, C.; Alberti, A.; Duarte, C. M.; Gasol, J. M.; Vaque, D.; Tara Oceans, C.; Bork, P.; Acinas, S. G.; Wincker, P.; Sullivan, M. B. Ecogenomics and potential biogeochemical impacts of globally abundant ocean viruses. Nature 2016, 537, (7622), 689-693.

491 492

18. Rohwer, F.; Edwards, R. The Phage Proteomic Tree: a genome-based taxonomy for phage. J. Bacteriol. 2002, 184, (16), 4529-4535.

493 494 495 496

19. Luef, B.; Frischkorn, K. R.; Wrighton, K. C.; Holman, H. Y.; Birarda, G.; Thomas, B. C.; Singh, A.; Williams, K. H.; Siegerist, C. E.; Tringe, S. G.; Downing, K. H.; Comolli, L. R.; Banfield, J. F. Diverse uncultivated ultra-small bacterial cells in groundwater. Nat. Commun. 2015, 6, 6372.

497 498

20. Chibani-Chennoufi, S.; Bruttin, A.; Dillmann, M. L.; Brussow, H. Phage-host interaction: an ecological perspective. J. Bacteriol. 2004, 186, (12), 3677-86.

499 500

21. Guo, F.; Zhang, T. Profiling bulking and foaming bacteria in activated sludge by high throughput sequencing. Water Res. 2012, 46, (8), 2772-2782.

501 502 503

22. Paez-Espino, D.; Eloe-Fadrosh, E. A.; Pavlopoulos, G. A.; Thomas, A. D.; Huntemann, M.; Mikhailova, N.; Rubin, E.; Ivanova, N. N.; Kyrpides, N. C. Uncovering Earth's virome. Nature 2016, 536, (7617), 425-30.

504 505 506 507

23. Gudbergsdottir, S. R.; Menzel, P.; Krogh, A.; Young, M.; Peng, X. Novel viral genomes identified from six metagenomes reveal wide distribution of archaeal viruses and high viral diversity in terrestrial hot springs. Environ. Microbiol. 2016, 18, (3), 86374.

508 509

24. Finn, R. D.; Clements, J.; Eddy, S. R. HMMER web server: interactive sequence similarity searching. Nucleic Acids Res. 2011, 39, (suppl_2), W29-W37.

510

25. Finn, R. D.; Coggill, P.; Eberhardt, R. Y.; Eddy, S. R.; Mistry, J.; Mitchell, A. L.; 20

ACS Paragon Plus Environment

Environmental Science & Technology

511 512 513

Potter, S. C.; Punta, M.; Qureshi, M.; Sangrador-Vegas, A. The Pfam protein families database: towards a more sustainable future. Nucleic Acids Res. 2016, 44, (D1), D279D285.

514 515

26. Buchfink, B.; Xie, C.; Huson, D. H. Fast and sensitive protein alignment using DIAMOND. Nat. Meth. 2015, 12, (1), 59-60.

516 517

27. Huson, D. H.; Auch, A. F.; Qi, J.; Schuster, S. C. MEGAN analysis of metagenomic data. Genome Res. 2007, 17, (3), 377-386.

518 519 520

28. Merchant, N.; Lyons, E.; Goff, S.; Vaughn, M.; Ware, D.; Micklos, D.; Antin, P. The iPlant collaborative: cyberinfrastructure for enabling data to discovery for the life sciences. PLoS Biol. 2016, 14, (1), e1002342.

521 522

29. Roux, S.; Enault, F.; Hurwitz, B. L.; Sullivan, M. B. VirSorter: mining viral signal from microbial genomic data. PeerJ 2015, 3, e985.

523 524 525 526

30. Quast, C.; Pruesse, E.; Yilmaz, P.; Gerken, J.; Schweer, T.; Yarza, P.; Peplies, J.; Glockner, F. O. The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Res. 2013, 41, (Database issue), D5906.

527 528 529

31. Camacho, C.; Coulouris, G.; Avagyan, V.; Ma, N.; Papadopoulos, J.; Bealer, K.; Madden, T. L. BLAST+: architecture and applications. BMC Bioinformatics 2009, 10, (1), 421.

530 531 532

32. Albertsen, M.; Hugenholtz, P.; Skarshewski, A.; Nielsen, K. L.; Tyson, G. W.; Nielsen, P. H. Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes. Nat. Biotechnol. 2013, 31, (6), 533-8.

533 534

33. Guy, L.; Roat Kultima, J.; Andersson, S. G. genoPlotR: comparative gene and genome visualization in R. Bioinformatics 2010, 26, (18), 2334-2335.

535 536 537

34. Bolduc, B.; Jang, H. B.; Doulcier, G.; You, Z.-Q.; Roux, S.; Sullivan, M. B. vConTACT: an iVirus tool to classify double-stranded DNA viruses that infect Archaea and Bacteria. PeerJ 2017, 5, e3243.

538 539

35. Byron, L.; Wattenberg, M. Stacked graphs--geometry & aesthetics. IEEE Trans. Vis. Comput. Graph. 2008, 14, (6), 1245-52.

540 541 542 543 544 545 546

36. Paez-Espino, D.; Chen, I. A.; Palaniappan, K.; Ratner, A.; Chu, K.; Szeto, E.; Pillay, M.; Huang, J.; Markowitz, V. M.; Nielsen, T.; Huntemann, M.; TB, K. R.; Pavlopoulos, G. A.; Sullivan, M. B.; Campbell, B. J.; Chen, F.; McMahon, K.; Hallam, S. J.; Denef, V.; Cavicchioli, R.; Caffrey, S. M.; Streit, W. R.; Webster, J.; Handley, K. M.; Salekdeh, G. H.; Tsesmetzis, N.; Setubal, J. C.; Pope, P. B.; Liu, W. T.; Rivers, A. R.; Ivanova, N. N.; Kyrpides, N. C. IMG/VR: a database of cultured and uncultured DNA Viruses and retroviruses. Nucleic Acids Res. 2017, 45, (D1), D457-D465. 21

ACS Paragon Plus Environment

Page 28 of 32

Page 29 of 32

Environmental Science & Technology

547 548

37. Brister, J. R.; Ako-adjei, D.; Bao, Y.; Blinkova, O. NCBI Viral Genomes Resource. Nucleic Acids Res. 2015, 43, (D1), D571-D577.

549 550

38. Zhou, Y.; Liang, Y.; Lynch, K. H.; Dennis, J. J.; Wishart, D. S. PHAST: a fast phage search tool. Nucleic Acids Res. 2011, 39, (Web Server issue), W347-52.

551 552

39. Zhang, T.; Shao, M. F.; Ye, L. 454 pyrosequencing reveals bacterial diversity of activated sludge from 14 sewage treatment plants. ISME J. 2012, 6, (6), 1137-47.

553 554

40. Wang, J.; Gao, Y.; Zhao, F. Phage-bacteria interaction network in human oral microbiome. Environ. Microbiol. 2016, 18, (7), 2143-58.

555 556 557

41. Minot, S.; Bryson, A.; Chehoud, C.; Wu, G. D.; Lewis, J. D.; Bushman, F. D. Rapid evolution of the human gut virome. Proc. Natl. Acad. Sci. U. S. A. 2013, 110, (30), 12450-5.

558 559 560

42. Needham, D. M.; Chow, C.-E. T.; Cram, J. A.; Sachdeva, R.; Parada, A.; Fuhrman, J. A. Short-term observations of marine bacterial and viral communities: patterns, connections and resilience. ISME J. 2013, 7, (7), 1274-1285.

561 562 563

43. Bouvier, T.; del Giorgio, P. A. Key role of selective viral-induced mortality in determining marine bacterial community composition. Environ. Microbiol. 2007, 9, (2), 287-97.

564 565 566

44. Brum, J. R.; Hurwitz, B. L.; Schofield, O.; Ducklow, H. W.; Sullivan, M. B. Seasonal time bombs: dominant temperate viruses affect Southern Ocean microbial dynamics. ISME J. 2016, 10, (2), 437-49.

567 568

45. Gomez, P.; Buckling, A. Coevolution with phages does not influence the evolution of bacterial mutation rates in soil. ISME J. 2013, 7, (11), 2242-4.

569 570 571 572

46. Katayama, H.; Haramoto, E.; Oguma, K.; Yamashita, H.; Tajima, A.; Nakajima, H.; Ohgaki, S. One-year monthly quantitative survey of noroviruses, enteroviruses, and adenoviruses in wastewater collected from six plants in Japan. Water Res. 2008, 42, (67), 1441-8.

573 574 575

47. Maurice, C. F.; Bouvier, T.; Comte, J.; Guillemette, F.; Del Giorgio, P. A. Seasonal variations of phage life strategies and bacterial physiological states in three northern temperate lakes. Environ. Microbiol. 2010, 12, (3), 628-41.

576 577 578 579

48. Mookerjee, S.; Jaiswal, A.; Batabyal, P.; Einsporn, M. H.; Lara, R. J.; Sarkar, B.; Neogi, S. B.; Palit, A. Seasonal dynamics of Vibrio cholerae and its phages in riverine ecosystem of Gangetic West Bengal: cholera paradigm. Environ. Monit. Assess. 2014, 186, (10), 6241-50.

580 581 582

49. Mookerjee, S.; Batabyal, P.; Sarkar, M. H.; Palit, A. Seasonal Prevalence of Enteropathogenic Vibrio and Their Phages in the Riverine Estuarine Ecosystem of South Bengal. PLoS ONE 2015, 10, (9), e0137338. 22

ACS Paragon Plus Environment

Environmental Science & Technology

583 584 585

50. Jiang, X. T.; Guo, F.; Zhang, T. Population Dynamics of Bulking and Foaming Bacteria in a Full-scale Wastewater Treatment Plant over Five Years. Sci. Rep. 2016, 6, 24180.

586 587 588

51. Saunders, A. M.; Albertsen, M.; Vollertsen, J.; Nielsen, P. H. The activated sludge ecosystem contains a core community of abundant organisms. ISME J. 2016, 10, (1), 11-20.

589

23

ACS Paragon Plus Environment

Page 30 of 32

Page 31 of 32

Environmental Science & Technology

590

FIGURE LEGENDS

591

Figure 1. Genome maps of the 2 longest mVCs in the ST and SWH WWTPs. All the

592

predicted genes with functions that could not be assigned are colored in gray, and

593

confirmed viral genes are labeled and colored in blue. If a gene had hits with viral and

594

bacterial proteins simultaneously, the gene was assigned based on e-value and coverage.

595

Genes with high coverage and e-value with bacterial proteins are labeled and colored

596

in burly wood. Length scales are shown at the bottom left.

597

Figure 2. Distribution of ST (blue) and SWH (orange) mVCs by length and coverage

598

with the NCBI RefSeq (a) and IMG viral protein databases (b). The mVCs with >70%,

599

35-70% and 1.0) in ST AS samples over 9 years. This visualization represents the average

613

normalized coverage (areas) of different mVCs (color). The sectional areas at specific

614

time points represent the average normalized coverage of mVCs. The normalized

615

coverage of all samples is summarized in this figure, but only 53 samples are labeled 24

ACS Paragon Plus Environment

Environmental Science & Technology

616

on the x-axis to make the figure easier to read.

617

Figure 6. Distribution of shared viral clusters among wastewater treatment systems. The

618

connecting lines indicate shared viral clusters that are found between two connected

619

samples, and the colors of the lines represent the number of shared viral clusters.

25

ACS Paragon Plus Environment

Page 32 of 32