Reduced Zebrafish Transcriptome Atlas toward Understanding the

May 21, 2018 - Here, we described a Reduced Transcriptome Atlas (RTA) approach which integrates transcriptomic datasets and a comprehensive panel of ...
1 downloads 0 Views 2MB Size
Subscriber access provided by UNIVERSITY OF TOLEDO LIBRARIES

Ecotoxicology and Human Environmental Health

Reduced Zebrafish Transcriptome Atlas toward Understanding the Environmental Neurotoxicants Kun Zhang, and Yanbin Zhao Environ. Sci. Technol., Just Accepted Manuscript • DOI: 10.1021/acs.est.8b01350 • Publication Date (Web): 21 May 2018 Downloaded from http://pubs.acs.org on May 21, 2018

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 36

Environmental Science & Technology

1

Reduced Zebrafish Transcriptome Atlas toward Understanding the Environmental

2

Neurotoxicants

3

Kun Zhanga, and Yanbin Zhaoa,*

4 5

a

6

02115, USA

Brigham and Women's Hospital, Harvard Medical School, 60 Fenwood Road, Boston, MA

7 8

*Corresponding author:

9

Yanbin Zhao

10

Tel.: +1 857 307 0318; E-mail: [email protected]

ACS Paragon Plus Environment

Environmental Science & Technology

11

Abstract

12

Transcriptomic approaches monitoring gene responses at genome-scale are increasingly used in

13

toxicological research and help to clarify the molecular mechanisms of adverse effects caused by

14

environmental toxicants. However, their applications for chemical assessment are hampered due

15

to high expenses required and more importantly the lack of in-depth data mining and mechanistic

16

perspectives. Here, we described a Reduced Transcriptome Atlas (RTA) approach which

17

integrates transcriptomic datasets and a comprehensive panel of genes generated to represent

18

neurogenesis and the early neuronal development of zebrafish, to determine the potential

19

neurodevelopmental toxicities of environmental chemicals. Transcriptomic datasets of 74

20

chemicals and 736 related gene expression profiles were integrated resulting in 135 exposure

21

signatures. Chemical prioritization demonstrated four sets of hits to be neurotoxic: neuro-active

22

chemicals (representatively, Valproic acid, VPA and Carbamazepine, CAR), xenoestrogens

23

(Bisphenol A, BPA; Genistein, GEN; 17-α Ethinylestradiol, EE2), microcystins (Cyanopeptolin,

24

CP1020; Microcystin-LR, MCLR) and heavy metals (AgNO3, AgNPs). The enriched biological

25

pathways and processes were distinct among the four sets, while the overlapping functional

26

enrichments were observed within each set, e.g. over 25% differentially expressed genes and

27

four of top five KEGG pathways were shared between VPA and CAR. Furthermore, gene

28

expression index (GEI) analysis demonstrated that a gene panel with 300 genes was sufficient to

29

effectively characterize and cluster chemicals and therefore offer an efficient and cost-effective

30

tool for the prioritization of neurotoxicants. Thus, the RTA approach provides novel insights into

31

the understanding of the in-depth molecular mechanisms of environmental neurotoxicants and

32

can be used as indication for potential adverse outcomes.

ACS Paragon Plus Environment

Page 2 of 36

Page 3 of 36

33

Environmental Science & Technology

Introduction

34

Transcriptomics technologies can characterize and quantify the entirety of gene transcripts

35

simultaneously and consequently are useful to identify changes in molecular patterns related to

36

environmental stressors, such as toxic chemicals. Transcriptomics have increasingly been applied

37

in toxicological research. They help to clarify the molecular mechanisms of adverse effects and

38

predict the potential toxic effects caused by environmental chemicals.1-3 The integration of

39

transcriptomics and high-throughput screening contribute to the adverse outcome pathway

40

analysis for chemical risk assessments4,5 and assist with the classification and prioritization in

41

large toxicity screening program, for instance, Tox21 and the human toxome project.6-7

42

High throughput transcriptomic approaches have received more and more attention recently,

43

however, several issues still limit their extensive utilizations in general toxicology. One major

44

concern that significantly hampered the progress is the lack of data mining tools and advanced

45

knowledge discovery.8 Due to the data complexity, most of the transcriptomics-based toxicology

46

studies still focus on the signaling pathways and biomarker genes related to the already known

47

adverse effects of the toxicants, which helps to explain the molecular mode of actions of the

48

toxic outcomes observed at higher organizational levels (e.g. the histological and physiological

49

toxicities),9,10 while other meaningful molecular responses were either overlooked or ignored as

50

not directly relevant to the expected mode of actions. For example, bisphenol A (BPA), with

51

weak estrogenic activities, has been classified as an endocrine disruptor and a “substance of very

52

high concern”.11 Transcriptomics highlighted its dysregulations on endocrine systems such as on

53

the receptor binding and estrogen stimulus, steroidogenesis and cell proliferation.12,13 However,

54

the significant changes were also observed for other pathways, such as the DNA metabolic

55

process, oxidative stress, apoptosis, neurogenesis and nervous system development. These

ACS Paragon Plus Environment

Environmental Science & Technology

56

dysregulations indicate that BPA would have unanticipated toxicities on DNA repair process and

57

zebrafish neurodevelopment that are beyond the adverse outcomes on the endocrine system.

58

Meanwhile, due to the high biological complexity, many adverse outcomes in animals under

59

chemical exposure cannot be easily detected. The lack of advanced knowledge and technologies

60

to qualitatively and quantitatively characterize the traits of different phenotypes hampered the

61

progress to characterize and prioritize toxic chemicals. Especially for the nervous system, there

62

are limited strategies that were commonly applied to assess and prioritize the neuronal disruptive

63

chemicals during neurogenesis and early neuronal development, e.g. by immunohistochemistry,

64

in situ hybridization and the emerging behavioral testing (e.g. locomotor activities in teleost).14,15

65

Transcriptomics approaches can help to predict these unanticipated toxic effects, however, an

66

effective data interpretation is still hard to perform due to the limited knowledge of gene-

67

function relationships, regulatory mechanisms and the immature data mining tools.8,14 For

68

instance, functional enrichment analysis of transcriptome can clarify the dysregulated biological

69

pathways and processes, such as the altered mRNA metabolic process, cellar protein catabolic

70

process and chromosome organization caused by silver ions.9 Nevertheless, limited perspectives

71

can be obtained based on these general biochemical alterations and it is still difficult to uncover

72

the related toxic outcomes at higher organizational levels. Thus, the challenges on data analysis

73

and interpretation call for novel methodological strategies, especially for the complex nervous

74

system where the toxicity outcomes were hard to determine qualitatively and quantitatively.

75

Another obstacle confronting the utilization of transcriptomics is the relatively high cost. In

76

spite of falling prices, transcriptomic profiling remains expensive to perform, especially for a

77

large scale toxicological research, such as the high throughput screening for chemical clustering

78

and prioritization. In recent years, alternative strategies were developed with substantially

ACS Paragon Plus Environment

Page 4 of 36

Page 5 of 36

Environmental Science & Technology

79

reduced costs.16-19 Of them, a reduced transcriptome approach was increasingly used. It includes

80

a small number of genes as a gene panel which can effectively represent the whole-genome

81

expressions, such as the L1000 landmark genes18 and 917 human pathway reporter gene panel.19

82

In this way, the activity of entire signaling networks can be assessed based on these established

83

key regulators/targets rather than the whole transcriptome.19 Especially, the LINCS (NIH library

84

of integrated network-based cellular signatures) program as well as the related L1000 expression

85

profiling assay has generated a large-scale public dataset that indicates how human cells respond

86

to various genetic and environmental stressors, which already included over 27,000 perturbagens

87

and produced over 1.3 million L1000 profiles.20,21 As a powerful and effective tool, the LINCS

88

program greatly accelerated the drug discovery. For toxicology, a human S1500 sentinel gene set

89

were generated from 3339 gene expression series by Tox21 program (Toxicology Testing in the

90

21st Century) recently with full pathway coverages. It can be used to accurately predict pathway

91

perturbations and biological relationships for samples under study.22 Similarly, a reduced human

92

transcriptome (RHT) approach and a reduced zebrafish transcriptome (RZT) approach were

93

sequentially developed via the integration of 1200 human Entrez genes and 1637 zebrafish

94

Entrez genes, respectively, to comprehensively represent the biological pathways and

95

toxicologically relevant processes.23,24 They were proven to be promising for profiling the

96

molecular pathways that modulated by reference chemicals and the water samples ranging from

97

wastewater to drinking water.23,24 The enriched pathways in RHT and RZT analysis could be

98

used to evaluate and prioritize chemicals for future assessment. Specifically, the most sensitive

99

biological pathways could be linked with adverse outcomes at higher organizational levels and

100

could identify the early molecular responses for adverse outcome pathway (AOP) construction

101

and assess the hazard potencies of environmental toxicants in early developmental stages.23,24

ACS Paragon Plus Environment

Environmental Science & Technology

102

On this basis, the aim of present study was to develop a Reduced Transcriptome Atlas (RTA)

103

by integrating a large number of transcriptomic datasets with a specific gene panel (L2000) to

104

comprehensively represent the neurogenesis and early neuronal development in zebrafish. This

105

approach was aimed to uncover and prioritize the potential neurodevelopmental toxicants by in-

106

depth transcriptomic data mining and interpretation. Meanwhile, a specific gene set was aimed to

107

be generated to effectively characterize and prioritize chemicals as L2000 and therefore serves to

108

establish a cost-effective reduced zebrafish transcriptome approach that specifically targets the

109

nervous system.

110 111

Materials and Methods

112

RTA gene panel development. The RTA gene panel was aimed to comprehensively represent

113

the biological pathways and processes involved in neurogenesis and early neuronal development

114

in zebrafish (Danio rerio). Thus, we propose three criteria to get a maximum coverage in the

115

present study: Representing the entire neural networks, maximal coverage of biological pathways

116

and processes and integrating novel neural factors (Figure 1). Our current knowledge and large-

117

scale signaling pathway databases provide us the opportunity to retrieve most of the currently

118

known neural factors involved in neural networks and signaling pathways. With a comprehensive

119

survey of the up-to-date literatures, novel annotated neural genes can be further integrated.

120

Therefore, at first, we constructed a general framework for the neural signaling pathways and

121

related genes based on the current knowledge,25-27 which included most of the classic extrinsic

122

and intrinsic factors, such as neurotransmitters and growth factors. Then, the associated genes

123

within each signaling pathway were curated and integrated from two databases: Kyoto

124

Encyclopedia of Genes and Genomes (KEGG) database and Molecular Signatures Database

ACS Paragon Plus Environment

Page 6 of 36

Page 7 of 36

Environmental Science & Technology

125

(MSigDB). For the pathways generated for other species, like Homo sapiens and Mus musculus,

126

the corresponding zebrafish homologous genes were retrieved from Ensembl zebrafish genome

127

database (http://useast.ensembl.org/Danio_rerio/Info/Index).

128

The Gene Ontology (GO) dataset from the European Bioinformatics Institute (EMBL-EBI)

129

and MSigDB helps to uncover more neural factors. Nervous system development (GO Biological

130

Process, GO: 0007399) and the child GO terms, such as neurogenesis (GO: 0022008) and neural

131

tube development (GO:0021915), provide a large set of genes involved in neurogenesis and early

132

neuronal development. This dataset was further reviewed and clustered. The neural genes beyond

133

the classic pathways described above were further integrated manually. Meanwhile, the gene sets

134

from Mammalian Neurogenesis Gene Ontology (MANGO), a database of genes mapped to adult

135

neurogenesis via a comprehensive survey of the literatures,28 were also checked to include some

136

novel annotated neural genes. At last, all genes were integrated and clustered manually and the

137

related detailed annotations were extracted from Ensembl database and Genebank database using

138

R (version 3.4.3). This information was stored in FASTA format. In total, the gene panel in the

139

present study contained 2015 genes (termed L2000) (Table S1), which were divided into six

140

groups: extrinsic morphogens, growth factors, neurotransmitters and synaptic vesicle cycle,

141

intrinsic transcription factors, cytoskeletal proteins and epigenetic regulators. At last, the

142

representation of RTA gene panel on neural development was further validated by the enrichment

143

analysis of KEGG_Pathway and GO_Biological Process on the online bioinformatics resources

144

database: The Database for Annotation, Visualization and Integrated Discovery (DAVID).

145

Datasets collection. The basic criteria for data inclusion were as follows: (I) Exposure durations

146

representing the neurogenesis processes/early neuronal development. In zebrafish embryos, the

147

brain ventricles have formed by 48 hours post-fertilization (hpf). Neurogenesis peaks between 18

ACS Paragon Plus Environment

Environmental Science & Technology

148

hpf and 36 hpf.29 Therefore, the exposures covered the 0-48 hpf neurogenesis or exposures after

149

the first 48 hpf to represent the early neuronal development were selected for the present study.

150

(II) Studies have appropriate experimental setups. Treatment groups and vehicle controls should

151

have sufficient replicates for the statistical analysis. (III) Complete raw/normalized dataset with

152

sufficient annotation. The raw data or normalized dataset should be completed and can be

153

retrieved from public repositories or related literature. Annotation repository should be supplied

154

with sufficient information to cover all gene probes (Figure 1). The criteria for data inclusion

155

basically followed the approaches proposed by Ramasamy et al.30 and the MIAME-Guidelines.31

156

A database query was then conducted in the publicly accessible NCBI Gene Expression

157

Omnibus (GEO) and EMBL-EBI ArrayExpress online repositories. Datasets satisfying the above

158

restrictions were manually selected and integrated. Besides, the PubMed literature database was

159

also employed to search for relevant studies. Dataset was included if the criteria met. In total, the

160

searching resulted in 27 studies, which contain 74 environmental chemicals and 736 related gene

161

expression profiles (SI Table S1).

162

Data normalization and analysis. The raw data and platform information were downloaded

163

from GEO or Array Express. For data normalization, if available, the processed signal intensity

164

was employed. The raw data were log2 transformed and then the sample qualities were checked

165

using R package arrayQualityMetrics. The differentially expressed gene (DEG) was identified

166

with a moderated t-test using R-package “limma”. Cutoff criterium for DEGs was an adjusted P-

167

value of < 0.05. Because the analyzed studies are usually highly variable in regard to the

168

experimental setup, such as the exposure time points and the applied concentrations, the Log2

169

fold change (LogFC) values were not applied for DEGs identification to avoid a biased impact

170

on the analysis and to obtain sufficient gene expression information. From each platform file, the

ACS Paragon Plus Environment

Page 8 of 36

Page 9 of 36

Environmental Science & Technology

171

full annotation information, such as the probe IDs, Ensembl IDs, Genebank associate numbers,

172

gene names and symbols, chromosomal locations were extracted and stored in FASTA format.

173

The annotation repository was then mapped against the DEGs generated in each related dataset.

174

The final dataset with DEGs annotations was then stored for the RTA construction.

175

RTA Platform Construction. After obtaining the RTA gene panel (L2000), the full annotations

176

for each gene were retrieved from Ensembl database and Genebank database, which include the

177

Ensembl gene ID, Ensembl transcript ID, Entrez Gene ID, Genebank associate number, official

178

symbol, official full name, synonyms and chromosome location. This annotation repository was

179

then imported into R (version 3.4.3), which was used to compare with the annotations of DEGs

180

in each transcriptomic dataset to characterize and cluster the hit neural genes. The complete atlas

181

was assembled based on 135 gene sets with differentially expressed neural genes. Afterwards, it

182

was further reviewed and corrected manually to avoid the potential mismatched or overlapped

183

genes in each dataset.

184

Chemical Prioritization and Functional Enrichment Analysis. To discriminate the most likely

185

neurotoxic substances, several criteria were set up for the chemical prioritization analysis. It

186

included: (I) The ratio of differentially expressed neural genes (DEG-N) compared to the whole

187

set of DEGs, (II) Gene expression intensity (expression levels) of differentially expressed neural

188

genes responses to each chemical, (III) Gene expression patterns to cluster the different types of

189

chemicals. Chemicals with higher DEG-N proportions, higher gene expression levels or

190

displaying similar gene expression patterns indicated a higher neurotoxic potential. This strategy

191

was similar to those reported in previous large-scale transcriptome analysis, in which gene

192

expression intensity and hierarchical clustering were employed to characterize and group the

193

toxic chemicals.32,33

ACS Paragon Plus Environment

Environmental Science & Technology

194

For functional enrichment analysis, we divided the transcriptomes into two sets with respect

195

to the effect concentrations to derive biologically meaningful information as described below.

196

Only the exposures with reported visible effects, sublethal concentrations or pharmacological

197

concentrations were included. We performed the functional annotation and gene ontology (GO)

198

clustering by use of DAVID (The Database for Annotation, Visualization and Integrated

199

Discovery) Bioinformatics Resources 6.8. The enriched KEGG pathways and Gene Ontology

200

terms (Biological Process (BP), Cellular Component (CC) and Molecular Function (MF)) were

201

demonstrated by analyzing the differentially expressed neural genes generated in RTA gene set.

202

Hierarchical clustering map was constructed by use of the MultiExperimental Viewer v4.9

203

(Dana-Farber Cancer Institute, Boston, MA, USA) with the default distance metric (Pearson

204

Correlation) and average linkage clustering. Bootstrapping with 100 replicates was used to

205

generate support trees. Venn diagrams were generated using Venny 2.1.0. Chord diagram and

206

bubble plot were implemented using R (version 3.4.3) together with the package GOplot (version

207

1.0.2).

208

Gene Expression Index. Considering that the overlapping and gene expression intensities were

209

the major contributors for the development of reduced transcriptome atlas, as described in the

210

S1500 platform construction,22 a similar approach was performed for reduced zebrafish

211

transcriptome construction in the present study. The gene expression index (GEI) was developed

212

to score and rank each gene by multiplying the overlapping frequency by the averaged gene

213

expression intensity: GEI Score = (N/81)*(S/81). N represents the number of occurrence of each

214

gene as a DEG; S represents the summed absolute value (Log2FC); 81 represents the number of

215

exposure signatures.

216

ACS Paragon Plus Environment

Page 10 of 36

Page 11 of 36

Environmental Science & Technology

217

Results and Discussion

218

RTA Approach Development. Following the workflow diagram (Figure 1a), the RTA approach

219

was developed by integrating a large number of transcriptomic datasets and a specific gene panel

220

to comprehensively represent the neurogenesis and early neuronal development in zebrafish

221

(Danio rerio). The developed RTA gene set (termed L2000) consists of 2015 zebrafish Entrez ID

222

genes, which include 1236 extrinsic genes and 779 intrinsic genes. The extrinsic factors include

223

the vast majority of biological pathways and processes involved in neuronal development, such

224

as the extrinsic morphogens (e.g. Wnt, Notch, SHH, BMP and Ephrin signaling pathways),

225

growth factors (e.g. VEGF, IGF, FGF, Neurotrophin and Neuregulin), neurotransmitters (e.g.

226

dopamine, GABA, glutamate, acetylcholine, serotonin) and synaptic vesicle cycle process. The

227

cell-intrinsic factors include most known transcription factors, cytoskeletal proteins and other

228

regulators clarified in recent literature. Besides, some epigenetic regulators have recently been

229

shown to be crucially involved in modulating neurogenesis were also added, e.g. genes involved

230

in DNA methylation, demethylation and histone modifications (Figure 1b).

231

This RTA approach aggregated transcriptomic datasets from 27 studies, which contain 74

232

environmental chemicals and in total 736 related gene expression profiles. The gene expression

233

signatures for different exposure conditions, generated as the genes within the L2000 RTA gene

234

panel that expressed differentially in exposure fish samples compared to control group, were then

235

obtained. In total, 135 gene expression signatures were generated and each signature contained a

236

number of differentially expressed neural genes (DEG-N) range from 1 to 1316 with a specific

237

expression pattern. Chemicals included in the present study can be roughly divided into eight

238

categories: environmental hormones, polycyclic aromatic hydrocarbons, pharmaceuticals, food

239

additives, biocides (insecticides, herbicides, fungicides and antibiotics), industrial compounds,

ACS Paragon Plus Environment

Environmental Science & Technology

240

microcystins, heavy metals and related nanoparticles. The duration of exposure for each

241

chemical was depicted in Figure S1. In total, 91 exposures were integrated, including 46

242

exposures covered the fertilization to 48 hpf embryo developmental stage and 45 other exposures

243

range from fertilization to 7 days post-fertilization.

244

Chemical Prioritization. The DEG-N proportions were first checked. As shown in Figure 2a,

245

the ratios of DEG-N compared to the L2000 gene set were varied among chemicals, range from

246

65.3% to < 0.2%, while the ratios of DEG-N compared to the whole set of DEGs were relatively

247

stable, ranging from 12.6% to 1.2%. The high variations when compared to L2000 were mainly

248

due to the chemical types and the applied concentrations. The surprisingly stable ratios when

249

compared to whole DEGs set, mostly at about 5% to 8%, indicate that most of the chemicals had

250

background dysregulations on neural genes and were possibly due to their general adverse effects

251

on embryo development. The prioritized chemicals with higher DEG-N proportions (>8%) were

252

found to be estrogenic hormones (GEN, BPA), PAHs-like (TCDD, Dibenzothiophene),

253

microcystins (MCLR, CP1020), a neuro-drug (Fluoxetine) and a heavy metal (CoSO4).

254

Similarly, the gene expression intensity analysis based on Log2 FC value identified less than 6%

255

(8/135) chemicals with Log2 FC>3, i.e. xenoestrogens (EE2, GEN), microcystins (MCLR), a

256

neuro-drug (VPA) and heavy metals (AgNO3, AgNPs) (Figure 2b).

257

Hierarchical clustering map depicts 135 exposure signatures for the L2000 gene set (Figure

258

2c). Of these signatures, only a small part displayed clear similar gene expression patterns. They

259

can be roughly divided into three sets: (a) the highest prioritization occurred for pharmaceuticals,

260

i.e. VPA, CAR, retinoic acid (RA) and caffeine (CAFF). They are all neuro-drugs or neuroactive

261

compounds. (b) the second prioritized group consisted of extrinsic environmental hormones,

262

including the classic xenoestrogens, like EE2, GEN, BPA, 17β-estradiol (E2), and others such as

ACS Paragon Plus Environment

Page 12 of 36

Page 13 of 36

Environmental Science & Technology

263

pentachlorophenol, triclosan, propiconazole and diisobutyl_phthalate. (c) the third group

264

consisted of microcystins and heavy metals, including CP1020, AgNO3 and the related silver

265

nanoparticles (Ag50, Ag150 and PVP) (Figure 2d).

266

Therefore, the chemical prioritization analyses provides a list of hit chemicals that displayed

267

remarkable dysregulations on genes involved in neurogenesis and early neuronal development in

268

zebrafish embryos. The most frequently occurred substances were: neuro-drugs and neuroactive

269

compounds (representatively, VPA and CAR), xenoestrogens (representatively, BPA, GEN and

270

EE2), microcystins (CP1020 and MCLR) and heavy metal forms (AgNO3 and AgNPs). VPA is a

271

broad-spectrum antiepileptic drug that has been used in the clinic for more than 30 years. It is

272

effective for many different forms of partial and generalized epileptic seizure and is prescribed to

273

treat bipolar disorders, schizoaffective disorders, social phobias, as well as the neuropathic pain

274

and migraine headache.34 Similarly, CAR was first marketed in 1963 and is typically used for the

275

treatment of seizure disorders and neuropathic pain. It is also very important as off-label for a

276

second-line treatment for bipolar disorder.35 These compounds have been frequently detected in

277

the aquatic environment and have been non-surprisingly proven to be neurotoxic to animals like

278

zebrafish. They disrupted behavioral activities with similar patterns such as the acceleration of

279

spontaneous movement, touch response and light stimulation response in zebrafish eleuthero-

280

embryos and even showed the persisting behavioral alterations later in life.36,37 Xenoestrogens

281

like BPA, EE2 and GEN exhibit estrogen-mimicking, hormone-like properties that raise concern

282

about their endocrine disrupting activities.38,39 In recent years, the neurotoxicity of endocrine

283

disruptive chemicals, especially the xenoestrogens, has been extensively discussed.38,39 Animal

284

studies demonstrated that xenoestrogens, such as BPA and EE2, can produce adverse outcomes

285

in rearing behavior, locomotive activity, anxiety, learning and memory, as well as the neuronal

ACS Paragon Plus Environment

Environmental Science & Technology

286

abnormalities.39 Some evidence further clarified that BPA exerts effects on neurogenesis within

287

zebrafish hypothalamus and therefore caused the later hyperactive behaviors in zebrafish

288

larvae.40 In contrast, fewer reports discussed the neurotoxicity of GEN, while its neuroprotective

289

effects against the neurotoxic amyloid β-peptide and heavy metals have been demonstrated

290

widely.41 These results were consistent with the transcriptomic data interpretations in the present

291

study that xenoestrogens like BPA, EE2 and GEN would most likely to be neurotoxic substances.

292

Microcystins as the secondary metabolites are produced via cyanobacterial blooms. Their toxic

293

targets are not only the liver but also other organs, i.e., the brain.42 The serious neurotoxicity

294

effects caused by microcystins can lead to various symptoms, such as dizziness, headache, visual

295

disturbances and even blindness. In zebrafish, microcystins like MCLR and CP1020 can cross

296

the blood brain barrier via the uptake transporters43 and cause neurostructural and behavioral

297

changes mainly through oxidative stress and the inhibition of protein phosphatases.44 Similarly,

298

the neurotoxicity effects of AgNO3 and silver nanoparticles have also been investigated widely.

299

The common phenomenon includes the reduced neurite formation and outgrowth, impaired cell

300

membrane integrity, blood-brain barrier dysfunction, cell necrosis, neurodegeneration, oxidative

301

stress and apoptosis.45,46 The induced oxidative stress and cell apoptosis by increasing reactive

302

oxygen species and caspase activity were generally considered to be the primary target of AgNO3

303

and silver nanoparticles exposure.45,46 This molecular mode of actions were similar to that for

304

microcystins, which was also in agreement and consistent with the hierarchical clustering where

305

microcystins (CP1020) and AgNO3 and AgNPs built the third group (Figure 2d).

306

It should be noted that during the transcriptomic data analysis, we observed the chemical

307

concentration was a key factor to effectively characterize and cluster chemicals. The exposures

308

with low concentrations usually dysregulated less DEGs and were not be able to show clear

ACS Paragon Plus Environment

Page 14 of 36

Page 15 of 36

Environmental Science & Technology

309

molecular mode of actions and the expression patterns during the hierarchical clustering. On the

310

contrary, the exposures with relatively high concentrations (such as pharmacological or lethal

311

concentration) generally did well. This phenomenon was quite consistent with the observations

312

in a similar report previously on zebrafish transcriptome meta-analysis.32 Thus, to be able to

313

derive biologically meaningful information of different substances via transcriptome analysis,

314

only the exposures with reported inducing visible effects, sublethal concentrations or

315

pharmacological concentrations were included in the following functional enrichment analysis,

316

as described above.

317

Functional Enrichment Analysis. To systemically characterize the hit chemicals, the DEGs

318

were subjected to DAVID for KEGG pathway and Gene Ontology enrichment analysis. The top

319

five KEGG pathways and GO Biological Processes enriched for each of the 12 hit chemicals

320

(VPA, CAR, Fluoxetine (FLO), BPA, EE2, GEN, CP1020, MC-LR, AgNO3, Ag150, Cobalt and

321

TCDD) were depicted (Figure 3a, S2). VPA and CAR, the commonly administered antiepileptic

322

drugs, showed very similar dysregulations on the KEGG pathways, four of the top five pathways

323

were identical between them. They were Wnt signaling, GnRH signaling, MAPK signaling and

324

Melanogenesis. In contrast, the antidepressant drug of the selective serotonin reuptake inhibitor

325

(SSRI) class, FLO, displayed a distinct dysregulation. Only the weak dysregulated Wnt signaling

326

and GnRH signaling were identical with others. BPA and EE2 also showed similar patterns, with

327

three of the top five pathways identical, i.e. Wnt signaling, GnRH signaling, and Notch signaling.

328

Interestingly, the enriched pathways of BPA and EE2 also resembled the neuro-drugs (VPA and

329

CAR). On the contrary, the microcystins and heavy metal forms showed quite different patterns

330

compared to the neuro-drugs and xenoestrogens. The enriched and identical Wnt signaling and

331

TGF-beta signaling was observed for CP1020 and MCLR and the enriched and identical Wnt

ACS Paragon Plus Environment

Environmental Science & Technology

332

signaling, VEGF signaling and Adrenergic signaling was observed for AgNO3 and AgNP-150. In

333

comparison, the GO_ Biological Processes were varied among the hit chemicals (Figure S2).

334

It should be noted that Wnt signaling was the most enriched pathway and biological process

335

among the hit chemicals, which was not surprising because it is indispensable in virtually every

336

aspect of embryonic development.47 However, the in-depth analysis of L2000 displayed quite

337

distinct dysregulations on Wnt signaling components as well as other pathways among different

338

types of chemicals. The transcriptional changes of the representative extrinsic signaling (i.e.

339

morphogens, growth factors, neurotransmitters and synaptic vesicle cycle) response to four types

340

of chemicals (VPA, BPA, MCLR and Ag150) is depicted in Figure 3b. For VPA (Figure 3b),

341

almost all signals were remarkably dysregulated; the ratio ranges from 54.2% (13/24, BMP

342

signaling) to 86.4% (19/22, FGF signaling). For example, 71.1% (59/83) genes involved in Wnt

343

signaling were significantly changed, with 31 genes up-regulated (representatively, sfrp2, wif1

344

and nlk2) and 28 genes down-regulated (representatively, apc2, wnt4b, gsk3b and ctnnd2b). This

345

result indicated that VPA had comprehensive adverse effects on the signals of neuronal

346

development in zebrafish embryos. In contrast, BPA had a specific enrichment on the extrinsic

347

signals. Compared to the enriched Wnt signal (15.6%, 13/83), the enrichment on

348

neurotransmitters (Glutamate signaling, 48.9%, 22/45) and synapse vesicle cycle process (52.3%,

349

46/88) was much stronger. For MCLR, it displayed less effect on most of the extrinsic signals

350

such as growth factors and neurotransmitters, the weak enrichments were only found for Wnt

351

signaling (26.5%, 22/83) and synapse vesicle cycle (22.7%, 22/88). For AgNP-150, multiple

352

signals were targeted. The highest enrichments were found for Neurotrophin signaling (53.2%,

353

25/47), BMP signaling (45.8%, 11/24) and Dopamine signaling (43.3%, 13/30). It should be

354

noted that the enriched signaling was sometimes identical among chemicals (as shown in KEGG

ACS Paragon Plus Environment

Page 16 of 36

Page 17 of 36

Environmental Science & Technology

355

pathways), such as the Wnt signaling and the synapse vesicle cycle process; however, the

356

detailed regulation patterns of targeted genes within each signaling were quite distinct (Figure

357

3b).

358

VPA, as a representative chemical, was further analyzed to better understand its potential

359

adverse effects on neurogenesis and early neuronal development (Figure 3c). The bioactivity

360

profile of VPA was shown as the radial pie diagram where the length of the slice indicates the

361

percentage of DEG-N/RTA gene panel for each signaling and the angle of the slice indicates the

362

total number of changed genes for a specific category term in the annotation level (e.g. Wnt

363

signaling, Glutamate signaling and synapse vesicle cycle). In total, 28 extrinsic neural signaling

364

pathways, cell-intrinsic transcription factors and epigenetic regulators were integrated. As shown

365

in Figure 3c, VPA had a significant effect on neural signaling pathways. Almost all signals were

366

remarkably dysregulated. The top five enriched signaling pathways were FGF signaling (86.4%),

367

GABA signaling (73.3%), Neurotrophin signaling (72.3%), Wnt signaling (71.1%) and Ephrin

368

signaling (69.7%) (Figure 3c). In-depth data analysis on the Wnt signaling demonstrated that

369

VPA is a potent inhibitor. Besides the up-regulations on the key inhibitory factors (e.g. wif1,

370

cer1, nlk, sfrp1a and sfrp2), VPA suppressed most essential components involved in canonical

371

Wnt signaling, such as wnt1, wnt2b, lrp5, gsk3aa, ctnnb1, tcf7l1b, tcf7l2, axin1 and apc2 (Figure

372

S3). Especially for apc2, the fold-changes were found to be four times higher compared with the

373

control. In contrast, the high enriched FGF signaling pathway was apparently activated by VPA

374

(Figure S4). Accompanied with the up-regulation on multiple FGF factors like fgf3, fgf8a and

375

fgf14, the transcriptional levels of FGF receptors (fgfr1a, fgfr1b and fgfr1bl) and downstream

376

components such as grb2a, grb2b, sos1, rras2, mras and mapk1 were significantly increased

377

(Figure S4). Wnt signaling and FGF signaling serve many functions in embryonic developmental

ACS Paragon Plus Environment

Environmental Science & Technology

378

processes such as cellular proliferation, differentiation and morphogenesis,47 therefore, the

379

significant alterations on these signals indicated that VPA had multiple adverse effects on

380

embryonic development that was beyond the nervous system. We further analyzed the

381

regulations for GABA signaling (Figure S4). Similar to Wnt signaling, most of the essential

382

components participated in GABAergic synaptic process were inhibited, especially the GABAA

383

and GABAC receptors such as gabra1, gabrb1, gabrg2, gabrr1 and gabrr2b. Because GABAA

384

and GABAC receptors are ionotropic and mediate fast GABA responses by triggering chloride

385

channel openings,48 the significant inhibitions indicated that VPA may affect the fast GABA

386

signal transductions by the blockage of chloride channels (Figure S4).

387

In addition, the differentially expressed neural genes generated in RTA approach were also

388

used to analyze the overlap within each type of chemical (Figure 4, S5 and S6). Venn diagram

389

shows the overlapping genes among the four neuroactive chemicals (VPA, CAR, RA and CAFF)

390

(Figure 4a) and three xenoestrogens (BPA, EE2 and GEN) (Figure S5a). For neuroactive

391

chemicals, there were 74 overlapping genes showing significant alterations. The responses of

392

these genes appeared very similar in terms of directionality among these chemicals with only a

393

few exceptions. The top four shared biological pathways generated by DAVID were Wnt

394

signaling, MAPK signaling, Calcium signaling and VEGF signaling (Figure 4b). Shared GO

395

terms turned to be involved in early embryonic development, such as anterior/posterior pattern

396

specification and multicellular organism development (Figure 4c) and therefore indicated that

397

these chemicals have broadly adverse effects on zebrafish embryogenesis. For the three

398

xenoestrogens, there were 87 overlapping genes showing significant changes (Figure S5a). The

399

three overlapped pathways generated were Wnt signaling, cell adhesion molecules and Notch

400

signaling (Figure S5b). The shared GO terms were specific on neuronal development, such as the

ACS Paragon Plus Environment

Page 18 of 36

Page 19 of 36

Environmental Science & Technology

401

nervous system development, neuron projection, synapse and neuron projection development

402

(Figure S5c). Similarly, for microcystins, 110 overlapping genes were observed between MCLR

403

and CP1020 (Figure S6a). They were involved in Wnt signaling, TGF-beta signaling, FoxO

404

signaling and Focal adhesion and the overlapped GO terms were specific on neuronal

405

development, such as the axon guidance and neuron projection morphogenesis (Figure S6b). For

406

heavy metals, there were 99 overlapping genes among AgNO3, AgNP-150 and PVP (Figure S6c).

407

The top four overlapped biological pathways were Hedgehog signaling, TGF-beta signaling, Wnt

408

signaling and Cytokine-cytokine receptor interaction. The shared GO terms were enriched in

409

nervous system development, cytoplasm and sprouting angiogenesis (Figure S6d).

410

Construction of L300 RZT Approach. The utilization of “omics” for chemical assessment is

411

hampered by to the high expenses and the lack of “standardized” toxicogenomic methods,22,23

412

therefore, alternative strategies, such as the reduced transcriptome approach with substantially

413

reduced costs, were developed in recent years.18-24 Similar to the L1000 landmark gene panel for

414

drug discovery and the S1500 sentinel gene set for toxicological studies, here, based on the

415

L2000 gene set developed to comprehensively represent biological pathways involved in

416

neurogenesis and early neuronal development in zebrafish, we aimed to generate an efficient and

417

cost-effective neural gene panel with similar approaches for high throughput developmental

418

neurotoxicology analysis. As described above and previous study,32 exposure concentrations and

419

observation time points were the major factors that influence the gene expressions. To reduce the

420

study bias and be able to derive more biologically meaningful information from the large set of

421

different treatments, in each exposure, only the signatures with the highest concentration and/or

422

latest time point were included. They were believed to represent the treatments showing the

423

strongest and most significant neural toxicities.

ACS Paragon Plus Environment

Environmental Science & Technology

424

In total, 81 exposure signatures containing 72 environmental chemicals were included in the

425

RZT construction. This broadly representative chemical set includes eight large categories, such

426

as the environmental hormones, PAHs, pharmaceuticals, biocides and heavy metals. To capture

427

maximal variations in gene expressions and dynamics and the maximal pathway coverage, a

428

principal component analysis were employed as a first analysis step, while not surprisingly, it

429

failed to cluster the L2000 neural genes due to the large variations among exposure signatures.

430

Therefore we employed the GEI to score and rank each gene as described above. Based on this

431

GEI score, the L2000 genes were completely evaluated to generate a representative gene panel

432

with a minimum number of genes. We selected the top 100, 200, 300 and 400 genes (as L100,

433

L200, L300 and L400, respectively) and compared their representativeness on neural signaling

434

pathways and whether they can be used to cluster and prioritize chemicals in the same manner as

435

L2000. The average percentage of covered genes within each signaling pathway increased

436

steadily from L100 to L400, while the percentage of covered pathways increased rapidly at L300

437

and L400 compared to L100 and L200. We demonstrated that when the gene numbers increase to

438

300 (L300, Table S1), it can cover most neural signaling pathways (90.3%, 28/31) with a gene

439

coverage percentage of about 10%-30% among pathways (Figure S7, S8). This was similar with

440

L400 (93.5%, 29/31), which covered one more signaling pathway, neuropeptide Y. To generate a

441

representative gene panel with a minimum number of genes, L300 was therefore employed for

442

the further analysis. In L300, we found that many key regulators involved in each pathway were

443

included, such as the neurotransmitter receptors (bmpr2a, gria1a, gabrg2 and fgfr1a) and

444

transporters (slc6a1a, slc6a9 and slc32a1) and the neurogenic differentiation factors (neurod1,

445

neurod2 and neurod6a). At the same time, it was sufficient to group the chemicals and display

446

the different expression patterns among them. The sorted three clusters closely resembled those

ACS Paragon Plus Environment

Page 20 of 36

Page 21 of 36

Environmental Science & Technology

447

from L2000 clustering, i.e. the neuro-active chemicals, xenoestrogens, microcystins and heavy

448

metals (Figure 2d, 5a).

449

Chord diagram further characterized the expression patterns via L300 for four neuro-active

450

chemicals (VPA, CAR, RA and CAFF) (Figure 5b). Thirty-seven genes involved in four enriched

451

pathways (FGF signaling, GABA signaling, Neurotrophin signaling and Wnt signaling) were

452

compared. Genes participated in GABAergic synaptic process were overlapped significantly and

453

were remarkable inhibited, such as the neurotransmitter transporters slc6a1a and slc32a1 and the

454

G proteins like gng3. Genes involved in FGF signaling were also similarly among chemicals and

455

showed upregulation, such as fgf8a and the related receptor fgfr1a. In contrast, the expression

456

levels of genes involved in Wnt signaling were distinct though they were highly overlapped as

457

well. For example, the protein kinase, prkcba, was upregulated by VPA and CAR, while it was

458

downregulated by RA and CAFF (Figure 5b). Similarly, the gene expression characterizations

459

can also be efficiently clarified for the cluster of microcystins and heavy metals. Forty-eight

460

genes involved in Wnt signaling, BMP signaling, Neurotrophin signaling or Dopamine signaling

461

were compared. Most genes involved in Wnt, BMP and Neurotrophin signaling were overlapped

462

among these four chemicals, especially for AgNO3 and Ag150, and significantly upregulated

463

such as SMAD family member smad7 and the C-terminal binding protein ctbp2a. The different

464

expression patterns among chemicals were observed for genes involved in dopamine signaling,

465

such as the dopa decarboxylase ddc and the G proteins like gng3 (Figure 5c).

466

In conclusion, we developed a Reduced Transcriptome Atlas approach to characterize and

467

prioritize the potential neurotoxicants using zebrafish. We successfully demonstrated four sets of

468

hits to be neurotoxic, i.e. neuro-active chemicals, xenoestrogens, microcystins and heavy metals,

469

from the transcriptomic dataset analysis and in-depth interpretation. These results were

ACS Paragon Plus Environment

Environmental Science & Technology

470

consistent with the previous reports on their neurotoxicological effects observed at higher

471

organizational levels (e.g. histological and physiological toxicities). We further generated a gene

472

panel with 300 neural genes, which could effectively characterize and cluster the chemicals as

473

L2000 and therefore would help to establish a cost-effective RZT approach specific targeting on

474

the nervous system. The RTA approach generated in the present study would be applied for

475

prioritization of environmental neurotoxicants and discovery of the in-depth molecular mode of

476

actions. It provides novel insights into the understanding of gene-function relationships and the

477

regulatory mechanisms and would contribute to the early molecular response identification,

478

adverse outcome pathway (AOP) construction and the further ecological risk assessments. The

479

strategy employed in the present study can also be applied to prioritize other types of toxicants,

480

such as the endocrine disrupting chemicals (EDCs) and cardiovascular disrupting chemicals.

481 482

ASSOCIATED CONTENT

483

Supporting information

484

Details of chemical abbreviation; transcriptomic datasets information (SI Table S1); descriptions

485

of the duration of exposure (Figure S1); gene ontology enrichment analysis (Figure S2); striped

486

view of the enriched Wnt signaling pathway (Figure S3) and FGF signaling pathway and GABA

487

signaling pathway (Figure S4); overlapping functional enrichment for xenoestrogens (Figure S5),

488

microcystins and heavy metals (Figure S6); gene expression index ranking (Figure S7); pathway

489

coverage of reduced L300 (Figure S8). This material is available free of charge via the internet at

490

http://pubs.acs.org.

491

Table S1. L2000 and L300 gene set generated in the present study.

492

ACS Paragon Plus Environment

Page 22 of 36

Page 23 of 36

Environmental Science & Technology

493

AUTHOR INFORMATION

494

Corresponding author

495

Tel.: +1 857 307 0318; E-mail: [email protected]

496 497

Notes

498

The authors declare no competing financial interest.

499 500

ACKNOWLEDGMENTS

501

We thank Nadja Brun (Woods Hole Oceanographic Institution, MA), Karl Fent (University of

502

Applied Sciences and Arts Northwestern Switzerland and Swiss Federal Institute of Technology

503

(ETH Zürich)) and Zaniar Ghazizadeh (Harvard Medical School, MA) for critically reading the

504

manuscript and making valuable suggestions.

ACS Paragon Plus Environment

Environmental Science & Technology

505

Figure Legends

506

Figure 1. Construction of the Reduced Transcriptome Atlas (RTA) Approach in Zebrafish.

507

(A) Design and the workflow of the RTA approach development. (B) L2000 gene set represents

508

the extrinsic and intrinsic factors involved in neurogenesis and early neuronal development of

509

zebrafish.

510 511

Figure 2. Chemical Prioritization through RTA Approach. (A) Ratio of RTA genes compared

512

to the L2000 gene set (left y-axis, red bar graph) and ratio of RTA genes compared to the whole

513

number of differentially expressed genes (right y-axis, blue dots) for each exposure signature.

514

Top 9 chemicals with the highest ratio of RTA genes/total genes are displayed on the right. (B)

515

Total gene expressions (Log2 fold change (Log2FC) of the different expressed gene) for each

516

signature. Top 8 chemicals with highest gene expression levels are displayed on the right. (C)

517

Hierarchical clustering of L2000 gene transcriptional responses for 135 exposure signatures.

518

White boxes in the figure indicate the similar expression patterns among chemicals. The colored

519

boxes on the right (C1, Yellow; C2, Green and C3, Blue) indicate the chemical groups displayed

520

on the subfigure (D). (D) Detailed information on the chemical clusters (Cluster 1, yellow:

521

neuro-active chemicals; Cluster 2, green: xenoestrogens; Cluster 3, blue: heavy metals and

522

microcystins) sorted by the hierarchical clustering.

523 524

Figure 3. Functional Enrichment Analysis of the Twelve Hit Chemicals. (A) KEGG pathway

525

enrichment. Top five KEGG pathways are displayed for each of the twelve chemicals (Valproic

526

acid (VPA), Carbamazepine (CAR), Fluoxetine (FLO), Bisphenol A (BPA), 17-α Ethinylestradiol

527

(EE2), Genistein (GEN), Cyanopeptolin (CP1020), Microcystin-LR (MCLR), AgNO3, Ag150,

ACS Paragon Plus Environment

Page 24 of 36

Page 25 of 36

Environmental Science & Technology

528

Cobalt

and

2,3,7,8-Tetrachlorodibenzo-p-dioxin

529

transcriptional

530

neurotransmitters and synaptic vesicle cycle) under four types of chemicals exposure (VPA,

531

BPA, MCLR and Ag150). (C) Polar area diagram shows the enriched neural signaling pathways

532

in zebrafish embryos under VPA exposure. The top 10 enriched pathways are described on the

533

right. The percentage value shows the proportion of different expressed genes compared to the

534

related gene panel for each signaling pathway recruited for the present study.

responses

of

key

extrinsic

(TCDD)). signaling

(B)

Heatmap

(morphogens,

depicts

growth

the

factors,

535 536

Figure 4. Overlapping Functional Enrichments of the Neuro-active Chemicals. (A) Venn

537

diagram shows the overlap of different expressed RTA genes relative to control among the four

538

neuro-active chemicals (VPA, CAR, RA and CAFF). (B) The top four shared KEGG pathways

539

are shown at below. (C) Overlapped gene ontology terms (BP: Biological Process; CC: Cellular

540

Component; MF: Molecular Function) among the four neuroactive chemicals shown in a bubble

541

plot using the Goplot package. Top five GO terms with adjusted p-value