Subscriber access provided by Nottingham Trent University
Omics Technologies Applied to Agriculture and Food
Gene Coexpression Networks Reveal Key Drivers of Flavonoid Variation in Eleven Tea Cultivars (Camellia sinensis) Chao Zheng, Jian-Qiang Ma, Jie-Dan Chen, Chun-Lei Ma, Wei Chen, Ming-Zhe Yao, and Liang Chen J. Agric. Food Chem., Just Accepted Manuscript • DOI: 10.1021/acs.jafc.9b04422 • Publication Date (Web): 12 Aug 2019 Downloaded from pubs.acs.org on August 13, 2019
Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.
is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.
Page 1 of 39
Journal of Agricultural and Food Chemistry
Gene Coexpression Networks Reveal Key Drivers of Flavonoid Variation in Eleven Tea Cultivars (Camellia sinensis) Chao Zheng, Jian-Qiang Ma, Jie-Dan Chen, Chun-Lei Ma, Wei Chen, Ming-Zhe Yao*, Liang Chen* Key Laboratory of Tea Biology and Resources Utilization, Ministry of Agriculture and Rural Affairs, Tea Research Institute of the Chinese Academy of Agricultural Sciences, Hangzhou, China * Correspondence: Liang Chen,
[email protected]; Ming-Zhe Yao,
[email protected] Tel: +86 571 86652835, Fax: +86 571 86653866
Chao Zheng:
[email protected]; Jian-Qiang Ma:
[email protected]; Jie-Dan Chen:
[email protected] Chun-Lei Ma:
[email protected]; Wei Chen:
[email protected].
ACS Paragon Plus Environment
Journal of Agricultural and Food Chemistry
Page 2 of 39
1
Abstract
2
Following the recent completion of the draft genome sequence of the tea plant, high-throughput
3
decoding of gene function, especially for those involved in complex secondary metabolism pathways
4
has become a major challenge. Here, we profiled the metabolome and transcriptome of eleven tea
5
cultivars, and then illustrated a weighted gene coexpression network analysis (WGCNA) based system
6
biology strategy to interpret metabolomic flux, predict gene functions, and mining key regulators
7
involved in the flavonoid biosynthesis pathway. We constructed a multilayered regulatory network,
8
which integrated gene coexpression relationship with the microRNA target and promoter cis-
9
regulatory element information. This allowed revealing new uncharacterized TFs (e.g., MADSs,
10
WRKYs, and SBPs) and microRNAs (including 17 conserved and 15 novel microRNAs) that
11
potentially implicate in different steps of the catechin biosynthesis. Furthermore, we applied
12
metabolic-signature-based association method to capture additional key regulators involved in
13
catechin pathway. This provides important clues for the functional characterization of five SCPL1A
14
acyltransferase family members, which might implicate in the production balance of anthocyanins,
15
galloylated catechins, and proanthocyanins. Application of ‘omics’-based system biology strategy
16
should facilitate germplasm utilization and provide valuable resources for tea quality improvement.
17 18
KEYWORDS: Camellia sinensis, catechin biosynthesis, metabolomic flux, weighted gene
19
coexpression network (WGCNA), widely targeted metabolomics, functional genomics
ACS Paragon Plus Environment
Page 3 of 39
Journal of Agricultural and Food Chemistry
20
Introduction
21
Plants including tea are genetically very diverse groups and present numerous types of species,
22
cultivars, genotypes and accessions, occurring worldwide. The food industry is currently searching for
23
new functional foods and nutraceuticals to help meet the demand presented by the consumers of natural,
24
immunity-boosting and health-promoting plant-based food products.1-2 Tea plant (Camellia sinensis
25
(L.) O. Kuntze), an important beverage crop, produces numerous characteristic components such as
26
theanine, caffeine, and catechins that are estimated to have various beneficial effects.3 Recently, several
27
special albino (e.g., ‘Anji Baicha’ and ‘Huangjinya’) and purple-leaf (e.g., ‘Zijuan’ and ‘Sunrouge’)
28
tea cultivars have been discovered and developed in many tea-growing countries.4-5 In comparison to
29
green-leaf tea cultivars, the concentration of anthocyanins is significantly higher in the new shoots of
30
purple-leaf tea cultivars;4 while the albino tea cultivars accumulate a higher quantity of amino acids
31
and lower levels of caffeine and catechins in their new shoots.5 The change in the intricate balance of
32
these secondary metabolites will greatly influence the flavor of tea. However, the characterization of
33
genes involved in the production of these secondary metabolites is still lagging behind, which
34
constitutes a major obstacle for metabolic engineering in tea plants. Most recently, one of the greatest
35
achievements of tea plant biology is the completion of the draft genome sequence, which will greatly
36
facilitate understanding of tea metabolite pathways and accelerate tea breeding progress.6-7 In tea
37
plants, most of the genes were functionally annotated based on sequence similarity analysis against
38
the known genes in public databases, with a few of them has been validated with direct experimental
39
evidence. Currently, the functional elucidation of unknown genes in tea plants, especially those
40
involved in secondary metabolism, is therefore, a critical challenge. Based on the sequence annotation
ACS Paragon Plus Environment
Journal of Agricultural and Food Chemistry
Page 4 of 39
41
of the C. sinensis genome, a substantial number of genes are assigned to large enzyme families,
42
including 22 SCPL1A acyltransferases genes, 30 terpene synthase genes, 148 ABC protein genes, 226
43
cytochrome P450 genes, and 267 glycosyltransferase genes, which are thought to play vital roles in
44
the transport, degradation, synthesis, and modification of particular metabolites in tea plants. 7
45
Although extensive efforts have been made to reveal the molecular mechanism of the production of
46
three major characteristic secondary metabolites (i.e., flavonoids, caffeine, and theanine) in tea plants
47
for germplasm selection and tea quality improvement, the function of the great majority of genes
48
involved in these pathways remains to be determined.8-14 Thus, a high-throughput functional genomics
49
analysis pipeline is essential.
50
Weighted gene coexpression network analysis (WGCNA) is a highly robust method for classifying
51
genes via hierarchical clustering of gene coexpression network (GCN).15 GCNs are particularly useful
52
for analyzing high-dimensionality expression datasets as they provide an intuitive framework for
53
describing changes in expression driven by cellular processes acting within or across multiple
54
conditions.15 Consequently, WGCNA is now commonly employed to analyze whole transcriptome,
55
proteomic, and metabolomic data to identify coexpression modules that relate either to a trait or
56
condition of interest.15-16 In our previous study, we used WGCNA to investigate the potential signaling
57
interactions between light quality and other environmental cues (e.g., low temperature, drought, and
58
blister blight disease) in tea plants.9 We revealed the molecular events that triggered by blue and green
59
light might partly overlap with stress responses, which suggest the possibility of a targeted use of blue
60
and green light to induce defense responses in tea plants. The WGCNA can also be extended to
61
investigate the gene modules that responsible for the variation of metabolites. Xu et al. (2018) profiled
ACS Paragon Plus Environment
Page 5 of 39
Journal of Agricultural and Food Chemistry
62
metabolome and transcriptome of the tea leaves harvested in different seasons, and by using WGCNA
63
based integration strategy, key metabolite–specific modules and regulators involved in monoterpenes
64
and sesquiterpenes were successfully delimited.17 MicroRNA (miRNA) target and promoter cis-
65
regulatory element (CRE) information can be easily integrated into plant GCNs. This strategy has been
66
employed to comprehensively identify target genes of the entire R2R3-MYB family in grape.18
67
Enrichment for miRNA targets within GCNs has demonstrated a vital role of four classes of miRNA–
68
transcription factor (TF) pairs in regulating terpenoid biosynthesis in tea plants.19 Thus, aggregating
69
genome-wide coexpression network and several regulatory networks into a community network can
70
be advantageous to provide supportive evidence and effectively reveal discrepancies between
71
individual networks.
72
To date, challenges in tea plant biology are lack of information on the complicated genetic
73
background of various cultivars, and the transformation system has not yet been established. The
74
integration of transcriptome and metabolome profiles can provide clues for the characterization of
75
functions of key metabolic genes and regulators in tea plants. Here, we employed the WGCNA based
76
system biology strategy for dissecting the complexity of 11 excellent tea cultivars’ secondary metabolic
77
variation. Special emphasis was placed on the practical application of the bait genes- and metabolites-
78
guide coexpression networks to interpret metabolomic flux, predict gene functions, and mining key
79
regulators involved in the catechin biosynthesis pathway.
80 81
Materials and Methods
82
Plant Materials
ACS Paragon Plus Environment
Journal of Agricultural and Food Chemistry
Page 6 of 39
83
Several studies have shown that there is considerable variability in metabolite composition underlying
84
the leaf-color mutations in tea plants.4-5 To obtain a highly informative (variable) dataset for WGCNA
85
analysis, we selected 11 tea cultivars with different shoot colors for transcriptome and metabolome
86
analysis. This should facilitate the implementation of network construction and module detection, as
87
well as module-metabolite association analysis. All tea cultivars (ten-year-old) used in this study were
88
grown at the China National Germplasm Tea Repository (Hangzhou, Zhejiang, China; latitude: 30.27°
89
N, longitude: 120.13° E). The cultivars ‘Fuding Dabaicha’ (GA), ‘TRI15’ (GB), ‘Longjing 43’ (GC),
90
‘Longjing Ziya’ (PA), ‘Liercha’ (PB), ‘Zijuan’ (PC), ‘Jingning Baicha’ (WA), ‘Anji Baicha’ (WB),
91
‘Huaye’ (YA), ‘Huangjinya’ (YB), and ‘Zhonghuang 3’ (YC) belongs to Camellia sinensis (Figure 1).
92
For each cultivar, first-flush young shoots (two leaves and a bud) in the spring were harvested from 30
93
individuals for metabolomics (three biological replications * five individuals for each replicate) and
94
transcriptomics (three biological replications * five individuals for each replicate) studies.
95
Metabolite Profiling and Statistical Analysis
96
The freeze-dried leaf sample was crushed using a mixer mill (MM 400, Retsch) for 1.5 min at 30 Hz
97
before extraction, 100 mg dried powder was extracted with 1.0 ml pure methanol spiked with 0.1 mg/L
98
lidocaine. Following centrifugation at 10000 g for 10 min, the lipid-solubility extracts were absorbed
99
and 0.4 ml of each extract was mixed and filtrated before LC–MS analysis. The quality control samples
100
(i.e., a mix of all samples) were injected at regular intervals (every ten samples) throughout the
101
analytical run to provide a set of data from which repeatability could be assessed. By comparing the
102
retention time, m/z values, and the fragmentation patterns with the standards, 462 metabolites were
103
identified (Figure S1). For the metabolites for which no commercial standards are available, peaks in
ACS Paragon Plus Environment
Page 7 of 39
Journal of Agricultural and Food Chemistry
104
the MS2T library were used to query the by comparing the MS/MS spectral data with literature
105
references and online metabolite databases (METLIN, HMDB, KNApSAcK, MoTo DB and
106
MassBank). Relative quantification was achieved by normalization of integrated extracted ion
107
chromatograms (XICs) using LC–ESI–MS/MS system (HPLC, Shim-pack UFLC SHIMADZU
108
CBM20A system; MS, Applied Biosystems 4000 Q-TRAP) by a large-scale multiple-reaction
109
monitoring (MRM) on a triple quadrupole (QQQ) MS (Figure S2).9,
110
detection window of 80 s and a target scan time of 1.5 s, the quadrupole filters the precursor ions
111
(parent ions) of the target substance and excludes the ions corresponding to other molecular weights
112
to prevent interference. After obtaining metabolite data from the different samples, the peak area of
113
the mass spectra of all substances was integrated using MultiQuant (v2.0) software (AB SCIEX). Row-
114
wise normalization was performed by the quality control sample, and data were log2-transformed and
115
auto-scaled to reduce any systematic bias within metabolite data. Principal component analysis (PCA)
116
and sparse Independent PCA (sIPCA) were implemented in R. The intra-cultivar Euclidean distances
117
(EDs) were calculated by the average of ED crosswise replicate samples within each cultivar, and inter-
118
cultivar EDs were calculated by first computing the average of replicates for each cultivar and then
119
computing the average ED of remaining cultivars calculated with this cultivar.
120
RNA-Seq and Data Processing
121
The total RNA was isolated using the RNeasy plant mini kit (Tiangen Bio, Beijing, China) according
122
to the manufacturer’s instructions. The cDNA libraries were sequenced on an Illumina HiSeq Xten
123
platform (Illumina, San Diego, CA, USA) to produce 150 bp paired-end reads at Novogene (Novogene,
124
Beijing, China). The raw reads were submitted to the BIGD (BIG Data Center, http://bigd.big.ac.cn/)
ACS Paragon Plus Environment
20
In the MRM mode with
Journal of Agricultural and Food Chemistry
Page 8 of 39
125
under the accession number PRJCA001447. Low-quality reads and adapters were filtered and trimmed
126
using Trimmomatic (v0.36). High-quality clean reads were then mapped to the C. sinensis genome
127
(http://tpia.teaplant.org/) using HISAT2 (v.2.1.0)21 and were counted using HTseq (v.0.6.1p1).22 TMM
128
normalization method was used to calculate gene expression in Bioconductor edgeR package.23 Gene
129
Ontology (GO) enrichment analysis of the differentially expressed genes (DEGs) was performed using
130
the GOseq R packages (v1.32.0).24
131
Gene and Metabolite Coexpression Network Construction
132
Weighted gene coexpression network analysis (WGCNA) was implemented using the WGCNA R
133
package (v1.63).15 Network construction and module detection were performed by ‘blockwiseModules’
134
function with adjusted parameters for transcriptome and metabolome datasets, which allows automatic
135
and unsupervised network construction for the input data. For transcriptome dataset, ‘minModuleSize’
136
= 30, ‘minKMEtoStay’ = 0.7, ‘mergeCutHeight’ = 0.05, and ‘minCoreKME’ = 0.8 were applied to
137
construct a ‘signed’ weighted correlation network, while ‘minModuleSize’ = 10, ‘minKMEtoStay’ =
138
0.6, ‘mergeCutHeight’ = 0.05, and ‘minCoreKME’= 0.7 were chosen for the analysis of metabolomic
139
datasets. The pattern of gene expression and metabolite level in a given module was summarized using
140
modules eigengene and eigenmetabolite values, respectively, which represent the first principal
141
component of the scaled data for genes/metabolites in each module.15 The module membership (kME)
142
was then calculated to assess the gene connectivity based on Pearson correlation. The module networks
143
were visualized using Cytoscape (v3.7.1).25
144
Quantitative Real-Time PCR (qRT-PCR) Validation
ACS Paragon Plus Environment
Page 9 of 39
Journal of Agricultural and Food Chemistry
145
The qRT-PCR analysis was carried out using SYBR Premix Ex Taq II kit (Takara, Japan) with a
146
LightCycler 480 Real-Time PCR system (Roche Applied Science) according to the manufacturer’s
147
instructions. Three biological replicates were performed for each cultivar, and GAPDH was used as an
148
endogenous control. The relative transcript abundance was calculated using the 2−ΔΔCT method.26
149
Primers used are shown in Table S1.
150
MicroRNA Target and Transcription Factor Binding Sites Prediction
151
Conserved and novel tea plant miRNAs were collected from our previous study.14 The psRNATarget
152
and psRobot were used to determine whether genes involved in the catechin biosynthesis pathway are
153
regulated by miRNAs.27-28 The intersection of psRNATarget and psRobot outputs was selected to
154
reduce the false-positive rate. Potential TF binding sites in the promoter region (1 kbp upstream from
155
the transcription start site) of the entire genome were identified using the TFBSTools R package and
156
JASPAR database (http://jaspar.genereg.net).29-30
157
Orthogonality of Regulatory Network
158
To compare ‘guilt-by-association’ based GCN with known functional networks of Arabidopsis
159
thaliana, structure or TF genes involved in the catechin pathway were subjected to BLASTX searching
160
against the Arabidopsis protein (TAIR, Version 11, http://www.arabidopsis.org) with a cutoff E-value
161
of 10-5. We then downloaded multiple Arabidopsis functional annotation networks from STRING v11
162
(http://string-db.org) database and compared whether edges based on expression similarity were found
163
in any known functional network.
164
ACS Paragon Plus Environment
Journal of Agricultural and Food Chemistry
Page 10 of 39
165
Results and Discussion
166
Highly Variable Metabolic Profiles of Eleven Tea Cultivars
167
To explore metabolic differences existing in 11 tea cultivars, we used a widely targeted metabolomic
168
approach to quantify the metabolite levels in tea leaves in a high-throughput manner. Of the 776
169
metabolites detected in tea leaves, 462 were identified by using authentic standards, while 314 were
170
putatively annotated by comparing the MS/MS spectral data with literature references and online
171
metabolite databases (Table S2). We then calculated the intra- and inter-cultivar EDs to measure the
172
“metabolic distance” within and among cultivars using the complete normalized metabolic dataset of
173
each sample pair (Figure 2B). As shown in the scatterplot, two purple-leaf cultivars, PB (EDintracultivar
174
= 46.3) and PC (EDintracultivar = 46.8) showed highest inter-cultivar variations among the cultivars, while
175
replicate samples exhibited relatively low intra-cultivar variations (EDintracultivar = 17.3–24.1). In
176
addition, we employed PCA to investigate the variance structure of metabolic composition in an
177
unsupervised way (Figure 2A). The first (PC1) and the second (PC2) principal components explained
178
19% and 14% of the metabolite variation, respectively. In agreement with the results of ED analysis,
179
we observed a large variation exist in the relative metabolic profiles between PB/PC and other cultivars
180
analyzed.
181
To gain a better clustering and to investigate major differential metabolites of these cultivars, we
182
employed a sparse version of Independent PCA (sIPCA), which performs an internal variable selection
183
and combines the advantages of both PCA and Independent Component Analysis (ICA) on metabolic
184
data (Figure 2C). Several important flavonoids (e.g., delphinidin 3-O-rutinoside, (+)-gallocatechin
185
(GC), cyanidin 3-O-glucosyl-malonylglucoside, chrysoeriol O-rhamnosyl-O-glucuronic acid,
ACS Paragon Plus Environment
Page 11 of 39
Journal of Agricultural and Food Chemistry
186
myricetin, and fustin) were identified as key metabolites responsible for cultivar discrimination based
187
on sIPCA (Figure 2D). Notably, the accumulation level of two anthocyanins (i.e., cyanidin 3-O-
188
glucosyl-malonylglucoside and delphinidin 3-O-rutinoside) in PB and PC was much lower than other
189
cultivars (Figure 2D and Table S2). However, the total amount of anthocyanins in PA, PB and PC were
190
significantly higher than other tea cultivars, which is consistent with previous studies showing that
191
anthocyanin overproduction is responsible for the purple coloration in tea plants (Table S2).13, 31-32 In
192
addition, it has been suggested that the main coloration components in the leaves of ‘Zijuan’ were
193
delphinidin-3-O-galactoside and cyanidin-3-O-galactoside.32 In another study, delphinidin-3-O-
194
coumaroylglucoside and cyanidin-3-O-coumaroylgalactoside were identified as the two most
195
abundant anthocyanins in ‘Mooma1’, which may responsible for the bright red coloration of new
196
shoots in this novel purple-leaf tea variety.13 Consistent with these studies, we observed that
197
anthocyanins in different purple-leaf cultivars might be modified by different anthocyanin
198
acyltransferases (ATs) and anthocyanidin 3-O-glycosyltransferases (UGTs) to give differently colored
199
anthocyanins (e.g., peonidin and pelargonidin) (Figure S3 and Table S2). Thus, the understanding of
200
the modification processes to produce those anthocyanins would help us to elucidate the complex
201
metabolic flux within the anthocyanidin pathways.
202
To reveal metabolic modules and possible metabolic flux responsible for the biosynthesis of
203
secondary metabolites in tea plants, we employed WGCNA on the metabolomic dataset. As shown in
204
Figure 3A, the WGCNA allowed identification of 19 metabolic modules. Obvious precursor-product
205
relationships were recognized within several modules, such as amino acid and its derivatives (MM11),
206
and nucleotide derivates and glycerophospholipids (MM12, MM17, and MM18) (Table S2). By
ACS Paragon Plus Environment
Journal of Agricultural and Food Chemistry
Page 12 of 39
207
calculating the correlation efficiency between eigenmetabolite networks, we found that the
208
eigenmetabolites of MM1 were negatively correlated with MM7 (r = -0.95, P = 4 × 10-17) (Figure 3B,
209
C, D). In MM1, a number of flavone glycosides (e.g., 6-C-hexosyl-luteolin O-hexoside, selgin 5-O-
210
hexoside, C-hexosyl-isorhamnetin O-hexoside, apigenin C-glucoside, and luteolin 3',7-di-O-glucoside)
211
and metabolites that located in the downstream flavonoid biosynthesis pathway such as
212
proanthocyanidins (e.g., procyanidin A1/A2), anthocyanins (e.g., delphinidin and cyanidin), and
213
catechin derivatives (e.g., catechin trimer) showed a significant lower production level in PB (Figure
214
3C, E and Table S2). On the other hand, PB accumulated a higher level of flavonols (e.g., kaempferol
215
3,7-dirhamnoside, isorhamnetin, quercetin 7-O-malonylhexosyl-hexoside, isorhamnetin O-acetyl-
216
hexoside, quercetin O-acetylhexoside, and kaempferol 3-O-rhamnoside) and flavones (e.g., velutin,
217
acacetin, and apigenin 4-O-rhamnoside) than other cultivars (MM7; Figure 3D, F and Table S2).
218
Metabolites in these two modules are mostly derived from the flavonoid biosynthetic pathway, and the
219
competition between these metabolites for the same substrates has also been reported.31, 33 For example,
220
the concentration of proanthocyanins could be greatly improved by redirecting anthocyanidins flux
221
into the proanthocyanin pathway and direct repression of isoflavone biosynthesis pathway.33
222
Furthermore, the significantly higher accumulation level of anthocyanin in the new shoot of purple-
223
leaf cultivar ‘Zijuan’ was proposed due to direct repression of biosynthesis of lignin.31 These results
224
indicated that the WGCNA protocol can be modified to construct weighted metabolic networks for
225
meaningful pathway inferring and biological interpretation. Together, information on the varietal
226
differences in the accumulation levels of these characteristic secondary metabolites (e.g., flavonols,
227
catechins, and anthocyanins), provides valuable insights to further investigate the sensory qualities and
ACS Paragon Plus Environment
Page 13 of 39
Journal of Agricultural and Food Chemistry
228
nutritional values of tea plants.
229
Multilevel Regulation of Catechin Biosynthesis
230
As an example of the so-called ‘guilt-by-association’ principle, genes that display similar expression
231
pattern across different cultivars have an increased likelihood of being within the same metabolic
232
pathway or biological process, as well as under the control of a shared regulatory mechanism. To
233
understand the coexpression relationships between genes at a systems level, we performed WGCNA
234
on the transcriptome dataset. This unbiased and unsupervised analysis identified 279 distinct
235
coexpression modules corresponding to clusters of correlated transcripts (Figure S4 and Table S3).
236
Flavonoids such as catechins, anthocyanins, and flavonols are important determinant of tea quality and
237
flavor.3 Some of them (e.g., delphinidin 3-O-rutinoside, (+)-gallocatechin (GC), cyanidin 3-O-
238
glucosyl-malonylglucoside, myricetin, and fustin) were identified as key metabolites responsible for
239
cultivar discrimination (Figure 2D). Characterizing genes involved in the regulation of these
240
metabolites is crucial for improving tea quality. Taking catechin pathway as an example, we selected
241
74 known genes encoding 15 enzymes involved in catechin biosynthesis as ‘bait-genes’ or ‘guide-
242
genes’ to construct a GCN based on the results of WGCNA (Figure 4 and Table S4). Keeping in mind
243
that GCN constructed based on the ‘guilt-by-association’ principle is by nature different from the
244
physical interactions.34 We then questioned to which extent WGCNA based GCN would provide
245
orthogonal and new information compared to existing functional genomic networks. We, therefore,
246
compared GCN to the genetic and physical interaction networks of Arabidopsis thaliana that stored in
247
the STRING database. We found that WGCNA based GCN could provide highly orthogonal
248
information (Figure S5). The maximum overlap in gene interaction was 11%, which was obtained by
ACS Paragon Plus Environment
Journal of Agricultural and Food Chemistry
Page 14 of 39
249
combining all evidences in the STRING database (Figure S5). The accuracy of RNA-seq results and
250
coexpression relationships was further verified by qRT-PCR (Figure S6).
251
Since miRNA and TF have been known to play a critical role in the regulation of flavonoid
252
biosynthesis.7, 10 We therefore also integrated promoter CRE and miRNA target information into GCN.
253
Our results reveal a much more complex regulatory network (including 45 enzyme genes, 155 TFs, 17
254
conserved and 15 novel miRNAs) for catechin biosynthesis (Figure 4 and Table S4). Promoters of a
255
considerable number of enzyme-coding genes (e.g., CHSs, PALs, DFRs, and SCPLs) were enriched
256
for CREs such as those for MADS, MYB, NAC, AP2/ERF, bZIP, WRKY, SBP, and bHLH TF binding
257
(Figure 4 and Table S4). In particular, the MYB binding motif was present in most of the early
258
biosynthesis genes (e.g., C4H, PAL, CHS, CHI, and F3H,) and late biosynthesis genes (e.g., ANS, LAR,
259
DFR, and SCPL1A) in the catechin pathway. The potential regulation of these genes by MYB and
260
bHLH TFs is supported by the coexpression analysis and several recent studies in tea plants.32, 35-36 For
261
example, MYB4 and MYB308 are known to function as transcriptional repressors of C4H, 4CL, LAR,
262
CHS and ANR to mediate anthocyanin biosynthesis in A. thaliana, tobacco, and tea plant.32, 36-37
263
Ectopic expression of these repressors could result in the loss of flowers pigmentation in tobacco due
264
to the decrease of anthocyanin levels.37 In our study, MYB4 (TEA032503.1) and MYB308
265
(TEA033191.1)) were shown a higher expression level in PC and coexpressed with PAL
266
(TEA003137.1, TEA023243.1, and TEA024587.1), ANS (TEA010322.1), F3H (TEA023790.1), and
267
LAR (TEA027582.1) (Figure 4 and Table S4). Hence, these MYB repressors were likely to mediate a
268
transcriptional feedback inhibition to prevent the deleterious effects of anthocyanin overproduction on
269
photosynthesis and other biological processed in PC. GL3 (TEA008168.1; GM131), a bHLH TF that
ACS Paragon Plus Environment
Page 15 of 39
Journal of Agricultural and Food Chemistry
270
isolated from the ‘Zijuan’ (PC), was coexpressed with CHI (TEA034003.1; GM131), CHS
271
(TEA034042.1; GM131) and DFR (TEA032730.1; GM131). In addition, GL3 displayed a higher
272
expression level in three purple (PA, PB, and PC) cultivars, suggesting its potential role in the
273
regulation of anthocyanin accumulation (Figure 4 and Table S4). This hypothesis is supported by a
274
recent study in ‘Zijuan’, demonstrating that GL3 might interact with CsAN1 (MYB TF) and CsTTG1
275
(WD repeat protein, WDR) to form the MYB-bHLH-WDR (MBW) transcriptional activation complex,
276
thus promoting the accumulation of anthocyanin.32
277
To date, the transcriptional control of flavonoid biosynthesis by MYB and bHLH TFs is well
278
described in many plant species; however, little is known about the role of other TFs in the regulation
279
of flavonoid accumulation.38 The GCN we generated is novel in suggesting the potential regulatory
280
roles by those TF families such as AP2/ERF, SBP, and MADS in catechin biosynthesis (Figure 4 and
281
Table S4). For instance, the presence of MADS CREs in DFRs and PALs coincided with the
282
coregulation of MADS TFs (e.g., TT16, SEP3, and AP3) to these genes. Among them, TT16 has been
283
reported to play a vital role in transcriptional regulation of proanthocyanin biosynthesis.39 Interestingly,
284
we observed a strong coregulation of several TF genes (e.g., 1 bHLH, 3 MYB, 2 NAC, 7 AP2/ERF,
285
and 6 WRKY), to C4H (TEA016772.1); however, no corresponding binding site of NAC, AP2/ERF,
286
and WRKY was found in the promoter region (Figure 4 and Table S4). This might be due to the fact
287
that several TFs, such as NAC, SPL, and WRKY could regulate the flavonoid biosynthesis through
288
interaction with the MBW complex.38 Therefore, cis-regulatory element-driven networks that
289
integrated gene coexpression relationship with promoter CRE structure information are powerful tools
290
for interfering new uncharacterized regulators that involved in the catechin biosynthesis pathway.
ACS Paragon Plus Environment
Journal of Agricultural and Food Chemistry
Page 16 of 39
291
Recently, scientific endeavors are directed toward understanding the posttranscriptional regulation
292
of the flavonoid pathway involving miRNAs.10, 14, 40-41 In the current study, 51 out of 200 catechin
293
biosynthesis-related genes/TFs were potentially targeted by 17 conserved and 15 novel miRNAs in tea
294
plants (Figure 4 and Table S4). Several of conserved miRNA (e.g., miR156, miR172, and miR858)
295
identified in the coexpression network have known to play a pivotal role in flavonoid biosynthesis
296
regulation.40-41 For example, miR858 has been demonstrated to act as a negative regulator to control
297
the anthocyanin accumulation in tomato.41 miRNA156-SPL9 pair influences anthocyanin
298
accumulation by destabilizing MBW complex and/or targeting C4H in Arabidopsis.40 Thus, the
299
enrichment of miRNA targets within GCN indicated an important role of these molecules and revealed
300
an additional layer of control of the expression of "switch" genes involved in the catechin biosynthesis.
301
Finally, our high-confidence GCN can be advantageous to effectively reveal a set of potential key
302
regulators of catechin biosynthesis by aggregating different kinds of regulatory information into a
303
community network.
304
305
Metabolic-Signature-Based Association Captures Additional Key Regulators Involved in the
306
Catechin Pathway
307
Since the prediction of gene function by ‘guilt-by-association’ principle is based on the coregulation
308
of a set of genes in the same biological process or pathway, only coregulated genes can be predicted
309
by this approach. However, genes involved in the same biological process or pathway are not
310
necessarily coexpressed when regulation occurs at posttranscriptional (e.g., miRNA and lncRNA
311
regulation) or posttranslational (e.g., ubiquitination and phosphorylation) levels, and some key
ACS Paragon Plus Environment
Page 17 of 39
Journal of Agricultural and Food Chemistry
312
regulators cannot be captured by this approach.34 Thus, we employed metabolic-signature-based
313
association method to capture additional key regulators involved in the catechin pathway.
314
To analyze the catechin pathway in more detail, error plots of metabolites within the pathway were
315
generated, and MAD scores were calculated to directly compare the variation level across samples
316
(Figure 5A, B). Our previous study demonstrated that albino cultivars (i.e., ‘Anji Baicha’, ‘Baijiguan’,
317
and ‘Huangjinya’) contained higher amounts of amino acids and relatively low levels of GC and EGCG
318
compared to the green-leaf tea cultivars (i.e., ‘Fuding Dabaicha’ and ‘Longjing 43’).4 In accordance
319
with this result, we observed that galloylated catechins had a lower abundance in albino tea cultivars
320
(e.g., WA and WB) than in most green- or purple-leaf tea cultivars, hence green tea processed from the
321
albino tea cultivars taste more refreshing and less bitter. We then related 279 modules to 776
322
metabolites using eigengene network methodology. Among them, 37 out of 279 coexpression modules
323
showed catechins biosynthesis-related expression values (r > 0.75, P < 10-3), that is, these modules
324
probably represent core gene networks operating in each catalysis and modification reaction (Figure
325
5C, D). Genes functioning in “protein/nucleotide binding”, “carbohydrate derivative/heterocyclic
326
compound/organic
327
“catalytic/transmembrane transporter activity” were observed to be responsible for most metabolites’
328
concentration variations in catechin pathway (Figure 6A). Genes participating in “chromosome
329
organization/protein localization”, “stress/hormone response”, “macromolecule modification”, “signal
330
transduction”, and “transport” altered catechin biosynthesis and modification most frequently (Figure
331
6B). The GO term association analysis reveals a close association of catechin biosynthesis, signaling,
332
and various abiotic/biotic stress responses, suggesting that catechins might play a pivotal role in plant-
cyclic
compound
binding”,
“hydrolase/transferase
ACS Paragon Plus Environment
activity”,
and
Journal of Agricultural and Food Chemistry
Page 18 of 39
333
environment interactions, in agreement with previous studies indicating antioxidant and antiinsect
334
activities of these important flavan-3-ol products.8, 42
335
Galloylated catechins (ECG and EGCG) are characteristic astringent taste compounds and comprise
336
up to 80% (ranged from 81.2 to 88.9%) of total catechins in tea leaves, and therefore have a large
337
impact on tea quality (Table S2). Epicatechin:1-O-galloyl-β-D-glucose O-galloyltransferase (ECGT),
338
an enzyme that belongs to subclade 1A of SCPL acyltransferase has been proposed to play vital roles
339
in galloylation of catechins.7 Comparative genomic analysis revealed a total of 22 SCPL1A genes in
340
the tea plant genome, and most of them (15) were arose from tandem duplications.7 The duplication of
341
these genes could largely promote the neo- and sub-functionalization to generate new enzymatic
342
activities and drive the evolution of plant metabolic complexity. Thus, additional efforts are necessary
343
to precisely identify whether some or all of these SCPL1A genes are involved in catechin galloylation
344
in tea plants.
345
Four modules (GM111, GM124, GM74, and GM264) are comprised of genes that are significantly
346
correlated the variation of CG, ECG, GCG, and EGCG (r > 0.75, P < 10-3; Figure 5C, D). In GM74,
347
an SCPL1A acyltransferase gene (TEA016463.1) was coexpressed with two WDR genes
348
(TEA026107.1 and TEA032249.1) (Figure 6C and Table S3). Proteins containing WDRs are known
349
to act as a docking platform to interact with MYB and bHLH TFs in regulating the anthocyanin and
350
proanthocyanidin biosynthesis in tea plants, thus highlighting WDRs as a possible key regulator of the
351
production of galloylated catechins.38, 43 Notably, many of the coexpressed genes, such as DFL1
352
(auxin), IAA16 (auxin), STE1 (brassinosteroid), ACS2 (ethylene), GID1C (gibberellin), and RLK
353
(salicylic acid), are involved in hormone metabolism and signaling (Figure 6C and Table S3).44 These
ACS Paragon Plus Environment
Page 19 of 39
Journal of Agricultural and Food Chemistry
354
results are consistent with several lines of evidence of crosstalk between hormones and flavonoids in
355
the control of various facets of plant developmental processes.45
356
In GM124, two SCPL1A acyltransferases (i.e., TEA034027.1 and TEA034034.1) were coexpressed
357
with several key positive regulators of flavonoids biosynthesis such as WDR protein (BUB3.2) and
358
bHLH TF (MYC2) (Figure 6D and Table S3). These regulators may not only be related to anthocyanin
359
or flavonol biosynthesis but also to stressor/stimuli such as MYC2.46 In addition, several stress
360
response genes (e.g., HOS1, GRP7, and PIP1B) and protein kinases (e.g., MPK9, dyrk2, and CRCK2)
361
were found to highly coexpressed with these regulators (Figure 6D and Table S3). Various
362
environmental stress signals, such as low temperature, high light, and UV irradiation could result in
363
oxidative stress and lead to flavonoid overproduction through the regulation expression of R2R3-MYBs
364
and bHLHs, as well as the activity of MBW complexes.8-9, 38 The activity of MBW complexes can be
365
regulated through various posttranslational modifications (e.g., ubiquitination and phosphorylation).38,
366
47-48
367
PAP1 and PAP2, and thus result in the repression of anthocyanin in Arabidopsis.48 In addition, TTG1
368
(WDR family) and TT8 (bHLH042) are also likely to be degraded by the E3 ubiquitin ligase, which
369
has not yet been identified.47 The E3 ubiquitin ligase HOS1 acts as cold signaling attenuator to
370
compromise the harmful effect of extreme cold responses by targeting ICE1 for proteasome
371
degradation.49 It is probable, therefore, that HOS1 (TEA015363.1) could trigger negative regulatory
372
feedback of catechin overproduction by control the expression of bHLH (MYC2, TEA009193.1) and
373
WDR (BUB3.2, TEA004851.1) genes.
374
For example, COP1/SPA E3 ubiquitin ligase could mediate the degradation of the MYB proteins
To date, a number of metabolic quantitative trait locus (QTL) studies have been conducted in tea
ACS Paragon Plus Environment
Journal of Agricultural and Food Chemistry
Page 20 of 39
375
plants to identify QTL affecting the production of characteristic secondary metabolites (e.g.,
376
theobromine, caffeine, and catechins).12, 50-51 Integration of QTL confidence intervals/markers with
377
expression based functional networks are therefore useful and needed to narrow the pool of candidates
378
and interpret the QTL results. HOS15 (TEA014884.1, GM160), another crucial repressor of cold
379
tolerance induced genes, was found to negatively correlated with the accumulation of dihydroquercetin
380
(precursor of catechins) (Table S3).52 Notably, HOS15 has also been identified as a putative QLT
381
related to catechin production in a recent study.51 Therefore, these repressors might play a key role in
382
the crosstalk between cold signaling and anthocyanin/catechin metabolic pathways.
383
Of particular interest is the identification of another two modules (GM77 and GM201) that comprise
384
SCPL1A (TEA010715.1 and TEA027270.1) were negatively correlated to the production of cyanidin
385
(Figure 5C and Table S3). Key regulators involved in the anthocyanin biosynthesis pathway, such as
386
genes encoding MYB75, WDR proteins (DWA1 and AT3G49400) and anthocyanin regulatory C1
387
protein, were found to coexpress with several glycosyltransferase (e.g., UGT92A1 and UGT80B1) and
388
acyltransferase (e.g., PEL3, GNA1, and SCPL1A) genes in these two modules (Figure 6E, F and Table
389
S3). In addition, we observed an inverse accumulation pattern exists between catechins (e.g., GCG,
390
and EGCG) and cyanidin in PB (Figure 5B). Therefore, we propose that anthocyanidins are either
391
reduced to generate catechins by anthocyanidin reductase (ANR) or immediately modified by those
392
glycosyltransferases and acyltransferases to produce anthocyanins in PB (Figure S3). Furthermore, an
393
inverse accumulation pattern was also observed between catechins and proanthocyanins (PAs, e.g.,
394
procyanidin A1/A2) in PB (Figure 5B). Catechins such as (-)-epicatechin and (+)-catechin are the
395
monomers and extension units of PAs. Therefore, the galloylation of catechins by SCPL1A may lead
ACS Paragon Plus Environment
Page 21 of 39
Journal of Agricultural and Food Chemistry
396
to the repression of the PAs biosynthesis. Together, we proposed that at least five SCPL1A
397
acyltransferase family members (TEA016463.1, TEA034027.1, TEA034034.1, TEA010715.1, and
398
TEA027270.1) could implicate in the production balance of anthocyanins, galloylated catechins, and
399
proanthocyanins in tea plants.
400
Here, we shown an ‘omics’-based data integration approaches to facilitate the identification of
401
functions of unknown metabolic genes and regulators in tea plants. These results will bring new tools
402
and valuable resources to understand how to improve and modify tea quality. Furthermore, additional
403
efforts are required to further account for features such as chromatin structure, TF combinations, and
404
protein-protein interactions by Chip-seq and ATAC-seq.
405 406
Abbreviations Used
407
GA, ‘Fuding Dabaicha’; GB, ‘TRI15’; GC, ‘Longjing 43’; PA, ‘Longjing Ziya’; PB, ‘Liercha’; PC,
408
‘Zijuan’; WA, ‘Jingning Baicha’; WB, ‘Anji Baicha’; YA, ‘Huaye’; YB, ‘Huangjinya’; YC,
409
‘Zhonghuang 3’; TF, transcription factor; miRNA, microRNA; CRE, cis-regulatory element; GO,
410
Gene Ontology; ED, Euclidean distance; MAD, Median absolute deviation; qRT-PCR, quantitative
411
real-time PCR; WGCNA, weighted gene coexpression network analysis; GCN, gene coexpression
412
network; PCA, principal component analysis; sIPCA, sparse Independent PCA; C, catechin; GC,
413
gallocatechin; EC, epicatechin; EGC, epigallocatechin; CG, catechin gallate; ECG, epicatechin gallate;
414
GCG, gallocatechin gallate; EGCG, epigallocatechin gallate; PAs, proanthocyanidins
415 416
Funding
ACS Paragon Plus Environment
Journal of Agricultural and Food Chemistry
Page 22 of 39
417
This work was supported by the Ministry of Agriculture of China through the Earmarked Fund for
418
China Agriculture Research System (CARS-019), and the Chinese Academy of Agricultural Sciences
419
through the Agricultural Science and Technology Innovation Program (CAAS-ASTIP-2017-
420
TRICAAS) to Liang Chen.
421 422
Supporting Information
423
Figure S1. Representative MS/MS spectra for metabolites identification.
424
Figure S2. Extracted ion chromatograms (XICs) for detection of multimodal maps.
425
Figure S3. Relative accumulation levels of anthocyanins across 11 tea cultivars. Variations in relative
426
accumulation levels of anthocyanins (Z-score, scale shown in the right) are indicated in color, where
427
red regions indicate higher accumulation level, and blue regions indicate lower accumulation.
428
Figure S4. Hierarchical clustering dendrogram showing gene coexpression modules identified using
429
WGCNA. Modules correspond to branches and are labeled by colors as indicated by the first color
430
band underneath the tree. Remaining color bands reveal highly correlated (red) or anti-correlated (blue)
431
genes for the particular cultivar.
432
Figure S5. Orthogonality of ‘guilt-by-association’ principle-based gene coexpression network to other
433
molecular data in the STRING database.
434
Figure S6. qRT-PCR validation of the RNA-seq results and coexpression relationships.
435
Table S1. Primers used for the qPCR assay.
436
Table S2. WGCNA results of metabolomic dataset and analysis the levels of catechin, anthocyanins,
437
and procyanidins across samples.
ACS Paragon Plus Environment
Page 23 of 39
Journal of Agricultural and Food Chemistry
438
Table S3. WGCNA results of transcriptome dataset.
439
Table S4. Gene coexpression relationship, miRNA target and promoter cis-regulatory element
440
information of integrated network for catechin regulation in tea plants.
441 442
References
443
(1) Ercisli, S.; Akbulut, M.; Ozdemir, O.; Sengul, M.; Orhan, E., Phenolic and antioxidant diversity
444
among persimmon (Diospyrus kaki L.) genotypes in Turkey. Int. J. Food Sci. Nutr. 2008, 59 (6), 477-
445
82.
446
(2) Serce, S.; Ercisli, S.; Sengul, M.; Gunduz, K.; Orhan, E., Antioxidant activities and fatty acid
447
composition of wild grown myrtle (Myrtus communis L.) fruits. Pharmacogn. Mag. 2010, 6 (21), 9-
448
12.
449
(3) Namita, P.; Mukesh, R.; Vijay, K. J., Camellia sinensis (green tea): A review. Global journal of
450
pharmacology 2012, 6 (2), 52-59.
451
(4) Li, C. F.; Ma, J. Q.; Huang, D. J.; Ma, C. L.; Jin, J. Q.; Yao, M. Z.; Chen, L., Comprehensive
452
Dissection of Metabolic Changes in Albino and Green Tea Cultivars. J. Agric. Food Chem. 2018, 66
453
(8), 2040-2048.
454
(5) Shen, J.; Zou, Z.; Zhang, X.; Zhou, L.; Wang, Y.; Fang, W.; Zhu, X., Metabolic analyses reveal
455
different mechanisms of leaf color change in two purple-leaf tea plant (Camellia sinensis L.) cultivars.
456
Hortic Res 2018, 5, 7.
457
(6) Xia, E. H.; Zhang, H. B.; Sheng, J.; Li, K.; Zhang, Q. J.; Kim, C.; Zhang, Y.; Liu, Y.; Zhu, T.; Li,
458
W.; Huang, H.; Tong, Y.; Nan, H.; Shi, C.; Shi, C.; Jiang, J. J.; Mao, S. Y.; Jiao, J. Y.; Zhang, D.; Zhao,
ACS Paragon Plus Environment
Journal of Agricultural and Food Chemistry
Page 24 of 39
459
Y.; Zhao, Y. J.; Zhang, L. P.; Liu, Y. L.; Liu, B. Y.; Yu, Y.; Shao, S. F.; Ni, D. J.; Eichler, E. E.; Gao, L.
460
Z., The Tea Tree Genome Provides Insights into Tea Flavor and Independent Evolution of Caffeine
461
Biosynthesis. Mol Plant 2017, 10 (6), 866-877.
462
(7) Wei, C.; Yang, H.; Wang, S.; Zhao, J.; Liu, C.; Gao, L.; Xia, E.; Lu, Y.; Tai, Y.; She, G.; Sun, J.;
463
Cao, H.; Tong, W.; Gao, Q.; Li, Y.; Deng, W.; Jiang, X.; Wang, W.; Chen, Q.; Zhang, S.; Li, H.; Wu,
464
J.; Wang, P.; Li, P.; Shi, C.; Zheng, F.; Jian, J.; Huang, B.; Shan, D.; Shi, M.; Fang, C.; Yue, Y.; Li, F.;
465
Li, D.; Wei, S.; Han, B.; Jiang, C.; Yin, Y.; Xia, T.; Zhang, Z.; Bennetzen, J. L.; Zhao, S.; Wan, X.,
466
Draft genome sequence of Camellia sinensis var. sinensis provides insights into the evolution of the
467
tea genome and tea quality. Proc. Natl. Acad. Sci. U. S. A. 2018, 115 (18), E4151-E4158.
468
(8) Zheng, C.; Wang, Y.; Ding, Z.; Zhao, L., Global Transcriptional Analysis Reveals the Complex
469
Relationship between Tea Quality, Leaf Senescence and the Responses to Cold-Drought Combined
470
Stress in Camellia sinensis. Front Plant Sci 2016, 7, 1858.
471
(9) Zheng, C.; Ma, J. Q.; Ma, C. L.; Shen, S. Y.; Liu, Y. F.; Chen, L., Regulation of Growth and
472
Flavonoid Formation of Tea Plants (Camellia sinensis) by Blue and Green Light. J. Agric. Food Chem.
473
2019, 67 (8), 2408-2419.
474
(10)Zhao, L.; Chen, C.; Wang, Y.; Shen, J.; Ding, Z., Conserved MicroRNA Act Boldly During Sprout
475
Development and Quality Formation in Pingyang Tezaocha (Camellia sinensis). Frontiers in genetics
476
2019, 10, 237.
477
(11) Xu, L. Y.; Wang, L. Y.; Wei, K.; Tan, L. Q.; Su, J. J.; Cheng, H., High-density SNP linkage map
478
construction and QTL mapping for flavonoid-related traits in a tea plant (Camellia sinensis) using 2b-
479
RAD sequencing. BMC Genomics 2018, 19 (1), 955.
ACS Paragon Plus Environment
Page 25 of 39
Journal of Agricultural and Food Chemistry
480
(12)Ma, J. Q.; Jin, J. Q.; Yao, M. Z.; Ma, C. L.; Xu, Y. X.; Hao, W. J.; Chen, L., Quantitative Trait Loci
481
Mapping for Theobromine and Caffeine Contents in Tea Plant (Camellia sinensis). J. Agric. Food
482
Chem. 2018, 66 (50), 13321-13327.
483
(13)He, X.; Zhao, X.; Gao, L.; Shi, X.; Dai, X.; Liu, Y.; Xia, T.; Wang, Y., Isolation and
484
Characterization of Key Genes that Promote Flavonoid Accumulation in Purple-leaf Tea (Camellia
485
sinensis L.). Sci. Rep. 2018, 8 (1), 130.
486
(14)Liu, S. C.; Xu, Y. X.; Ma, J. Q.; Wang, W. W.; Chen, W.; Huang, D. J.; Fang, J.; Li, X. J.; Chen,
487
L., Small RNA and degradome profiling reveals important roles for microRNAs and their targets in
488
tea plant response to drought stress. Physiol. Plant. 2016, 158 (4), 435-451.
489
(15)Langfelder, P.; Horvath, S., WGCNA: an R package for weighted correlation network analysis.
490
BMC Bioinformatics 2008, 9 (1), 559.
491
(16)Pei, G.; Chen, L.; Zhang, W., WGCNA Application to Proteomic and Metabolomic Data Analysis.
492
In Method. Enzymol., Elsevier: 2017; Vol. 585, pp 135-158.
493
(17)Xu, Q.; He, Y.; Yan, X.; Zhao, S.; Zhu, J.; Wei, C., Unraveling a crosstalk regulatory network of
494
temporal aroma accumulation in tea plant (Camellia sinensis) leaves by integration of metabolomics
495
and transcriptomics. Environ. Exp. Bot. 2018, 149, 81-94.
496
(18)Wong, D. C.; Schlechter, R.; Vannozzi, A.; Holl, J.; Hmmam, I.; Bogs, J.; Tornielli, G. B.;
497
Castellarin, S. D.; Matus, J. T., A systems-oriented analysis of the grapevine R2R3-MYB transcription
498
factor family uncovers new insights into the regulation of stilbene accumulation. DNA Res. 2016.
499
(19)Zhao, S.; Wang, X.; Yan, X.; Guo, L.; Mi, X.; Xu, Q.; Zhu, J.; Wu, A.; Liu, L.; Wei, C., Revealing
500
of MicroRNA Involved Regulatory Gene Networks on Terpenoid Biosynthesis in Camellia sinensis in
ACS Paragon Plus Environment
Journal of Agricultural and Food Chemistry
Page 26 of 39
501
Different Growing Time Points. J. Agric. Food Chem. 2018, 66 (47), 12604-12616.
502
(20)Chen, W.; Gong, L.; Guo, Z.; Wang, W.; Zhang, H.; Liu, X.; Yu, S.; Xiong, L.; Luo, J., A novel
503
integrated method for large-scale detection, identification, and quantification of widely targeted
504
metabolites: application in the study of rice metabolomics. Mol Plant 2013, 6 (6), 1769-80.
505
(21)Kim, D.; Langmead, B.; Salzberg, S. L., HISAT: a fast spliced aligner with low memory
506
requirements. Nat. Methods 2015, 12 (4), 357-60.
507
(22)Anders, S.; Pyl, P. T.; Huber, W., HTSeq--a Python framework to work with high-throughput
508
sequencing data. Bioinformatics 2015, 31 (2), 166-9.
509
(23)Robinson, M. D.; McCarthy, D. J.; Smyth, G. K., edgeR: a Bioconductor package for differential
510
expression analysis of digital gene expression data. Bioinformatics 2010, 26 (1), 139-40.
511
(24)Young, M. D.; Wakefield, M. J.; Smyth, G. K.; Oshlack, A., Gene ontology analysis for RNA-seq:
512
accounting for selection bias. Genome Biol. 2010, 11 (2), R14.
513
(25)Shannon, P.; Markiel, A.; Ozier, O.; Baliga, N. S.; Wang, J. T.; Ramage, D.; Amin, N.;
514
Schwikowski, B.; Ideker, T., Cytoscape: a software environment for integrated models of biomolecular
515
interaction networks. Genome Res. 2003, 13 (11), 2498-504.
516
(26)Livak, K. J.; Schmittgen, T. D., Analysis of relative gene expression data using real-time
517
quantitative PCR and the 2(-Delta Delta C(T)) Method. Methods 2001, 25 (4), 402-8.
518
(27)Dai, X.; Zhuang, Z.; Zhao, P. X., psRNATarget: a plant small RNA target analysis server (2017
519
release). Nucleic. Acids. Res. 2018, 46 (W1), W49-W54.
520
(28)Wu, H. J.; Ma, Y. K.; Chen, T.; Wang, M.; Wang, X. J., PsRobot: a web-based plant small RNA
521
meta-analysis toolbox. Nucleic. Acids. Res. 2012, 40 (Web Server issue), W22-8.
ACS Paragon Plus Environment
Page 27 of 39
Journal of Agricultural and Food Chemistry
522
(29)Tan, G.; Lenhard, B., TFBSTools: an R/bioconductor package for transcription factor binding site
523
analysis. Bioinformatics 2016, 32 (10), 1555-6.
524
(30)Khan, A.; Fornes, O.; Stigliani, A.; Gheorghe, M.; Castro-Mondragon, J. A.; van der Lee, R.; Bessy,
525
A.; Cheneby, J.; Kulkarni, S. R.; Tan, G.; Baranasic, D.; Arenillas, D. J.; Sandelin, A.; Vandepoele, K.;
526
Lenhard, B.; Ballester, B.; Wasserman, W. W.; Parcy, F.; Mathelier, A., JASPAR 2018: update of the
527
open-access database of transcription factor binding profiles and its web framework. Nucleic. Acids.
528
Res. 2018, 46 (D1), D260-D266.
529
(31)Wang, L.; Pan, D.; Liang, M.; Abubakar, Y. S.; Li, J.; Lin, J.; Chen, S.; Chen, W., Regulation of
530
Anthocyanin Biosynthesis in Purple Leaves of Zijuan Tea (Camellia sinensis var. kitamura). Int. J.
531
Mol. Sci. 2017, 18 (4).
532
(32)Sun, B.; Zhu, Z.; Cao, P.; Chen, H.; Chen, C.; Zhou, X.; Mao, Y.; Lei, J.; Jiang, Y.; Meng, W.;
533
Wang, Y.; Liu, S., Purple foliage coloration in tea (Camellia sinensis L.) arises from activation of the
534
R2R3-MYB transcription factor CsAN1. Sci. Rep. 2016, 6, 32534.
535
(33)Li, P.; Dong, Q.; Ge, S.; He, X.; Verdier, J.; Li, D.; Zhao, J., Metabolic engineering of
536
proanthocyanidin production by repressing the isoflavone pathways and redirecting anthocyanidin
537
precursor flux in legume. Plant Biotechnol. J. 2016, 14 (7), 1604-18.
538
(34)Saito, K.; Hirai, M. Y.; Yonekura-Sakakibara, K., Decoding genes with coexpression networks and
539
metabolomics - 'majority report by precogs'. Trends Plant Sci. 2008, 13 (1), 36-43.
540
(35)Zhao, L.; Gao, L.; Wang, H.; Chen, X.; Wang, Y.; Yang, H.; Wei, C.; Wan, X.; Xia, T., The R2R3-
541
MYB, bHLH, WD40, and related transcription factors in flavonoid biosynthesis. Funct. Integr.
542
Genomics 2013, 13 (1), 75-98.
ACS Paragon Plus Environment
Journal of Agricultural and Food Chemistry
Page 28 of 39
543
(36)Li, M.; Li, Y.; Guo, L.; Gong, N.; Pang, Y.; Jiang, W.; Liu, Y.; Jiang, X.; Zhao, L.; Wang, Y.; Xie,
544
D. Y.; Gao, L.; Xia, T., Functional Characterization of Tea (Camellia sinensis) MYB4a Transcription
545
Factor Using an Integrative Approach. Front Plant Sci 2017, 8, 943.
546
(37)Chen, L.; Hu, B.; Qin, Y.; Hu, G.; Zhao, J., Advance of the negative regulation of anthocyanin
547
biosynthesis by MYB transcription factors. Plant Physiol. Biochem. 2019, 136, 178-187.
548
(38)Xu, W.; Dubos, C.; Lepiniec, L., Transcriptional control of flavonoid biosynthesis by MYB-
549
bHLH-WDR complexes. Trends Plant Sci. 2015, 20 (3), 176-85.
550
(39)Nesi, N.; Debeaujon, I.; Jond, C.; Stewart, A. J.; Jenkins, G. I.; Caboche, M.; Lepiniec, L., The
551
TRANSPARENT TESTA16 locus encodes the ARABIDOPSIS BSISTER MADS domain protein and
552
is required for proper development and pigmentation of the seed coat. Plant Cell 2002, 14 (10), 2463-
553
79.
554
(40)Xing, S.; Salinas, M.; Hohmann, S.; Berndtgen, R.; Huijser, P., miR156-targeted and nontargeted
555
SBP-box transcription factors act in concert to secure male fertility in Arabidopsis. Plant Cell 2010,
556
22 (12), 3935-50.
557
(41)Jia, X.; Shen, J.; Liu, H.; Li, F.; Ding, N.; Gao, C.; Pattanaik, S.; Patra, B.; Li, R.; Yuan, L., Small
558
tandem target mimic-mediated blockage of microRNA858 induces anthocyanin accumulation in
559
tomato. Planta 2015, 242 (1), 283-93.
560
(42)War, A. R.; Paulraj, M. G.; Ahmad, T.; Buhroo, A. A.; Hussain, B.; Ignacimuthu, S.; Sharma, H.
561
C., Mechanisms of plant defense against insect herbivores. Plant signaling & behavior 2012, 7 (10),
562
1306-20.
563
(43)Liu, Y.; Hou, H.; Jiang, X.; Wang, P.; Dai, X.; Chen, W.; Gao, L.; Xia, T., A WD40 Repeat Protein
ACS Paragon Plus Environment
Page 29 of 39
Journal of Agricultural and Food Chemistry
564
from Camellia sinensis Regulates Anthocyanin and Proanthocyanidin Accumulation through the
565
Formation of MYB-bHLH-WD40 Ternary Complexes. Int. J. Mol. Sci. 2018, 19 (6).
566
(44)Ohri, P.; Bhardwaj, R.; Bali, S.; Kaur, R.; Jasrotia, S.; Khajuria, A.; Parihar, R. D., The common
567
molecular players in plant hormone crosstalk and signaling. Curr Protein Pept Sci 2015, 16 (5), 369-
568
88.
569
(45)Falcone Ferreyra, M. L.; Rius, S. P.; Casati, P., Flavonoids: biosynthesis, biological functions, and
570
biotechnological applications. Front Plant Sci 2012, 3, 222.
571
(46)Abe, H.; Urao, T.; Ito, T.; Seki, M.; Shinozaki, K.; Yamaguchi-Shinozaki, K., Arabidopsis
572
AtMYC2 (bHLH) and AtMYB2 (MYB) function as transcriptional activators in abscisic acid signaling.
573
Plant Cell 2003, 15 (1), 63-78.
574
(47)Patra, B.; Pattanaik, S.; Yuan, L., Proteolytic degradation of the flavonoid regulators,
575
TRANSPARENT TESTA8 and TRANSPARENT TESTA GLABRA1, in Arabidopsis is mediated by
576
the ubiquitin/26Sproteasome system. Plant signaling & behavior 2013, 8 (10), doi: 10 4161/psb 25901.
577
(48)Maier, A.; Schrader, A.; Kokkelink, L.; Falke, C.; Welter, B.; Iniesto, E.; Rubio, V.; Uhrig, J. F.;
578
Hulskamp, M.; Hoecker, U., Light and the E3 ubiquitin ligase COP1/SPA control the protein stability
579
of the MYB transcription factors PAP1 and PAP2 involved in anthocyanin accumulation in Arabidopsis.
580
Plant J. 2013, 74 (4), 638-51.
581
(49)Miura, K.; Furumoto, T., Cold signaling and cold response in plants. Int. J. Mol. Sci. 2013, 14 (3),
582
5312-37.
583
(50)Ma, J. Q.; Yao, M. Z.; Ma, C. L.; Wang, X. C.; Jin, J. Q.; Wang, X. M.; Chen, L., Construction of
584
a SSR-based genetic map and identification of QTLs for catechins content in tea plant (Camellia
ACS Paragon Plus Environment
Journal of Agricultural and Food Chemistry
Page 30 of 39
585
sinensis). PLoS One 2014, 9 (3), e93131.
586
(51)Koech, R. K.; Malebe, P. M.; Nyarukowa, C.; Mose, R.; Kamunya, S. M.; Joubert, F.; Apostolides,
587
Z., Functional annotation of putative QTL associated with black tea quality and drought tolerance traits.
588
Sci. Rep. 2019, 9 (1), 1465.
589
(52)Park, J.; Lim, C. J.; Shen, M.; Park, H. J.; Cha, J. Y.; Iniesto, E.; Rubio, V.; Mengiste, T.; Zhu, J.
590
K.; Bressan, R. A.; Lee, S. Y.; Lee, B. H.; Jin, J. B.; Pardo, J. M.; Kim, W. Y.; Yun, D. J., Epigenetic
591
switch from repressive to permissive chromatin in response to cold stress. Proc. Natl. Acad. Sci. U. S.
592
A. 2018, 115 (23), E5400-E5409.
593 594
Figure Captions
595
Figure 1. Leaf phenotypes of 11 tea cultivars. ‘Fuding Dabaicha’ (GA); ‘TRI15’ (GB); ‘Longjing 43’
596
(GC); ‘Longjing Ziya’ (PA); ‘Liercha’ (PB); ‘Zijuan’ (PC); ‘Huaye’ (YA); ‘Huangjinya’ (YB);
597
‘Zhonghuang 3’ (YC); ‘Jingning Baicha’ (WA); ‘Anji Baicha’ (WB).
598
Figure 2. Metabolic distance and variation between cultivars. (A) Principal component analysis (PCA)
599
and (C) sparse Independent PCA (sIPCA) sample representation using the first two components. (B)
600
Scatterplot of intra- and inter-cultivar Euclidean distances (EDs) using the complete normalized
601
metabolic profiles of a sample pair. (D) Heatmap of normalized accumulation level for the top 30
602
metabolites that were selected based on ranked sIPCA loading values.
603
Figure 3. Metabolite coexpression network analysis. (A) Hierarchical clustering dendrogram showing
604
metabolite coexpression modules identified using WGCNA. Modules correspond to branches and are
605
labeled by colors as indicated by the first color band underneath the tree. Remaining color bands reveal
ACS Paragon Plus Environment
Page 31 of 39
Journal of Agricultural and Food Chemistry
606
highly correlated (red) or anti-correlated (blue) metabolites for the particular cultivar. (B) Correlation
607
analysis of module eigenmetabolite that summarizes the modules identified in the clustering analysis.
608
Blue color represents the negative correlation, while red represents the positive correlation.
609
Eigenmetabolite accumulation for the (C) MM1 and (D) MM7. Bar plots present module
610
eigenmetabolite values. Error bars represent the standard errors among three biological replicates for
611
each cultivar. Heatmap of normalized accumulation level for top 20 metabolites in (E) MM1 and (F)
612
MM7 that were selected based on the WGCNA measure of intramodular gene connectivity (kME).
613
Figure 4. An integrated network for catechin regulation in the tea plant. Octagon, circle, and triangle
614
nodes represent enzyme-coding genes, transcription factors (TFs), and microRNAs (miRNAs),
615
respectively. Coexpression relationships between different TF family members (colored solid circles)
616
and enzyme-coding genes are depicted by corresponding colored edges. Dashed red lines with arrows
617
depict predicted miRNA-gene interaction. Pie chart colors represent the presence of selected TF-
618
binding sites in promoter regions (1kbp upstream from the transcription start site) of the corresponding
619
enzyme-coding genes.
620
Figure 5. The variation of metabolite levels and module gene expression involved in the catechin
621
pathway. (A) Biosynthetic pathway of the principal catechins. CHS, CHI, F3H, F3′H, F3′5′H, DFR,
622
ANS, LAR, ANR, and SCPL1A represent chalcone synthase, chalcone isomerase, flavanone 3-
623
hydroxylase, flavonoid 3′-hydroxylase, flavonoid 3′,5′-hydroxylase, dihydroflavonol 4-reductase,
624
anthocyanidin synthase, leucoanthocyanidin reductase, anthocyanidin reductase, and type 1A serine
625
carboxypeptidase-like acyltransferases, respectively. (B) The relative intensity of metabolites in the
626
catechin pathway. Mean expression values of metabolite intensities with their standard error bars from
ACS Paragon Plus Environment
Journal of Agricultural and Food Chemistry
Page 32 of 39
627
three biological replicates are represented. The asterisks indicate significant differences across all
628
samples (Kruskal-Wallis, "*", P < 0.05; "**", P < 0.01; "***", P < 0.001). Color stripe represents
629
metabolite-specific variation coefficients calculated as the relative median absolute distance (relative
630
MAD); the degree of MAD is indicated by the node color from green (low) to red (high). (C) The
631
heatmap on the bottom left showing the significant correlation between metabolite variation and
632
module eigenmetabolite. Blue color represents negative correlation (r < -0.75, P < 10-3), while red
633
represents positive correlation (r > 0.75, P < 10-3). (D) The heatmap on the bottom right showing
634
relative expression of genes in 37 catechin pathway-related modules across all samples.
635
Figure 6. GO enrichment and network analysis for catechin-related modules. GO (A) molecular
636
function and (B) biological processes analysis for catechin-related modules. The color of the dots in
637
the scatterplot represents the range of the -log10 (P-value). The correlation network of (C) GM74, (D)
638
GM124, (E) GM77, and (F) GM201. Genes with a higher degree are indicated by red. Potential key
639
genes and regulators discussed in the Results and Discussion section are indicated by larger circles.
ACS Paragon Plus Environment
Page 33 of 39
Journal of Agricultural and Food Chemistry
Figure Graphics Figure 1
ACS Paragon Plus Environment
Journal of Agricultural and Food Chemistry
Figure 2
ACS Paragon Plus Environment
Page 34 of 39
Page 35 of 39
Journal of Agricultural and Food Chemistry
Figure 3
ACS Paragon Plus Environment
Journal of Agricultural and Food Chemistry
Figure 4
ACS Paragon Plus Environment
Page 36 of 39
Page 37 of 39
Journal of Agricultural and Food Chemistry
Figure 5
ACS Paragon Plus Environment
Journal of Agricultural and Food Chemistry
Figure 6
ACS Paragon Plus Environment
Page 38 of 39
Page 39 of 39
Journal of Agricultural and Food Chemistry
Graphic for Table of Contents
ACS Paragon Plus Environment