Gene Coexpression Networks Reveal Key Drivers of Flavonoid


23 hours ago - Here, we profiled the metabolome and transcriptome of eleven tea cultivars, and then illustrated a weighted gene coexpression network ...
0 downloads 0 Views 2MB Size


Subscriber access provided by Nottingham Trent University

Omics Technologies Applied to Agriculture and Food

Gene Coexpression Networks Reveal Key Drivers of Flavonoid Variation in Eleven Tea Cultivars (Camellia sinensis) Chao Zheng, Jian-Qiang Ma, Jie-Dan Chen, Chun-Lei Ma, Wei Chen, Ming-Zhe Yao, and Liang Chen J. Agric. Food Chem., Just Accepted Manuscript • DOI: 10.1021/acs.jafc.9b04422 • Publication Date (Web): 12 Aug 2019 Downloaded from pubs.acs.org on August 13, 2019

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 39

Journal of Agricultural and Food Chemistry

Gene Coexpression Networks Reveal Key Drivers of Flavonoid Variation in Eleven Tea Cultivars (Camellia sinensis) Chao Zheng, Jian-Qiang Ma, Jie-Dan Chen, Chun-Lei Ma, Wei Chen, Ming-Zhe Yao*, Liang Chen* Key Laboratory of Tea Biology and Resources Utilization, Ministry of Agriculture and Rural Affairs, Tea Research Institute of the Chinese Academy of Agricultural Sciences, Hangzhou, China * Correspondence: Liang Chen, [email protected]; Ming-Zhe Yao, [email protected] Tel: +86 571 86652835, Fax: +86 571 86653866

Chao Zheng: [email protected]; Jian-Qiang Ma: [email protected]; Jie-Dan Chen: [email protected] Chun-Lei Ma: [email protected]; Wei Chen: [email protected]

ACS Paragon Plus Environment

Journal of Agricultural and Food Chemistry

Page 2 of 39

1

Abstract

2

Following the recent completion of the draft genome sequence of the tea plant, high-throughput

3

decoding of gene function, especially for those involved in complex secondary metabolism pathways

4

has become a major challenge. Here, we profiled the metabolome and transcriptome of eleven tea

5

cultivars, and then illustrated a weighted gene coexpression network analysis (WGCNA) based system

6

biology strategy to interpret metabolomic flux, predict gene functions, and mining key regulators

7

involved in the flavonoid biosynthesis pathway. We constructed a multilayered regulatory network,

8

which integrated gene coexpression relationship with the microRNA target and promoter cis-

9

regulatory element information. This allowed revealing new uncharacterized TFs (e.g., MADSs,

10

WRKYs, and SBPs) and microRNAs (including 17 conserved and 15 novel microRNAs) that

11

potentially implicate in different steps of the catechin biosynthesis. Furthermore, we applied

12

metabolic-signature-based association method to capture additional key regulators involved in

13

catechin pathway. This provides important clues for the functional characterization of five SCPL1A

14

acyltransferase family members, which might implicate in the production balance of anthocyanins,

15

galloylated catechins, and proanthocyanins. Application of ‘omics’-based system biology strategy

16

should facilitate germplasm utilization and provide valuable resources for tea quality improvement.

17 18

KEYWORDS: Camellia sinensis, catechin biosynthesis, metabolomic flux, weighted gene

19

coexpression network (WGCNA), widely targeted metabolomics, functional genomics

ACS Paragon Plus Environment

Page 3 of 39

Journal of Agricultural and Food Chemistry

20

Introduction

21

Plants including tea are genetically very diverse groups and present numerous types of species,

22

cultivars, genotypes and accessions, occurring worldwide. The food industry is currently searching for

23

new functional foods and nutraceuticals to help meet the demand presented by the consumers of natural,

24

immunity-boosting and health-promoting plant-based food products.1-2 Tea plant (Camellia sinensis

25

(L.) O. Kuntze), an important beverage crop, produces numerous characteristic components such as

26

theanine, caffeine, and catechins that are estimated to have various beneficial effects.3 Recently, several

27

special albino (e.g., ‘Anji Baicha’ and ‘Huangjinya’) and purple-leaf (e.g., ‘Zijuan’ and ‘Sunrouge’)

28

tea cultivars have been discovered and developed in many tea-growing countries.4-5 In comparison to

29

green-leaf tea cultivars, the concentration of anthocyanins is significantly higher in the new shoots of

30

purple-leaf tea cultivars;4 while the albino tea cultivars accumulate a higher quantity of amino acids

31

and lower levels of caffeine and catechins in their new shoots.5 The change in the intricate balance of

32

these secondary metabolites will greatly influence the flavor of tea. However, the characterization of

33

genes involved in the production of these secondary metabolites is still lagging behind, which

34

constitutes a major obstacle for metabolic engineering in tea plants. Most recently, one of the greatest

35

achievements of tea plant biology is the completion of the draft genome sequence, which will greatly

36

facilitate understanding of tea metabolite pathways and accelerate tea breeding progress.6-7 In tea

37

plants, most of the genes were functionally annotated based on sequence similarity analysis against

38

the known genes in public databases, with a few of them has been validated with direct experimental

39

evidence. Currently, the functional elucidation of unknown genes in tea plants, especially those

40

involved in secondary metabolism, is therefore, a critical challenge. Based on the sequence annotation

ACS Paragon Plus Environment

Journal of Agricultural and Food Chemistry

Page 4 of 39

41

of the C. sinensis genome, a substantial number of genes are assigned to large enzyme families,

42

including 22 SCPL1A acyltransferases genes, 30 terpene synthase genes, 148 ABC protein genes, 226

43

cytochrome P450 genes, and 267 glycosyltransferase genes, which are thought to play vital roles in

44

the transport, degradation, synthesis, and modification of particular metabolites in tea plants. 7

45

Although extensive efforts have been made to reveal the molecular mechanism of the production of

46

three major characteristic secondary metabolites (i.e., flavonoids, caffeine, and theanine) in tea plants

47

for germplasm selection and tea quality improvement, the function of the great majority of genes

48

involved in these pathways remains to be determined.8-14 Thus, a high-throughput functional genomics

49

analysis pipeline is essential.

50

Weighted gene coexpression network analysis (WGCNA) is a highly robust method for classifying

51

genes via hierarchical clustering of gene coexpression network (GCN).15 GCNs are particularly useful

52

for analyzing high-dimensionality expression datasets as they provide an intuitive framework for

53

describing changes in expression driven by cellular processes acting within or across multiple

54

conditions.15 Consequently, WGCNA is now commonly employed to analyze whole transcriptome,

55

proteomic, and metabolomic data to identify coexpression modules that relate either to a trait or

56

condition of interest.15-16 In our previous study, we used WGCNA to investigate the potential signaling

57

interactions between light quality and other environmental cues (e.g., low temperature, drought, and

58

blister blight disease) in tea plants.9 We revealed the molecular events that triggered by blue and green

59

light might partly overlap with stress responses, which suggest the possibility of a targeted use of blue

60

and green light to induce defense responses in tea plants. The WGCNA can also be extended to

61

investigate the gene modules that responsible for the variation of metabolites. Xu et al. (2018) profiled

ACS Paragon Plus Environment

Page 5 of 39

Journal of Agricultural and Food Chemistry

62

metabolome and transcriptome of the tea leaves harvested in different seasons, and by using WGCNA

63

based integration strategy, key metabolite–specific modules and regulators involved in monoterpenes

64

and sesquiterpenes were successfully delimited.17 MicroRNA (miRNA) target and promoter cis-

65

regulatory element (CRE) information can be easily integrated into plant GCNs. This strategy has been

66

employed to comprehensively identify target genes of the entire R2R3-MYB family in grape.18

67

Enrichment for miRNA targets within GCNs has demonstrated a vital role of four classes of miRNA–

68

transcription factor (TF) pairs in regulating terpenoid biosynthesis in tea plants.19 Thus, aggregating

69

genome-wide coexpression network and several regulatory networks into a community network can

70

be advantageous to provide supportive evidence and effectively reveal discrepancies between

71

individual networks.

72

To date, challenges in tea plant biology are lack of information on the complicated genetic

73

background of various cultivars, and the transformation system has not yet been established. The

74

integration of transcriptome and metabolome profiles can provide clues for the characterization of

75

functions of key metabolic genes and regulators in tea plants. Here, we employed the WGCNA based

76

system biology strategy for dissecting the complexity of 11 excellent tea cultivars’ secondary metabolic

77

variation. Special emphasis was placed on the practical application of the bait genes- and metabolites-

78

guide coexpression networks to interpret metabolomic flux, predict gene functions, and mining key

79

regulators involved in the catechin biosynthesis pathway.

80 81

Materials and Methods

82

Plant Materials

ACS Paragon Plus Environment

Journal of Agricultural and Food Chemistry

Page 6 of 39

83

Several studies have shown that there is considerable variability in metabolite composition underlying

84

the leaf-color mutations in tea plants.4-5 To obtain a highly informative (variable) dataset for WGCNA

85

analysis, we selected 11 tea cultivars with different shoot colors for transcriptome and metabolome

86

analysis. This should facilitate the implementation of network construction and module detection, as

87

well as module-metabolite association analysis. All tea cultivars (ten-year-old) used in this study were

88

grown at the China National Germplasm Tea Repository (Hangzhou, Zhejiang, China; latitude: 30.27°

89

N, longitude: 120.13° E). The cultivars ‘Fuding Dabaicha’ (GA), ‘TRI15’ (GB), ‘Longjing 43’ (GC),

90

‘Longjing Ziya’ (PA), ‘Liercha’ (PB), ‘Zijuan’ (PC), ‘Jingning Baicha’ (WA), ‘Anji Baicha’ (WB),

91

‘Huaye’ (YA), ‘Huangjinya’ (YB), and ‘Zhonghuang 3’ (YC) belongs to Camellia sinensis (Figure 1).

92

For each cultivar, first-flush young shoots (two leaves and a bud) in the spring were harvested from 30

93

individuals for metabolomics (three biological replications * five individuals for each replicate) and

94

transcriptomics (three biological replications * five individuals for each replicate) studies.

95

Metabolite Profiling and Statistical Analysis

96

The freeze-dried leaf sample was crushed using a mixer mill (MM 400, Retsch) for 1.5 min at 30 Hz

97

before extraction, 100 mg dried powder was extracted with 1.0 ml pure methanol spiked with 0.1 mg/L

98

lidocaine. Following centrifugation at 10000 g for 10 min, the lipid-solubility extracts were absorbed

99

and 0.4 ml of each extract was mixed and filtrated before LC–MS analysis. The quality control samples

100

(i.e., a mix of all samples) were injected at regular intervals (every ten samples) throughout the

101

analytical run to provide a set of data from which repeatability could be assessed. By comparing the

102

retention time, m/z values, and the fragmentation patterns with the standards, 462 metabolites were

103

identified (Figure S1). For the metabolites for which no commercial standards are available, peaks in

ACS Paragon Plus Environment

Page 7 of 39

Journal of Agricultural and Food Chemistry

104

the MS2T library were used to query the by comparing the MS/MS spectral data with literature

105

references and online metabolite databases (METLIN, HMDB, KNApSAcK, MoTo DB and

106

MassBank). Relative quantification was achieved by normalization of integrated extracted ion

107

chromatograms (XICs) using LC–ESI–MS/MS system (HPLC, Shim-pack UFLC SHIMADZU

108

CBM20A system; MS, Applied Biosystems 4000 Q-TRAP) by a large-scale multiple-reaction

109

monitoring (MRM) on a triple quadrupole (QQQ) MS (Figure S2).9,

110

detection window of 80 s and a target scan time of 1.5 s, the quadrupole filters the precursor ions

111

(parent ions) of the target substance and excludes the ions corresponding to other molecular weights

112

to prevent interference. After obtaining metabolite data from the different samples, the peak area of

113

the mass spectra of all substances was integrated using MultiQuant (v2.0) software (AB SCIEX). Row-

114

wise normalization was performed by the quality control sample, and data were log2-transformed and

115

auto-scaled to reduce any systematic bias within metabolite data. Principal component analysis (PCA)

116

and sparse Independent PCA (sIPCA) were implemented in R. The intra-cultivar Euclidean distances

117

(EDs) were calculated by the average of ED crosswise replicate samples within each cultivar, and inter-

118

cultivar EDs were calculated by first computing the average of replicates for each cultivar and then

119

computing the average ED of remaining cultivars calculated with this cultivar.

120

RNA-Seq and Data Processing

121

The total RNA was isolated using the RNeasy plant mini kit (Tiangen Bio, Beijing, China) according

122

to the manufacturer’s instructions. The cDNA libraries were sequenced on an Illumina HiSeq Xten

123

platform (Illumina, San Diego, CA, USA) to produce 150 bp paired-end reads at Novogene (Novogene,

124

Beijing, China). The raw reads were submitted to the BIGD (BIG Data Center, http://bigd.big.ac.cn/)

ACS Paragon Plus Environment

20

In the MRM mode with

Journal of Agricultural and Food Chemistry

Page 8 of 39

125

under the accession number PRJCA001447. Low-quality reads and adapters were filtered and trimmed

126

using Trimmomatic (v0.36). High-quality clean reads were then mapped to the C. sinensis genome

127

(http://tpia.teaplant.org/) using HISAT2 (v.2.1.0)21 and were counted using HTseq (v.0.6.1p1).22 TMM

128

normalization method was used to calculate gene expression in Bioconductor edgeR package.23 Gene

129

Ontology (GO) enrichment analysis of the differentially expressed genes (DEGs) was performed using

130

the GOseq R packages (v1.32.0).24

131

Gene and Metabolite Coexpression Network Construction

132

Weighted gene coexpression network analysis (WGCNA) was implemented using the WGCNA R

133

package (v1.63).15 Network construction and module detection were performed by ‘blockwiseModules’

134

function with adjusted parameters for transcriptome and metabolome datasets, which allows automatic

135

and unsupervised network construction for the input data. For transcriptome dataset, ‘minModuleSize’

136

= 30, ‘minKMEtoStay’ = 0.7, ‘mergeCutHeight’ = 0.05, and ‘minCoreKME’ = 0.8 were applied to

137

construct a ‘signed’ weighted correlation network, while ‘minModuleSize’ = 10, ‘minKMEtoStay’ =

138

0.6, ‘mergeCutHeight’ = 0.05, and ‘minCoreKME’= 0.7 were chosen for the analysis of metabolomic

139

datasets. The pattern of gene expression and metabolite level in a given module was summarized using

140

modules eigengene and eigenmetabolite values, respectively, which represent the first principal

141

component of the scaled data for genes/metabolites in each module.15 The module membership (kME)

142

was then calculated to assess the gene connectivity based on Pearson correlation. The module networks

143

were visualized using Cytoscape (v3.7.1).25

144

Quantitative Real-Time PCR (qRT-PCR) Validation

ACS Paragon Plus Environment

Page 9 of 39

Journal of Agricultural and Food Chemistry

145

The qRT-PCR analysis was carried out using SYBR Premix Ex Taq II kit (Takara, Japan) with a

146

LightCycler 480 Real-Time PCR system (Roche Applied Science) according to the manufacturer’s

147

instructions. Three biological replicates were performed for each cultivar, and GAPDH was used as an

148

endogenous control. The relative transcript abundance was calculated using the 2−ΔΔCT method.26

149

Primers used are shown in Table S1.

150

MicroRNA Target and Transcription Factor Binding Sites Prediction

151

Conserved and novel tea plant miRNAs were collected from our previous study.14 The psRNATarget

152

and psRobot were used to determine whether genes involved in the catechin biosynthesis pathway are

153

regulated by miRNAs.27-28 The intersection of psRNATarget and psRobot outputs was selected to

154

reduce the false-positive rate. Potential TF binding sites in the promoter region (1 kbp upstream from

155

the transcription start site) of the entire genome were identified using the TFBSTools R package and

156

JASPAR database (http://jaspar.genereg.net).29-30

157

Orthogonality of Regulatory Network

158

To compare ‘guilt-by-association’ based GCN with known functional networks of Arabidopsis

159

thaliana, structure or TF genes involved in the catechin pathway were subjected to BLASTX searching

160

against the Arabidopsis protein (TAIR, Version 11, http://www.arabidopsis.org) with a cutoff E-value

161

of 10-5. We then downloaded multiple Arabidopsis functional annotation networks from STRING v11

162

(http://string-db.org) database and compared whether edges based on expression similarity were found

163

in any known functional network.

164

ACS Paragon Plus Environment

Journal of Agricultural and Food Chemistry

Page 10 of 39

165

Results and Discussion

166

Highly Variable Metabolic Profiles of Eleven Tea Cultivars

167

To explore metabolic differences existing in 11 tea cultivars, we used a widely targeted metabolomic

168

approach to quantify the metabolite levels in tea leaves in a high-throughput manner. Of the 776

169

metabolites detected in tea leaves, 462 were identified by using authentic standards, while 314 were

170

putatively annotated by comparing the MS/MS spectral data with literature references and online

171

metabolite databases (Table S2). We then calculated the intra- and inter-cultivar EDs to measure the

172

“metabolic distance” within and among cultivars using the complete normalized metabolic dataset of

173

each sample pair (Figure 2B). As shown in the scatterplot, two purple-leaf cultivars, PB (EDintracultivar

174

= 46.3) and PC (EDintracultivar = 46.8) showed highest inter-cultivar variations among the cultivars, while

175

replicate samples exhibited relatively low intra-cultivar variations (EDintracultivar = 17.3–24.1). In

176

addition, we employed PCA to investigate the variance structure of metabolic composition in an

177

unsupervised way (Figure 2A). The first (PC1) and the second (PC2) principal components explained

178

19% and 14% of the metabolite variation, respectively. In agreement with the results of ED analysis,

179

we observed a large variation exist in the relative metabolic profiles between PB/PC and other cultivars

180

analyzed.

181

To gain a better clustering and to investigate major differential metabolites of these cultivars, we

182

employed a sparse version of Independent PCA (sIPCA), which performs an internal variable selection

183

and combines the advantages of both PCA and Independent Component Analysis (ICA) on metabolic

184

data (Figure 2C). Several important flavonoids (e.g., delphinidin 3-O-rutinoside, (+)-gallocatechin

185

(GC), cyanidin 3-O-glucosyl-malonylglucoside, chrysoeriol O-rhamnosyl-O-glucuronic acid,

ACS Paragon Plus Environment

Page 11 of 39

Journal of Agricultural and Food Chemistry

186

myricetin, and fustin) were identified as key metabolites responsible for cultivar discrimination based

187

on sIPCA (Figure 2D). Notably, the accumulation level of two anthocyanins (i.e., cyanidin 3-O-

188

glucosyl-malonylglucoside and delphinidin 3-O-rutinoside) in PB and PC was much lower than other

189

cultivars (Figure 2D and Table S2). However, the total amount of anthocyanins in PA, PB and PC were

190

significantly higher than other tea cultivars, which is consistent with previous studies showing that

191

anthocyanin overproduction is responsible for the purple coloration in tea plants (Table S2).13, 31-32 In

192

addition, it has been suggested that the main coloration components in the leaves of ‘Zijuan’ were

193

delphinidin-3-O-galactoside and cyanidin-3-O-galactoside.32 In another study, delphinidin-3-O-

194

coumaroylglucoside and cyanidin-3-O-coumaroylgalactoside were identified as the two most

195

abundant anthocyanins in ‘Mooma1’, which may responsible for the bright red coloration of new

196

shoots in this novel purple-leaf tea variety.13 Consistent with these studies, we observed that

197

anthocyanins in different purple-leaf cultivars might be modified by different anthocyanin

198

acyltransferases (ATs) and anthocyanidin 3-O-glycosyltransferases (UGTs) to give differently colored

199

anthocyanins (e.g., peonidin and pelargonidin) (Figure S3 and Table S2). Thus, the understanding of

200

the modification processes to produce those anthocyanins would help us to elucidate the complex

201

metabolic flux within the anthocyanidin pathways.

202

To reveal metabolic modules and possible metabolic flux responsible for the biosynthesis of

203

secondary metabolites in tea plants, we employed WGCNA on the metabolomic dataset. As shown in

204

Figure 3A, the WGCNA allowed identification of 19 metabolic modules. Obvious precursor-product

205

relationships were recognized within several modules, such as amino acid and its derivatives (MM11),

206

and nucleotide derivates and glycerophospholipids (MM12, MM17, and MM18) (Table S2). By

ACS Paragon Plus Environment

Journal of Agricultural and Food Chemistry

Page 12 of 39

207

calculating the correlation efficiency between eigenmetabolite networks, we found that the

208

eigenmetabolites of MM1 were negatively correlated with MM7 (r = -0.95, P = 4 × 10-17) (Figure 3B,

209

C, D). In MM1, a number of flavone glycosides (e.g., 6-C-hexosyl-luteolin O-hexoside, selgin 5-O-

210

hexoside, C-hexosyl-isorhamnetin O-hexoside, apigenin C-glucoside, and luteolin 3',7-di-O-glucoside)

211

and metabolites that located in the downstream flavonoid biosynthesis pathway such as

212

proanthocyanidins (e.g., procyanidin A1/A2), anthocyanins (e.g., delphinidin and cyanidin), and

213

catechin derivatives (e.g., catechin trimer) showed a significant lower production level in PB (Figure

214

3C, E and Table S2). On the other hand, PB accumulated a higher level of flavonols (e.g., kaempferol

215

3,7-dirhamnoside, isorhamnetin, quercetin 7-O-malonylhexosyl-hexoside, isorhamnetin O-acetyl-

216

hexoside, quercetin O-acetylhexoside, and kaempferol 3-O-rhamnoside) and flavones (e.g., velutin,

217

acacetin, and apigenin 4-O-rhamnoside) than other cultivars (MM7; Figure 3D, F and Table S2).

218

Metabolites in these two modules are mostly derived from the flavonoid biosynthetic pathway, and the

219

competition between these metabolites for the same substrates has also been reported.31, 33 For example,

220

the concentration of proanthocyanins could be greatly improved by redirecting anthocyanidins flux

221

into the proanthocyanin pathway and direct repression of isoflavone biosynthesis pathway.33

222

Furthermore, the significantly higher accumulation level of anthocyanin in the new shoot of purple-

223

leaf cultivar ‘Zijuan’ was proposed due to direct repression of biosynthesis of lignin.31 These results

224

indicated that the WGCNA protocol can be modified to construct weighted metabolic networks for

225

meaningful pathway inferring and biological interpretation. Together, information on the varietal

226

differences in the accumulation levels of these characteristic secondary metabolites (e.g., flavonols,

227

catechins, and anthocyanins), provides valuable insights to further investigate the sensory qualities and

ACS Paragon Plus Environment

Page 13 of 39

Journal of Agricultural and Food Chemistry

228

nutritional values of tea plants.

229

Multilevel Regulation of Catechin Biosynthesis

230

As an example of the so-called ‘guilt-by-association’ principle, genes that display similar expression

231

pattern across different cultivars have an increased likelihood of being within the same metabolic

232

pathway or biological process, as well as under the control of a shared regulatory mechanism. To

233

understand the coexpression relationships between genes at a systems level, we performed WGCNA

234

on the transcriptome dataset. This unbiased and unsupervised analysis identified 279 distinct

235

coexpression modules corresponding to clusters of correlated transcripts (Figure S4 and Table S3).

236

Flavonoids such as catechins, anthocyanins, and flavonols are important determinant of tea quality and

237

flavor.3 Some of them (e.g., delphinidin 3-O-rutinoside, (+)-gallocatechin (GC), cyanidin 3-O-

238

glucosyl-malonylglucoside, myricetin, and fustin) were identified as key metabolites responsible for

239

cultivar discrimination (Figure 2D). Characterizing genes involved in the regulation of these

240

metabolites is crucial for improving tea quality. Taking catechin pathway as an example, we selected

241

74 known genes encoding 15 enzymes involved in catechin biosynthesis as ‘bait-genes’ or ‘guide-

242

genes’ to construct a GCN based on the results of WGCNA (Figure 4 and Table S4). Keeping in mind

243

that GCN constructed based on the ‘guilt-by-association’ principle is by nature different from the

244

physical interactions.34 We then questioned to which extent WGCNA based GCN would provide

245

orthogonal and new information compared to existing functional genomic networks. We, therefore,

246

compared GCN to the genetic and physical interaction networks of Arabidopsis thaliana that stored in

247

the STRING database. We found that WGCNA based GCN could provide highly orthogonal

248

information (Figure S5). The maximum overlap in gene interaction was 11%, which was obtained by

ACS Paragon Plus Environment

Journal of Agricultural and Food Chemistry

Page 14 of 39

249

combining all evidences in the STRING database (Figure S5). The accuracy of RNA-seq results and

250

coexpression relationships was further verified by qRT-PCR (Figure S6).

251

Since miRNA and TF have been known to play a critical role in the regulation of flavonoid

252

biosynthesis.7, 10 We therefore also integrated promoter CRE and miRNA target information into GCN.

253

Our results reveal a much more complex regulatory network (including 45 enzyme genes, 155 TFs, 17

254

conserved and 15 novel miRNAs) for catechin biosynthesis (Figure 4 and Table S4). Promoters of a

255

considerable number of enzyme-coding genes (e.g., CHSs, PALs, DFRs, and SCPLs) were enriched

256

for CREs such as those for MADS, MYB, NAC, AP2/ERF, bZIP, WRKY, SBP, and bHLH TF binding

257

(Figure 4 and Table S4). In particular, the MYB binding motif was present in most of the early

258

biosynthesis genes (e.g., C4H, PAL, CHS, CHI, and F3H,) and late biosynthesis genes (e.g., ANS, LAR,

259

DFR, and SCPL1A) in the catechin pathway. The potential regulation of these genes by MYB and

260

bHLH TFs is supported by the coexpression analysis and several recent studies in tea plants.32, 35-36 For

261

example, MYB4 and MYB308 are known to function as transcriptional repressors of C4H, 4CL, LAR,

262

CHS and ANR to mediate anthocyanin biosynthesis in A. thaliana, tobacco, and tea plant.32, 36-37

263

Ectopic expression of these repressors could result in the loss of flowers pigmentation in tobacco due

264

to the decrease of anthocyanin levels.37 In our study, MYB4 (TEA032503.1) and MYB308

265

(TEA033191.1)) were shown a higher expression level in PC and coexpressed with PAL

266

(TEA003137.1, TEA023243.1, and TEA024587.1), ANS (TEA010322.1), F3H (TEA023790.1), and

267

LAR (TEA027582.1) (Figure 4 and Table S4). Hence, these MYB repressors were likely to mediate a

268

transcriptional feedback inhibition to prevent the deleterious effects of anthocyanin overproduction on

269

photosynthesis and other biological processed in PC. GL3 (TEA008168.1; GM131), a bHLH TF that

ACS Paragon Plus Environment

Page 15 of 39

Journal of Agricultural and Food Chemistry

270

isolated from the ‘Zijuan’ (PC), was coexpressed with CHI (TEA034003.1; GM131), CHS

271

(TEA034042.1; GM131) and DFR (TEA032730.1; GM131). In addition, GL3 displayed a higher

272

expression level in three purple (PA, PB, and PC) cultivars, suggesting its potential role in the

273

regulation of anthocyanin accumulation (Figure 4 and Table S4). This hypothesis is supported by a

274

recent study in ‘Zijuan’, demonstrating that GL3 might interact with CsAN1 (MYB TF) and CsTTG1

275

(WD repeat protein, WDR) to form the MYB-bHLH-WDR (MBW) transcriptional activation complex,

276

thus promoting the accumulation of anthocyanin.32

277

To date, the transcriptional control of flavonoid biosynthesis by MYB and bHLH TFs is well

278

described in many plant species; however, little is known about the role of other TFs in the regulation

279

of flavonoid accumulation.38 The GCN we generated is novel in suggesting the potential regulatory

280

roles by those TF families such as AP2/ERF, SBP, and MADS in catechin biosynthesis (Figure 4 and

281

Table S4). For instance, the presence of MADS CREs in DFRs and PALs coincided with the

282

coregulation of MADS TFs (e.g., TT16, SEP3, and AP3) to these genes. Among them, TT16 has been

283

reported to play a vital role in transcriptional regulation of proanthocyanin biosynthesis.39 Interestingly,

284

we observed a strong coregulation of several TF genes (e.g., 1 bHLH, 3 MYB, 2 NAC, 7 AP2/ERF,

285

and 6 WRKY), to C4H (TEA016772.1); however, no corresponding binding site of NAC, AP2/ERF,

286

and WRKY was found in the promoter region (Figure 4 and Table S4). This might be due to the fact

287

that several TFs, such as NAC, SPL, and WRKY could regulate the flavonoid biosynthesis through

288

interaction with the MBW complex.38 Therefore, cis-regulatory element-driven networks that

289

integrated gene coexpression relationship with promoter CRE structure information are powerful tools

290

for interfering new uncharacterized regulators that involved in the catechin biosynthesis pathway.

ACS Paragon Plus Environment

Journal of Agricultural and Food Chemistry

Page 16 of 39

291

Recently, scientific endeavors are directed toward understanding the posttranscriptional regulation

292

of the flavonoid pathway involving miRNAs.10, 14, 40-41 In the current study, 51 out of 200 catechin

293

biosynthesis-related genes/TFs were potentially targeted by 17 conserved and 15 novel miRNAs in tea

294

plants (Figure 4 and Table S4). Several of conserved miRNA (e.g., miR156, miR172, and miR858)

295

identified in the coexpression network have known to play a pivotal role in flavonoid biosynthesis

296

regulation.40-41 For example, miR858 has been demonstrated to act as a negative regulator to control

297

the anthocyanin accumulation in tomato.41 miRNA156-SPL9 pair influences anthocyanin

298

accumulation by destabilizing MBW complex and/or targeting C4H in Arabidopsis.40 Thus, the

299

enrichment of miRNA targets within GCN indicated an important role of these molecules and revealed

300

an additional layer of control of the expression of "switch" genes involved in the catechin biosynthesis.

301

Finally, our high-confidence GCN can be advantageous to effectively reveal a set of potential key

302

regulators of catechin biosynthesis by aggregating different kinds of regulatory information into a

303

community network.

304

305

Metabolic-Signature-Based Association Captures Additional Key Regulators Involved in the

306

Catechin Pathway

307

Since the prediction of gene function by ‘guilt-by-association’ principle is based on the coregulation

308

of a set of genes in the same biological process or pathway, only coregulated genes can be predicted

309

by this approach. However, genes involved in the same biological process or pathway are not

310

necessarily coexpressed when regulation occurs at posttranscriptional (e.g., miRNA and lncRNA

311

regulation) or posttranslational (e.g., ubiquitination and phosphorylation) levels, and some key

ACS Paragon Plus Environment

Page 17 of 39

Journal of Agricultural and Food Chemistry

312

regulators cannot be captured by this approach.34 Thus, we employed metabolic-signature-based

313

association method to capture additional key regulators involved in the catechin pathway.

314

To analyze the catechin pathway in more detail, error plots of metabolites within the pathway were

315

generated, and MAD scores were calculated to directly compare the variation level across samples

316

(Figure 5A, B). Our previous study demonstrated that albino cultivars (i.e., ‘Anji Baicha’, ‘Baijiguan’,

317

and ‘Huangjinya’) contained higher amounts of amino acids and relatively low levels of GC and EGCG

318

compared to the green-leaf tea cultivars (i.e., ‘Fuding Dabaicha’ and ‘Longjing 43’).4 In accordance

319

with this result, we observed that galloylated catechins had a lower abundance in albino tea cultivars

320

(e.g., WA and WB) than in most green- or purple-leaf tea cultivars, hence green tea processed from the

321

albino tea cultivars taste more refreshing and less bitter. We then related 279 modules to 776

322

metabolites using eigengene network methodology. Among them, 37 out of 279 coexpression modules

323

showed catechins biosynthesis-related expression values (r > 0.75, P < 10-3), that is, these modules

324

probably represent core gene networks operating in each catalysis and modification reaction (Figure

325

5C, D). Genes functioning in “protein/nucleotide binding”, “carbohydrate derivative/heterocyclic

326

compound/organic

327

“catalytic/transmembrane transporter activity” were observed to be responsible for most metabolites’

328

concentration variations in catechin pathway (Figure 6A). Genes participating in “chromosome

329

organization/protein localization”, “stress/hormone response”, “macromolecule modification”, “signal

330

transduction”, and “transport” altered catechin biosynthesis and modification most frequently (Figure

331

6B). The GO term association analysis reveals a close association of catechin biosynthesis, signaling,

332

and various abiotic/biotic stress responses, suggesting that catechins might play a pivotal role in plant-

cyclic

compound

binding”,

“hydrolase/transferase

ACS Paragon Plus Environment

activity”,

and

Journal of Agricultural and Food Chemistry

Page 18 of 39

333

environment interactions, in agreement with previous studies indicating antioxidant and antiinsect

334

activities of these important flavan-3-ol products.8, 42

335

Galloylated catechins (ECG and EGCG) are characteristic astringent taste compounds and comprise

336

up to 80% (ranged from 81.2 to 88.9%) of total catechins in tea leaves, and therefore have a large

337

impact on tea quality (Table S2). Epicatechin:1-O-galloyl-β-D-glucose O-galloyltransferase (ECGT),

338

an enzyme that belongs to subclade 1A of SCPL acyltransferase has been proposed to play vital roles

339

in galloylation of catechins.7 Comparative genomic analysis revealed a total of 22 SCPL1A genes in

340

the tea plant genome, and most of them (15) were arose from tandem duplications.7 The duplication of

341

these genes could largely promote the neo- and sub-functionalization to generate new enzymatic

342

activities and drive the evolution of plant metabolic complexity. Thus, additional efforts are necessary

343

to precisely identify whether some or all of these SCPL1A genes are involved in catechin galloylation

344

in tea plants.

345

Four modules (GM111, GM124, GM74, and GM264) are comprised of genes that are significantly

346

correlated the variation of CG, ECG, GCG, and EGCG (r > 0.75, P < 10-3; Figure 5C, D). In GM74,

347

an SCPL1A acyltransferase gene (TEA016463.1) was coexpressed with two WDR genes

348

(TEA026107.1 and TEA032249.1) (Figure 6C and Table S3). Proteins containing WDRs are known

349

to act as a docking platform to interact with MYB and bHLH TFs in regulating the anthocyanin and

350

proanthocyanidin biosynthesis in tea plants, thus highlighting WDRs as a possible key regulator of the

351

production of galloylated catechins.38, 43 Notably, many of the coexpressed genes, such as DFL1

352

(auxin), IAA16 (auxin), STE1 (brassinosteroid), ACS2 (ethylene), GID1C (gibberellin), and RLK

353

(salicylic acid), are involved in hormone metabolism and signaling (Figure 6C and Table S3).44 These

ACS Paragon Plus Environment

Page 19 of 39

Journal of Agricultural and Food Chemistry

354

results are consistent with several lines of evidence of crosstalk between hormones and flavonoids in

355

the control of various facets of plant developmental processes.45

356

In GM124, two SCPL1A acyltransferases (i.e., TEA034027.1 and TEA034034.1) were coexpressed

357

with several key positive regulators of flavonoids biosynthesis such as WDR protein (BUB3.2) and

358

bHLH TF (MYC2) (Figure 6D and Table S3). These regulators may not only be related to anthocyanin

359

or flavonol biosynthesis but also to stressor/stimuli such as MYC2.46 In addition, several stress

360

response genes (e.g., HOS1, GRP7, and PIP1B) and protein kinases (e.g., MPK9, dyrk2, and CRCK2)

361

were found to highly coexpressed with these regulators (Figure 6D and Table S3). Various

362

environmental stress signals, such as low temperature, high light, and UV irradiation could result in

363

oxidative stress and lead to flavonoid overproduction through the regulation expression of R2R3-MYBs

364

and bHLHs, as well as the activity of MBW complexes.8-9, 38 The activity of MBW complexes can be

365

regulated through various posttranslational modifications (e.g., ubiquitination and phosphorylation).38,

366

47-48

367

PAP1 and PAP2, and thus result in the repression of anthocyanin in Arabidopsis.48 In addition, TTG1

368

(WDR family) and TT8 (bHLH042) are also likely to be degraded by the E3 ubiquitin ligase, which

369

has not yet been identified.47 The E3 ubiquitin ligase HOS1 acts as cold signaling attenuator to

370

compromise the harmful effect of extreme cold responses by targeting ICE1 for proteasome

371

degradation.49 It is probable, therefore, that HOS1 (TEA015363.1) could trigger negative regulatory

372

feedback of catechin overproduction by control the expression of bHLH (MYC2, TEA009193.1) and

373

WDR (BUB3.2, TEA004851.1) genes.

374

For example, COP1/SPA E3 ubiquitin ligase could mediate the degradation of the MYB proteins

To date, a number of metabolic quantitative trait locus (QTL) studies have been conducted in tea

ACS Paragon Plus Environment

Journal of Agricultural and Food Chemistry

Page 20 of 39

375

plants to identify QTL affecting the production of characteristic secondary metabolites (e.g.,

376

theobromine, caffeine, and catechins).12, 50-51 Integration of QTL confidence intervals/markers with

377

expression based functional networks are therefore useful and needed to narrow the pool of candidates

378

and interpret the QTL results. HOS15 (TEA014884.1, GM160), another crucial repressor of cold

379

tolerance induced genes, was found to negatively correlated with the accumulation of dihydroquercetin

380

(precursor of catechins) (Table S3).52 Notably, HOS15 has also been identified as a putative QLT

381

related to catechin production in a recent study.51 Therefore, these repressors might play a key role in

382

the crosstalk between cold signaling and anthocyanin/catechin metabolic pathways.

383

Of particular interest is the identification of another two modules (GM77 and GM201) that comprise

384

SCPL1A (TEA010715.1 and TEA027270.1) were negatively correlated to the production of cyanidin

385

(Figure 5C and Table S3). Key regulators involved in the anthocyanin biosynthesis pathway, such as

386

genes encoding MYB75, WDR proteins (DWA1 and AT3G49400) and anthocyanin regulatory C1

387

protein, were found to coexpress with several glycosyltransferase (e.g., UGT92A1 and UGT80B1) and

388

acyltransferase (e.g., PEL3, GNA1, and SCPL1A) genes in these two modules (Figure 6E, F and Table

389

S3). In addition, we observed an inverse accumulation pattern exists between catechins (e.g., GCG,

390

and EGCG) and cyanidin in PB (Figure 5B). Therefore, we propose that anthocyanidins are either

391

reduced to generate catechins by anthocyanidin reductase (ANR) or immediately modified by those

392

glycosyltransferases and acyltransferases to produce anthocyanins in PB (Figure S3). Furthermore, an

393

inverse accumulation pattern was also observed between catechins and proanthocyanins (PAs, e.g.,

394

procyanidin A1/A2) in PB (Figure 5B). Catechins such as (-)-epicatechin and (+)-catechin are the

395

monomers and extension units of PAs. Therefore, the galloylation of catechins by SCPL1A may lead

ACS Paragon Plus Environment

Page 21 of 39

Journal of Agricultural and Food Chemistry

396

to the repression of the PAs biosynthesis. Together, we proposed that at least five SCPL1A

397

acyltransferase family members (TEA016463.1, TEA034027.1, TEA034034.1, TEA010715.1, and

398

TEA027270.1) could implicate in the production balance of anthocyanins, galloylated catechins, and

399

proanthocyanins in tea plants.

400

Here, we shown an ‘omics’-based data integration approaches to facilitate the identification of

401

functions of unknown metabolic genes and regulators in tea plants. These results will bring new tools

402

and valuable resources to understand how to improve and modify tea quality. Furthermore, additional

403

efforts are required to further account for features such as chromatin structure, TF combinations, and

404

protein-protein interactions by Chip-seq and ATAC-seq.

405 406

Abbreviations Used

407

GA, ‘Fuding Dabaicha’; GB, ‘TRI15’; GC, ‘Longjing 43’; PA, ‘Longjing Ziya’; PB, ‘Liercha’; PC,

408

‘Zijuan’; WA, ‘Jingning Baicha’; WB, ‘Anji Baicha’; YA, ‘Huaye’; YB, ‘Huangjinya’; YC,

409

‘Zhonghuang 3’; TF, transcription factor; miRNA, microRNA; CRE, cis-regulatory element; GO,

410

Gene Ontology; ED, Euclidean distance; MAD, Median absolute deviation; qRT-PCR, quantitative

411

real-time PCR; WGCNA, weighted gene coexpression network analysis; GCN, gene coexpression

412

network; PCA, principal component analysis; sIPCA, sparse Independent PCA; C, catechin; GC,

413

gallocatechin; EC, epicatechin; EGC, epigallocatechin; CG, catechin gallate; ECG, epicatechin gallate;

414

GCG, gallocatechin gallate; EGCG, epigallocatechin gallate; PAs, proanthocyanidins

415 416

Funding

ACS Paragon Plus Environment

Journal of Agricultural and Food Chemistry

Page 22 of 39

417

This work was supported by the Ministry of Agriculture of China through the Earmarked Fund for

418

China Agriculture Research System (CARS-019), and the Chinese Academy of Agricultural Sciences

419

through the Agricultural Science and Technology Innovation Program (CAAS-ASTIP-2017-

420

TRICAAS) to Liang Chen.

421 422

Supporting Information

423

Figure S1. Representative MS/MS spectra for metabolites identification.

424

Figure S2. Extracted ion chromatograms (XICs) for detection of multimodal maps.

425

Figure S3. Relative accumulation levels of anthocyanins across 11 tea cultivars. Variations in relative

426

accumulation levels of anthocyanins (Z-score, scale shown in the right) are indicated in color, where

427

red regions indicate higher accumulation level, and blue regions indicate lower accumulation.

428

Figure S4. Hierarchical clustering dendrogram showing gene coexpression modules identified using

429

WGCNA. Modules correspond to branches and are labeled by colors as indicated by the first color

430

band underneath the tree. Remaining color bands reveal highly correlated (red) or anti-correlated (blue)

431

genes for the particular cultivar.

432

Figure S5. Orthogonality of ‘guilt-by-association’ principle-based gene coexpression network to other

433

molecular data in the STRING database.

434

Figure S6. qRT-PCR validation of the RNA-seq results and coexpression relationships.

435

Table S1. Primers used for the qPCR assay.

436

Table S2. WGCNA results of metabolomic dataset and analysis the levels of catechin, anthocyanins,

437

and procyanidins across samples.

ACS Paragon Plus Environment

Page 23 of 39

Journal of Agricultural and Food Chemistry

438

Table S3. WGCNA results of transcriptome dataset.

439

Table S4. Gene coexpression relationship, miRNA target and promoter cis-regulatory element

440

information of integrated network for catechin regulation in tea plants.

441 442

References

443

(1) Ercisli, S.; Akbulut, M.; Ozdemir, O.; Sengul, M.; Orhan, E., Phenolic and antioxidant diversity

444

among persimmon (Diospyrus kaki L.) genotypes in Turkey. Int. J. Food Sci. Nutr. 2008, 59 (6), 477-

445

82.

446

(2) Serce, S.; Ercisli, S.; Sengul, M.; Gunduz, K.; Orhan, E., Antioxidant activities and fatty acid

447

composition of wild grown myrtle (Myrtus communis L.) fruits. Pharmacogn. Mag. 2010, 6 (21), 9-

448

12.

449

(3) Namita, P.; Mukesh, R.; Vijay, K. J., Camellia sinensis (green tea): A review. Global journal of

450

pharmacology 2012, 6 (2), 52-59.

451

(4) Li, C. F.; Ma, J. Q.; Huang, D. J.; Ma, C. L.; Jin, J. Q.; Yao, M. Z.; Chen, L., Comprehensive

452

Dissection of Metabolic Changes in Albino and Green Tea Cultivars. J. Agric. Food Chem. 2018, 66

453

(8), 2040-2048.

454

(5) Shen, J.; Zou, Z.; Zhang, X.; Zhou, L.; Wang, Y.; Fang, W.; Zhu, X., Metabolic analyses reveal

455

different mechanisms of leaf color change in two purple-leaf tea plant (Camellia sinensis L.) cultivars.

456

Hortic Res 2018, 5, 7.

457

(6) Xia, E. H.; Zhang, H. B.; Sheng, J.; Li, K.; Zhang, Q. J.; Kim, C.; Zhang, Y.; Liu, Y.; Zhu, T.; Li,

458

W.; Huang, H.; Tong, Y.; Nan, H.; Shi, C.; Shi, C.; Jiang, J. J.; Mao, S. Y.; Jiao, J. Y.; Zhang, D.; Zhao,

ACS Paragon Plus Environment

Journal of Agricultural and Food Chemistry

Page 24 of 39

459

Y.; Zhao, Y. J.; Zhang, L. P.; Liu, Y. L.; Liu, B. Y.; Yu, Y.; Shao, S. F.; Ni, D. J.; Eichler, E. E.; Gao, L.

460

Z., The Tea Tree Genome Provides Insights into Tea Flavor and Independent Evolution of Caffeine

461

Biosynthesis. Mol Plant 2017, 10 (6), 866-877.

462

(7) Wei, C.; Yang, H.; Wang, S.; Zhao, J.; Liu, C.; Gao, L.; Xia, E.; Lu, Y.; Tai, Y.; She, G.; Sun, J.;

463

Cao, H.; Tong, W.; Gao, Q.; Li, Y.; Deng, W.; Jiang, X.; Wang, W.; Chen, Q.; Zhang, S.; Li, H.; Wu,

464

J.; Wang, P.; Li, P.; Shi, C.; Zheng, F.; Jian, J.; Huang, B.; Shan, D.; Shi, M.; Fang, C.; Yue, Y.; Li, F.;

465

Li, D.; Wei, S.; Han, B.; Jiang, C.; Yin, Y.; Xia, T.; Zhang, Z.; Bennetzen, J. L.; Zhao, S.; Wan, X.,

466

Draft genome sequence of Camellia sinensis var. sinensis provides insights into the evolution of the

467

tea genome and tea quality. Proc. Natl. Acad. Sci. U. S. A. 2018, 115 (18), E4151-E4158.

468

(8) Zheng, C.; Wang, Y.; Ding, Z.; Zhao, L., Global Transcriptional Analysis Reveals the Complex

469

Relationship between Tea Quality, Leaf Senescence and the Responses to Cold-Drought Combined

470

Stress in Camellia sinensis. Front Plant Sci 2016, 7, 1858.

471

(9) Zheng, C.; Ma, J. Q.; Ma, C. L.; Shen, S. Y.; Liu, Y. F.; Chen, L., Regulation of Growth and

472

Flavonoid Formation of Tea Plants (Camellia sinensis) by Blue and Green Light. J. Agric. Food Chem.

473

2019, 67 (8), 2408-2419.

474

(10)Zhao, L.; Chen, C.; Wang, Y.; Shen, J.; Ding, Z., Conserved MicroRNA Act Boldly During Sprout

475

Development and Quality Formation in Pingyang Tezaocha (Camellia sinensis). Frontiers in genetics

476

2019, 10, 237.

477

(11) Xu, L. Y.; Wang, L. Y.; Wei, K.; Tan, L. Q.; Su, J. J.; Cheng, H., High-density SNP linkage map

478

construction and QTL mapping for flavonoid-related traits in a tea plant (Camellia sinensis) using 2b-

479

RAD sequencing. BMC Genomics 2018, 19 (1), 955.

ACS Paragon Plus Environment

Page 25 of 39

Journal of Agricultural and Food Chemistry

480

(12)Ma, J. Q.; Jin, J. Q.; Yao, M. Z.; Ma, C. L.; Xu, Y. X.; Hao, W. J.; Chen, L., Quantitative Trait Loci

481

Mapping for Theobromine and Caffeine Contents in Tea Plant (Camellia sinensis). J. Agric. Food

482

Chem. 2018, 66 (50), 13321-13327.

483

(13)He, X.; Zhao, X.; Gao, L.; Shi, X.; Dai, X.; Liu, Y.; Xia, T.; Wang, Y., Isolation and

484

Characterization of Key Genes that Promote Flavonoid Accumulation in Purple-leaf Tea (Camellia

485

sinensis L.). Sci. Rep. 2018, 8 (1), 130.

486

(14)Liu, S. C.; Xu, Y. X.; Ma, J. Q.; Wang, W. W.; Chen, W.; Huang, D. J.; Fang, J.; Li, X. J.; Chen,

487

L., Small RNA and degradome profiling reveals important roles for microRNAs and their targets in

488

tea plant response to drought stress. Physiol. Plant. 2016, 158 (4), 435-451.

489

(15)Langfelder, P.; Horvath, S., WGCNA: an R package for weighted correlation network analysis.

490

BMC Bioinformatics 2008, 9 (1), 559.

491

(16)Pei, G.; Chen, L.; Zhang, W., WGCNA Application to Proteomic and Metabolomic Data Analysis.

492

In Method. Enzymol., Elsevier: 2017; Vol. 585, pp 135-158.

493

(17)Xu, Q.; He, Y.; Yan, X.; Zhao, S.; Zhu, J.; Wei, C., Unraveling a crosstalk regulatory network of

494

temporal aroma accumulation in tea plant (Camellia sinensis) leaves by integration of metabolomics

495

and transcriptomics. Environ. Exp. Bot. 2018, 149, 81-94.

496

(18)Wong, D. C.; Schlechter, R.; Vannozzi, A.; Holl, J.; Hmmam, I.; Bogs, J.; Tornielli, G. B.;

497

Castellarin, S. D.; Matus, J. T., A systems-oriented analysis of the grapevine R2R3-MYB transcription

498

factor family uncovers new insights into the regulation of stilbene accumulation. DNA Res. 2016.

499

(19)Zhao, S.; Wang, X.; Yan, X.; Guo, L.; Mi, X.; Xu, Q.; Zhu, J.; Wu, A.; Liu, L.; Wei, C., Revealing

500

of MicroRNA Involved Regulatory Gene Networks on Terpenoid Biosynthesis in Camellia sinensis in

ACS Paragon Plus Environment

Journal of Agricultural and Food Chemistry

Page 26 of 39

501

Different Growing Time Points. J. Agric. Food Chem. 2018, 66 (47), 12604-12616.

502

(20)Chen, W.; Gong, L.; Guo, Z.; Wang, W.; Zhang, H.; Liu, X.; Yu, S.; Xiong, L.; Luo, J., A novel

503

integrated method for large-scale detection, identification, and quantification of widely targeted

504

metabolites: application in the study of rice metabolomics. Mol Plant 2013, 6 (6), 1769-80.

505

(21)Kim, D.; Langmead, B.; Salzberg, S. L., HISAT: a fast spliced aligner with low memory

506

requirements. Nat. Methods 2015, 12 (4), 357-60.

507

(22)Anders, S.; Pyl, P. T.; Huber, W., HTSeq--a Python framework to work with high-throughput

508

sequencing data. Bioinformatics 2015, 31 (2), 166-9.

509

(23)Robinson, M. D.; McCarthy, D. J.; Smyth, G. K., edgeR: a Bioconductor package for differential

510

expression analysis of digital gene expression data. Bioinformatics 2010, 26 (1), 139-40.

511

(24)Young, M. D.; Wakefield, M. J.; Smyth, G. K.; Oshlack, A., Gene ontology analysis for RNA-seq:

512

accounting for selection bias. Genome Biol. 2010, 11 (2), R14.

513

(25)Shannon, P.; Markiel, A.; Ozier, O.; Baliga, N. S.; Wang, J. T.; Ramage, D.; Amin, N.;

514

Schwikowski, B.; Ideker, T., Cytoscape: a software environment for integrated models of biomolecular

515

interaction networks. Genome Res. 2003, 13 (11), 2498-504.

516

(26)Livak, K. J.; Schmittgen, T. D., Analysis of relative gene expression data using real-time

517

quantitative PCR and the 2(-Delta Delta C(T)) Method. Methods 2001, 25 (4), 402-8.

518

(27)Dai, X.; Zhuang, Z.; Zhao, P. X., psRNATarget: a plant small RNA target analysis server (2017

519

release). Nucleic. Acids. Res. 2018, 46 (W1), W49-W54.

520

(28)Wu, H. J.; Ma, Y. K.; Chen, T.; Wang, M.; Wang, X. J., PsRobot: a web-based plant small RNA

521

meta-analysis toolbox. Nucleic. Acids. Res. 2012, 40 (Web Server issue), W22-8.

ACS Paragon Plus Environment

Page 27 of 39

Journal of Agricultural and Food Chemistry

522

(29)Tan, G.; Lenhard, B., TFBSTools: an R/bioconductor package for transcription factor binding site

523

analysis. Bioinformatics 2016, 32 (10), 1555-6.

524

(30)Khan, A.; Fornes, O.; Stigliani, A.; Gheorghe, M.; Castro-Mondragon, J. A.; van der Lee, R.; Bessy,

525

A.; Cheneby, J.; Kulkarni, S. R.; Tan, G.; Baranasic, D.; Arenillas, D. J.; Sandelin, A.; Vandepoele, K.;

526

Lenhard, B.; Ballester, B.; Wasserman, W. W.; Parcy, F.; Mathelier, A., JASPAR 2018: update of the

527

open-access database of transcription factor binding profiles and its web framework. Nucleic. Acids.

528

Res. 2018, 46 (D1), D260-D266.

529

(31)Wang, L.; Pan, D.; Liang, M.; Abubakar, Y. S.; Li, J.; Lin, J.; Chen, S.; Chen, W., Regulation of

530

Anthocyanin Biosynthesis in Purple Leaves of Zijuan Tea (Camellia sinensis var. kitamura). Int. J.

531

Mol. Sci. 2017, 18 (4).

532

(32)Sun, B.; Zhu, Z.; Cao, P.; Chen, H.; Chen, C.; Zhou, X.; Mao, Y.; Lei, J.; Jiang, Y.; Meng, W.;

533

Wang, Y.; Liu, S., Purple foliage coloration in tea (Camellia sinensis L.) arises from activation of the

534

R2R3-MYB transcription factor CsAN1. Sci. Rep. 2016, 6, 32534.

535

(33)Li, P.; Dong, Q.; Ge, S.; He, X.; Verdier, J.; Li, D.; Zhao, J., Metabolic engineering of

536

proanthocyanidin production by repressing the isoflavone pathways and redirecting anthocyanidin

537

precursor flux in legume. Plant Biotechnol. J. 2016, 14 (7), 1604-18.

538

(34)Saito, K.; Hirai, M. Y.; Yonekura-Sakakibara, K., Decoding genes with coexpression networks and

539

metabolomics - 'majority report by precogs'. Trends Plant Sci. 2008, 13 (1), 36-43.

540

(35)Zhao, L.; Gao, L.; Wang, H.; Chen, X.; Wang, Y.; Yang, H.; Wei, C.; Wan, X.; Xia, T., The R2R3-

541

MYB, bHLH, WD40, and related transcription factors in flavonoid biosynthesis. Funct. Integr.

542

Genomics 2013, 13 (1), 75-98.

ACS Paragon Plus Environment

Journal of Agricultural and Food Chemistry

Page 28 of 39

543

(36)Li, M.; Li, Y.; Guo, L.; Gong, N.; Pang, Y.; Jiang, W.; Liu, Y.; Jiang, X.; Zhao, L.; Wang, Y.; Xie,

544

D. Y.; Gao, L.; Xia, T., Functional Characterization of Tea (Camellia sinensis) MYB4a Transcription

545

Factor Using an Integrative Approach. Front Plant Sci 2017, 8, 943.

546

(37)Chen, L.; Hu, B.; Qin, Y.; Hu, G.; Zhao, J., Advance of the negative regulation of anthocyanin

547

biosynthesis by MYB transcription factors. Plant Physiol. Biochem. 2019, 136, 178-187.

548

(38)Xu, W.; Dubos, C.; Lepiniec, L., Transcriptional control of flavonoid biosynthesis by MYB-

549

bHLH-WDR complexes. Trends Plant Sci. 2015, 20 (3), 176-85.

550

(39)Nesi, N.; Debeaujon, I.; Jond, C.; Stewart, A. J.; Jenkins, G. I.; Caboche, M.; Lepiniec, L., The

551

TRANSPARENT TESTA16 locus encodes the ARABIDOPSIS BSISTER MADS domain protein and

552

is required for proper development and pigmentation of the seed coat. Plant Cell 2002, 14 (10), 2463-

553

79.

554

(40)Xing, S.; Salinas, M.; Hohmann, S.; Berndtgen, R.; Huijser, P., miR156-targeted and nontargeted

555

SBP-box transcription factors act in concert to secure male fertility in Arabidopsis. Plant Cell 2010,

556

22 (12), 3935-50.

557

(41)Jia, X.; Shen, J.; Liu, H.; Li, F.; Ding, N.; Gao, C.; Pattanaik, S.; Patra, B.; Li, R.; Yuan, L., Small

558

tandem target mimic-mediated blockage of microRNA858 induces anthocyanin accumulation in

559

tomato. Planta 2015, 242 (1), 283-93.

560

(42)War, A. R.; Paulraj, M. G.; Ahmad, T.; Buhroo, A. A.; Hussain, B.; Ignacimuthu, S.; Sharma, H.

561

C., Mechanisms of plant defense against insect herbivores. Plant signaling & behavior 2012, 7 (10),

562

1306-20.

563

(43)Liu, Y.; Hou, H.; Jiang, X.; Wang, P.; Dai, X.; Chen, W.; Gao, L.; Xia, T., A WD40 Repeat Protein

ACS Paragon Plus Environment

Page 29 of 39

Journal of Agricultural and Food Chemistry

564

from Camellia sinensis Regulates Anthocyanin and Proanthocyanidin Accumulation through the

565

Formation of MYB-bHLH-WD40 Ternary Complexes. Int. J. Mol. Sci. 2018, 19 (6).

566

(44)Ohri, P.; Bhardwaj, R.; Bali, S.; Kaur, R.; Jasrotia, S.; Khajuria, A.; Parihar, R. D., The common

567

molecular players in plant hormone crosstalk and signaling. Curr Protein Pept Sci 2015, 16 (5), 369-

568

88.

569

(45)Falcone Ferreyra, M. L.; Rius, S. P.; Casati, P., Flavonoids: biosynthesis, biological functions, and

570

biotechnological applications. Front Plant Sci 2012, 3, 222.

571

(46)Abe, H.; Urao, T.; Ito, T.; Seki, M.; Shinozaki, K.; Yamaguchi-Shinozaki, K., Arabidopsis

572

AtMYC2 (bHLH) and AtMYB2 (MYB) function as transcriptional activators in abscisic acid signaling.

573

Plant Cell 2003, 15 (1), 63-78.

574

(47)Patra, B.; Pattanaik, S.; Yuan, L., Proteolytic degradation of the flavonoid regulators,

575

TRANSPARENT TESTA8 and TRANSPARENT TESTA GLABRA1, in Arabidopsis is mediated by

576

the ubiquitin/26Sproteasome system. Plant signaling & behavior 2013, 8 (10), doi: 10 4161/psb 25901.

577

(48)Maier, A.; Schrader, A.; Kokkelink, L.; Falke, C.; Welter, B.; Iniesto, E.; Rubio, V.; Uhrig, J. F.;

578

Hulskamp, M.; Hoecker, U., Light and the E3 ubiquitin ligase COP1/SPA control the protein stability

579

of the MYB transcription factors PAP1 and PAP2 involved in anthocyanin accumulation in Arabidopsis.

580

Plant J. 2013, 74 (4), 638-51.

581

(49)Miura, K.; Furumoto, T., Cold signaling and cold response in plants. Int. J. Mol. Sci. 2013, 14 (3),

582

5312-37.

583

(50)Ma, J. Q.; Yao, M. Z.; Ma, C. L.; Wang, X. C.; Jin, J. Q.; Wang, X. M.; Chen, L., Construction of

584

a SSR-based genetic map and identification of QTLs for catechins content in tea plant (Camellia

ACS Paragon Plus Environment

Journal of Agricultural and Food Chemistry

Page 30 of 39

585

sinensis). PLoS One 2014, 9 (3), e93131.

586

(51)Koech, R. K.; Malebe, P. M.; Nyarukowa, C.; Mose, R.; Kamunya, S. M.; Joubert, F.; Apostolides,

587

Z., Functional annotation of putative QTL associated with black tea quality and drought tolerance traits.

588

Sci. Rep. 2019, 9 (1), 1465.

589

(52)Park, J.; Lim, C. J.; Shen, M.; Park, H. J.; Cha, J. Y.; Iniesto, E.; Rubio, V.; Mengiste, T.; Zhu, J.

590

K.; Bressan, R. A.; Lee, S. Y.; Lee, B. H.; Jin, J. B.; Pardo, J. M.; Kim, W. Y.; Yun, D. J., Epigenetic

591

switch from repressive to permissive chromatin in response to cold stress. Proc. Natl. Acad. Sci. U. S.

592

A. 2018, 115 (23), E5400-E5409.

593 594

Figure Captions

595

Figure 1. Leaf phenotypes of 11 tea cultivars. ‘Fuding Dabaicha’ (GA); ‘TRI15’ (GB); ‘Longjing 43’

596

(GC); ‘Longjing Ziya’ (PA); ‘Liercha’ (PB); ‘Zijuan’ (PC); ‘Huaye’ (YA); ‘Huangjinya’ (YB);

597

‘Zhonghuang 3’ (YC); ‘Jingning Baicha’ (WA); ‘Anji Baicha’ (WB).

598

Figure 2. Metabolic distance and variation between cultivars. (A) Principal component analysis (PCA)

599

and (C) sparse Independent PCA (sIPCA) sample representation using the first two components. (B)

600

Scatterplot of intra- and inter-cultivar Euclidean distances (EDs) using the complete normalized

601

metabolic profiles of a sample pair. (D) Heatmap of normalized accumulation level for the top 30

602

metabolites that were selected based on ranked sIPCA loading values.

603

Figure 3. Metabolite coexpression network analysis. (A) Hierarchical clustering dendrogram showing

604

metabolite coexpression modules identified using WGCNA. Modules correspond to branches and are

605

labeled by colors as indicated by the first color band underneath the tree. Remaining color bands reveal

ACS Paragon Plus Environment

Page 31 of 39

Journal of Agricultural and Food Chemistry

606

highly correlated (red) or anti-correlated (blue) metabolites for the particular cultivar. (B) Correlation

607

analysis of module eigenmetabolite that summarizes the modules identified in the clustering analysis.

608

Blue color represents the negative correlation, while red represents the positive correlation.

609

Eigenmetabolite accumulation for the (C) MM1 and (D) MM7. Bar plots present module

610

eigenmetabolite values. Error bars represent the standard errors among three biological replicates for

611

each cultivar. Heatmap of normalized accumulation level for top 20 metabolites in (E) MM1 and (F)

612

MM7 that were selected based on the WGCNA measure of intramodular gene connectivity (kME).

613

Figure 4. An integrated network for catechin regulation in the tea plant. Octagon, circle, and triangle

614

nodes represent enzyme-coding genes, transcription factors (TFs), and microRNAs (miRNAs),

615

respectively. Coexpression relationships between different TF family members (colored solid circles)

616

and enzyme-coding genes are depicted by corresponding colored edges. Dashed red lines with arrows

617

depict predicted miRNA-gene interaction. Pie chart colors represent the presence of selected TF-

618

binding sites in promoter regions (1kbp upstream from the transcription start site) of the corresponding

619

enzyme-coding genes.

620

Figure 5. The variation of metabolite levels and module gene expression involved in the catechin

621

pathway. (A) Biosynthetic pathway of the principal catechins. CHS, CHI, F3H, F3′H, F3′5′H, DFR,

622

ANS, LAR, ANR, and SCPL1A represent chalcone synthase, chalcone isomerase, flavanone 3-

623

hydroxylase, flavonoid 3′-hydroxylase, flavonoid 3′,5′-hydroxylase, dihydroflavonol 4-reductase,

624

anthocyanidin synthase, leucoanthocyanidin reductase, anthocyanidin reductase, and type 1A serine

625

carboxypeptidase-like acyltransferases, respectively. (B) The relative intensity of metabolites in the

626

catechin pathway. Mean expression values of metabolite intensities with their standard error bars from

ACS Paragon Plus Environment

Journal of Agricultural and Food Chemistry

Page 32 of 39

627

three biological replicates are represented. The asterisks indicate significant differences across all

628

samples (Kruskal-Wallis, "*", P < 0.05; "**", P < 0.01; "***", P < 0.001). Color stripe represents

629

metabolite-specific variation coefficients calculated as the relative median absolute distance (relative

630

MAD); the degree of MAD is indicated by the node color from green (low) to red (high). (C) The

631

heatmap on the bottom left showing the significant correlation between metabolite variation and

632

module eigenmetabolite. Blue color represents negative correlation (r < -0.75, P < 10-3), while red

633

represents positive correlation (r > 0.75, P < 10-3). (D) The heatmap on the bottom right showing

634

relative expression of genes in 37 catechin pathway-related modules across all samples.

635

Figure 6. GO enrichment and network analysis for catechin-related modules. GO (A) molecular

636

function and (B) biological processes analysis for catechin-related modules. The color of the dots in

637

the scatterplot represents the range of the -log10 (P-value). The correlation network of (C) GM74, (D)

638

GM124, (E) GM77, and (F) GM201. Genes with a higher degree are indicated by red. Potential key

639

genes and regulators discussed in the Results and Discussion section are indicated by larger circles.

ACS Paragon Plus Environment

Page 33 of 39

Journal of Agricultural and Food Chemistry

Figure Graphics Figure 1

ACS Paragon Plus Environment

Journal of Agricultural and Food Chemistry

Figure 2

ACS Paragon Plus Environment

Page 34 of 39

Page 35 of 39

Journal of Agricultural and Food Chemistry

Figure 3

ACS Paragon Plus Environment

Journal of Agricultural and Food Chemistry

Figure 4

ACS Paragon Plus Environment

Page 36 of 39

Page 37 of 39

Journal of Agricultural and Food Chemistry

Figure 5

ACS Paragon Plus Environment

Journal of Agricultural and Food Chemistry

Figure 6

ACS Paragon Plus Environment

Page 38 of 39

Page 39 of 39

Journal of Agricultural and Food Chemistry

Graphic for Table of Contents

ACS Paragon Plus Environment