Zanthoxylum Species Based on Integrated ... - ACS Publications

Oct 23, 2017 - in GenBank.14 Phylogenetic analyses based on chloroplast genome information .... List of Zanthoxylum Species Used in the Study species ...
0 downloads 0 Views 2MB Size
Subscriber access provided by United Arab Emirates University | Libraries Deanship

Article

Authentication of Zanthoxylum Species Based on Integrated Analysis of Complete Chloroplast Genome Sequences and Metabolite Profiles Hyeon Ju Lee, Hyun Jo Koo, Jonghoon Lee, Dong Young Lee, Vo Ngoc Linh Giang, Minjung Kim, Hyeonah Shim, Jee Young Park, Ki-Oug Yoo, Sang Hyun Sung, and Tae-Jin Yang J. Agric. Food Chem., Just Accepted Manuscript • DOI: 10.1021/acs.jafc.7b04167 • Publication Date (Web): 23 Oct 2017 Downloaded from http://pubs.acs.org on October 24, 2017

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Journal of Agricultural and Food Chemistry is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 35

Journal of Agricultural and Food Chemistry

Authentication of Zanthoxylum Species Based on Integrated Analysis of Complete Chloroplast Genome Sequences and Metabolite Profiles

Hyeon Ju Lee1,4, Hyun Jo Koo1,4, Jonghoon Lee1,4, Sang-Choon Lee1, Dong Young Lee2, Vo Ngoc Linh Giang1, Minjung Kim1, Hyeonah Shim1, Jee Young Park1, Ki-Oug Yoo3, Sang Hyun Sung2*, Tae-Jin Yang1*

1

Department of Plant Science, Plant Genomics and Breeding Institute, and Research

Institute of Agriculture and Life Sciences, College of Agriculture and Life Sciences, Seoul National University, Seoul, 08826, Republic of Korea 2

College of Pharmacy and Research Institute of Pharmaceutical Science, Seoul National

University, Seoul, 08826, Republic of Korea 3

Department of Biological Sciences, Kangwon National University, Chuncheon,

Gangwon, 24341, Republic of Korea 4

These authors contributed equally to this work.

*Corresponding authors Email: [email protected] Tel: +82-2-880-4547 Fax: + 82-2-873-2056

1

ACS Paragon Plus Environment

Journal of Agricultural and Food Chemistry

Page 2 of 35

1

Abstract

2

We performed chloroplast genome sequencing and comparative analysis of two Rutaceae

3

species, Zanthoxylum schinifolium (Korean pepper tree) and Z. piperitum (Japanese pepper

4

tree), which are medicinal and culinary crops in Asia. We identified more than 837 single

5

nucleotide polymorphisms and 103 insertions/deletions (InDels) based on a comparison of

6

the two chloroplast genomes and developed seven DNA markers derived from five tandem

7

repeats and two InDel variations that discriminated between Korean Zanthoxylum species.

8

Metabolite profile analysis pointed to three metabolic groups, one with Korean Z. piperitum

9

samples, one with Korean Z. schinifolium samples and the last containing all the tested

10

Chinese Zanthoxylum species samples, which are considered to be Z. bungeanum based on

11

our results. Two markers were capable of distinguishing among these three groups. The

12

chloroplast genome sequences identified in this study represent a valuable genomics

13

resource for exploring diversity in Rutaceae, and the molecular markers will be useful for

14

authenticating dried Zanthoxylum berries in the marketplace.

15 16

Key words: Zanthoxylum, Z. schinifolium, Z. piperitum, chloroplast genome, marker

17 18

2

ACS Paragon Plus Environment

Page 3 of 35

19

Journal of Agricultural and Food Chemistry

Introduction

20

The chloroplast is an essential cytoplasmic organelle in plant cells, serving as the

21

location for photosynthesis to produce energy via CO2 assimilation.1 Chloroplasts retain an

22

autonomous organellar genome that encodes, among other proteins, the large subunit of the

23

key

24

carboxylase/oxygenase, rbcL).2 The circular chloroplast genome ranges in size from

25

approximately 120–217 kb and is maternally inherited in most angiosperms.3 The

26

chloroplast genome is usually divided into four parts including a large single copy (LSC)

27

region and a small single copy (SSC) region separated by a pair of inverted repeats (IRs).4-6

28

Compared to nuclear and mitochondrial genomes, chloroplast genomes are highly

29

conserved7, and there is little variation within a single species. Although chloroplast

30

genomes are highly conserved, small nucleotide variations offer enough information to

31

distinguish among different species and sometimes between different variants or cultivars

32

within a species.

enzyme

in

photosynthesis,

RuBisCO

(Ribulose-1,5-bisphosphate

33

Sequence variations in chloroplast genomes can be found in both protein-coding

34

genes (e.g., matK, rpoB, rpoC1 and rbcL) and intergenic regions (e.g., psbK-psbI, trnL-trnF

35

and atpF-atpH). These variations have been used to study plant genetic diversity and

36

evolution and to develop markers for authenticating plant species.8-13 Due to recent

37

advances in sequencing and assembly technologies, the complete chloroplast genome

38

sequences from more than 1800 species have been deposited in GenBank14. Phylogenetic

39

analyses based on chloroplast genome information have shed light on plant evolution. In

40

phylogenomic studies using chloroplast genes, the selection of the proper sequence datasets,

41

taxon sampling techniques and methods for phylogenetic analysis (Bayesian analysis,

42

maximum likelihood and so on) is important because different methods can produce

3

ACS Paragon Plus Environment

Journal of Agricultural and Food Chemistry

Page 4 of 35

43

different results; therefore, the correct methodology is still under debate15. To date,

44

comprehensive surveys of genetic diversity using chloroplast genomes have been

45

performed in many plant species using close relatives16-19 or within subspecies in plants

46

such as Oryza sativa20 and Panax ginseng14.

47

The Zanthoxylum genus, which belongs to the Rutaceae family, comprises

48

approximately 250 species of aromatic trees and shrubs.21 In Africa, the Americas and Asia,

49

many Zanthoxylum species are traditionally used as food supplements or drugs due to the

50

valuable aromatic oil compounds obtained from their pericarps and leaves.22-26 Some Asian

51

species such as Z. piperitum (Japanese pepper), Z. schinifolium (Korean pepper) and Z.

52

bungeanum (Szechuan pepper) are also used as condiments and spices due to their strong

53

taste, especially in Eastern Asian countries including Korea, Japan and China22-25, 27-29,

54

while American and African Zanthoxylum species are not used for culinary purposes.25

55

Essential oils from Z. piperitum and Z. schinifolium contain beneficial compounds with

56

anti-microbial, anti-inflammatory and antioxidant activities.23,

57

schinifolium and Z. piperitum plants appear similar, they can be distinguished based on the

58

arrangement of their spikes on branches, which is alternate in Z. schinifolium and

59

symmetrically opposite in Z. piperitum (Figure 1A, B). However, it is not easy to

60

discriminate between fruits and seeds harvested from these plants due to their similar

61

morphology, especially when their dried and ground pericarps are distributed in the

62

marketplace. The chemical components of these species differ, including aromatic

63

compounds (especially their isopulegol contents), but their metabolic profiles sometimes

64

differ within a single species, such as in samples from different countries.22, 23, 30 It is even

65

more difficult to distinguish between Z. schinifolium and Z. piperitum based on their

66

metabolic profiles when the pericarp powders from the species are mixed. Differences in

25, 27, 28, 30

Although Z.

4

ACS Paragon Plus Environment

Page 5 of 35

Journal of Agricultural and Food Chemistry

67

DNA sequences could be used to differentiate/identify these two Zanthoxylum species;

68

however, the limited availability of genetic and genomic resources for both species

69

represents an obstacle for the establishment of a clear molecular authentication system.31

70

Therefore, a reliable tool is needed for authenticating the pericarps from these Zanthoxylum

71

species at the species level.

72

Several efforts have focused on developing markers to distinguish Z. schinifolium

73

from Z. piperitum based on sequence variations in their internal transcribed spacer (ITS)

74

regions in nuclear ribosomal DNA (nrDNA),31,

75

However, complete chloroplast genome sequences exhibit less variation than nrDNA within

76

species and can therefore provide important information for comprehensive analysis of

77

genetic diversity and establishment of a clear molecular authentication system.17

32

a well-known barcoding region.14,

33

78

In this study, we obtained the complete chloroplast genome sequences of Z.

79

piperitum and Z. schinifolium by de novo assembly of whole-genome sequencing (WGS)

80

data using next-generation sequencing (NGS) technology. We also carried out metabolic

81

profiling of Korean and Chinese Zanthoxylum species, and developed practical molecular

82

markers that distinguish Z. schinifolium, Z. piperitum and Chinese Zanthoxylum species.

83

5

ACS Paragon Plus Environment

Journal of Agricultural and Food Chemistry

84

Materials and Methods

85

Plant materials and DNA preparation

Page 6 of 35

86

Twenty-three individual samples of Zanthoxylum species collected from Korea and

87

China were used in this study, including 11 Chinese Zanthoxylum species (CZ), eight

88

Korean Z. piperitum (KZP) samples and four Korean Z. schinifolium (KZS) samples; their

89

geographical origins are described in Table 1 and Figure 1C. Total genomic DNA was

90

extracted from fresh leaves of KZP-08, KZS-03 and KZS-04 and from freeze-dried fruits

91

from the other samples using a modified cetyltrimethylammonium bromide (CTAB)

92

method.34 The quality and quantity of the extracted DNA samples were examined using a

93

NanoDrop ND-1000 (Thermo Scientific, Wilmington, MA).

94 95

Whole-genome shotgun sequencing

96

To generate the chloroplast genome sequences, genomic DNA was extracted from

97

the leaves of KZS-03 (Z. schinifolium) and KZP-08 (Z. piperitum) and used for whole

98

genome shotgun sequencing on the Illumina MiSeq platform (Illumina, San Diego, CA)

99

and Illumina NextSeq500 (Illumina, San Diego, CA), respectively (Table 1). A paired-end

100

genomic library was constructed following the manufacturer’s instructions. Library

101

construction and sequencing were carried out by Lab Genomics Co. (Seongnam, Korea).

102 103

Chloroplast genome assembly and gene annotation

104

The generated sequencing data with Phred scores of 20 or less were filtered and de

105

novo assembled using CLC genome assembler (v. beta 4.6, CLC Inc., Rarhus, Denmark)

106

according to the dnaLCW method described in Kim et al.14,

107

representing the chloroplast genome were combined into a draft sequence based on the

33

Principal contigs

6

ACS Paragon Plus Environment

Page 7 of 35

Journal of Agricultural and Food Chemistry

108

linkages of overlapping contig sequences. Annotation of protein-coding genes in the

109

chloroplast genome was carried out using the DOGMA program35 and manually confirmed

110

using BLAST searches. Circular gene maps of the complete chloroplast genomes were

111

drawn using OGDRAW (http://ogdraw.mpimp-golm.mpg.de/)36.

112 113

Comparative analysis of the chloroplast genomes of Zanthoxylum species

114

The assembled chloroplast genome sequence of Z. schinifolium was compared to

115

the complete chloroplast genome sequence of Z. piperitum obtained from sample KZP-08

116

(GenBank No.: KT153018).37 The two sequences were aligned and compared using

117

MAFFT

118

(http://genome.lbl.gov/vista/mvista/submit.shtml)39. Annotation information for mVISTA

119

was obtained using DOGMA35 and tRNAscan-SE40, followed by manual curation that also

120

included a comparison with published chloroplast genome sequences. In addition, tandem

121

repeats (TRs) were identified from the chloroplast genomes of the two Zanthoxylum species

122

using the Tandem Repeats Finder program (http://tandem.bu.edu/trf/trf.html)41 and

123

compared to identify the different regions between Z. schinifolium and Z. piperitum. The

124

rates of nonsynonymous substitutions per nonsynonymous sites (Ka) over synonymous

125

substitutions

126

(http://www.bork.embl.de/pal2nal/)42.

(http://mafft.cbrc.jp/alignment/server/)38

per

synonymous

site

(Ks)

were

and

calculated

using

mVISTA

PAL2NAL

127

To compare the ndhG sequences from Z. piperitum, Z. schinifolium and Z.

128

bungeanum, these sequences and their translated sequences were aligned and compared

129

using MAFFT (http://mafft.cbrc.jp/alignment/server/)38.

130 131

Molecular marker analysis

7

ACS Paragon Plus Environment

Journal of Agricultural and Food Chemistry

Page 8 of 35

132

To validate the inter-species polymorphisms in the chloroplast genomes and to

133

develop DNA markers for discriminating these Zanthoxylum species, specific primers were

134

designed based on polymorphic regions derived from InDels and copy number variation of

135

the TRs between Z. piperitum and Z. schinifolium using the Primer 3 program

136

(http://bioinfo.ut.ee/primer3-0.4.0/).43, 44 Seven molecular markers were developed based on

137

the sequence variation between the Z. piperitum and Z. schinifolium chloroplast genomes.

138

PCR amplifications were performed in a total volume of 25 µl containing 20 ng of genomic

139

DNA template, 1× PCR buffer, 10 pM of each primer, 0.2 mM dNTPs and 1 U Taq DNA

140

polymerase (Vivagen, Korea). The amplified PCR fragments were analyzed via size

141

separation in 1.5% agarose gels or 9.0% polyacrylamide gels or by capillary electrophoresis

142

using a Fragment Analyzer (Advanced Analytical Technologies Inc., Ankeny, IA),

143

depending on the sizes of the PCR products.

144 145

Principle Component Analysis based on near infrared reflectance spectroscopy

146

analysis

147

The pericarp samples were cleaned, air-dried, placed into a stoppered glass vial and

148

dried for 12 h in an oven at 60°C to remove the moisture in the samples prior to near-

149

infrared reflectance (NIR) spectroscopy analysis. NIR spectra were obtained from the

150

samples using an NIR system (MPA; Bruker Optics, Ettlingen, Germany) over a

151

wavelength range of 10,000–4000 cm-1 using 32 scans at a resolution of 8 cm-1 per

152

spectrum. Each spectrum represents an average of 32 scanned spectra. Approximately 1 g

153

of sample was placed into a single glass sample vial. The spectra were acquired in the

154

reflectance mode using a glass sample vial as a reference standard. Each sample spectrum

155

was measured three times and the final spectra were averaged. NIR spectra are affected by 8

ACS Paragon Plus Environment

Page 9 of 35

Journal of Agricultural and Food Chemistry

156

both the chemical and physical properties of samples; the latter properties contribute to the

157

majority of unwanted variance among spectra. Therefore, spectral pre-processing must be

158

performed to reduce systematic noise, such as light scattering, path length differences,

159

baseline variation and so on. In this study, several spectral preprocessing methods were

160

used comparatively to obtain the optimum results, including first derivative, second

161

derivative, standard normal variate (SNV) and multiplicative scatter correction (MSC). To

162

avoid noise enhancement, which occurs as a consequence of derivative analysis, a

163

Savitzky-Golay smoothing filter was employed. NIR spectral data acquisition and spectral

164

preprocessing were performed with OPUS 7.0 software (Bruker Optics, Ettlingen,

165

Germany). SIMCA 13 software (Umetrics, Malmö, Sweden) was used for PCA. The data

166

sets were in Pareto scaling mode prior to PCA.

167 168

Phylogenetic analysis

169

The whole chloroplast genome sequences from 16 plant species were aligned using

170

ClustalW, and a maximum likelihood tree was generated with very strong branch swap

171

filter options using MEGA5 (version 5.2.2)45. To measure clade support, 1000 bootstrap

172

replicates were generated. A Bayesian tree was generated from the same sequence

173

alignment using BEAST (version 1.8.4)46 with the following options: substitution model,

174

HKY; base frequencies, Estimated; site heterogeneity model, None; tree prior, Coalescent -

175

Constant Size; tree model, Random starting model. The length of the chain for Two Markov

176

Chain Monte Carlo searches was 10,000,000 generations, with trees samples every 1000

177

generations. TreeAnnotator was run with the following options: burn-in (as trees), 100;

178

posterior probability limit, 0; target tree type, Maximum clade credibility tree; node heights,

9

ACS Paragon Plus Environment

Journal of Agricultural and Food Chemistry

Page 10 of 35

179

Mean heights. A final tree with posterior probability values at the clade nodes was

180

generated with FigTree version 1.4.3 and edited with MEGA5 (version 5.2.2)45.

181 182

Results and Discussion

183

Complete chloroplast genome sequence of Z. schinifolium

184

We obtained approximately 3.26 Gb and 4.23 Gb of paired-end sequences from

185

whole-genome sequencing of KZP-08 (Z. piperitum) and KZS-03 (Z. schinifolium),

186

respectively (Table 2), using low-coverage WGS, an efficient method that has been used to

187

produce complete chloroplast genome sequences in many plant species.14,

188

Compared to the raw sequence data for KZP-08, the sequencing data for KZS-03 contains

189

many more raw sequence reads corresponding to the chloroplast genome (164.87x

190

chloroplast coverage from 3.26 Gbp in KZP-08 and 1069.04x chloroplast coverage from

191

4.23 Gbp in KZS-03) (Table 2). Following de novo assembly of the KZS-03 data, five

192

contigs were produced for the chloroplast genome, which were ordered based on the

193

complete chloroplast genome sequence of Z. piperitum (GenBank No.: KT153018). The

194

contigs were merged into a single circular draft sequence by combining overlapping

195

sequences. After putative assembly errors were curated by mapping raw reads onto the draft

196

sequence, we obtained 158,963 bp of the complete chloroplast genome sequence, with 38.4%

197

GC content. The chloroplast genome of Z. schinifolium has a typical quadripartite structure

198

with a pair of inverted repeat regions (IRa and IRb, each 27,085 bp) separated by a large

199

single copy (LSC) region (86,528 bp) and a small single copy (SSC) region (18,265 bp)

200

(Table 2). In addition, analysis of GC contents (calculated based on the GC composition in

201

100 bp sliding windows) and raw read mapping depth revealed that parts of the IR regions

202

next to SSC have relatively high GC contents with lower sequencing depth in the

19, 20, 33, 37

10

ACS Paragon Plus Environment

Page 11 of 35

Journal of Agricultural and Food Chemistry

203

chloroplast genomes of both Zanthoxylum species (Figure S1). In total, we identified 111

204

genes in the Z. piperitum and Z. schinifolium chloroplast genomes, including 78 protein-

205

coding genes, 29 tRNA genes and 4 rRNA genes, including 18 genes containing introns

206

(Table S1). When counting gene numbers, duplicated genes in IRa and IRb were considered

207

to be one gene instead of two. The complete chloroplast genome sequence of Z.

208

schinifolium was deposited in GenBank under accession number KT321318.

209 210

Comparative analysis of the chloroplast genomes of Z. piperitum versus Z.

211

schinifolium

212

The chloroplast genome sequences of the two Zanthoxylum species are 97.1%

213

identical, and their GC contents are also very similar (38.5% and 38.4% in Z. piperitum and

214

Z. schinifolium, respectively) (Table 2). Compared to the Z. piperitum chloroplast genome,

215

the Z. schinifolium chloroplast genome is 809 bp longer, with shorter IR regions (27,644 bp

216

in Z. piperitum and 27,085 bp in Z. schinifolium) but longer LSC and SSC regions (85,340

217

bp and 17,526 bp, respectively, in Z. piperitum and 86,528 bp and 18,265 bp, respectively,

218

in Z. schinifolium) (Table 2). Both chloroplast genomes contain 112 identical genes, which

219

are present in the same order in the genome (Figure 2). The IR regions in both genomes

220

contain completely duplicated genes, including eight protein-coding genes (rps19, rpl2,

221

rpl23, ycf2, ycf15, ndhB, rps7 and rps12), seven tRNA genes (trnI-CAU, trnL-CAA, trnV-

222

GAC, trnI-GAU, trnA-UGC, trnR-ACG and trnN-GUU) and four rRNA genes (rrn16, rrn23,

223

rrn4.5 and rrn5) (Table S1).

224

We performed comparative analysis using mVISTA to determine the level of

225

sequence divergence, finding that intergenic regions are more divergent than genic regions

226

(Figure S2). The nucleotide and amino acid sequences of protein-coding genes are highly

11

ACS Paragon Plus Environment

Journal of Agricultural and Food Chemistry

Page 12 of 35

227

similar, with an average sequence similarity of 98.6 and 98.5%, respectively (Table S2).

228

When we aligned both chloroplast sequences, there were two notably large InDels: in the

229

rps16–trnQ (UUG) (473 bp) intergenic region in the LSC and in ycf1 (582 bp) in IRa (see

230

red arrowheads in Figure 2 and red dashed boxes in Figure S2). The ycf1 genes are located

231

in the borders between SSR and two IR regions, and an extended IR region in Z. piperitum

232

(approximately 560 bp) caused the latter large InDel (Figure 3).

233

We compared TRs within the chloroplast genome between Z. piperitum and Z.

234

schinifolium. Nineteen TR regions differ between the two Zanthoxylum species, and TRs

235

ranging from 15–38 bp in length were repeated from 0.5 to 3 times (Table 3). All TRs were

236

found in the 14 intergenic regions, including 13 located in LSC and one in an IR region

237

(blue arrowheads in Figure 2).

238

239

Divergence of coding gene sequences

240

Between the two Zanthoxylum species, 17 and 34 genes share identical nucleotide

241

and amino acid sequences, respectively (Table S2). Several genes with higher Ka, Ks or

242

Ka/Ks values are indicated in Figure S3. The average Ks values between the two

243

Zanthoxylum species are 0.0185, 0.0250 and 0.0059 in the LSC, SSC and IR regions,

244

respectively, with an average ratio of 0.0165 (Table S2, Figure S3). This result is in

245

agreement with the previous finding that IR regions are more conserved than other regions

246

because they frequently compensate for each other19. However, small variations were

247

observed even in highly conserved coding regions. Genes in SSC regions had higher rates

248

of changes in non-synonymous sites and higher average Ka/Ks ratios, indicating that genes

249

in the SSC region are relatively more variable between the two Zanthoxylum species than

250

those in other regions. Only one gene had a Ka/Ks ratio >1 (ndhG in the SSC region had a 12

ACS Paragon Plus Environment

Page 13 of 35

Journal of Agricultural and Food Chemistry

251

Ka/Ks ratio of 1.4324) (Table S2). NdhG is a subunit of the NADPH dehydrogenase

252

complex, which provides electrons for cyclic electron flow47 and helps protect the plant

253

against photo-oxidative stress.47, 48 NdhG might be structurally similar to the NuoJ subunit

254

of Escherichia coli complex I (NADH: ubiquinone oxidoreductase) and the Nqo10 subunit

255

of Thermus thermophiles complex I, and it appears that NdhG offers part of a

256

plastoquinone-binding site on its surface and is not involved in electron transport.49

257

Therefore, NdhG is likely to be more of a structural subunit than a functional subunit for

258

the NADPH dehydrogenase complex, and ZpNdhG and ZsNdhG, with four amino acid

259

differences, might both be functional. These result imply that accumulation of non-

260

synonymous mutations not affecting protein function sometimes can result in high value of

261

Ks/Ka even without positive selection.

262 263

Validation of inter-species polymorphism and development of authentication markers

264

We performed molecular classification of the two Zanthoxylum species by

265

designing TR markers based on InDel and copy number variations. We designed seven

266

primer sets derived from the two large InDels and five intergenic regions harboring TRs

267

and confirmed them by PCR analysis of Zanthoxylum species using two accessions each of

268

Chinese Zanthoxylum species (CZP-03, CZP-11), Korean Z. piperitum (KZP-01, KZP-08)

269

and Korean Z. schinifolium (KZS-03, KZS-04) (Table 3 and Table 4). Schematic diagrams

270

of the five TR markers are shown in Figure 4A–E. The lengths of genes used as markers are

271

varied due to tandem repeats and insertions. The sizes of PCR products from these markers

272

were varied according to their size expected from their sequences (Figure 4F–J). Although

273

3 markers (2 TR markers and 1 InDel marker) were the same between Z. piperitum and Z.

274

bungeanum, all seven markers revealed inter-specific polymorphism and clearly

13

ACS Paragon Plus Environment

Journal of Agricultural and Food Chemistry

Page 14 of 35

275

discriminated between Z. piperitum and Z. schinifolium (see Figure 4 for TR markers,

276

Figure S4 for InDel markers and Figure S5 for all samples).

277 278

Authentication of Zanthoxylum species using the newly developed markers and

279

metabolite profile analysis

280

The sizes of PCR products from KZP and KZS amplified using the TR markers and InDel

281

markers (Figure 4, Figure S4 and Figure S5) were as expected (Table 4). Since the CZ

282

samples were purchased from markets in China, their tentative production area was known

283

but their scientific names were uncertain, although we thought they could have been

284

obtained from Z. piperitum or Z. schinifolium plants grown in China. When we analyzed

285

metabolite data using near infrared reflectance spectroscopy analysis, principle component

286

analysis (PCA) indicated that the CZ samples harbored distinct metabolites and therefore

287

might have been different species from the KZP and KZS samples (Figure 5). Several

288

Zanthoxylum species grow in China, and the chloroplast genome sequence of one of these

289

species, Z. bungeanum, has been reported (GenBank No. KX497031). Notably, the sizes of

290

PCR products from CZ matched the sizes expected from the Z. bungeanum chloroplast

291

genome (Figure 4, Figure S4, Figure S5 and Table 4). Several varieties of Z. bungeanum

292

also grow in Szechuan province; the fruits of these varieties, as well as of Z. armatum, are

293

commonly referred to as Szechuan peppers.29 Of the 11 CZ samples, three were also

294

obtained from Szechuan province, and we believe that the CZ samples are Szechuan

295

peppers. To date, the chloroplast genomes of all Chinese Zanthoxylum species except Z.

296

bungeanum have yet to be sequenced, so it is unclear if the chloroplast genomes of several

297

Chinese Zanthoxylum species are highly similar. However, we confirmed that the sequence

298

lengths of all CZ samples collected from China match that expected from Z. bungeanum.

14

ACS Paragon Plus Environment

Page 15 of 35

Journal of Agricultural and Food Chemistry

299

Therefore, our two TR markers, IMZanTR-1 and IMZanTR-3, can be used to distinguish Z.

300

piperitum, Z. schinifolium and Z. bungeanum and to identify Korean pepper, Japanese

301

pepper and some Szechuan peppers in the marketplace.

302 303 304

Zanthoxylum species in East Asia

305

Comparative analysis of metabolites revealed different profiles between Z.

306

piperitum and Z. schinifolium. While the Z. piperitum pericarp produces oleic acid as a

307

major fatty acid, the Z. schinifolium pericarp produces linolenic acid instead.22 Among

308

terpenoids, isopulegol is produced at the highest levels in Z. piperitum pericarp, followed

309

by myrcene, whereas myrcene is produced at the highest levels in Z. schinifolium pericarp,

310

followed by citronellal.22 Since isopulegol can be synthesized from citronellal by squalene

311

hopene cyclase,50 perhaps Z. piperitum contains high levels of oxidosqualene cyclase for

312

this conversion. However, Z. piperitum fruit from Japan contains high levels of limonene as

313

a major terpene product and produces very little isopulegol,51 and Z. schinifolium from

314

China produces linalool as a major product.52 The metabolite profiles of plants can vary

315

based on the environment, developmental stage, storage conditions after harvest, metabolite

316

extraction method and so on. To identify the differences in metabolite profiles among

317

Zanthoxylum species, it is best to perform experiments using the same method. Unlike

318

metabolite analysis, genetic information is highly reproducible and digitalized.

319

We constructed a maximum likelihood (ML) tree for several Rutaceae species

320

using whole chloroplast genome sequences with a clade of Anacardiaceae species as an

321

outgroup (Figure 6). In this phylogenetic tree, Z. piperitum and Z. bungeanum are closer to

322

each other than to Z. schinifolium. The ndhG sequence data support this result; there are

15

ACS Paragon Plus Environment

Journal of Agricultural and Food Chemistry

Page 16 of 35

323

four amino acids difference between Z. piperitum and Z. schinifolium but no difference

324

between Z. piperitum and Z. bungeanum. The Bayesian (B) tree has the same structure as

325

the ML tree (Figure S6). Interestingly, Korean Z. piperitum is genetically closer to Chinese

326

Z. bungeanum than to Korean Z. schinifolium. Perhaps Z. piperitum and Z. bungeanum

327

diverged after their ancestor split from Z. schinifolium; this hypothesis is well supported by

328

both bootstrap and posterior probability values from the ML and B trees, respectively.

329

Z. piperitum is preferred for use as a spice over Z. schinifolium in Korea and Japan,

330

and Z. bungeanum is also widely used as a spice in China. There are additional

331

Zanthoxylum species in East Asia, some of which also produce culinary seeds. More

332

sequencing data from these species, such as Z. armatum and Z. simulans, will shed light on

333

the evolution of the Zanthoxylum species used as spices in this area.

334

The sizes of the PCR products obtained using the newly developed markers were

335

more similar between the KZP and CZ samples versus the KZS samples (Figure 4, Table 3).

336

However, these markers must be much more broadly applicable to other Zanthoxylum

337

species when they are used to discriminate species other than Z. piperitum, Z. schinifolium

338

and Z. bungeanum. The availability of additional sequencing data from other Zanthoxylum

339

species will also facilitate the development of a system for discriminating all of these

340

species in the marketplace.

341 342 343 344

16

ACS Paragon Plus Environment

Page 17 of 35

Journal of Agricultural and Food Chemistry

345 346

Abbreviations Used

347

LSC, large single copy; SSC, small single copy; IR, inverted repeat; TR, tandem repeat;

348

InDel, insertion or deletion; KZP, Korean Zanthoxylum piperitum; KZS, Korean

349

Zanthoxylum schinifolium; CZ, Chinese Zanthoxylum species

350 351

Acknowledgments

352

We thank all members of the Laboratory of Functional Crop Genomics and Biotechnology,

353

Seoul National University and Phyzen Genomics Institute for their technical assistance.

354 355

Funding Sources

356

This research was supported by the Next-Generation BioGreen21 Program for Agriculture

357

and Technology Development (Project No. PJ01103001) of the Rural Development

358

Administration and the Bio and Medical Technology Development Program of the NRF

359

funded by the Korean government, MSIP (NRF-2015M3A9A5030733), Republic of Korea.

360 361

17

ACS Paragon Plus Environment

Journal of Agricultural and Food Chemistry

Page 18 of 35

Literature cited

1.

Sharkey, T. D., Photosynthesis in intact leaves of C3 plants: physics, physiology and rate

limitations. The Botanical Review 1985, 51, 53-105. 2.

Bedbrook, J. R.; Coen, D. M.; Beaton, A. R.; Bogorad, L.; Rich, A., Location of the single

gene for the large subunit of ribulosebisphosphate carboxylase on the maize chloroplast chromosome. J Biol Chem 1979, 254, 905-10. 3.

Hagemann, R., The foundation of extranuclear inheritance: plastid and mitochondrial

genetics. Molecular Genetics and Genomics 2010, 283, 199-209. 4.

Group, C. P. W., A DNA barcode for land plants. Proceedings of the National Academy of

Sciences of the United States of America 2009, 106, 12794-7. 5.

Palmer, J. D., Contrasting modes and tempos of genome evolution in land plant organelles.

Trends in Genetics 1990, 6, 115-120. 6.

J.D., P., Cell organelles. Plant Gene Research 1992, 99-122.

7.

Taberlet, P.; Gielly, L.; Pautou, G.; Bouvet, J., Universal primers for amplification of three

non-coding regions of chloroplast DNA. Plant molecular biology 1991, 17, 1105-1109. 8.

Bortiri, E.; Oh, S.-H.; Jiang, J.; Baggett, S.; Granger, A.; Weeks, C.; Buckingham, M.; Potter,

D.; Parfitt, D. E., Phylogeny and systematics of Prunus (Rosaceae) as determined by sequence analysis of ITS and the chloroplast trnL-trnF spacer DNA. Systematic Botany 2001, 26, 797-807. 9.

Samuel, R.; Stuessy, T. F.; Tremetsberger, K.; Baeza, C. M.; Siljak-Yakovlev, S.,

Phylogenetic relationships among species of Hypochaeris (Asteraceae, Cichorieae) based on ITS, plastid trnL intron, trnL-F spacer, and matK sequences. American Journal of Botany 2003, 90, 496507. 10.

Lee, B.-R.; Kim, S.-H.; Huh, M.-K., Phylogenic Study of Genus Asarum (Aristolochiaceae)

in Korea by trnL-trnT Region. Journal of Life Science 2010, 20, 1697-1703. 11.

Huh, M.-K.; Yoon, H.-J.; Choi, J.-S., Phylogenic Study of Genus Citrus and Two Relative

Genera in Korea by trnL-trnF Sequence. Journal of Life Science 2011, 21, 1452-1459.

18

ACS Paragon Plus Environment

Page 19 of 35

Journal of Agricultural and Food Chemistry

12.

Yang, J. Y.; Jang, S. Y.; Kim, H.-K.; Park, S. J., Development of a molecular marker to

discriminate Korean Rubus species medicinal plants based on the nuclear ribosomal DNA internal transcribed spacer and chloroplast trnL-F intergenic region sequences. Journal of the Korean Society for Applied Biological Chemistry 2012, 55, 281-289. 13.

Kim, J. H.; Jung, J.-Y.; Choi, H.-I.; Kim, N.-H.; Park, J. Y.; Lee, Y.; Yang, T.-J., Diversity

and evolution of major Panax species revealed by scanning the entire chloroplast intergenic spacer sequences. Genetic resources and crop evolution 2013, 60, 413-425. 14.

Kim, K.; Lee, S.-C.; Lee, J.; Lee, H. O.; Joh, H. J.; Kim, N.-H.; Park, H.-S.; Yang, T.-J.,

Comprehensive Survey of Genetic Diversity in Chloroplast Genomes and 45S nrDNAs within Panax ginseng Species. PLoS One 2015, 10, e0117159. 15.

Jansen, R. K.; Kaittanis, C.; Saski, C.; Lee, S. B.; Tomkins, J.; Alverson, A. J.; Daniell, H.,

Phylogenetic analyses of Vitis (Vitaceae) based on complete chloroplast genome sequences: effects of taxon sampling and phylogenetic methods on resolving relationships among rosids. BMC evolutionary biology 2006, 6, 32. 16.

Terakami, S.; Matsumura, Y.; Kurita, K.; Kanamori, H.; Katayose, Y.; Yamamoto, T.;

Katayama, H., Complete sequence of the chloroplast genome from pear (Pyrus pyrifolia): genome structure and comparative analysis. Tree Genetics & Genomes 2012, 8, 841-854. 17.

Ku, C.; Chung, W.-C.; Chen, L.-L.; Kuo, C.-H., The complete plastid genome sequence of

Madagascar periwinkle Catharanthus roseus (L.) G. Don: plastid genome evolution, molecular marker identification, and phylogenetic implications in asterids. PLoS One 2013, 8, e68518. 18.

Su, H. J.; Hogenhout, S. A.; Al-Sadi, A. M.; Kuo, C. H., Complete chloroplast genome

sequence of Omani lime (Citrus aurantiifolia) and comparative analysis within the rosids. PLoS One 2014, 9, e113049. 19.

Cho, K. S.; Yun, B. K.; Yoon, Y. H.; Hong, S. Y.; Mekapogu, M.; Kim, K. H.; Yang, T. J.,

Complete Chloroplast Genome Sequence of Tartary Buckwheat (Fagopyrum tataricum) and Comparative Analysis with Common Buckwheat (F. esculentum). PLoS One 2015, 10, e0125332. 20.

Tong, W.; He, Q.; Wang, X. Q.; Yoon, M. Y.; Ra, W. H.; Li, F.; Yu, J.; Oo, W. H.; Min, S. K.;

Choi, B. W., A chloroplast variation map generated using whole genome re‐sequencing of Korean landrace rice reveals phylogenetic relationships among Oryza sativa subspecies. Biological Journal of the Linnean Society 2015, 115, 940-952.

19

ACS Paragon Plus Environment

Journal of Agricultural and Food Chemistry

21.

Page 20 of 35

Arun, K.; Paridhavi, M., An ethno botanical phytochemical and pharmacological utilization

of widely distributed species Zanthoxylum: a comprehensive overview. Int. J. Pharm. Invent 2012, 2, 24-35. 22.

Ko, Y.-S.; Han, H.-J., Chemical Constituents of Korean Chopi (Zanthoxylum piperitum) and

Sancho (Zanthoxylum schinifolium) KOREAN J. FOOD SCI. TECHNOL. 1996, 28, 19-27. 23.

Kim, J.; Jeong, C.-H.; Bae, Y.-I.; Shim, K.-H., Chemical components of Zanthoxylum

schinifolium and Zanthoxylum piperitum leaves Korean J POSTARVEST SCI. TECHNOL 2000, 7, 189-194. 24.

Adesina, S., The Nigerian Zanthoxylum; chemical and biological values. African Journal of

Traditional, Complementary, and Alternative Medicines 2005, 2, 282-301. 25.

Yang, X., Aroma constituents and alkylamides of red and green huajiao (Zanthoxylum

bungeanum and Zanthoxylum schinifolium). Journal of agricultural and food chemistry 2008, 56, 1689-1696. 26.

Gupta, D. D.; Mandi, S. S., Species Specific AFLP Markers for authentication of

Zanthoxylum acanthopodium & Zanthoxylum oxyphyllum. J Med Plants 2013, 1, 1-9. 27.

Paik, S.-Y.; Koh, K.-H.; Beak, S.-M.; Paek, S.-H.; Kim, J.-A., The essential oils from

Zanthoxylum schinifolium pericarp induce apoptosis of HepG2 human hepatoma cells through increased production of reactive oxygen species. Biological and Pharmaceutical Bulletin 2005, 28, 802-807. 28.

Yamazaki, E.; Inagaki, M.; Kurita, O.; Inoue, T., Antioxidant activity of Japanese pepper

(Zanthoxylum piperitum DC.) fruit. Food chemistry 2007, 100, 171-177. 29.

Xiang, L.; Liu, Y.; Xie, C.; Li, X.; Yu, Y.; Ye, M.; Chen, S., The Chemical and Genetic

Characteristics of Szechuan Pepper (Zanthoxylum bungeanum and Z. armatum) Cultivars and Their Suitable Habitat. Front Plant Sci 2016, 7, 467. 30.

Cho, S.-H.; Kwon, E.-H.; Oh, S.-H.; Woo, M.-H., Suppressive Effects of the Extract of

Zanthoxylum schinifolium and Essential Oil from Zanthoxylum piperitum on Pacific Saury, Coloabis saira Kwamegi. Journal of the Korean Society of Food Science and Nutrition 2009, 38, 1753-1759. 31.

Sun, Y.-L.; Park, W.-G.; Kwon, O.-W.; Hong, S.-K., The internal transcribed spacer rDNA

specific markers for identification of Zanthoxylum piperitum. African Journal of Biotechnology 2013, 9, 6027-6039. 20

ACS Paragon Plus Environment

Page 21 of 35

Journal of Agricultural and Food Chemistry

32.

Kim, Y.-J.; Zhang, D.; Yang, D.-C., Biosynthesis and biotechnological production of

ginsenosides. Biotechnology advances 2015, 33, 717-735. 33.

Kim, W. J.; Ji, Y.; Lee, Y. M.; Kang, Y. M.; Choi, G.; Moon, B. C., Development of

Molecular Markers for the authentication of Zanthoxyli Pericarpium by the analysis of rDNA-ITS DNA barcode regions. The Korea Journal of Herbology 2015, 30, 41-47. 34.

Allen, G.; Flores-Vergara, M.; Krasynanski, S.; Kumar, S.; Thompson, W., A modified

protocol for rapid DNA isolation from plant tissues using cetyltrimethylammonium bromide. Nature protocols 2006, 1, 2320-2325. 35.

Wyman, S. K.; Jansen, R. K.; Boore, J. L., Automatic annotation of organellar genomes with

DOGMA. Bioinformatics 2004, 20, 3252-3255. 36.

Lohse, M.; Drechsel, O.; Bock, R., OrganellarGenomeDRAW (OGDRAW): a tool for the

easy generation of high-quality custom graphical maps of plastid and mitochondrial genomes. Current genetics 2007, 52, 267-274. 37.

Kim, K.; Lee, S. C.; Lee, J.; Yu, Y.; Yang, K.; Choi, B. S.; Koh, H. J.; Waminal, N. E.; Choi,

H. I.; Kim, N. H.; Jang, W.; Park, H. S.; Lee, J.; Lee, H. O.; Joh, H. J.; Lee, H. J.; Park, J. Y.; Perumal, S.; Jayakodi, M.; Lee, Y. S.; Kim, B.; Copetti, D.; Kim, S.; Kim, S.; Lim, K. B.; Kim, Y. D.; Lee, J.; Cho, K. S.; Park, B. S.; Wing, R. A.; Yang, T. J., Complete chloroplast and ribosomal sequences for 30 accessions elucidate evolution of Oryza AA genome species. Sci Rep 2015, 5, 15655. 38.

Katoh, K.; Toh, H., Recent developments in the MAFFT multiple sequence alignment

program. Briefings in bioinformatics 2008, 9, 286-298. 39.

Frazer, K. A.; Pachter, L.; Poliakov, A.; Rubin, E. M.; Dubchak, I., VISTA: computational

tools for comparative genomics. Nucleic Acids Res 2004, 32, W273-9. 40.

Schattner, P.; Brooks, A. N.; Lowe, T. M., The tRNAscan-SE, snoscan and snoGPS web

servers for the detection of tRNAs and snoRNAs. Nucleic Acids Res 2005, 33, W686-9. 41.

Benson, G., Tandem repeats finder: a program to analyze DNA sequences. Nucleic acids

research 1999, 27, 573. 42.

Suyama, M.; Torrents, D.; Bork, P., PAL2NAL: robust conversion of protein sequence

alignments into the corresponding codon alignments. Nucleic acids research 2006, 34, W609-W612.

21

ACS Paragon Plus Environment

Journal of Agricultural and Food Chemistry

43.

Page 22 of 35

Koressaar, T.; Remm, M., Enhancements and modifications of primer design program

Primer3. Bioinformatics 2007, 23, 1289-91. 44.

Untergasser, A.; Cutcutache, I.; Koressaar, T.; Ye, J.; Faircloth, B. C.; Remm, M.; Rozen, S.

G., Primer3--new capabilities and interfaces. Nucleic Acids Res 2012, 40, e115. 45.

Tamura, K.; Peterson, D.; Peterson, N.; Stecher, G.; Nei, M.; Kumar, S., MEGA5: molecular

evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Mol Biol Evol 2011, 28, 2731-9. 46.

Drummond, A. J.; Suchard, M. A.; Xie, D.; Rambaut, A., Bayesian phylogenetics with

BEAUti and the BEAST 1.7. Mol Biol Evol 2012, 29, 1969-73. 47.

Casano, L. M.; Zapata, J. M.; Martin, M.; Sabater, B., Chlororespiration and poising of

cyclic electron transport. Plastoquinone as electron transporter between thylakoid NADH dehydrogenase and peroxidase. J Biol Chem 2000, 275, 942-8. 48.

Martin, M.; Casano, L. M.; Sabater, B., Identification of the product of ndhA gene as a

thylakoid protein synthesized in response to photooxidative treatment. Plant Cell Physiol 1996, 37, 293-8. 49.

Battchikova, N.; Eisenhut, M.; Aro, E. M., Cyanobacterial NDH-1 complexes: novel insights

and remaining puzzles. Biochim Biophys Acta 2011, 1807, 935-44. 50.

Hammer, S. C.; Marjanovic, A.; Dominicus, J. M.; Nestl, B. M.; Hauer, B., Squalene hopene

cyclases are protonases for stereoselective Bronsted acid catalysis. Nat Chem Biol 2015, 11, 121-6. 51.

Jiang, L.; Kubota, K., Differences in the volatile components and their odor characteristics of

green and ripe fruits and dried pericarp of Japanese pepper (Xanthoxylum piperitum DC.). J Agric Food Chem 2004, 52, 4197-203. 52.

Yang, X., Aroma constituents and alkylamides of red and green huajiao (Zanthoxylum

bungeanum and Zanthoxylum schinifolium). J Agric Food Chem 2008, 56, 1689-96. 53.

Lee, J.; Lee, H. J.; Kim, K.; Lee, S. C.; Sung, S. H.; Yang, T. J., The complete chloroplast

genome sequence of Zanthoxylum piperitum. Mitochondrial DNA A DNA Mapp Seq Anal 2016, 27, 3525-6.

22

ACS Paragon Plus Environment

Page 23 of 35

Journal of Agricultural and Food Chemistry

Figure captions Figure 1. Zanthoxylum species used in this study, and areas of origin of leaf samples or seed products collected from various markets. (A, B) Morphological characteristics of Z. piperitum and Z. schinifolium, respectively, used for genome and RNA sequencing. Opposite or alternately arranged spikes are indicated by orange circles. (C) Collection areas of seed products utilized for food or oriental medicine production. Collection areas for Korean Z. piperitum (KZP), Korean Z. schinifolium (KZS) and Chinese Zanthoxylum species (CZ) are indicated by circles on the map.

Figure 2. Circular gene maps of the chloroplast genomes of the two Zanthoxylum species. Genes shown inside the circle are transcribed clockwise, and those outside the circle are transcribed counterclockwise. The genes are colored according to their functions, as shown in the legend. Polymorphic sites between two Zanthoxylum species derived from the copy number variation of tandem repeat sequences are denoted with blue arrowheads at 14 locations, and the two InDel sites are marked by red arrows; those used for marker development are indicated by “*”.

Figure 3. Comparison of the borders of SSC and IR regions between the chloroplast genomes of the two Zanthoxylum species. Compared to the Z. piperitum sequences, IR regions in Z. schinifolium are shorter, and part of ycf1 in the SSC region (indicated with a triangle) was removed.

Figure 4. Schematic diagram of TRs and insertions in five TR markers (A–E), and confirmation of these markers for the discrimination of Zanthoxylum species (F–G). PCR products from two individuals each of CZ, KZP and KZS are of the predicted sizes using TR23

ACS Paragon Plus Environment

Journal of Agricultural and Food Chemistry

Page 24 of 35

type markers. The primer sequences and expected sizes are shown in Table 4, and PCR results from whole samples (11 CZ, 8 KZP and 4 KZS) are shown in Figure S5.

Figure 5. Comparison of the metabolite profiles of Zanthoxylum seed samples. Principle Component Analysis of pericarp metabolites based on near infrared reflectance spectroscopy analysis data formed three groups: Chinese Zanthoxylum species (CZ), Korean Z. piperitum (KZP) and Korean Z. schinifolium (KZS).

Figure 6. Phylogenetic tree including sequences from Z. piperitum, Z. schinifolium and Z. bungeanum. The maximum likelihood tree was generated using whole chloroplast genome sequences with 1000 bootstrap replicates. Bootstrap values are shown in percentages, and posterior probability values from the Bayesian tree (adapted from Figure S6) are in parentheses. A clade of Anacardiaceae species was used as an outgroup.

24

ACS Paragon Plus Environment

Page 25 of 35

Journal of Agricultural and Food Chemistry

Table 1. List of Zanthoxylum species used in the study Species Chinese Zanthoxylum species (CZ)

Korean Z. piperitum (KZP)

Korean Z. schinifolium (KZS)

Sample CZ-01

Geographical origin Szechuan province

CZ-02

Szechuan province

CZ-03

Szechuan province

CZ-04

Shantung province

CZ-05

Shantung province

CZ-06

Shanxi province

CZ-07

Shanxi province

CZ-08

Shanxi province

CZ-09

Shanxi province

CZ-10

Shanxi province

CZ-11

Shanxi province

KZP-01

Geochang, Gyeongsangnam-do

KZP-02

Mungyeong, Gyeongsangbuk-do

KZP-03

Gurye, Jeollanam-do

KZP-04

Gurye, Jeollanam-do

KZP-05

Gurye, Jeollanam-do

KZP-06

Gimje, Jeollabuk-do

KZP-07

Imsil, Jeollabuk-do

KZP-08*

Geoje, Gyeongsangnam-do

KZS-01

Hongcheon, Gangwon-do

KZS-02

Yangpyeong, Gyeonggi-do

KZS-03*

Chuncheon, Gangwon-do

KZS-04 Yongin, Gyeonggi-do * Plant materials used for complete chloroplast genome sequencing

25

ACS Paragon Plus Environment

Journal of Agricultural and Food Chemistry

Page 26 of 35

Table 2. Summary of whole genome sequencing and chloroplast genome assembly in Zanthoxylum species Species (Sample No.)

Raw data amount (Gbp)

Z. piperitum* 3.26 (KZP-08) Z. schinifolium 4.23 (KZS-03) Sequence identity (%) * cited from Lee et al. 53

GenBank No.

Cp coverage (x)

KT153018 KT321318

Length (bp)

GC content

LSC

SSC

IR

Total

164.87

85,340

17,526

27,644

158,154

38.5%

1069.04

86,528

18,265

27,085

158,963

38.4%

96.4

89.9

97.1

97.1

26

ACS Paragon Plus Environment

Page 27 of 35

Journal of Agricultural and Food Chemistry

Table 3. Intergenic regions containing tandem repeats with copy number variation between Zanthoxylum piperitum and Z. schinifolium No. 1 2 3 4 5 6

Position trnH(GUG) - psbA psbK - psbI trnS(GCU) trnG(UCC) trnR(UCU) - atpA atpH - atpI petN - psbM

Sequence (length) TAATTTTCTTAGTAGTATTC (20 bp) AGAGCCAACCACAATGT (17 bp) GTTACATTGTTACATTACACA (21 bp)

TTATATATTTATATT (15 bp) AAAGAAAATATTAAG (15 bp) AGTAATTTCATTATA (15 bp) TTTAATTCAGTAATTCAATT (20 bp) CCATTTAGAATTTTTCAGTAATTTAATT (28 bp) 7* psbM - trnD(GUC) AATACTAAAATACTAATA (18 bp) CTTTTTTTTATTTATCATT (17 bp) 8* psbZ - trnG(GCC) AAATAAATATTAATATAATAATT (23 bp) TTATTAATAGAAATATATATTATTTATA (28 bp) 9* trnS(GGA) - rps4 GGTGAAAGGGGAAATTTGTACGAGCCCGTTATTTTAGT (38 bp) 10 trnT(UGU) - trnL(UAA) TCTTAATCTATTCTA (15 bp) 11 ndhC - trnV(UAC) TAGTTTCGTTTGTTTGTTGT (20 bp) TTTTGATTCTATTCTATA (18 bp) 12* rpl33 - rps18 TTATTTCATATATTTAAATAGAAACAA (27 bp) 13 rpl16 - rps3 TTTAGAGATAATCTCAA (17 bp) 14 rrn4.5 - rrn5 ATTGTTCAACTCTTTGACAACATGAAAAAACC (32 bp) * Regions where PCR markers were developed for validation

Copy Number KZP KZS 2 1 1 2 1.5

2

1 2 0.5 2 1 1.5 1 1 1 1 1 1 1 1 1 1

2 1 2 1 2 2 2 2 3 2 2 2 2 3 2 2

27

ACS Paragon Plus Environment

Journal of Agricultural and Food Chemistry

Page 28 of 35

Table 4. Newly developed markers for validation of the polymorphic sites among Zanthoxylum species Product size (bp) Marker name

IMZanTR-1 IMZanTR-2 IMZanTR-3 IMZanTR-4 IMZanTR-5 IMZanInDel-1 IMZanInDel-2

Primer sequence (5´- 3´)

Region

Forward

AATTGAGTTGGGAAATCAAACTGTA

Reverse

CTCGCTAGAATCCAAGACAATAGAA

Forward

GATCTTTTATCCACACACCGAATAC

Reverse

GAAAAGACAGAATGGAAAAGAATGA

Forward

GGGATCAAACTTCTGGAACTTGA

Reverse

TTATCCCGAGTTAGGCCAGATAC

Forward

CAATTCCCAGTTTCTGTGATACG

Reverse

CTCGTCAGACTTAAACCTAACTAAAAT

Forward

GTGCTTGTGTGTCACCCTTG

Reverse

GAGTCGCTTGGTTTTATCCAT

Forward

AGTGGTAAGGCAACGGGTTT

Reverse

GATACAAAGACAAAAAGTCCCACA

Forward

CAAAATCGAGGAAACGGAAGAGA

Reverse

TTGATGGAATTACGAATGGGGTC

Specific to

Type

290

CZ/KZP/KZS

TR & InDel

133

163

KZS

TR

389

405

520

CZ/KZP/KZS

TR & InDel

trnS(GGA) - rps4

210

210

248

KZS

TR

rpl33 - rps18

132

132

258

KZS

TR & InDel

rps16 - trnQ(UUG)

594

120

593

KZP

InDel

ycf1

943

943

361

KZS

InDel

CZ*

KZP

KZS

petN - psbM

196

214

psbM - trnD(GUC)

131

psbZ - trnG(GCC)

* Predicted CZ product size based on Zanthoxylum bungeanum (GenBank No. KX497031)

28

ACS Paragon Plus Environment

Page 29 of 35

Journal of Agricultural and Food Chemistry

Figure 1. Zanthoxylum species used in this study, and areas of origin of leaf samples or seed products collected from various markets. (A, B) Morphological characteristics of Z. piperitum and Z. schinifolium, respectively, used for genome and RNA sequencing. Opposite or alternately arranged spikes are indicated by orange circles. (C) Collection areas of seed products utilized for food or oriental medicine production. Collection areas for Korean Z. piperitum (KZP), Korean Z. schinifolium (KZS) and Chinese Zanthoxylum species (CZ) are indicated by circles on the map.

29

ACS Paragon Plus Environment

Journal of Agricultural and Food Chemistry

Page 30 of 35

Figure 2. Circular gene maps of the chloroplast genomes of the two Zanthoxylum species. Genes shown inside the circle are transcribed clockwise, and those outside the circle are transcribed counterclockwise. The genes are colored according to their functions, as shown in the legend. Polymorphic sites between two Zanthoxylum species derived from the copy number variation of tandem repeat sequences are denoted with blue arrowheads at 14 locations, and the two InDel sites are marked by red arrows; those used for marker development are indicated by “*”.

30

ACS Paragon Plus Environment

Page 31 of 35

Journal of Agricultural and Food Chemistry

Figure 3. Comparison of the borders of SSC and IR regions between the chloroplast genomes of the two Zanthoxylum species. Compared to the Z. piperitum sequences, IR regions in Z. schinifolium are shorter, and part of ycf1 in the SSC region (indicated with a triangle) was removed.

31

ACS Paragon Plus Environment

Journal of Agricultural and Food Chemistry

Page 32 of 35

Figure 4. Schematic diagram of TRs and insertions in five TR markers (A–E), and confirmation of these markers for the discrimination of Zanthoxylum species (F–G). PCR products from two individuals each of CZ, KZP and KZS are of the predicted sizes using TR-type markers. The primer sequences and expected sizes are shown in Table 4, and PCR results from whole samples (11 CZ, 8 KZP and 4 KZS) are shown in Figure S5.

32

ACS Paragon Plus Environment

Page 33 of 35

Journal of Agricultural and Food Chemistry

Figure 5. Comparison of the metabolite profiles of Zanthoxylum seed samples. Principle Component Analysis of pericarp metabolites based on near infrared reflectance spectroscopy analysis data formed three groups: Chinese Zanthoxylum species (CZ), Korean Z. piperitum (KZP) and Korean Z. schinifolium (KZS).

33

ACS Paragon Plus Environment

Journal of Agricultural and Food Chemistry

Page 34 of 35

Figure 6. Phylogenetic tree including sequences from Z. piperitum, Z. schinifolium and Z. bungeanum. The maximum likelihood tree was generated using whole chloroplast genome sequences with 1000 bootstrap replicates. Bootstrap values are shown in percentages, and posterior probability values from the Bayesian tree (adapted from Figure S6) are in parentheses. A clade of Anacardiaceae species was used as an outgroup.

34

ACS Paragon Plus Environment

Page 35 of 35

Journal of Agricultural and Food Chemistry

For Table of Contents Only

35

ACS Paragon Plus Environment