Subscriber access provided by CORNELL UNIVERSITY LIBRARY
Article
Authentication markers for five major Panax species developed via comparative analysis of complete chloroplast genome sequences Van Binh Nguyen, Hyun-Seung Park, Sang-Choon Lee, Junki Lee, Jee Young Park, and Tae-Jin Yang J. Agric. Food Chem., Just Accepted Manuscript • Publication Date (Web): 22 May 2017 Downloaded from http://pubs.acs.org on May 23, 2017
Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.
Journal of Agricultural and Food Chemistry is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.
Page 1 of 30
Journal of Agricultural and Food Chemistry
Authentication markers for five major Panax species developed via comparative analysis of complete chloroplast genome sequences
Van Binh Nguyena, Hyun-Seung Parka, Sang-Choon Leea, Junki Leea, Jee Young Parka, TaeJin Yanga,b* a
Department of Plant Science, Plant Genomics and Breeding Institute, and Research Institute
of Agriculture and Life Sciences, College of Agriculture and Life Sciences, Seoul National University, Seoul, 151-921, Republic of Korea. b
Crop Biotechnology Institute/GreenBio Science and Technology, Seoul National University,
Pyeongchang 232-916, Republic of Korea. *Corresponding author: Tae-Jin Yang E-mail:
[email protected] Tel: +82-2-880-4547, Fax: +82-2-877-4550
1 ACS Paragon Plus Environment
Journal of Agricultural and Food Chemistry
1
Abstract
2
Ginseng represents a set of high-value medicinal plants of different species: Panax
3
ginseng (Asian ginseng), P. quinquefolius (American ginseng), P. notoginseng (Chinese
4
ginseng), P. japonicus (Bamboo ginseng), and P. vietnamensis (Vietnamese ginseng). Each
5
species is pharmacologically and economically important, with differences in efficacy and
6
price. Accordingly, an authentication system is needed to combat economically motivated
7
adulteration of Panax products. We conducted comparative analysis of the chloroplast
8
genome sequences of these five species, identifying 34–124 InDels and 141–560 SNPs.
9
Fourteen InDel markers were developed to authenticate the Panax species. Among these,
10
eight were species-unique markers that successfully differentiated one species from the others.
11
We generated at least one species-unique marker for each of the five species, and any of the
12
species can be authenticated by selection among these markers. The markers are reliable,
13
easily detectable, and valuable for applications in the ginseng industry as well as in related
14
research.
15
Keywords: Panax species, Chloroplast genome, Molecular markers, Ginseng authentication
2 ACS Paragon Plus Environment
Page 2 of 30
Page 3 of 30
16
Journal of Agricultural and Food Chemistry
Introduction
17
The Panax genus belongs to the Araliaceae family and contains many important
18
medicinal species, collectively called ‘ginseng’. Of the 14 known species in the Panax genus,
19
five species, Panax ginseng (Asian ginseng), P. quinquefolius (American ginseng), P.
20
notoginseng (Sanchi ginseng; Chinese ginseng), P. japonicus (Bamboo ginseng), and P.
21
vietnamensis (Vietnamese ginseng), are broadly utilized in Korea, the USA, China, Japan,
22
and Vietnam. Each species is well-known as a traditional medicinal plant in oriental countries,
23
and species such as P. ginseng, P. quinquefolius, and P. notoginseng contain
24
protopanaxadiol-type and protopanaxatriol-type saponins1, while other species like P.
25
japonicus and P. vietnamensis contain high quantities of oleanolic acid-type and ocotillol-
26
type saponins, respectively1,2.
27
The high pharmacological and economical value of ginseng means that many
28
economically motivated adulterations (EMAs) of ginseng products have been developed by
29
substituting morphologically similar plant roots, or by mixing different species. Traditionally,
30
the authentication of herb plants was based on morphological and histological inspection.
31
However, these traditional methods are unable to authenticate some Panax species because of
32
their very similar morphological appearances, especially in terms of root shape. For example,
33
P. ginseng and P. quinquefolius, and P. japonicus and P. vietnamensis cannot easily be
34
distinguished from each other. Moreover, many commercial ginseng products are sold in a
35
processed form, such as red ginseng, ginseng powder, liquid extracts, pellets, shredded slices,
36
or even tea, which cannot be authenticated by traditional methods. Ginsenoside profiling
37
methods have been developed to authenticate ginseng samples3. However, the effects of
38
factors such as growth conditions, developmental stage, internal metabolism, storage
39
conditions, and manufacturing processes on secondary metabolite accumulation in ginseng
40
limits the application of such chemical analyses4. Chemical methods are also expensive and
3 ACS Paragon Plus Environment
Journal of Agricultural and Food Chemistry
41
difficult to utilize in high-throughput analysis. Therefore, reliable and practical methods to
42
authenticate ginseng are in high demand.
43
The chloroplast is a plant-specific organelle containing the entire machinery required for
44
photosynthesis and carbon fixation. Chloroplast genomes are generally highly conserved
45
across land plants at the gene level, with a quadripartite structure comprising two inverted
46
repeat blocks (IRs), one large single copy (LSC) region, and one small single copy (SSC)
47
region. As a result of interspecies sequence divergence and intraspecies sequence
48
conservation, the chloroplast genome is valuable for taxonomic classification and phylogeny
49
reconstruction5,6. Lack of recombination, low nucleotide substitution rates, and usually
50
uniparental inheritance make chloroplast genomes valuable sources of genetic markers for
51
phylogenetic analysis and species identification7,8. Chloroplast sequences such as atpF-atpH,
52
matK, psbK-psbI, rbcL, ropC1, rpoB, and trnH-psbA are commonly used as DNA barcodes
53
for plants9-11. In some cases, these sequences were highly efficient for species identification
54
and phylogenetic studies, but they showed low variation in closely related species10.
55
Recently, DNA markers have been developed to authenticate ginseng, including random
56
amplified polymorphic DNA (RAPD)12, microsatellite13, and expressed sequence tag-simple
57
sequence repeat (EST-SSR)14,15 markers, as well as single nucleotide polymorphism (SNP)
58
and insertion and deletion (InDel) markers derived from chloroplast sequences6,16,17.
59
However, thus far, these markers have been used to identify only P. ginseng cultivars at the
60
intraspecies level, and just a few Panax species at the interspecies level.
61
Recently, we developed an efficient method to obtain complete chloroplast genome and
62
nuclear ribosomal DNA (nrDNA) by de novo assembly using low-coverage whole-genome
63
shotgun next-generation sequencing (dnaLCW)18. Using this method, we obtained complete
64
chloroplast genomes and nrDNA for the five Panax species: P. ginseng, P. quinquefolius, P.
65
notoginseng, P. japonicus, and P. vietnamensis6,19. In the present study, we comparatively
4 ACS Paragon Plus Environment
Page 4 of 30
Page 5 of 30
Journal of Agricultural and Food Chemistry
66
analyzed these five chloroplast genome sequences and developed credible chloroplast
67
genome-derived InDel markers to authenticate these Panax species. These markers are
68
valuable tools for the further study of genetic diversity in the Panax genus. They also may be
69
used to support the ginseng industry, which depends on a number of Panax species, for
70
example, P. ginseng in Korea, China, and Japan, P. quinquefolius in the USA, Canada, and
71
China, P. notoginseng in China, and P. vietnamensis in Vietnam.
72 73
Materials and Methods
74
Plant materials and Genomic DNA extraction
75
P. ginseng (cultivars ‘Chunpoong’ (CP) and ‘Yunpoong’ (YP)) and P. quinquefolius
76
plants were collected from the ginseng farm at Seoul National University in Suwon, Korea. P.
77
notoginseng and P. japonicus plants were collected from Dafang County, Guizhou Province,
78
and Enshi County, Hubei Province, China, respectively. P. vietnamensis plants were collected
79
from Ngoc Linh Mountain, Kon Tum Province, Vietnam. Individual leaves and roots of
80
plants from each species were harvested and stored at −70°C until use. Total genomic DNA
81
was isolated using a modified cetyltrimethylammonium bromide (CTAB) method20. The
82
quality and quantity of extracted genomic DNA was measured using a UV-spectrophotometer.
83
Comparative analysis and characterization of SSRs and large sequence repeats
84
The chloroplast genome sequences of P. ginseng cv. CP (KM088019), P. ginseng cv. YP
85
(KM088020), P. quinquefolius (KM088018), P. notoginseng (KP036468), P. japonicus
86
(KP036469), and P. vietnamensis (KP036471) were obtained from our previous studies6,19.
87
MAFFT
88
(http://genome.lbl.gov/vista/mvista/submit.shtml) programs were used to compare these
89
sequences.
(http://mafft.cbrc.jp/alignment/server/),
MEGA
5 ACS Paragon Plus Environment
621,
and
mVISTA
Journal of Agricultural and Food Chemistry
90
SSRs were predicted using MISA (http://pgrc.ipk-gatersleben.de/misa/misa.html)22 with
91
the parameters set to ≥10 repeat units for mononucleotide SSRs, ≥5 repeat units for
92
dinucleotide SSRs, ≥4 repeat units for trinucleotide SSRs, and ≥3 repeat units for
93
tetranucleotide, pentanucleotide, and hexanucleotide SSRs.
94
The program REPuter23 was used to identify the number and location of repeat sequences,
95
including tandem, dispersed, reverse, and palindromic repeats within chloroplast genomes.
96
For all repeat types, the following constraints were set to a minimum repeat size of 15 bp, and
97
90% minimum cut-off identity between two copies. Overlapping repeats were merged into
98
one repeat motif whenever possible.
99
Development and validation of InDel markers
100
To validate intraspecies and interspecies polymorphism in the chloroplast genomes, and
101
develop DNA markers to authenticate the five major Panax species, specific primers were
102
designed based on InDel polymorphic sites found in Panax chloroplast genomes, including P.
103
ginseng cv. YP (KM088020) as an intraspecies level. Primer pairs were designed using the
104
Primer3 program (http://bioinfo.ut.ee/primer3-0.4.0/).
105
The polymerase chain reaction (PCR) was performed in a 25 µl reaction mixture
106
containing 20 ng of DNA template, 5 pmol of each primer, 1.25 mM deoxynucleotide
107
triphosphate (dNTP), 1.25 units of Taq DNA polymerase (Inclone, Korea), and 2.5 µl of 10×
108
reaction buffer. The PCR reaction was performed in thermocyclers using the following
109
cycling parameters: 94°C (5 min); 35 cycles of 94°C (30 s), 54–58°C (30 s); 72°C (30 s),
110
then 72°C (7 min). PCR products were visualized on agarose gels (1.5–3.0%) after staining
111
with ethidium bromide, and/or analyzed by capillary electrophoresis using a fragment
112
analyzer (Advanced Analytical Technologies Inc., USA).
113 114
Results
6 ACS Paragon Plus Environment
Page 6 of 30
Page 7 of 30
115
Journal of Agricultural and Food Chemistry
Structure of complete chloroplast genomes of five Panax species
116
Complete chloroplast genomes of five Panax species were obtained by dnaLCW and
117
reported in our previous studies6,19. Lengths of the chloroplast genomes ranged from 155,993
118
bp (P. vietnamensis) to 156,466 bp (P. notoginseng). The order, content, and orientation of
119
genes were highly conserved among these five chloroplast genomes, comprising 79 protein-
120
coding genes, 30 tRNA genes, and 4 rRNA genes in common. Each chloroplast genome had
121
the same quadripartite structure with an LSC, an SSC, and two IR regions (Fig. 1).
122
Variations in the copy numbers of SSRs were identified among the five Panax chloroplast
123
genomes. The longest SSRs were 18 nucleotides in length, and the most abundant nucleotides
124
in SSRs were A and T (Table 1). P. vietnamenis and P. notoginseng both contained 42 SSRs;
125
lower than P. japonicus (45), and higher than P. ginseng cv. CP (38) and P. quinquefolius (40)
126
(Table 1). The P. japonicus chloroplast genome has the highest number of homopolymers
127
(24), but no hexapolymers. P. vietnamensis, P. notoginseng, and P. ginseng cv. CP all had 6
128
dipolymers; lower than P. japonicus and P. quinquefolius (7). P. notoginseng had 4
129
tripolymers; more than each of the other Panax species (3). P. vietnamensis had 10
130
tetrapolymers; more than the other Panax species (8). P. japonicus and P. notoginseng had 3
131
pentapolymers each, while the other Panax species contained 2 (Table 1).
132
For comparative analysis, repetitive sequences were grouped into four categories: tandem,
133
dispersed, palindromic, and reverse. To avoid redundancy, repeat sequence analysis of each
134
chloroplast genome was carried out with a single IR region. A total of 45–50 repetitive
135
sequences were identified in each of the five Panax chloroplast genomes (Fig. 2B), including
136
dispersed repeats (44.26%), palindromic repeats (29.79%), tandem repeats (22.13%), and
137
reverse repeats (3.83%). Most repeats were located in intergenic spaces (IGS) (53.83%) and
138
coding sequence (CDS) regions (38.94%); a small number of repeats (7.23%) were found in
139
intron regions (Fig. 2C). Across the five chloroplast genomes, repeat lengths ranged from
7 ACS Paragon Plus Environment
Journal of Agricultural and Food Chemistry
140
17 bp to 67 bp (Fig. 2A). The 67-bp tandem repeat found in P. japonicus was the longest
141
repeat (Fig. 2A). Palindromic and reverse repeats ranged from 17–35 bp and 17–28 bp,
142
respectively (Fig. 2A). Comparing the length and location, 32 repeats were shared by all five
143
Panax species, and 5 repeats were present in four chloroplast genomes. P. notoginseng had
144
the most unique repeats (9), followed by P. ginseng and P. vietnamensis (6), P. japonicus (3),
145
and P. quinquefolius (1) (Fig. 2D).
146
Divergence of chloroplast genomes among Panax species
147
To investigate chloroplast genome divergence among the five Panax species, multiple
148
alignments of six chloroplast genomes were performed with a single IR region. The sequence
149
identity of six chloroplast genomes was plotted using the mVISTA program, using the Panax
150
ginseng cv. CP annotation as the reference (Fig. 3). Two cultivating varieties of ginseng, CP
151
and YP, were almost identical with 3 InDels and 1 SNP overall (Table 2). Five Panax species
152
also showed high sequence similarity with each other (98.9%) (Fig. 3), suggesting that Panax
153
chloroplast genomes are well conserved. Our results are consistent with earlier research on
154
Araliaceae chloroplast genomes8. As expected, coding regions are more conserved than non-
155
coding regions. The most highly conserved chloroplast genome coding regions are the four
156
rRNA and 30 tRNA genes (Fig. 3). The most divergent coding region is the ycf1 gene, which
157
has low sequence identity because of various InDels and repeat sequences (Fig. 3). This has
158
been reported in previous research on other chloroplast genomes8,24.
159
Although chloroplast genome diversity at the intraspecies level is low, abundant
160
polymorphism was identified between species. At the interspecies level, the number of SNPs
161
ranged between 141 (P. ginseng versus P. quinquefolius) and 560 (P. ginseng versus P.
162
vietnamensis), and the number of InDels ranged between 34 (P. ginseng versus P.
163
quinquefolius) and 124 (P. notoginseng versus P. vietnamensis) (Table 2). A/T SNP
164
substitutions were more frequent than other types, in agreement with previous studies5,25.
8 ACS Paragon Plus Environment
Page 8 of 30
Page 9 of 30
Journal of Agricultural and Food Chemistry
165
Fewer substitutions and InDels were found between P. ginseng and P. quinquefolius than
166
between the other three Panax species. The ratios of nucleotide substitution events to InDel
167
events (S/I) for different pairwise comparisons between species ranged between 4.06 (P.
168
ginseng versus P. quinquefolius) and 5.64 (P. quinquefolius versus P. japonicus) (Table 2).
169
S/I ratios are thought to increase with divergence time between genomes26. Our results
170
indicate that P. ginseng is more closely related to P. quinquefolius (S/I = 4.06), and that P.
171
quinquefolius is more highly divergent from P. japonicus (S/I = 5.64), compared to others.
172
Development of InDel markers to identify Panax species
173
Based on chloroplast genome sequence alignment, the 14 most InDel-variable loci were
174
selected to develop 14 potentially discriminate markers (Table 3; Fig. 1). Each of these 14
175
markers were successfully amplified by PCR, and each showed the expected polymorphic
176
band sizes. Eight of these 14 markers had unique amplicon sizes specific to different Panax
177
species (Table 4).
178
The marker gcpm2 was specific to both P. ginseng cultivars (CP and YP), and was
179
derived from a 33-bp tandem repeat (TR) in the rps16–trnQ-UUG region (Table 4; Fig. 4B).
180
The P. notoginseng-specific markers gcpm3, gcpm8, and gcpm10 were derived from a 25-bp
181
TR in the atpH–atpI region, a 38-bp TR in the petA–psbJ region, and a 25-bp TR in the
182
rpl14–rpl16 region, respectively (Table 4; Fig. 4C, H, K). The gcpm4 marker was derived
183
from a 23-bp TR in the rps2–rpoC2 region, and was specific to P. quinquefolius (Table 4; Fig.
184
4D). The marker gcpm6 was derived from a 67-bp TR and a 19-bp InDel in the trnE–trnT
185
region, and was specific to P. japonicus (Table 4; Fig. 4F). Finally, the markers gcpm9 and
186
gcpm14 were derived from a 25-bp TR and a 6-bp SSR in the clpP–psbB region, and a 30-bp
187
InDel in ycf1, respectively, and were specific to P. vietnamensis (Table 4; Fig. 4I, O).
188
Validation results revealed that six markers, gcpm1, 5, 7, 11, 12, and 13, were able to
189
identify more than two species, of which gcpm12 (derived from a 57-bp TR in the ycf1 gene)
9 ACS Paragon Plus Environment
Journal of Agricultural and Food Chemistry
190
was the most variable (Table 4; Fig. 4M). In addition, gcpm12 also distinguished between the
191
two P. ginseng cultivars (Table 4; Fig. 4M).
192 193
Discussion
194
Repetitive sequences in Panax chloroplast genomes
195
Repeat structure plays an important role in genomic rearrangement and divergence in
196
chloroplast genomes via illegitimate recombination and slipped-strand mispairing27,28. SSRs
197
are direct tandem repeated DNA sequences consisting of short (1–10 bp) nucleotide motifs.
198
The highly polymorphic, non-recombinant and uniparentally inherited nature of SSRs means
199
they are often used as genetic molecular markers for population genetics29,30, and the study of
200
ecology and evolution. In this study, we found 207 SSRs (Table 1), varying in number and
201
type between five major Panax species.
202
It is considered that species divergence in chloroplast genomes is mostly caused by
203
repetitive sequences. Indeed, we found many repetitive sequences associated with
204
polymorphic chloroplast sites among the five major Panax species studied. Among the many
205
repeats in the ycf3 gene, the most divergent target region was a 57-bp TR. TRs such as this
206
can be used to develop molecular markers for the identification and authentication of Panax
207
species6.
208
Comparative analysis of Panax chloroplast genomes
209
Polymorphism at the intraspecies level polymorphism is very low compared to that at the
210
interspecies level. In P. ginseng, 12 chloroplast genomes derived from different cultivating
211
varieties revealed over 99.9% sequence similarity, and only 6 InDels and 6 SNPs6.
212
Meanwhile, of a total of 201 InDels and 962 SNPs identified among five Panax chloroplast
213
genome sequences, 34–124 InDels and 141–560 SNPs were shared between two Panax
10 ACS Paragon Plus Environment
Page 10 of 30
Page 11 of 30
Journal of Agricultural and Food Chemistry
214
species (Table 2). However, the chloroplast genome is highly conserved within Panax species,
215
and has high similarity (≥98.9%) at the nucleotide sequence level.
216
In chloroplast genomes, nucleotide substitution has been used to examine plant evolution
217
and genome differentiation between species7,17. These InDel events are mainly attributed to
218
the repetition of an adjacent sequence, probably resulting from slipped-strand mispairing in
219
DNA replication31. InDels play a major role in genome size evolution and are increasingly
220
used in phylogenetic studies32,33. S/I ratios have been reported to increase with divergence
221
time between genomes26. In this study, we identified many SNPs and InDels from the
222
complete chloroplast genomes of five major Panax species; S/I ratios ranged between 4.06 (P.
223
ginseng versus P. quinquefolius) and 5.64 (P. quinquefolius versus P. japonicus) (Table 2).
224
Along with their similar morphologies and distributions, this indicates that P. ginseng is
225
closely related to P. quinquefolius (S/I = 4.06), and is consistent with previous reports34,35. P.
226
quinquefolius is highly divergent from P. japonicus compared to others (S/I = 5.64).
227
Use of molecular markers to authenticate ginseng species
228
Biological diversity is a valuable and vulnerable natural resource. The first steps towards
229
protecting and benefiting from national biodiversity are to sample, identify, and study
230
biological specimens. The use of molecular markers (i.e., DNA barcoding) has a powerful
231
role to play in attaining many of the Millennium Development Goals, and reaching the
232
objectives of the Convention on Biological Diversity, by sustaining natural resources,
233
protecting endangered species, and identifying pests and pathogens at any life stage so as to
234
more easily control them. DNA barcoding can also help to monitor food, water and
235
environment quality by studying contaminating organisms.
236
Although relatively new, the use of molecular markers is well on its way to being
237
accepted as a global standard for species identification, and will play a major role in the
238
future of taxonomy36. DNA-based species identification offers enormous potential for the
11 ACS Paragon Plus Environment
Journal of Agricultural and Food Chemistry
239
biological scientific community, educators, and the interested public. The complete
240
chloroplast genome has a conserved sequence from 110–160 kbp, which far exceeds the
241
length of commonly used molecular markers and provides greater variation to discriminate
242
between closely related species37. The chloroplast genome is smaller in size and hundreds
243
times of higher copy numbers in a cell compared with the nuclear genome and has solid
244
interspecific divergence with lower intraspecific variation, which makes chloroplast genome-
245
based marker is suitable as DNA barcoding target37.
246
It is important to be able to authenticate natural health products (NHPs) from legal,
247
economic, health and conservation viewpoints. NHPs are often trusted by the public to be
248
safe; however, adulterated, counterfeit and low quality products can seriously threaten
249
consumer safety38,39. Ginseng plants have high economic and medicinal value, thus there are
250
many EMA ginseng products. The use of molecular methods to accurately identify the origins
251
of ginseng products is important to the development of the ginseng industry in those countries
252
where ginseng is cultivated, such as Korea, the USA, China, Japan, and Vietnam. Recently,
253
the chloroplast genome sequences rbcL, matK, trnH-psbA, and trnL-trnF, and a nuclear
254
internal transcribed spacer (ITS) of nuclear 45S ribosomal RNA genes, have successfully
255
been used as molecular markers for several plant species9,40, but these loci have little
256
variability in Panax36. Although many studies have sought to authenticate ginseng13-15,41,
257
application remains limited because of a lack of genome information and comprehensive
258
comparative genomic analysis against related Panax species.
259
In this study, we developed chloroplast-derived, species-specific markers for each of five
260
major Panax species. Among 14 molecular markers, three markers were specific to P.
261
notoginseng, and two were specific to P. vietnamensis. P. ginseng, P. quinquefolius and P.
262
japonicus could also each be identified by species-specific markers (Table 4; Fig. 4). We
263
discovered different 57-bp tandem repeats in the ycf1 gene of different Panax chloroplast
12 ACS Paragon Plus Environment
Page 12 of 30
Page 13 of 30
Journal of Agricultural and Food Chemistry
264
genomes, and this polymorphism proved powerful in the identification of Panax species
265
(Table 4; Fig. 4M). However, the gcpm12 marker, derived from the ycf1 gene, unexpectedly
266
produced two amplified bands from P. notoginseng, of which the A allele was the expected
267
allele of the P. ginseng type; allele D was not expected, but was same as the allele of P.
268
Vietnamese, although all other species gave rise to a single band (Table 4; Fig. 4A, M). We
269
assume that these unexpected bands are derived from heteroplasmy of the target sequences
270
caused by different IRA and IRB regions related to ycf1 gene sequences; alternatively, they
271
could be caused by chloroplast DNAs transferred into the nuclear or mitochondrial
272
genomes42-45. In our previous study6, we also found intra-species polymorphism in P. ginseng
273
at this position, indicating that, although highly informative, the marker designed from this
274
region may be confusing for species authentication. Overall, we suggest that intra-species
275
polymorphism and a combination of several markers should be considered for credible
276
authentication between different species.
277 278
Abbreviation Used
279
InDel, insertion or deletion; SNP, single nucleotide polymorphism; DNA, deoxyribonucleic
280
acid; bp, base pair.
281 282
Funding
283
This work was supported by: the Cooperative Research Program for Agriculture Science &
284
Technology Development of the Rural Development Administration (grant number
285
PJ01100801); the Ministry of Food and Drug Safety (grant number 16172MFDS229,
286
awarded in 2016); and the Bio & Medical Technology Development Program of the NRF,
287
MSIP (grant number NRF-2015M3A9A5030733), Republic of Korea.
288
13 ACS Paragon Plus Environment
Journal of Agricultural and Food Chemistry
289
References
290
1. Zhu, S.; Zou, K.; Fushimi, H.; Cai, S.; Komatsu, K. Comparative study on triterpene
291 292 293
saponins of Ginseng drugs. Planta Med. 2004, 70, 666-677. 2. Yamasaki, K. Bioactive saponins in Vietnamese ginseng, Panax vietnamensis. Pharm Biol. 2000, 38, 16-24.
294
3. Chan, T.; But, P.; Cheng, S.; Kwok, I.; Lau, F.; Xu, H. Differentiation and authentication
295
of Panax ginseng, Panax quinquefolius, and ginseng products by using HPLC/MS. Anal
296
Chem. 2000, 72, 1281-1287.
297 298
4. Ngan, F.; Shaw, P.; But, P.; Wang, J. Molecular authentication of Panax species. Phytochemistry. 1999, 50, 787-791.
299
5. Huang, H.; Shi, C.; Liu, Y.; Mao, S. Y.; Gao, L. Z. Thirteen Camellia chloroplast genome
300
sequences determined by high-throughput sequencing: genome structure and phylogenetic
301
relationships. BMC Evol Biol. 2014, 14, 151.
302
6. Kim, K.; Lee, S. C.; Lee, J.; Lee, H. O.; Joh, H. J.; Kim, N. H.; Park, H. S.; Yang, T. J.
303
Comprehensive survey of genetic diversity in chloroplast genomes and 45S nrDNAs
304
within Panax ginseng species. PloS one. 2015, 10.
305
7. Wolfe, K. H.; Li, W. H.; Sharp, P. M. Rates of nucleotide substitution vary greatly among
306
plant mitochondrial, chloroplast, and nuclear DNAs. Proc Natl Acad Sci. 1987, 84, 9054-
307
9058.
308 309
8. Li, R.; Ma, P. F.; Wen, J.; Yi, T. S. Complete sequencing of five Araliaceae chloroplast genomes and the phylogenetic implications. PloS one. 2013, 8, e78568.
310
9. Hollingsworth, P. M.; Forrest, L. L.; Spouge, J. L.; Hajibabaei, M.; Ratnasingham, S.; van
311
der Bank, M.; Chase, M. W.; Cowan, R. S.; Erickson, D. L.; Fazekas, A. J. A DNA
312
barcode for land plants. Proc Natl Acad Sci. 2009, 106, 12794-12797.
14 ACS Paragon Plus Environment
Page 14 of 30
Page 15 of 30
Journal of Agricultural and Food Chemistry
313
10. Dong, W.; Liu, J.; Yu, J.; Wang, L.; Zhou, S. Highly variable chloroplast markers for
314
evaluating plant phylogeny at low taxonomic levels and for DNA barcoding. PloS one.
315
2012, 7, e35071.
316
11. Elansary, H. O.; Ashfaq, M.; Ali, H. M.; Yessoufou, K. The first initiative of DNA
317
barcoding of ornamental plants from Egypt and potential applications in horticulture
318
industry. PloS one 2017, 12, e0172170.
319
12. Artyukova, E.; Kozyrenko, M.; Reunova, G.; Muzarok, T.; Zhuravlev, Y. N. RAPD
320
analysis of genome variability of planted ginseng, Panax ginseng. Mol Biol. 2000, 34,
321
297-302.
322
13. Ma, K. H.; Dixit, A.; Kim, Y. C.; Lee, D. Y.; Kim, T. S.; Cho, E. G.; Park, Y. J.
323
Development and characterization of new microsatellite markers for ginseng (Panax
324
ginseng CA Meyer). Conservation Genetics. 2007, 8, 1507-1509.
325
14. Kim, N. H.; Choi, H. I.; Ahn, I. O.; Yang, T. J. EST-SSR marker sets for practical
326
authentication of all nine registered ginseng cultivars in Korea. J Ginseng Res. 2012, 36,
327
298.
328
15. Choi, H. I.; Kim, N. H.; Kim, J. H.; Choi, B. S.; Ahn, I. O.; Lee, J. S.; Yang, T. J.
329
Development of reproducible EST-derived SSR markers and assessment of genetic
330
diversity in Panax ginseng cultivars and related species. J Ginseng Res. 2011, 35, 399.
331
16. Jung, J.; Kim, K. H.; Yang, K.; Bang, K. H.; Yang, T. J. Practical application of DNA
332
markers for high-throughput authentication of Panax ginseng and Panax quinquefolius
333
from commercial ginseng products. J Ginseng Res. 2014, 38, 123-129.
334
17. Kim, J. H.; Jung, J. Y.; Choi, H. I.; Kim, N. H.; Park, J. Y.; Lee, Y.; Yang, T. J. Diversity
335
and evolution of major Panax species revealed by scanning the entire chloroplast
336
intergenic spacer sequences. Genet Resour Crop Ev. 2013, 60, 413-425.
15 ACS Paragon Plus Environment
Journal of Agricultural and Food Chemistry
337
18. Kim, K.; Lee, S. C.; Lee, J.; Yu, Y.; Yang, K.; Choi, B. S.; Koh, H. J.; Waminal, N. E.;
338
Choi, H. I.; Kim, N. H. et al. Complete chloroplast and ribosomal sequences for 30
339
accessions elucidate evolution of Oryza AA genome species. Sci Rep. 2015, 5, 15655.
340
19. Kim, K.; Nguyen, V. B.; Dong, J. Z.; Wang, Y.; Park, J. Y.; Lee, S. C.; Yang, T. J.
341
Evolution of the Araliaceae family inferred from complete chloroplast genomes and 45S
342
nrDNAs of 10 Panax-related species. Sci Rep. 2016 (In press).
343
20. Allen, G.; Flores-Vergara, M.; Krasynanski, S.; Kumar, S.; Thompson, W. A modified
344
protocol for rapid DNA isolation from plant tissues using cetyltrimethylammonium
345
bromide. Nat Protoc. 2006, 1, 2320-2325.
346 347
21. Tamura, K.; Stecher, G.; Peterson, D.; Filipski, A.; Kumar, S. MEGA6: molecular evolutionary genetics analysis version 6.0. Mol Biol Evol. 2013, 30, 2725-2729.
348
22. Thiel, T.; Michalek, W.; Varshney, R.; Graner, A. Exploiting EST databases for the
349
development and characterization of gene-derived SSR-markers in barley (Hordeum
350
vulgare L.). Theor Appl Genet. 2003, 106, 411-422.
351
23. Kurtz, S.; Choudhuri, J. V.; Ohlebusch, E.; Schleiermacher, C.; Stoye, J.; Giegerich, R.
352
REPuter: the manifold applications of repeat analysis on a genomic scale. Nucleic acids
353
Res. 2001, 29, 4633-4642.
354
24. Nie, X.; Lv, S.; Zhang, Y.; Du, X.; Wang, L.; Biradar, S. S.; Tan, X.; Wan, F.; Weining, S.
355
Complete chloroplast genome sequence of a major invasive species, crofton weed
356
(Ageratina adenophora). PloS one. 2012, 7, E36869.
357
25. Xu, D.; Abe, J.; Gai, J.; Shimamoto, Y. Diversity of chloroplast DNA SSRs in wild and
358
cultivated soybeans: evidence for multiple origins of cultivated soybean. Theor Appl
359
Genet. 2002, 105, 645-653.
16 ACS Paragon Plus Environment
Page 16 of 30
Page 17 of 30
Journal of Agricultural and Food Chemistry
360
26. Chen, J. Q.; Wu, Y.; Yang, H.; Bergelson, J.; Kreitman, M.; Tian, D. Variation in the
361
ratio of nucleotide substitution and indel rates across genomes in mammals and bacteria.
362
Mol Biol Evol. 2009, 26, 1523-1531.
363
27. Kim, K. J.; Lee, H. L. Complete chloroplast genome sequences from Korean ginseng
364
(Panax schinseng Nees) and comparative analysis of sequence evolution among 17
365
vascular plants. DNA Res. 2004, 11, 247-261.
366
28. Asano, T.; Tsudzuki, T.; Takahashi, S.; Shimada, H.; Kadowaki, K. I. Complete
367
nucleotide sequence of the sugarcane (Saccharum officinarum) chloroplast genome: a
368
comparative analysis of four monocot chloroplast genomes. DNA Res. 2004, 11, 93-99.
369
29. Doorduin, L.; Gravendeel, B.; Lammers, Y.; Ariyurek, Y.; Chin-A-Woeng, T.; Vrieling,
370
K. The complete chloroplast genome of 17 individuals of pest species Jacobaea vulgaris:
371
SNPs, microsatellites and barcoding markers for population and phylogenetic studies.
372
DNA Res. 2011, dsr002.
373
30. He, S.; Wang, Y.; Volis, S.; Li, D.; Yi, T. Genetic diversity and population structure:
374
implications for conservation of wild soybean (Glycine soja Sieb. et Zucc) based on
375
nuclear and chloroplast microsatellite variation. Int J Mol Sci. 2012, 13, 12608-12628.
376
31. Leseberg, C. H.; Duvall, M. R. The complete chloroplast genome of Coix lacryma-jobi
377
and a comparative molecular evolutionary analysis of plastomes in cereals. J Mol Evol.
378
2009, 69, 311-318.
379 380 381 382
32. Grover, C. E.; Yu, Y.; Wing, R. A.; Paterson, A. H.; Wendel, J. F. A phylogenetic analysis of indel dynamics in the cotton genus. Mol Biol Evol. 2008, 25, 1415-1428. 33. Britten, R. J.; Rowen, L.; Williams, J.; Cameron, R. A. Majority of divergence between closely related DNA samples is due to indels. Proc Natl Acad Sci. 2003, 100, 4661-4665.
17 ACS Paragon Plus Environment
Journal of Agricultural and Food Chemistry
383
34. Nguyen, B.; Kim, K.; Kim, Y. C.; Lee, S. C.; Shin, J. E.; Lee, J.; Kim, N. H.; Jang, W.;
384
Choi, H. I.; Yang, T. J. The complete chloroplast genome sequence of Panax
385
vietnamensis Ha et Grushv (Araliaceae). Mitochondrial DNA. 2015, 1-2.
386
35. Zhu, S.; Fushimi, H.; Cai, S.; Chen, H.; Komatsu, K. A new variety of the genus Panax
387
from southern Yunnan, China and its nucleotide sequences of 18S ribosomal RNA gene
388
and matK gene. J Jap Bot. 2003, 78, 86-94.
389 390 391 392 393 394
36. Zuo, Y.; Chen, Z.; Kondo, K.; Funamoto, T.; Wen, J.; Zhou, S. DNA barcoding of Panax species. Planta Med. 2011, 77, 182. 37. Li, X.; Yang, Y.; Henry, R. J.; Rossetto, M.; Wang, Y.; Chen, S. Plant DNA barcoding: from gene to genome. Biol Rev. 2015, 90, 157-166. 38. Mine, Y.; Young, D. Regulation of natural health products in Canada. Food Sci Technol Res. 2009, 15, 459-468.
395
39. Wallace, L. J.; Boilard, S. M.; Eagle, S. H.; Spall, J. L.; Shokralla, S.; Hajibabaei, M.
396
DNA barcodes for everyday life: Routine authentication of Natural Health Products. Food
397
Res Int. 2012, 49, 446-452.
398
40. Li, D. Z.; Gao, L. M.; Li, H. T.; Wang, H.; Ge, X. J.; Liu, J. Q.; Chen, Z. D.; Zhou, S. L.;
399
Chen, S. L.; Yang, J. B. Comparative analysis of a large dataset indicates that internal
400
transcribed spacer (ITS) should be incorporated into the core barcode for seed plants.
401
Proc Natl Acad Sci. 2011, 108, 19641-19646.
402
41. Chen, X.; Liao, B.; Song, J.; Pang, X.; Han, J.; Chen, S. A fast SNP identification and
403
analysis of intraspecific variation in the medicinal Panax species based on DNA
404
barcoding. Gene. 2013, 530, 39-43.
405
42. Massouh, A.; Schubert, J.; Yaneva-Roder, L.; Ulbricht-Jones, E. S.; Zupok, A.; Johnson,
406
M. T.; Wright, S.; Pellizzer, T.; Sobanski, J.; Bock, R. Spontaneous chloroplast mutants
18 ACS Paragon Plus Environment
Page 18 of 30
Page 19 of 30
Journal of Agricultural and Food Chemistry
407
mostly occur by replication slippage and show a biased pattern in the plastome of
408
Oenothera. The Plant Cell. 2016, 28, 911-29.
409
43. Park, S.; Ruhlman, T. A.; Sabir, J. S.; Mutwakil, M. H.; Baeshen, M. N.; Sabir, M. J.;
410
Baeshen, N. A.; Jansen, R. K. Complete sequences of organelle genomes from the
411
medicinal plant Rhazya stricta (Apocynaceae) and contrasting patterns of mitochondrial
412
genome evolution across asterids. BMC genomics. 2014, 15, 405.
413
44. Goremykin, V. V.; Salamini, F.; Velasco, R.; Viola, R. Mitochondrial DNA of Vitis
414
vinifera and the issue of rampant horizontal gene transfer. Mol Biol Evol. 2009, 26, 99-
415
110.
416
45. Matsuo, M.; Ito, Y.; Yamauchi, R.; Obokata, J., The rice nuclear genome continuously
417
integrates, shuffles, and eliminates the chloroplast genome to cause chloroplast–nuclear
418
DNA flux. The Plant Cell. 2005, 17, 665-675.
419
420
19 ACS Paragon Plus Environment
Journal of Agricultural and Food Chemistry
421
FIGURE CAPTIONS
422
Fig. 1. Complete chloroplast genomes of five Panax species.
423
Colored boxes show conserved chloroplast genes, classified based on product function.
424
Genes shown inside the circle are transcribed clockwise, and those outside the circle are
425
transcribed counterclockwise. Genes belonging to different functional groups are color-coded.
426
Dashed area in the inner circle indicates the GC content of the chloroplast genome; blue
427
triangles indicate the positions of 14 markers (gcpm1–gcpm14) on the chloroplast genomes.
428
Fig. 2. Repeat structure analysis of the five Panax species chloroplast genomes.
429
(A) Histogram showing the frequency of repeats by length in the five Panax chloroplast
430
genomes.
431
(B) Histogram showing the number of four repeat types in each Panax chloroplast genome.
432
(C) Location of the 235 repeats on the five Panax species chloroplast genomes.
433
(D) Venn diagram showing the repeats shared among the five Panax species.
434
Fig. 3. Comparison of chloroplast genome sequences of five Panax species.
435
Pair-wise comparison of chloroplast genomes between Panax species using the mVISTA
436
program with P. ginseng cv. CP as the reference. Genome regions are color-coded as protein
437
coding (purple), rRNA or tRNA coding genes (blue), and noncoding sequences (pink).
438
Fig. 4. Validation of 14 molecular markers derived from InDel regions of five Panax
439
chloroplast genomes.
440
Schematic diagrams indicate InDel regions between Panax species. Tandem repeats and
441
inserted sequences are designated by pentagons and diamonds, respectively. Dotted and solid
442
lines indicate deleted and conserved sequences; left and right black arrows indicate forward
443
and reverse primers, respectively. The 14 InDel markers are denoted gcpm1 to gcpm14.
444
Different alleles are shown via capillary electrophoresis (A – F, I, N), and agarose gel
445
electrophoresis (G, H, K – M, O). Abbreviated species names shown on schematic diagrams 20 ACS Paragon Plus Environment
Page 20 of 30
Page 21 of 30
Journal of Agricultural and Food Chemistry
446
and amplicons: PgCP, P. ginseng cv. CP; PgYP, P. ginseng cv. YP; Pq, P. quinquefolius; Pn,
447
P. notoginseng; Pj, P. japonicus; Pv, P. vietnamensis; M, 100-bp DNA ladder.
21 ACS Paragon Plus Environment
Journal of Agricultural and Food Chemistry
Page 22 of 30
Table 1. SSRs in the five Panax chloroplast genomes. Number of SSRs Motif
Repeat units
PgCP
Pq
Pn
Pj
Pv
A/T
14
16
19
17
17
C/G
4
3
1
7
3
AT/AT
6
7
6
7
6
AAG/CTT
1
1
1
1
1
AAT/ATT
1
1
3
2
2
AGC/CTG
1
1
-
-
-
AAAG/CTTT
3
3
3
3
3
AAAT/ATTT
2
2
2
2
4
AATT/AATT
1
1
1
1
1
ACCT/AGGT
2
2
2
2
2
AATCT/AGATT
2
2
2
2
2
AAAAT/ATTTT
-
-
1
1
-
AGATAT/ATATCT
-
-
-
-
1
ACTATG/AGTCAT
1
1
1
-
-
38
40
42
45
42
Mononucleotide Dinucleotide
Trinucleotide
Tetranucleotide
Pentanucleotide
Hexanucleotide Total SSRs
Abbreviations: PgCP, P. ginseng cv. CP; Pq, P. quinquefolius; Pn, P. notoginseng; Pj, P. japonicus; Pv, P. vietnamensis; SSR, simple sequence repeat.
22 ACS Paragon Plus Environment
Page 23 of 30
Journal of Agricultural and Food Chemistry
Table 2. Numbers and ratios of nucleotide substitutions and InDels among chloroplast genomes of five Panax species. PgCP
PgYP
Pq
Pn
Pj
Pv
PgCP
/
1
141
480
511
559
PgYP
3 (0.33)
/
142
481
512
560
Pq
34 (4.15)
35 (4.06)
/
509
508
542
Pn
101 (4.75)
98 (4.91)
110 (4.63)
/
484
548
Pj
92 (5.55)
93 (5.51)
90 (5.64)
111 (4.36)
/
331
102 (5.48) 103 (5.44) 104 (5.21) 124 (4.42) 74 (4.47) / Pv Note: The upper triangle shows the total nucleotide substitutions, while the lower triangle indicates the number of InDels. Ratios of nucleotide substitutions to InDels (S/I) are given in brackets. Abbreviations: PgCP, Panax ginseng cv. CP; PgYP, P. ginseng cv. YP; Pq, P. quinquefolius; Pn, P. notoginseng; Pj, P. japonicas; Pv, P. vietnamensis.
23 ACS Paragon Plus Environment
Journal of Agricultural and Food Chemistry
Table 3. Molecular markers developed to authenticate Panax species. Melting Expected PCR product size (bp) temperature PgCP PgYP Pq Pn Pj Pv (°C) a F: TGGGGGAATAAATAAGGAAGAA 54.7 gcpm1 392 392 393 391 416 430 R: TTAAAGAAGGCGGAGGTTTTTa 54.0 b F: TGGAAAGGCTGTTGTCACTG 53.2 gcpm2 194 194 161 160 161 161 R: TGCCAAGATTGCAGAAAGATTb 53.8 F: TGGTCAATCGCTGAAGAAAAa 53.2 gcpm3 449 449 449 474 449 449 R: CCGCGGCTTATATAGGTGAAa 57.3 F: CTCTTCCAAATTGATGTTCCAAa 56.6 gcpm4 295 295 318 295 295 295 R: TCCATGATACACCAGAACAATCAa 59.3 F: TCCTGAACCACTAGACGATGGa 53.3 gcpm5 457 457 457 465 469 444 R: TTTCGATAACTTCTTGATCCCTCTa 54.3 F: CTCGCACTAAGCTCGGAAATa 57.3 475 475 475 475 561 477 gcpm6 R: ACCATGGCGTTACTCTACCGa 53.9 F: TGGTGTGTTGAATCCACAATa 53.2 gcpm7 297 297 289 322 322 322 R: CCCCCTTTCCAATAATATCCAa 55.9 a F: TCCAAAGAGGAATCCTTCCA 53.3 gcpm8 237 237 237 313 237 237 R: CCAGCCTCTACTGGGGGTTAa 55.3 F: TCCAGGACTTCGAAAGGGTAa 53.4 gcpm9 492 492 492 492 486 511 R: ACACGATACCAAGGCAAACCa 53.7 F: CCGCTGTTATCCGCTACATTa 54.1 gcpm10 227 227 228 257 225 228 R: TCGTCTAAAATGCCTATACGAACTCa 54.9 F: CTGGACGATCCTATTTGGTCAb 53.7 gcpm11 220 220 220 238 220 202 R: TCCAGCTCCGTATGAAGGTCb 53.8 b F: GCAGAATACCGTCACCCATT 53.6 gcpm12 316 373 259 373 259 202 R: ATTTTGTCCGGATCCTCCTTb 53.8 F: AAGATTTTTATGAAGATACCGAAAATa 52.5 gcpm13 484 484 466 461 456 465 R: CGGTCAAATTCGAGGAAAGAa 55.3 F: TGGTTAGTTTCACCGGATTCAb 54.2 208 208 208 208 208 178 gcpm14 R: TTTTTGAGCCCATTTTTAAGGAb 54.6 Abbreviations: PCR, polymerase chain reaction; PgCP, Panax ginseng cv. CP; PgYP, P. ginseng cv. YP; Pq, P. quinquefolius; Pn, P. notoginseng; Pj, P. japonicus; Pv, P. vietnamensis. a Primers developed in our previous study17, and bprimers developed in this study Marker name
Forward (F)/reverse (R) primers
24 ACS Paragon Plus Environment
Page 24 of 30
Page 25 of 30
Journal of Agricultural and Food Chemistry
Table 4. Marker combinations for each Panax species. Marker
Positions Regions CDS/IGS
PgCP
Allele types for each species PgYP Pq Pn Pj
Pv
trnK-UUU–rps16 IGS C C C D B A rps16–trnQ-UUG IGS A A B B B B atpH-atpI IGS B B B A B B atpH-atpI IGS B B A B B B trnE-trnT IGS B B B B A C trnE-trnT IGS B B B B A B rbcL-accD IGS B B C A A A petA-psbJ IGS B B B A B B clpP-psbB IGS B B B B B A rpl14–rpl16 IGS B B B A B B ycf2 CDS B B B A B C ycf1 CDS B A C AD C D ndhF–rpl32 IGS A A B C D C ycf1 CDS A A A A A B Abbreviations: CDS, coding sequences; IGS, intergenic spaces; PgCP, P. ginseng cv. CP; PgYP, P. ginseng cv. YP; Pq, P. quinquefolius; Pn, P. notoginseng; Pj, P. japonicus; Pv, P. vietnamensis. A, B, C, and D refer to different allele types.
gcpm1 gcpm2 gcpm3 gcpm4 gcpm5 gcpm6 gcpm7 gcpm8 gcpm9 gcpm10 gcpm11 gcpm12 gcpm13 gcpm14
25 ACS Paragon Plus Environment
Journal of Agricultural and Food Chemistry
Figure 1
26 ACS Paragon Plus Environment
Page 26 of 30
Page 27 of 30
Journal of Agricultural and Food Chemistry
Figure 2
27 ACS Paragon Plus Environment
Journal of Agricultural and Food Chemistry
Figure 3
28 ACS Paragon Plus Environment
Page 28 of 30
Page 29 of 30
Journal of Agricultural and Food Chemistry
Figure 4
29 ACS Paragon Plus Environment
Journal of Agricultural and Food Chemistry
Table of Contents Graphic
30 ACS Paragon Plus Environment
Page 30 of 30