Subscriber access provided by READING UNIV
Article
NP-StructurePredictor: prediction of unknown natural products in plant mixtures Yeu-Chern Harn, Bo-Han Su, Yuan-Ling Ku, Olivia A. Lin, Cheng-Fu Chou, and Yufeng Jane Tseng J. Chem. Inf. Model., Just Accepted Manuscript • DOI: 10.1021/acs.jcim.7b00565 • Publication Date (Web): 13 Nov 2017 Downloaded from http://pubs.acs.org on November 18, 2017
Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.
Journal of Chemical Information and Modeling is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.
Page 1 of 39
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
1
NP-StructurePredictor: prediction of unknown
2
natural products in plant mixtures
3 4
Yeu-Chern Harn1,2#, Bo-Han Su3#, Yuan-Ling Ku4, Olivia A. Lin5, Cheng-Fu
5
Chou3,and Y. Jane Tseng2,3,5,6*
6
1
7
1 Roosevelt Rd. Sec. 4, Taipei 10617, Taiwan
8
2
9
Syujhou Road, Taipei 10055, Taiwan
Graduate Institute of Networking and Multimedia, National Taiwan University, No.
The Metabolomics Core Laboratory, NTU Center of Genomic Medicine, 7F, No. 2,
10
3
11
University, No. 1 Roosevelt Rd. Sec. 4, Taipei 10617, Taiwan
12
4
13
9, Wuquan Rd., Wugu Dist., New Taipei City 24886, Taiwan
14
5
15
University, No. 1 Roosevelt Rd. Sec. 4, Taipei 10617, Taiwan
16
6
17
Ai Rd. Sec. 1, Taipei 10051, Taiwan
18
#
19
*Corresponding author (Voice: +886.2.3366.4888#529, Fax: +886.2.23628167,
20
[email protected])
Department of Computer Science and Information Engineering, National Taiwan
Medical and Pharmaceutical Industry Technology and Development Center, 7F, No.
Graduate Institute of Biomedical Electronic and Bioinformatics, National Taiwan
Drug Research Center, National Taiwan University College of Medicine, No.1 Jen
Equal contribution
21
-1ACS Paragon Plus Environment
Journal of Chemical Information and Modeling
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
22
Abstract
23
Identification of the individual chemical constituents of a mixture, especially
24
solutions extracted from medicinal plants, is a time-consuming task. The
25
identification results are often limited by challenges such as the development of
26
separation methods and the availability of known reference standards. A novel
27
structure elucidation system, NP-StructurePredictor, is presented and used to
28
accelerate the process of identifying chemical structures in a mixture based on a
29
branch and bound algorithm combined with a large collection of natural product
30
databases. NP-StructurePredictor requires only targeted molecular weights calculated
31
from a list of m/z values from LC-MS experiments as input information to predict the
32
chemical structures of individual components matching the weights in a mixture. NP-
33
StructurePredictor also provides the predicted structures with statistically calculated
34
probabilities so that the most likely chemical structures of the natural products and
35
their analogs can be proposed accordingly. Four datasets consisting of different
36
Chinese herbs with mixtures containing known compounds were selected for
37
validation studies, and all their components were correctly identified and highly
38
predicted using NP-StructurePredictor. NP-StructurePredictor demonstrated its
39
applicability for predicting the chemical structures of novel compounds by returning
40
highly accurate results from four different validation case studies.
41
Keywords
42
liquid chromatography-mass spectrometry (LC-MS), computer-aided structure
43
elucidation (CASE), cheminformatics, chemometrics, natural product determination,
44
branch and bound algorithm
45
-2ACS Paragon Plus Environment
Page 2 of 39
Page 3 of 39
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
46
Introduction
47
One of the central themes in medicinal chemistry and chemical biology research
48
involves the efficient identification of the small molecules that regulate protein
49
functions.1 Moreover, identification of small molecules from plants (natural products)
50
is important because those molecules are among the major sources of inspiration for
51
drug discovery.2 However, determination of chemical constituents in plants is a time-
52
consuming task requiring complex and lengthy procedures. For example, the
53
identification of the components in Ligusticum chuanxiong involves, first and
54
foremost, sample extraction via steam distillation then gas chromatography and mass
55
spectrometry analyses.3 The process may be lengthier when the components to be
56
identified are novel chemical compounds. Although powerful chromatographic and
57
spectroscopic analytical methods may help with the elucidation of novel structures,
58
there is currently no automated method with high-throughput capabilities. 4, 5
59
Generally, systems that aim to automatically propose a list of possible chemical
60
structures for unknown compounds in a mixture based on chromatographic and
61
spectroscopic data are commonly known as computer-assisted methods for structure
62
elucidation (or computer-aided structure elucidation, CASE in short). CASE was
63
developed thirty years ago6-9 to elucidate the chemical structures of small organic
64
molecules. Different algorithms and chemical knowledge, including heuristics rules,10
65
stochastic optimization11 and graph algorithms12 have been used in this field. In the
66
last decade, many advanced algorithms supporting the CASE expert systems were
67
developed to realize the dream of many spectroscopists: fully automated structure
68
elucidation.13-17 However, such methods still cannot replace human intelligence18 and
69
still have many limitations.19 CASE systems largely rely on two-dimensional nuclear
70
magnetic resonance spectroscopy (2D NMR) data as inputs, because they provide -3ACS Paragon Plus Environment
Journal of Chemical Information and Modeling
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
71
abundant structural information.20 However, 2D NMR experiments can be quite time-
72
consuming; typically, data acquisition take hours. Furthermore, if co-eluting
73
compounds in the mixture of interest cannot be completely separated, the CASE
74
system will need other experimental data to assist with the structure elucidation
75
process.21
76
Compared to NMR methods, mass spectrometry (MS) is more sensitive and is
77
therefore a good starting point for identification procedures.22 Furthermore, impurities
78
in input mixtures do not usually impact the results of MS experiments. In the last
79
decade, many MS-based computational methods for automatic identification of small
80
molecules have been developed and were recently reviewed by Scheubert et al..23
81
However, there are no successful structure elucidation methods using MS data alone.
82
Most of these MS-based methods are referred to as “in silico fragmentation spectrum
83
prediction” techniques. Current in silico fragmentation strategies can only be
84
successfully applied to a limited number of classes of molecules, such as lipids,
85
glycans, and alkenes, due to their structural simplicity and homogeneity. Yetukuri et
86
al.24 proposed an approach to predict structures of a specific group of lipids to assist
87
with lipid identification in lipidomics research. Yetukuri and colleagues took
88
advantage of the highly conserved patterns in lipid structures to deduce structures of
89
other lipids using known lipid scaffolds. Once the lipid scaffold is determined,
90
fragments attached to that scaffold can be added to construct more diverse lipids to
91
match unknown lipid signals in the ultra-performance liquid chromatography-mass
92
spectrometry (UPLC-MS) spectra. Although the methods proposed by Yetukuri et al.
93
can identify novel chemical structures in lipids from MS data, its application toward
94
lipid scaffolds is limited. Construction of an accurate and fully automated CASE
95
system to predict unknown structures in mixtures using mass spectra remains a
-4ACS Paragon Plus Environment
Page 4 of 39
Page 5 of 39
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
96
challenge. A part of the inspiration for this study comes from the study of chemical
97
structure scaffold classification.25,
98
construct chemical structure trees based on their scaffolds; in this way, structures can
99
be classified by the trees. Our study combines the advantages of two ideas, namely, to
100
classify related structures by their scaffolds and to predict novel structures by adding
101
different branches to the scaffolds.
26
Schuffenhauer et al.26 proposed a method to
102
In this study, we have developed a computational method, which we named NP-
103
StructurePredictor, that can accelerate the process of identifying chemical structures
104
using
105
StructurePredictor identifies compounds in a mixture by matching the compounds of
106
interest to the most likely chemical structures of natural products and their analogs
107
through a series of analyses. This method can also predict structures that do not exist
108
in the current libraries by combining different scaffolds and side chains and inferring
109
structures from similar scaffolds. For each target molecular weight from the input
110
mass spectra, NP-StructurePredictor returns a list of possible structures and their
111
relative probabilities. The proposed chemical structures with higher rankings are the
112
most likely structural candidates for the unknown compounds in the mixtures. Four
113
complex herbal mixtures with known constituents were used as a validation set in this
114
study, and all their components were successfully predicted using this method.
liquid
chromatography-mass
spectrometry
(LC-MS)
spectra.
NP-
115
There are several major differences between the previous CASE systems and our
116
current NP-StructurePredictor system. First, NP-StructurePredictor requires only
117
experimental data, essentially only the m/z list from the LC-MS spectra, as inputs. It
118
does not require additional NMR or tandem mass spectrometry (MS/MS) spectra for
119
further structural information. Second, the prediction model of NP-StructurePredictor
120
was built using a large collection of natural products; therefore, it is tailored for the
-5ACS Paragon Plus Environment
Journal of Chemical Information and Modeling
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
121
structural prediction of natural products. Finally, NP-StructurePredictor predicts
122
unknown structures with rankings based on the possible combinations of scaffolds
123
and side chains from our large databases of natural products. This ranking ensures
124
NP-StructurePredictor proposes a list of the closest structural matches to the currently
125
known plant-derived natural products.
126
-6ACS Paragon Plus Environment
Page 6 of 39
Page 7 of 39
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
127
Materials and Methods
128
System overview
129
A novel CASE system, NP-StructurePredictor, is presented and used to
130
accelerate the process of identifying chemical structures in a mixture. The system
131
architecture is shown in Figure 1. We needed to first extract a peak table containing
132
processed and aligned mass peaks with molecular weights from LC-MS experiments.
133
In our experiments, we used MAVEN27 and XCMS28 to extract the peak tables. The
134
list of targeted molecular weights (targeted MWs) in the peak table is necessary
135
information for our system. Moreover, if users have knowledge about what potential
136
scaffold structures (seed scaffolds) are likely to be present, NP-StructurePredictor can
137
use this information to predict the exact compounds. When the seed scaffolds for the
138
test material are not provided, NP-StructurePredictor is able to perform a full search
139
on all scaffolds to select suitable seeds in our database to generate suitable candidates.
140
A large natural products database was collected in NP-StructurePredictor, and a side
141
chains database was then constructed from the natural products database. In the
142
procedure of structure elucidation, a branch and bound algorithm was designed to
143
systematically search correct structures with the targeted MW values by linking the
144
appropriate fragments from the side chains database to the input seed scaffolds.
145
Furthermore, a scaffolds database was also constructed from our collected natural
146
products database, and a hierarchical scaffolds tree was constructed to correlate the
147
relationship between all the scaffolds. According to the hierarchical scaffolds tree, the
148
scaffolds which have strong relationship with the input seed scaffold are also used as
149
the input candidates of scaffolds in searching procedure of NP-StructurePredictor. For
150
each peak with a specific molecular weight, NP-StructurePredictor identifies it by
151
returning a list of possible compounds matching that targeted MW, and the resulting -7ACS Paragon Plus Environment
Journal of Chemical Information and Modeling
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 8 of 39
152
compounds are ranked according to their calculated probability of occurrence in
153
nature. Each module in NP-StructurePredictor was described in the later sections.
154
Collection of natural products
155
A
large
natural
products
database
(NPDB)
was
collected
in
NP-
156
StructurePredictor for further construction of scaffolds and side chains databases. The
157
main concept underlying the function of NP-StructurePredictor is that it utilizes the
158
structural information gleaned from three natural products databases to predict
159
possible structures in each experiment. The three natural products databases used here
160
are the Dictionary of Natural Products (DNP),29 “ZINC natural products” subset of
161
ZINC,30 and Traditional Chinese Medicine Database (TCMD, updated at 2010-07-
162
14).31 DNP listed 203615 records, “ZINC natural products” subset of ZINC listed
163
89425 records, and TCMD listed 3897 records. The structure data collected from
164
these three databases were standardized first by ChemAxon Application Programming
165
Interfaces (ChemAxon Kft, Máramaros köz 3/a, Budapest, 1037 Hungary). The
166
standardization included neutralization, removal of valence errors, and retaining the
167
largest fragment. After standardization, all records from the three databases were
168
pooled together and the redundant records were removed. Moreover, since the
169
majority of compounds used in medicinal chemistry and chemical biology research
170
contain rings,32 we only considered the structures with rings in this study (a total of
171
243130 records, of which 226949 records contained rings). The reason for compiling
172
a large NPDB is two-fold; 1) to increase the probability of matching structures from
173
the NPDB in our initial searches, and 2) to learn the diverse structural patterns of the
174
NPDB. The next section will cover our pattern analysis of NPDB.
-8ACS Paragon Plus Environment
Page 9 of 39
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
175
Construction of scaffold trees and generation of side chains
176
Since similar structures could possess similar biological functions, we
177
constructed hierarchical scaffold trees from the collected NPDB to search for similar
178
core structures of the given seed scaffolds to aid in elucidating possible compounds in
179
mixtures. A scaffold relationship database was first constructed by breaking and
180
classifying each structure within the NPDB into separate substructure categories —
181
one major chemical scaffold and several side chains. Then, the classified scaffolds
182
were used to construct the scaffold trees. The definition of scaffolds according to
183
Bemis et al.25 is the remaining core structures after all terminal side chains have been
184
eliminated. However, if the terminal chains are linked by double bonds, the chains are
185
retained. The rule for double bonds ensures that the planar sp2 carbon atoms in the
186
scaffolds are distinguishable from the tetrahedral sp3 carbon atoms. The hierarchical
187
scaffold trees were then constructed using the Scaffold Tree Generator26 to illustrate
188
the structural relationship between all the scaffolds. Each node in the tree denotes a
189
scaffold. The parent-child relationships in the trees were defined such that a parent
190
scaffold is a substructure of the child scaffold. To decide which substructures were the
191
child scaffolds and to preserve substructures with more chemical characteristics,
192
thirteen prioritization rules26 were used to remove side chains. The scaffolds having
193
the same parent are defined as sibling scaffolds, and thus, all the sibling scaffolds
194
have the same number of rings. For natural products in a mixture, we utilize the
195
constructed hierarchical scaffold trees to retrieve scaffolds that are similar to the given
196
seed scaffolds. The parent, sibling, and child scaffolds of the given seed scaffolds
197
were all retrieved for input in the next prediction procedure; these candidates
198
combined with the seed scaffolds are referred to as “targeted scaffolds” in this study.
199
By selecting the right scaffolds through surveying the sibling, parent, and child
-9ACS Paragon Plus Environment
Journal of Chemical Information and Modeling
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 10 of 39
200
relationships in the scaffold trees, the accuracy of unknown structure elucidation can
201
be enhanced considerably.
202
To elucidate chemical structures in mixtures that were not already included in the
203
NPDB and to enhance the structural diversity of predicted compounds in our system, a
204
side chain database is generated and used to combine the seed scaffolds for
205
construction of compound candidates in a mixture. The side chains are defined as the
206
parts of the structure other than the scaffold. We only considered the side chains that
207
are not hydrogen. All possible side chains that can be linked on each position of the
208
scaffold were collected from NPDB. Moreover, the probabilities of occurrence for
209
side chains in particular positions were also calculated. For a scaffold with atom-
210
positions {1, 2, … …, S}, the probability of occurrence of side chain x at atom-
211
position y is defined as follows: =
( )
, for an atom-position y ∈ 1, 2, … … ,
(1)
212
Where ( ) is the frequency at which side chain x occurred at atom-position y of
213
the scaffold in the NPDB, and is the total number of possible side chains that
214
occurred at atom-position y of the scaffold in the NPDB.
215
After analyzing which side chains can be linked to each scaffold in the NPDB
216
and calculating the probabilities of occurrences of side chains at each position of the
217
scaffold, we also must determine which possible sets of positions on the scaffold can
218
be linked to the side chains. These possible sets of positions on the scaffold are called
219
atom-position configurations in this study. We used the atom-position configurations
220
to elucidate unknown chemical structures by extending appropriate side chains.
- 10 ACS Paragon Plus Environment
Page 11 of 39
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
221
Prediction procedure of NP-StructurePredictor
222
The main prediction procedure in NP-StructurePredictor is to identify possible
223
structures that match the targeted MWs. The targeted scaffolds selected by NP-
224
StructurePredictor and user are used as the starting information for structural
225
elucidation. Users can choose to provide the seed scaffolds if the user has prior
226
knowledge of the potential structural features that are likely to be found in the
227
compounds. When the seed scaffolds for the test material are not provided NP-
228
StructurePredictor can regard all scaffolds in our database as the seed scaffolds for
229
generation of suitable candidates. If the seed scaffolds are not provided, the system
230
will require longer execution time; however, NP-StructurePredictor is efficient
231
enough to complete the task in a reasonable timeframe. To generate structures having
232
targeted MW of W0 from the input peak table, NP-StructurePredictor first takes each
233
scaffold listed in the set of targeted scaffolds as the starting seed. The prediction
234
procedure provided two ways for generating possible chemical structures having the
235
target scaffold. The first approach directly searches existing structures that contained
236
the relevant targeted scaffolds and matched the MW criteria in the NPDB. The second
237
approach applied a branch and bound algorithm to computationally formulate possible
238
chemical structures by linking all possible side chains on the targeted scaffolds to
239
match the target MWs based on the atom-position configurations. We defined a
240
combination of possible side chains on a considered scaffold as C=(X1, X2, …, XS),
241
where Xn is a side chain at atom-position n, and S is the number of atom positions of a
242
specific scaffold. For the considered scaffold, if we want to find the R most likely
243
structures with respect to a targeted MW of W0, the computational problem to
244
generate possible structures can be formulated into the following equations:
245
- 11 ACS Paragon Plus Environment
Journal of Chemical Information and Modeling
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Find R combinations ( , , … … , )
(2)
such that ( ) is maximum, ( ) > ( ) … … > ( )
(3)
Page 12 of 39
+
and ∀c ∈ ( , , … … , ), % &'(() ) = *
(4)
),
246
where w is the molecular weight excluded the MW of the scaffold from W0, and
247
&'(() ) represents the molecular weight of the side chain, () . The probability of a
248
combination C is defined as follows: ( ) = ∏+), (() ) , = (( , ( , … … , (+ )
(5)
249
The probability of a side chain Xi occurring at position i was defined in equation
250
(1). Formula (3) ensures that the combinations of side chains are the best R candidates
251
with the highest probability of occurrence in nature, and formula (4) ensures that the
252
total molecular weight of the selected side chains matches w. The value of w is the
253
targeted MW excluding the scaffold since we only consider the MWs of the side
254
chains in the prediction process. The probability of the selected combination of side
255
chains is defined in equation (5). Because the probability of each side chain occurring
256
on a scaffold is considered an independent event, the probability of the whole group
257
of selected side chains is given by the product of the probabilities of each side chain at
258
their corresponding atom positions. To identify the best R side chain lists, a brute-
259
force strategy is used to search all the combinations of side chains and compute the
260
probability for each combination. Thus, the brute-force strategy must be executed in
261
exponential time. Since we know that some combinations are impossible for the
262
targeted MWs, NP-StructurePredictor adopted a branch and bound algorithm to
263
enhance search performance. The algorithm iteratively searches the best side chain
264
candidates starting from atom-position 1 to position S of the targeted scaffolds, and
265
omits impossible combinations in each iteration. To illustrate the idea of this
- 12 ACS Paragon Plus Environment
Page 13 of 39
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
266
algorithm, consider the following case. Suppose we are searching all combinations of
267
side chains on a scaffold with S atom-positions, and the algorithm searched a possible
268
side chain combination Ccurrent=(X1, X2, …, Xy) once positions 1 to y (y < S) have been
269
processed. The combination of Ccurrent can be skipped once the MW of current
270
combination (MW(Ccurrent)) is greater than the targeted MW, w; that is, there is no
271
combination (X1, X2, …, Xy+1) with an MW smaller or equal to w. Thus, the algorithm
272
can save computation time by only branching to appropriate combinations.
273
NP-StructurePredictor finds possible structures for each targeted MW in user’s
274
peak table. The whole process continues until all targeted MWs in the peak table are
275
processed. The source codes of NP-StructurePredictor can be downloaded from
276
http://npstructurepredictor.cmdm.tw/NPSP.rar. The algorithm was implemented in
277
Java (JDK 7) and tested on a Linux PC with an Intel Xeon(R) CPU 2.40 GHz with 32
278
GB of memory. Users can build the program of NP-StructurePredictor for structural
279
elucidation steps by steps according to our provided manual file.
280 281
Validation datasets
282
Four herbal datasets (Cuscuta chinensis, Ophiopogon japonicus, Polygonum
283
multiflorum, and angelica) were selected to evaluate our system’s performance. All
284
herbs data sets were taken from the Natural Product Laboratory of Taiwan Medical
285
and Pharmaceutical Industry Technology and Development Center (PITDC).
286
Moreover, the Natural Product Laboratory of PITDC identified a list of structures of
287
each herb using their own identification procedure. The list of authentic structures
288
was treated as validated results in our evaluation process. The raw MS data in the
289
mzML
format
can
be
downloaded
from
- 13 ACS Paragon Plus Environment
the
following
link
Journal of Chemical Information and Modeling
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
290
(http://npstructurepredictor.cmdm.tw/Spectra.rar). The detailed experimental methods
291
are described in the next section.
292
Experimental methods
293
Sample powder (200 mg) was transferred to a 2-mL centrifuge tube followed by
294
1.5 mL methanol/water (7/3), and the tube was then placed in an ultrasonic bath
295
(Branson 5510/5210) at maximum ultrasonication for 15 min at 40°C. The sample
296
tube was then centrifuged at 10,000 rpm for 5 min (Hermle Z 323K). The extraction
297
was repeated three more times, and the upper extracts were combined. Then, 70%
298
methanol was added to the filtrate to bring the sample solutions up to a total volume
299
of 5 mL. The solutions were filtered through 0.45-µm filters before high-performance
300
liquid chromatography (HPLC) and high-performance liquid chromatography-
301
electrospray ionization-mass spectrometry (HPLC-ESI-MS) analyses. HPLC analyses
302
were carried using an Agilent 1100 HPLC series system (Santa Clara, CA, USA). The
303
column used was a Zorbax SB-C18 column (4.6 mm × 250 mm i.d., 5 µm; Agilent
304
Company, USA), and it was protected by a guard column (3.9 mm × 20 mm i.d., 5
305
µm). The extracts of the four herbs were analyzed under the same HPLC conditions.
306
The mobile phase consisted of solvent A, water/0.1% formic acid, and solvent B,
307
acetonitrile/0.1 formic acid, with a gradient program at a flow rate of 1 mL/min. The
308
gradient elution program was as follows: 0-40 min, linear gradient from 10 to 35% B;
309
40-50 min, linear gradient from 35 to 50% B; 50-60 min, linear gradient from 50 to
310
100% B; and hold at 100% B for 5 min. The effluent was monitored at 254 nm, 280
311
nm, and 312 nm. The MS system used was a Bruker Daltonics Esquire 2000 ion trap
312
mass spectrometer (Bremen, Germany) equipped with an orthogonal ESI interface.
313
The ionization parameters were as follows: positive and negative ion mode; capillary
314
voltage, 4000 V; nebulizing gas was nitrogen at 25-30 psi; drying gas flow 10.0 L/min - 14 ACS Paragon Plus Environment
Page 14 of 39
Page 15 of 39
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
315
at 250-300°C. The mass analyzer scanned from 50 to 1000 amu. The MS/MS spectra
316
were recorded in auto MS/MS mode. Other instrument parameters were set according
317
to the properties of each compound. The obtained data, including parent and daughter
318
ions pattern, were compared with the spectra of compounds of similar medicinal herbs
319
in earlier publications or databases. This step led to the preliminary identification of
320
the top five high-intensity peaks. These sample peaks were further compared with the
321
authentic compounds analyzed under the same LC conditions to compare their
322
retention times and MS/MS spectra.
323
- 15 ACS Paragon Plus Environment
Journal of Chemical Information and Modeling
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
324
Results and discussion
325
The scaffold tree database
326
Hierarchical scaffold trees database was generated for selection of appropriate
327
targeted scaffolds in NP-StructurePredictor. Classification of chemical structures, as
328
well as construction of a scaffold tree database, were achieved using hierarchical
329
scaffold classification.26 There were 83242 different scaffolds that formed 4001 trees
330
in our scaffold tree database. The total number of natural products in our NPDB is
331
243130, while the number of generated scaffolds is 83242. There are patterns that
332
frequently reoccur in the compounds of the NPDBs since the number of unique
333
scaffolds generated only accounts for approximately one third of the natural product
334
structures in the NPDB. We can utilize these patterns to generate novel structures for
335
elucidation of unknown chemical structures. Since every scaffold exists in an average
336
of three structures from the NPDB, we have adequate number of side chains to
337
generate novel structures for elucidation of unknown compounds. One of the
338
representative trees is shown and discussed in the supplementary Additional File 1
339
online.
340
The scaffold database
341
We designed several strategies in searching protocol to enhance the efficiency of
342
NP-StructurePredictor. During the process of scaffold database construction, we used
343
the symmetrical structures principles to ensure all possible atom positions can be
344
linked by any given side chain. In this way, NP-StructurePredictor can generate
345
structures that are not already available in the current NPDB. However, since a total
346
of 243130structures were included in the NPDB, a direct database searching and
347
structure matching cannot be achieved within a reasonable period. A threshold and an
- 16 ACS Paragon Plus Environment
Page 16 of 39
Page 17 of 39
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
348
indexing strategy were applied to increase the execution speed. First, we directly
349
applied the largest molecular weight (LMW) and the smallest molecular weight
350
(SMW) thresholds to filter out unsuitable scaffolds; if the targeted molecular weight
351
(MW) of the scaffold is smaller than the SMW threshold or larger than the LMW
352
threshold, then we can directly bypass this scaffold and terminate any further
353
processing. Although this strategy may increase the risk of losing structures that
354
should be identified, our validation experiments revealed that our system can still
355
effectively identify all the structures. Second, we assigned an index to each scaffold
356
that can directly map onto the structures in the NPDB. The final searching protocol in
357
the worst-case scenario reduced the number of structures from 243130 (total in the
358
NPDB) to 10214 (the largest number of structures with the same scaffolds).
359
Evaluation of time performance
360
The comparison of time performance between NP-StructurePredictor and the
361
traditional algorithm was analyzed in this section. NP-StructurePredictor adopted the
362
branch and bound algorithm to significantly improve the performance speed of
363
structure elucidation. Two different modes of the branch and bound algorithm were
364
implemented in NP-StructurePredictor to identify unknown structures; 1) by using the
365
information of our constructed atom-position configurations (“learned R-group”), and
366
2) by letting users specify atom-position configurations (“added R-group”) to restrict
367
the number of possible atom positions that can be linked by side chains. A traditional
368
brute-force algorithm which generated all combinations of possible structures was
369
compared with our two modes of branch and bound methods. We used the total
370
number (Nc) of possible structures that could be generated in the system to evaluate
371
the computational time of these methods, because 1) determining Nc is the most time-
- 17 ACS Paragon Plus Environment
Journal of Chemical Information and Modeling
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
372
intensive step in the whole process, and 2) computation time is approximately
373
proportional to Nc when all other conditions are kept the same.
374
In Figure 2, four user-identified scaffolds from the test case of Polygonum
375
multiflorum, were evaluated and compared using the three algorithms mentioned
376
above. The result show branch and bound methods using either learned R-groups or
377
added R-groups can significantly improve execution times compared to the brute-
378
force method. Taking scaffold 2-2 for instance, Nc in learned R-group mode is 2.38 ×
379
107, and Nc in added R-group mode is 7.68 × 105. However, Nc using the brute-force
380
approach is 7.36 × 1016. The total number of possible structures generated using the
381
brute-force approach is significantly larger than the two branch and bound methods by
382
a factor of approximately 1010. Moreover, the results of the branch and bound
383
approaches were more precise. These results are further discussed in the subsection
384
titled “Structure elucidation using a combinatorial side chains approach.” While the
385
rankings for some of the known structures identified by the brute-force approach fell
386
below the top one hundred most likely structures, this can be rectified by utilizing the
387
branch and bound approach which can improve the rankings such that they all fall in
388
the top ten most likely structures. This discrepancy is because the brute-force
389
algorithm for structure generation considers all possible atom positions in scaffolds,
390
and therefore, false positive chemical structures were included in the results. Either
391
learned R-group or added R-group approaches can address this challenge and
392
empirically generate better structures. It should be noted that the difference between
393
the combination numbers (Nc) of added R-group mode and learned R-group mode is
394
not significant; however, learned R-group mode is totally automatic without any user
395
intervention. In contrast, since added R-group mode allows users to specify R-group,
396
the results could be biased toward the users’ preexisting knowledge.
- 18 ACS Paragon Plus Environment
Page 18 of 39
Page 19 of 39
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
397
Case studies
398
The performance of our system was evaluated and validated using four
399
experimental herbs, namely, Cuscuta chinensis, Ophiopogon japonicus, Polygonum
400
multiflorum, and genus Angelica. We compared the predicted results from NP-
401
StructurePredictor with the known components of the four herbal mixtures to evaluate
402
the accuracy of our system.
403
Two proposed prediction methods were applied in each experimental case. The
404
first approach directly searched structures in our database containing the scaffolds of
405
interest and matched those to the MW criteria in the NPDB. The second approach
406
generated new structures by linking all possible side chains onto the scaffold to match
407
the target molecular weight. The detailed algorithms are described in the Methods
408
section. In the four case studies below, we will directly reference these methods as the
409
“first approach” and the “second approach.” It is worth noting that we only applied
410
the learned R-group mode to the second approach.
411
In the next four sections, four testing herbs were validated and analyzed by NP-
412
StructurePredictor. The “first” and “second” approaches were applied to the first two
413
testing herbs respectively. The third case demonstrated our capability and efficiency
414
of structure elucidation without inputting seed scaffolds. The last case illustrated the
415
predictive power of our system for structure elucidation even for a very complex herb.
416
Our evaluation demonstrated the following:
417
1)
The ranking strategy of NP-StructurePredictor is practical. We have shown its
418
practicality in four test cases, in which the compounds that were highly ranked using
419
NP-StructurePredictor matched the known compounds in the tested herbs.
420
2)
421
included in the current NPDB. Meaning, NP-StructurePredictor improves the
NP-StructurePredictor can generate novel structures that were not already
- 19 ACS Paragon Plus Environment
Journal of Chemical Information and Modeling
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
422
identification outcome by suggesting novel and correct (physically and chemically
423
feasible) structures.
424
Structure elucidation using database searching methods
425
The herbal mixture Cuscuta chinensis is known for its anti-cancer,33
426
immunostimulatory,34 and antiosteoporotic activities.35 The structures of the
427
components were confirmed using liquid chromatography-tandem mass spectrometry
428
(LS-MS/MS), and their respective spectra are shown in supplementary Additional File
429
2 which can be found online. The detected MW values extracted from the mass
430
spectra were 286.24, 302.24, 354.31, and 478.41 (Figure 3B). The validated
431
structures matching the targeted MW value of 286.24 include luteolin and kaempferol,
432
while the validated structures matching the targeted MW values of 302.24, 354.31,
433
and 478.41 are quercetin, 3-[3-(3,4-dihydroxy-phenyl)-acryloyloxy]-1,4,5-trihydroxy-
434
cyclohexanecarboxylic acid, and 2-(3-hydroxy-4-methoxyphenyl)-3,5-dihydroxy-7-
435
O-β-D-glucopyranoside-4H-1-benzopyrane-4-one, respectively. These structures were
436
verified by the Natural Product Laboratory of Pharmaceutical Industry Technology
437
and Development Center (PITDC) and were the correct structures for our testing set.
438
In this case study, NP-StructurePredictor took the four targeted MW values as
439
input. Two possible known scaffolds, shown in Figure 3A, were used as seed
440
scaffolds in the program of NP-StructurePredictor. One of the scaffolds (flavone, 1-2)
441
is a common backbone structure in Cuscuta chinensis.35-37 In this case, since most of
442
the constituents in Cuscuta chinensis were included in our collected NPDB, we used
443
the database searching approach to directly search existing compounds in the NPDB.
444
The average number of predicted structures identified by NP-StructurePredictor
445
across the four targeted MW values are 3 for scaffold 1-1 and 32 for scaffold 1-2. All
- 20 ACS Paragon Plus Environment
Page 20 of 39
Page 21 of 39
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
446
five of the validated structures were successfully identified in this approach. The
447
rakings of the five validated structures among all the predicted structures are listed in
448
Figure 3B. All five structures in the Cuscuta chinensis mixture, which were
449
definitively identified by experimental methods, can be correctly predicted by our
450
system; more importantly, all were ranked in the top five possible structures.
451
In this scenario, since all the validated chemical structures were already
452
available in the NPDB, NP-StructurePredictor simply had to retrieve known structures
453
from the database and rank them. The average ranking for these identified structures
454
was approximately 2, indicating that NP-StructurePredictor consistently recommends
455
chemical structures that closely resemble the structures of known compounds in our
456
databases. This case study demonstrated that the searching functionality of our system
457
is reliable and that our ranking method is reasonable.
458
Structure elucidation using a combinatorial side chains approach
459
Polygonum multiflorum, also called he shou wu, is one of the most important
460
traditional Chinese medicines and is frequently used as a strong laxative and blood
461
tonic. We used this herbal mixture to demonstrate the second approach of our NP-
462
StructurePredictor, by linking all possible side chains on a scaffold to match the
463
targeted MW. The validated structures and their respective targeted MW obtained
464
from experiments, are shown in the supplementary Additional File 3 online. The
465
corresponding targeted MW values from the mass spectra were 270.24, 284.27,
466
290.27, 406.39, 406.39, 432.38 and 578.53. Four scaffolds derived from known
467
chemical constituents of Polygonum multiflorum that have previously been published
468
in the literature38-40 were used as seed scaffolds in this case (Figure 4A). We first
469
applied the database searching approach, and NP-StructurePredictor returned a list of
- 21 ACS Paragon Plus Environment
Journal of Chemical Information and Modeling
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 22 of 39
470
more than one hundred structures matching each targeted MW. However, most of the
471
known structures could not be correctly predicted, nor were they highly ranked using
472
this method. This was because the compounds in the Polygonum multiflorum herbal
473
mixture are complex (e.g., procyanidin B2) and very diverse (a total of seven
474
structures with four scaffolds). To better predict the chemical structures in this
475
mixture, we applied the second prediction approach involving appending appropriate
476
side chains on the targeted scaffolds based on the atom-position configurations. NP-
477
StructurePredictor then utilizes the top five most common atom-position
478
configurations to generate novel structures. Seven confirmed constituents in
479
Polygonum multiflorum were all correctly identified. Incorporation of this approach
480
into NP-StructurePredictor improved the predictions. Although the second approach
481
iteratively searched all possible combinations of side chains on the targeted scaffolds
482
and generated a huge number of possible structures matching the targeted MW, the
483
validated structures were still ranked highly (Figure 4B). The average ranking of the
484
corrected structures was approximately 4. A total of 2858 compounds were generated
485
containing the 2-2 scaffolds matching the targeted MW value of 406.39. Then known
486
structures
487
tetrahydrostilbene 2-O-β-D-glucopyranoside, both containing the 2-2 scaffold, were
488
ranked 1 and 4, respectively. This case study demonstrated that the approach of side
489
chains extension on targeted scaffolds was useful in improving structure rankings
490
(reducing false positive identifications).
491
Structure elucidation without inputting seed scaffolds
492
NP-StructurePredictor contains an option to elucidate “unknown” chemical structures
493
by directly searching through all possible 83242 scaffolds, without inputting any seed
494
scaffold. We took another popular traditional Chinese herb, Ophiopogon japonicus
3,5,3',4'-tetrahydrostilbene-4'-O-β-D-glucopyranoside
- 22 ACS Paragon Plus Environment
and
2,3,5,4'-
Page 23 of 39
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
495
(also known as Maidong), as an example to compare the prediction results with or
496
without inputting seed scaffolds. Ophiopogon japonicus has been used clinically as a
497
treatment for chronic inflammation and coronary heart disease.41, 42 The seven known
498
structures from the Ophiopogon japonicus mixture and their experimentally obtained
499
mass spectra can be found in the supplementary Additional File 4 online. Their
500
corresponding targeted MW values extracted from the mass spectra are 328.32,
501
342.35, 356.33, and 370.36. Several chemical constituents of the roots of Ophiopogon
502
japonicus were elucidated by spectroscopic and chemical analyses,43,
503
derived three possible scaffolds (Figure 5A) from those chemical constituents. Using
504
the atom-position configurations approach to combine possible scaffolds with a
505
weighted list of side chains, NP-StructurePredictor correctly identified all seven
506
compounds from the Ophiopogon japonicus herbal mixture and assigned them
507
relatively high rankings (Figure 5B). Although methylophiopogonanone B has the
508
lowest estimated rank among all the prediction results, this compound still ranked 5
509
out of the 638 generated structures for its targeted MW. Furthermore, the other six
510
compounds from Ophiopogon japonicus all ranked in the top 3. However, when we
511
applied the direct searching of NPDB approach, two of the seven experimentally
512
confirmed
513
methylenedioxybenzyl)chromone
514
methylenedioxybenzyl)chromone), could not be identified or matched because these
515
two natural products do not currently exist in our NPDB. Although we have included
516
a large number of natural products (226949) from three well-known natural products
517
databases, we recognize our NPDB are not all-inclusive. This demonstrated that our
518
prediction system’s ability to generate novel chemical structures is crucial for the
519
structural elucidation process. For structures that were not already included in the
structures,
44
and we
(5,7,2'-trihydroxy-8-methyl-3-(3',4'and
5,7,2'-trihydroxy-6-methyl-3-(3',4'-
- 23 ACS Paragon Plus Environment
Journal of Chemical Information and Modeling
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
520
NPDB, NP-StructurePredictor is well-equipped to generate new structures to
521
compensate for unknown natural products.
522
When the seed scaffolds for the test species cannot be provided by users, NP-
523
StructurePredictor can still elucidate “unknown” chemical structures. Although this
524
process takes longer, NP-StructurePredictor can efficiently complete this task in a
525
reasonable timeframe. We took a targeted MW value of 342.35 from Ophiopogon
526
japonicus as a test case to identify unknown chemical structures by searching all
527
scaffolds in NP-StructurePredictor. Only two structural candidates were generated by
528
NP-StructurePredictor when the known scaffold was given. However, after
529
performing the second prediction approach on all 83242 scaffolds, the number of
530
generated compound candidates increased to 17332. The total execution time was
531
approximately 7 days. All structures of Ophiopogon japonicus with a targeted MW
532
value of 342.35 can be correctly identified, and the ranking of the three known
533
structures,
534
methylenedioxybenzyl)chromone,
535
(3',4'methylenedioxybenzyl)chromone, were 23, 133, and 159, respectively. This
536
example demonstrated that the validated structures still can be successfully ranked in
537
the top one percent of compounds even without the inputting seed scaffolds. We
538
recommend users choose the top 200 generated compounds as the likely candidates in
539
the testing mixture, and further utilize known mass spectra or structural information to
540
verify these structures.
541
Structure elucidation for a complex herbal mixture
methylophiopogonanone
A,
5,7,2'-trihydroxy-8-methyl-3-(3',4'-
and
5,7,2'-trihydroxy-6-methyl-3-
542
We chose a complex herbal mixture for our last case study. In this case, the
543
mixture contained Chinese angelica, Hualien angelica, and Japanese angelica to
- 24 ACS Paragon Plus Environment
Page 24 of 39
Page 25 of 39
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
544
illustrate the effectiveness of structure elucidation by NP-StructurePredictor. The root
545
of Angelica (Danggui) has been widely used for the treatment of many diseases
546
because of its anti-oxidation, anti-tumor, and anti-inflammatory activities.45 The
547
structures of the bioactive constituents isolated from angelica are very complex.45 The
548
chemical components are mainly composed of different types of coumarins,
549
acetylenic compounds, chalcones, sesquiterpenes and polysaccharides
550
45 validated compounds in our genus Angelica. The complex herbal mixture used in
551
this study contains 46 validated compounds, as shown in supplementary Additional
552
File 5 which can be found online. Thirty-seven different targeted MW values from
553
mass spectra ranging from 162.03 to 574.29 were reported. In Additional File 5, 46
554
validated structures are listed according to their MWs.
46
There were
555
A total of six known scaffolds derived from literature reports46 are shown in
556
Figure 6. We directly used the second prediction approach of NP-StructurePredictor
557
to elucidate the structures based on the six given scaffolds. The prediction results are
558
reported in supplementary Additional File 5. In this case, a total of 7079 compounds
559
were generated by NP-StructurePredictor based on the six seed scaffolds and 35
560
targeted MWs. The average number of generated structures for each targeted MW was
561
37. As shown in the table in supplementary Additional File 5, the average ranking for
562
the true structures in this herbal mixture was approximately 4, indicating that NP-
563
StructurePredictor can make good predictions even for a complex chemical mixture.
564
For example, in the mixture of angelica, the chemical constituent byakangelicin was
565
ranked first out of the thirty-six generated compounds that contain the 4-2 scaffold.
566
The overall prediction rate for this complex mixture was approximately 82%, since
567
out of a total of 45 compounds, only eight structures could not be correctly predicted
568
by our system. These compounds included oxypeucedanin, byakangelicol,
- 25 ACS Paragon Plus Environment
Journal of Chemical Information and Modeling
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
569
japoangelone, angelol G, isoepoxypteryxin, edulisin V, japoangelol B, and
570
japoangelol A. NP-StructurePredictor failed to predict these eight compounds because
571
our system can only utilize side chains learned from the collected NPDB to construct
572
possible structures on the known scaffolds; if our system lacked the specific side
573
chains required to generate these unique structures, then NP-StructurePredictor would
574
not be able to predict them. For example, since the side chain 3-(methoxymethyl)-2,2-
575
dimethyloxirane was not included in our side chain database, NP-StructurePredictor
576
could not link this side chain on the 4-2 scaffold to generate the correct chemical
577
constituent, oxypeucedanin, in the angelica mixture. A solution to this limitation is to
578
manually input extra side chains into our prediction system. To do this, commonly
579
occurring or structurally related side chains need to be added, and the criteria used to
580
select these side chains should be provided as well.
581
This case study demonstrated the merit of our NP-StructurePredictor system;
582
structures that do not already exist in the NPDB can still be generated by our system
583
for the identification of complex unknown natural products. The unavailable
584
structures in the NPDB include 4-hydroxyderricin and xanthoangelol E, and they were
585
ranked quite highly (4-hydroxyderricin: ranked 1, xanthoangelol E: ranked 5). The
586
ranking strategy is reliable because the predicted structures and their rankings
587
correlate well with the experimentally validated structures. Moreover, the outcomes of
588
case studies 3 and 4 showed that the atom-position configurations approach is an
589
effective strategy for generating new and viable structures to enhance the predictive
590
power of our system for structure elucidation.
591
Conclusions
592
In this study, NP-StructurePredictor was developed to efficiently and accurately
593
predict chemical structures of individual constituents of plant mixtures from LC-MS - 26 ACS Paragon Plus Environment
Page 26 of 39
Page 27 of 39
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
594
experiments. The only input users need to provide NP-StructurePredictor are a list of
595
molecular weight (MW) values from LC-MS spectra of the sample and seed scaffold
596
information from prior knowledge of the potential structural categories. When the
597
seed scaffolds are not provided, NP-StructurePredictor can directly search all its
598
83242 scaffolds for suitable candidates. The system computationally generates
599
possible chemical structures based on the user inputted target MWs and by combining
600
the most likely scaffolds and a list of side chains from our curated NPDB. NP-
601
StructurePredictor ranks the predicted structures allowing the most likely natural
602
product structures and their analogs to be proposed accordingly. Moreover, NP-
603
StructurePredictor can predict novel structures that were not already available in our
604
NPDB. NP-StructurePredictor is superior to previously developed methods that use
605
heuristics rules or chemical structural searches to generate structures because it can
606
automatically elucidate structures based on known side chains and correctly propose
607
the most plausible structures with respect to current experimental results. According
608
to our four validation case studies, our system can be used to predict natural products
609
in any herbal mixture. NP-StructurePredictor can also be utilized as a preliminary
610
structure elucidation screening system to reduce large numbers of possible chemical
611
structures, accelerating further identification procedures. The source code can be
612
downloaded from http://npstructurepredictor.cmdm.tw/NPSP.rar.
613
- 27 ACS Paragon Plus Environment
Journal of Chemical Information and Modeling
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
614
Acknowledgments
615
This work was funded by the Ministry of Science and Technology, Taiwan, grant
616
numbers 105-3011-F-002-010 -, 105-2812-8-002-001-MY2, and 106-2622-B-002-
617
008 -, and National Taiwan University, grant number NTU-ERP-106R880803.
618
Resources of the Laboratory of Computational Molecular Design and Metabolomics
619
and the Department of Computer Science and Information Engineering of National
620
Taiwan University were used to perform these studies.
621
Abbreviations
622
LC-MS: liquid chromatography-mass spectrometry
623
CASE: computer aided structures elucidation
624
2D: two-dimensional
625
NMR: nuclear magnetic resonance
626
MS: mass spectrometry
627
UPLC-MS: ultra performance liquid chromatography-mass spectrometry
628
MS/MS: tandem mass spectrometry
629
NPDB: natural products database
630
LMW: the largest molecular weight
631
SMW: the smallest molecular weight
632
MW: molecular weight
633
LC-MS/MS: liquid chromatography-tandem mass spectrometry
634
PITDC: Pharmaceutical Industry Technology and Development Center
635
DNP: Dictionary of Natural Products
636
TCMD: Traditional Chinese Medicine Database
637
HPLC: high performance liquid chromatography
- 28 ACS Paragon Plus Environment
Page 28 of 39
Page 29 of 39
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
638
HPLC-ESI-MS: high performance liquid chromatography-electrospray ionisation-
639
mass spectrometry
640
Supporting Information
641
Additional File 1. Detailed Results and Discussion.
642
Additional File 2. The spectral data as well as the structures identified from the
643
spectra from the Cuscuta chinensis case study.
644
Additional File 3. The spectral data as well as the structures identified from the
645
spectra from the Ophiopogon japonicus case study.
646
Additional File 4. The spectral data as well as the structures identified from the
647
spectra from the Polygonum multiflorum case study.
648
Additional File 5. A list of the verified structures and prediction results from the
649
genus Angelica case study using NP-StructurePredictor.
650 651
References
652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670
1. Koch, M. A.; Schuffenhauer, A.; Scheck, M.; Wetzel, S.; Casaulta, M.; Odermatt, A.; Ertl, P.; Waldmann, H., Charting biologically relevant chemical space: a structural classification of natural products (SCONP). Proc. Natl. Acad. Sci. U S A 2005, 102, 17272-17277. 2. Newman, D. J.; Cragg, G. M., Natural products as sources of new drugs over the last 25 years. J. Nat. Prod. 2007, 70, 461-477. 3. Zhang, C.; Qi, M.; Shao, Q.; Zhou, S.; Fu, R., Analysis of the volatile compounds in Ligusticum chuanxiong Hort. using HS-SPME-GC-MS. J. Pharm. Biomed. Anal. 2007, 44, 464-470. 4. Steinbeck, C., Recent developments in automated structure elucidation of natural products. Nat. Prod. Rep. 2004, 21, 512-518. 5. Steinbeck, C., The automation of natural product structure elucidation. Curr. Opin. Drug. Discov. Devel. 2001, 4, 338-342. 6. Elyashberg, M. E.; Gribov, L. A., Formal-logical method for interpreting infrared spectra from characteristic frequencies. J. Appl. Spectrosc. 1968, 8, 189-191. 7. Lederberg, J.; Sutherland, G. L.; Buchanan, B. G.; Feigenbaum, E. A.; Robertson, A. V.; Duffield, A. M.; Djerassi, C., Applications of artificial intelligence for chemical inference. I. The number of possible organic compounds. Acyclic structures containing C, H, O, and N. J. Am. Chem. Soc. 1969, 91.
- 29 ACS Paragon Plus Environment
Journal of Chemical Information and Modeling
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720
8. Nelson, D. B.; Munk, M. E.; Gash, K. B.; Herald, D. L., Alanylactinobicyclone. An Application of Computer Techniques to Structure Elucidation. J Org Chem 1969, 34, 3800-3805. 9. Sasaki, S.; Abe, H.; Ouki, T.; Sakamoto, M.; Ochiai, S., Automated structure elucidation of several kinds of aliphatic and alicyclic compounds. Anal. Chem. 2002, 40, 2220-2223. 10. Buchanan, B. G.; Smith, D. H.; White, W. C.; Gritter, R. J.; Feigenbaum, E. A.; Lederberg, J.; Djerassi, C., Applications of artificial intelligence for chemical inference. 22. Automatic rule formation in mass spectrometry by means of the metaDENDRAL program. J Org Chem 1976, 98, 6168-6178. 11. Steinbeck, C., SENECA: A platform-independent, distributed, and parallel system for computer-assisted structure elucidation in organic chemistry. J. Chem. Inf. Comput. Sci. 2001, 41, 1500-1507. 12. Peironcely, J. E.; Rojas-Cherto, M.; Fichera, D.; Reijmers, T.; Coulier, L.; Faulon, J. L.; Hankemeier, T., OMG: Open Molecule Generator. J. Cheminform. 2012, 4, 21. 13. Christie, B. D.; Munk, M. E., The role of 2-dimensional nuclear-magneticresonance spectroscopy in computer-enhanced structure elucidation. J Org Chem 1991, 113, 3750-3757. 14. Peng, C.; Yuan, S. G.; Zheng, C. Z.; Hui, Y. Z., Efficient Application of 2d Nmr Correlation Information in Computer-Assisted Structure Elucidation of Complex Natural-Products. J. Chem. Inf. Comput. Sci. 1994, 34, 805-813. 15. Lindel, T.; Junker, J.; Kock, M., 2D-NMR-guided constitutional analysis of organic compounds employing the computer program COCON. Eur. J. Org. Chem. 1999, 573-577. 16. Blinov, K. A.; Carlson, D.; Elyashberg, M. E.; Martin, G. E.; Martirosian, E. R.; Molodtsov, S.; Williams, A. J., Computer-assisted structure elucidation of natural products with limited 2D NMR data: application of the StrucEluc system. Magn. Reson. Chem. 2003, 41, 359-372. 17. Elyashberg, M. E.; Blinov, K. A.; Williams, A. J.; Molodtsov, S. G.; Martin, G. E.; Martirosian, E. R., Structure Elucidator: a versatile expert system for molecular structure elucidation from 1D and 2D NMR data and molecular fragments. J. Chem. Inf. Comput. Sci. 2004, 44, 771-792. 18. Elyashberg, M.; Blinov, K.; Molodtsov, S.; Williams, A., Elucidating 'undecipherable' chemical structures using computer-assisted structure elucidation approaches. Magn. Reson. Chem. 2012, 50, 22-27. 19. Elyashberg, M. E.; Williams, A.; Martin, G. E., Computer-assisted structure verification and elucidation tools in NMR-based structure elucidation. Prog. Nucl. Magn. Reson. Spectrosc. 2008, 53, 1-104. 20. Elyashberg, M.; Blinov, K.; Molodtsov, S.; Smurnyy, Y.; Williams, A. J.; Churanova, T., Computer-assisted methods for molecular structure elucidation: realizing a spectroscopist's dream. J Cheminform. 2009, 1, 3. 21. Kind, T.; Fiehn, O., Seven Golden Rules for heuristic filtering of molecular formulas obtained by accurate mass spectrometry. Bmc Bioinformatics 2007, 8, 105. 22. von Bargen, C.; Hubner, F.; Cramer, B.; Rzeppa, S.; Humpf, H. U., Systematic approach for structure elucidation of polyphenolic compounds using a bottom-up approach combining ion trap experiments and accurate mass measurements. J. Agric. Food Chem. 2012, 60, 11274-11282. 23. Scheubert, K.; Hufsky, F.; Bocker, S., Computational mass spectrometry for small molecules. J Cheminform 2013, 5, 12. - 30 ACS Paragon Plus Environment
Page 30 of 39
Page 31 of 39
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769
24. Yetukuri, L.; Katajamaa, M.; Medina-Gomez, G.; Seppanen-Laakso, T.; Vidal-Puig, A.; Oresic, M., Bioinformatics strategies for lipidomics analysis: characterization of obesity related hepatic steatosis. BMC Syst Biol 2007, 1, 12. 25. Bemis, G. W.; Murcko, M. A., The properties of known drugs. 1. Molecular frameworks. J Med Chem 1996, 39, 2887-2893. 26. Schuffenhauer, A.; Ertl, P.; Roggo, S.; Wetzel, S.; Koch, M. A.; Waldmann, H., The scaffold tree--visualization of the scaffold universe by hierarchical scaffold classification. J Chem Inf Model 2007, 47, 47-58. 27. Clasquin, M. F.; Melamud, E.; Rabinowitz, J. D., LC-MS data processing with MAVEN: a metabolomic analysis and visualization engine. Curr Protoc Bioinformatics 2012,, 14.11,1-23. 28. Smith, C. A.; Want, E. J.; O'Maille, G.; Abagyan, R.; Siuzdak, G., XCMS: processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification. Anal Chem 2006, 78, 779-787. 29. The Dictionary of Natural Products database is available from Chapman & Hall/CRC at URL http://dnp.chemnetbase.com/. (July 10, 2010) 30. Irwin, J. J.; Shoichet, B. K., ZINC--a free database of commercially available compounds for virtual screening. J Chem Inf Model 2005, 45, 177-182. 31. Chen, C. Y.-C., TCM database@Taiwan: the world's largest traditional chinese medicine database for drug screening in silico. PLoS ONE 2011, 6, e15939. 32. Fejzo, J.; Lepre, C. A.; Peng, J. W.; Bemis, G. W.; Ajay; Murcko, M. A.; Moore, J. M., The SHAPES strategy: an NMR-based approach for lead generation in drug discovery. Chem Biol 1999, 6, 755-769. 33. Nisa, M.; Akbar, S.; Tariq, M.; Hussain, Z., Effect of Cuscuta chinensis water extract on 7,12-dimethylbenz[a]anthracene-induced skin papillomas and carcinomas in mice. J Ethnopharmacol 1986, 18, 21-31. 34. Bao, X.; Wang, Z.; Fang, J.; Li, X., Structural features of an immunostimulating and antioxidant acidic polysaccharide from the seeds of Cuscuta chinensis. Planta Med 2002, 68, 237-243. 35. Yang, L.; Chen, Q.; Wang, F.; Zhang, G., Antiosteoporotic compounds from seeds of Cuscuta chinensis. J Ethnopharmacol 2011, 135, 553-560. 36. Umehara, K.; Nemoto, K.; Ohkubo, T.; Miyase, T.; Degawa, M.; Noguchi, H., Isolation of a new 15-membered macrocyclic glycolipid lactone, Cuscutic Resinoside a from the seeds of Cuscuta chinensis: a stimulator of breast cancer cell proliferation. Planta Med 2004, 70, 299-304. 37. Hajimehdipoor, H.; Kondori, B. M.; Amin, G. R.; Adib, N.; Rastegar, H.; Shekarchi, M., Development of a validated HPLC method for the simultaneous determination of flavonoids in Cuscuta chinensis Lam. by ultra-violet detection. Daru 2012, 20, 57. 38. Yao, S.; Li, Y.; Kong, L., Preparative isolation and purification of chemical constituents from the root of Polygonum multiflorum by high-speed counter-current chromatography. J Chromatogr A 2006, 1115, 64-71. 39. Qiu, X.; Zhang, J.; Huang, Z.; Zhu, D.; Xu, W., Profiling of phenolic constituents in Polygonum multiflorum Thunb. by combination of ultra-high-pressure liquid chromatography with linear ion trap-Orbitrap mass spectrometry. J Chromatogr A 2013, 1292, 121-131. 40. Choi, S. G.; Kim, J.; Sung, N. D.; Son, K. H.; Cheon, H. G.; Kim, K. R.; Kwon, B. M., Anthraquinones, Cdc25B phosphatase inhibitors, isolated from the roots of Polygonum multiflorum Thunb. Nat Prod Res 2007, 21, 487-493.
- 31 ACS Paragon Plus Environment
Journal of Chemical Information and Modeling
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788
41. Ge, L. L.; Kan, L. D.; Zhuge, Z. B.; Ma, K. E.; Chen, S. Q., Ophiopogon japonicus strains from different cultivation regions exhibit markedly different properties on cytotoxicity, pregnane X receptor activation and cytochrome P450 3A4 induction. Biomed. Rep. 2015, 3, 430-434. 42. Chen, M. H.; Chen, X. J.; Wang, M.; Lin, L. G.; Wang, Y. T., Ophiopogon japonicus--A phytochemical, ethnomedicinal and pharmacological review. J. Ethnopharmacol 2016, 181, 193-213. 43. Hung, T. M.; Thu, C. V.; Dat, N. T.; Ryoo, S. W.; Lee, J. H.; Kim, J. C.; Na, M.; Jung, H. J.; Bae, K.; Min, B. S., Homoisoflavonoid derivatives from the roots of Ophiopogon japonicus and their in vitro anti-inflammation activity. Bioorg Med Chem Lett 2010, 20, 2412-2416. 44. Li, N.; Zhang, J. Y.; Zeng, K. W.; Zhang, L.; Che, Y. Y.; Tu, P. F., Antiinflammatory homoisoflavonoids from the tuberous roots of Ophiopogon japonicus. Fitoterapia 2012, 83, 1042-1045. 45. Jin, M.; Zhao, K.; Huang, Q.; Xu, C.; Shang, P., Isolation, structure and bioactivities of the polysaccharides from Angelica sinensis (Oliv.) Diels: a review. Carbohydr Polym 2012, 89, 713-722. 46. Sarker, S. D.; Nahar, L., Natural medicine: the genus Angelica. Curr Med Chem 2004, 11, 1479-1500.
789 790
- 32 ACS Paragon Plus Environment
Page 32 of 39
Page 33 of 39
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
791
Figure 1. NP-StructurePredictor system overview. The NP-StructurePredictor system
792
overview is illustrated using color-coded modules. The red modules represent
793
inputted raw data into the NP-StructurePredictor system; the blue modules represent
794
the computational functions executed by the system; and the green modules represent
795
processed data. To emphasize the roles these modules play within the overall system,
796
boxes are drawn around sub-groups with dashed lines and labelled with white texts
797
over black background. The detailed system overview is described in the Methods
798
section.
799 800
- 33 ACS Paragon Plus Environment
Journal of Chemical Information and Modeling
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
801
Figure 2. Comparison of combination numbers (Nc) using three approaches. We used
802
the combination number as an index for evaluating the computation times for the
803
three approaches. Four scaffolds (2-1, 2-2, 2-3, and 2-4) were assessed in this
804
evaluation. The y-axis values are the base 10 logarithms of the combination numbers,
805
Nc.
806 807
- 34 ACS Paragon Plus Environment
Page 34 of 39
Page 35 of 39
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
808
Figure 3. The prediction result of Cuscuta chinensis. Two known possible scaffolds
809
for this herbal mixture were used as input scaffolds and are shown in (A). The
810
confirmed compounds are shown in (B). All these structures were correctly identified
811
by NP-StructurePredictor. The predicted rankings for these compounds are listed
812
below each structure.
813 814
- 35 ACS Paragon Plus Environment
Journal of Chemical Information and Modeling
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
815
Figure 4. The prediction result of Ophiopogon japonicus. Three input scaffolds are
816
shown in (A). The confirmed compounds are shown in (B). All these structures were
817
correctly identified by NP-StructurePredictor. The predicted rankings for these
818
compounds are listed below each structure.
819 820
- 36 ACS Paragon Plus Environment
Page 36 of 39
Page 37 of 39
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
821
Figure 5. The prediction result of Polygonum multiflorum. Four input scaffolds are
822
shown in (A). The confirmed compounds are shown in (B). All these structures were
823
correctly identified by NP-StructurePredictor, and the predicted rankings for these
824
compounds are listed below each structure.
825 826
- 37 ACS Paragon Plus Environment
Journal of Chemical Information and Modeling
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
827
Figure 6. The six input scaffolds for genus Angelica are shown. The input scaffolds,
828
the known possible scaffolds for this herbal mixture, were gleaned from published
829
data.
830 831
- 38 ACS Paragon Plus Environment
Page 38 of 39
Page 39 of 39
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
832
TOC GRAPH
833
For Table of Contents Use Only NP-StructurePredictor: prediction of unknown natural products in plant mixtures Yeu-Chern Harn, Bo-Han Su, Yuan-Ling Ku, Olivia A. Lin, Cheng-Fu Chou,and Y. Jane Tseng*
834
- 39 ACS Paragon Plus Environment