Subscriber access provided by University of Newcastle, Australia
Article
Development of a Comprehensive Flavonoid Analysis Computational Tool for Ultra High-Performance Liquid Chromatography-Diode Array Detection-High Resolution Accurate Mass-Mass Spectrometry Data Mengliang Zhang, Jianghao Sun, and Pei Chen Anal. Chem., Just Accepted Manuscript • DOI: 10.1021/acs.analchem.7b00771 • Publication Date (Web): 25 Jun 2017 Downloaded from http://pubs.acs.org on June 25, 2017
Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.
Analytical Chemistry is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.
Page 1 of 32
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Analytical Chemistry
1
Development of a Comprehensive Flavonoid Analysis Computational Tool for Ultra High-
2
Performance Liquid Chromatography-Diode Array Detection-High Resolution Accurate
3
Mass-Mass Spectrometry Data
4
Mengliang Zhang†, Jianghao Sun†, and Pei Chen∗
5
Food Composition and Methods Development Lab, Beltsville Human Nutrition Research Center,
6
Agricultural Research Service, United States Department of Agriculture, Beltsville, Maryland
7
20705-2350, USA
8 9 10 11 12 13
†
Contributed equally to this manuscript.
∗
Corresponding author: Tel.: +1 301 504 8144; fax: +1 301 504 8314. E-mail address:
[email protected] 1 ACS Paragon Plus Environment
Analytical Chemistry
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
14 15
Page 2 of 32
ABSTRACT Liquid chromatography and mass spectrometry methods, especially ultra-high performance
16
liquid chromatography coupled with diode array detection and high resolution accurate-mass
17
multi-stage mass spectrometry (UHPLC-DAD-HRAM/MSn), have become the tool-of-the-trade
18
for profiling flavonoids in foods. However, manually processing acquired UHPLC-DAD-
19
HRAM/MSn data for flavonoid analysis is very challenging and highly expertise-dependent due
20
to the complexities of the chemical structures of the flavonoids and the food matrices. A
21
computational expert data analysis program, FlavonQ-2.0v, has been developed to facilitate this
22
process. The program firstly uses UV-Vis spectra for an initial step-wise classification of
23
flavonoids into classes and then identifies individual flavonoids in each class based on their mass
24
spectra. Step-wise identification of flavonoid classes is based on a UV-Vis spectral library
25
compiled from 146 flavonoid reference standards and a novel chemometric model that uses step-
26
wise strategy and projected distance resolution (PDR) method. Further identification of the
27
flavonoids in each class is based on an in-house database that contains 5686 flavonoids analyzed
28
in-house or previously reported in the literature. Quantitation is based on the UV-Vis spectra.
29
The step-wise classification strategy to identify classes significantly improved the performance
30
of the program and resulted in more accurate and reliable classification results. The program was
31
validated by analyzing data from a variety of samples, including mixed flavonoid standards,
32
blueberry, mizuna, purple mustard, red cabbage, and red mustard green. Accuracies of
33
identification for all samples were above 88%. FlavonQ-2.0v greatly facilitates the identification
34
and quantitation of flavonoids from UHPLC-HRAM-MSn data. It saves time and resources and
35
allows less experienced people to analyze the data.
36
2 ACS Paragon Plus Environment
Page 3 of 32
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
37 38
Analytical Chemistry
INTRODUCTION. Flavonoids are a group of phenolic compounds with various bioactivities and are widely
39
distributed in plants. In various in vitro and in vivo models, they have exhibited diverse
40
biological activities including anti-inflammatory, anti-atherosclerotic, antitumor, anti-
41
thrombogenic, anti-osteoporotic, and anti-viral effects.1 Although dietary flavonoids may play an
42
important role in human health, making recommendation on daily flavonoid intakes is very
43
difficult. One of the important issues that limit progress in dietary flavonoid recommendations
44
for consumers is the lack of appropriate analytical methods for the determination of flavonoids in
45
foods and dietary intake levels.2
46
Profiling flavonoids in foods is challenging due to the fact that their structures are complex,
47
their distribution and concentrations in plants vary greatly, and commercially available reference
48
standards are limited.3 Liquid chromatography mass spectrometry (LC/MS) has become the most
49
commonly employed method in flavonoid identification and quantification.2,4 While technical
50
advances such as ultra-high-performance liquid chromatography-diode array detection-high
51
resolution accurate-mass multi-stage mass spectrometry (UHPLC-DAD-HRAM-MSn) can
52
provide much more detailed information for a sample, it also brings us a new challenge: the
53
tremendous amounts of data to be analyzed. In recent years, the emergence of a few “omics”
54
tools such as XCMS,5 MZmine,6,7 MetSign,8 and MET-COFEA9 have greatly facilitated data
55
analysis using automated peak picking, peak alignment, peak integration and database searching.
56
However, they are designed for non-targeted metabolomics or metabolite profiling. They are
57
inadequate for the analysis of a specific class of targeted plant secondary metabolites, such as
58
flavonoids, due to the lack of specificity. Herein, FlavonQ-2.0v, a software program specifically
59
designed for the analysis of flavonoids, has been developed.
3 ACS Paragon Plus Environment
Analytical Chemistry
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
60
Page 4 of 32
FlavonQ-2.0v has made several important advances compared with its predecessor, FlavonQ.
61
Like FlavonQ,10 FlavonQ-2.0v features all the functions necessary to detect chromatographic
62
peaks, integrate peak areas, interpret MS spectra, and produce qualitative and quantitative results.
63
The important advance of FlavonQ-2.0v are: 1) it is capable of analysis of all the major classes
64
of flavonoids, including flavone/flavonol, flavan/flavanol, flavanone/flavanonol, isoflavone,
65
anthocyanidins, and hydroxycinnamic acids (non-flavonoids). (Figure 1); 2) the program uses a
66
chemometric pattern recognition method to classify the classes of the flavonoids by comparing
67
the UV spectrum of a chromatographic peak to an UV-Vis spectra library of 146 flavonoid and
68
hydroxycinnamic acid standards; 3) the result obtained from the above-mentioned step is
69
correlated with HRAM/MSn spectra of that peak and searched against an in-house flavonoid
70
database for tentative identification.
71
In this study, the step-wise approach of FlavonQ-v2.0 is explained and illustrated. The
72
advantages of step-wise strategy with the projected difference resolution (PDR) method over
73
conventional classification strategy is demonstrated. The program is validated with the analysis
74
of samples spiked with flavonoids, mix standards, and plant extracts. The improved approach
75
used in FlavonQ-2.0v is innovative, efficient, and highly effective.
76
MATERIALS AND METHODS
77
Chemicals and Plant Materials. Formic acid, HPLC grade methanol and acetonitrile were
78
purchased from Fisher Scientific. (Pittsburgh, PA). HPLC grade water was prepared from
79
distilled water using a Milli-Q system (Millipore Laboratory, Bedford, MA). The reference
80
standards for flavonoids and hydroxycinnamic acid derivatives were obtained from Sigma-
81
Aldrich (St. Louis, MO), Chromadex, Inc. (Irvine, CA), Indofine Chemical Co. (Somerville, NJ),
4 ACS Paragon Plus Environment
Page 5 of 32
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Analytical Chemistry
82
and Extrasynthese (Genay, Cedex, France). A list of 146 reference standards can be found in
83
Supporting Information.
84
Blueberry (Vaccinium corymbosum L.), mizuna (Brassica juncea), purple mustard
85
(Chorispora tenella), red cabbage (Brassica oleracea L.), and red mustard green (Brassica
86
juncea) were purchased from local grocery stores, and lyophilized immediately upon arrival and
87
then ground and powdered.
88
UHPLC-DAD-MS Instrument. The UHPLC coupled with a diode array detector and LTQ
89
Orbitrap XL mass spectrometer (Thermo Fisher Scientific, San Jose, CA) was used. The
90
chromatographic separation was achieved using a UHPLC column (200 mm × 2.1 mm i.d., 1.9
91
µm, Hypersil Gold AQ RP-C18) (Thermo Fisher Scientific, Inc., Waltham, MA) with an
92
HPLC/UHPLC pre-column filter (UltraShield Analytical Scientific Instruments, Richmond, CA)
93
at a flow rate of 0.3 mL/min. UHPLC gradient and MS parameter settings were adapted from a
94
previous study 10 and the details can be found in the Supporting Information.
95
Sample Preparation. Each powdered sample (250 mg) was extracted with 5.00 mL of
96
methanol/water (60:40, v/v) using sonication for 60 min at room temperature and the slurry
97
mixture was centrifuged at 5,000 g for 15 min (IEC Clinical Centrifuge, Damon/IEC Division,
98
Needham, MA). The supernatant was filtered through a 17 mm (0.45 µm) PVDF syringe filter
99
(VWR Scientific, Seattle, WA), and 2 µL of the extract was used for each injection.
100
Data Format. MATLAB R2012b (MathWorks Inc., Natick, MA) was used to develop the
101
program. All the calculations were performed on an Intel Core i7-4770 CPU at 3.4 GHz personal
102
computer with 16 GB RAM running a Microsoft Windows 7 Professional x64 operation system
103
(Microsoft Corp., Redmond, WA).
5 ACS Paragon Plus Environment
Analytical Chemistry
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
104
The UHPLC-DAD HRAM MS data sets were acquired as RAW files. The DAD data were
105
converted to text files from RAW files by Xcalibur plug-in tool, MSGet.11 With an in-house
106
algorithm, text files were read into MATLAB. For the MS data, the RAW files were first
107
converted to mzXML by an open-source software package, ProteoWizard,12 and then read into
108
MATLAB by the built-in ‘mzxmlread’ function in MATLAB bioinformatics toolbox.
109
RESULTS AND DISCUSSION
110
Page 6 of 32
UV-Vis Spectral Library of 146 Flavonoid Standards. First, 146 flavonoid and
111
hydroxycinnamic acid derivative standards were analyzed using the UHPLC-DAD method and
112
their UV-Vis spectra were compiled into a UV-Vis spectral library after they were normalized to
113
unit vector length.13 The 146 UV-Vis spectra are shown in Figure 2A. As discussed in the
114
previous paper,10 flavonoid identification cannot be solely relied upon MS spectra, and often
115
requires the combination of multiple techniques such as chromatographic behavior, UV-Vis
116
spectrum, and HRAM-MS and MS fragmentation information. Flavonoids have characteristic
117
UV-Vis absorbance profiles which come from different conjugated systems in the structures and
118
can be used to distinguish isomers. For example, pelargonidin 3-O-glucoside (an anthocyanin),
119
genistein 4'-O-glucoside (an isoflavone), and apigenin 7-O-glucoside (a flavonol glycoside) have
120
exactly the same protonated or deprotonated ions in full scan MS spectra, and their
121
fragmentation mass spectra are dominated by one or only a few fragments simply do not contain
122
enough information to distinguish between them. The representative MS/MS spectra for the
123
three flavonoids mentioned above are shown in Figure S1. But they can be differentiated by their
124
UV-Vis spectra since the cinnamoyl structure in flavone, flavonol, and hydroxycinnamic acid
125
derivatives have a strong UV absorbance band between 305-390 nm, however anthocyanins are
126
cations with a strong visible absorbance band at 450-550 nm (Figure 2A).14,15
6 ACS Paragon Plus Environment
Page 7 of 32
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Analytical Chemistry
127
The assignment of classes for flavonoids based on their UV-Vis spectrum is a crucial step,
128
especially for the identification of flavonoid isomers which belong to different flavonoid classes.
129
In our previous study, UV-Vis spectrum similarity analysis was used to assign the class of
130
flavonoids for each chromatographic peak.10 The UV-Vis spectrum of a reference peak, either a
131
spiked standard or an endogenous flavonoid peak, was selected and compared with that of all
132
other chromatographic peaks. A threshold was set based on a trial-and-error procedure to filter
133
out non-desired peaks. Particular care needed to be taken for that method: 1). the reference peak
134
had to be representative of the class of flavonoid as selection of a reference peak sometimes can
135
be difficult especially for the classes of flavonoids which contain a great variety of substitution
136
groups; 2). plant samples usually contain different classes of flavonoids, therefore multiple
137
reference peaks need to be selected to represent the different classes of flavonoids and multiple
138
calculations are required since only one class of flavonoids could be classified by each
139
calculation; and 3). the threshold for UV-Vis spectral similarity analysis varies case by case. For
140
example, the thresholds ranged from 50% to 90% for leek, curry leaf, chive, giant green onion,
141
and red mustard green samples.10 Thus, although the similarity analysis of FlavonQ worked well
142
for the class of flavonols and their glycosides,10 it is inconvenient to use the approach for
143
identification of multiple classes.
144
Grouping 146 Reference Standards into Four Classes. A new strategy was developed in
145
FlavonQ-2.0v to improve the similarity approach by using chemometric modeling and a UV-Vis
146
spectral library. The UV-Vis spectral library was compiled from the UV-Vis data of 134
147
flavonoids and 12 hydroxycinnamic acid derivatives (HADs) standards. Although HADs do not
148
belong to flavonoid family, they were also included because their structures are similar to
149
flavonoids and they are ubiquitous in plant with various bioactivities.16 The standards were
7 ACS Paragon Plus Environment
Analytical Chemistry
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
150
divided into four classes on the basis of the structural similarities of the aglycones (Figure 1):
151
flavone/flavonol/HAD (class A), flavan/flavanol/flavanone/flavanonol (class B), anthocyanin
152
(class C), and isoflavone (class D). Chemometric methods were employed to construct models
153
for classification of different flavonoid classes based on the UV-Vis spectral library, and the
154
classifiers were used to predict the class of the flavonoid in unknown chromatographic peaks.
Page 8 of 32
155
Options for Chemometric Models in FlavonQ-2.0v. Two methods, including soft
156
independent modeling of class analogy (SIMCA)17 and fuzzy optimal associative memory
157
(FOAM)18, were evaluated. Classification methods such as partial least-squares discriminant
158
analysis (PLS-DA)19 and the fuzzy rule-building expert system (FuRES)20 were not used because
159
they cannot be applied when only one class is known or present.21 FlavonQ-2.0v was designed
160
to be a versatile program which can classify not only single-class flavonoid, but also multiclass
161
flavonoids. Therefore the classifiers like PLS-DA was not adopted in this study. In this program,
162
it is the user’s decision as to which group(s) of flavonoids will be used to build classification
163
models. There are several advantages to making the flavonoid type selection adjustable.
164
Flavonoids are usually synthesized through the phenylpropanoid metabolic pathway and several
165
enzymes are involved in the biosynthesis. It is rare that a single plant sample contains all the
166
enzymes for synthesis of all the classes of flavonoids. Limiting the flavonoid types in the sample
167
can reduce the complexity of chemometric models and improve the model accuracy and
168
reliability. For example, purple broccoli only contains flavonols and anthocyanins.22 So if the
169
chemometric model is built using only these two classes for the analysis of flavonoids in broccoli,
170
it will simplify the data analysis and reduce the possibility of misclassifying them into other
171
classes of flavonoids. Moreover, in some cases, only one class of flavonoids is the research focus
8 ACS Paragon Plus Environment
Page 9 of 32
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Analytical Chemistry
172
(e.g., the isoflavones in soybean samples) and a chemometric model targeting the class of
173
interest can be very efficient.
174
SIMICA and FOAM are commonly used as modeling methods, but they can be used in
175
classification mode as well. Modeling methods exploit the similarities of the features within each
176
independent class, therefore the test sample could belongs to none of the existing classes in the
177
training sets. However, in classification mode, the test sample must be assigned to one of the
178
classes in the training sets. When classifying an unknown UV-Vis spectrum by the constructed
179
SIMCA/FOAM models with more than one flavonoid class, three situations could be
180
encountered: (a) it only belongs to one class; (b) it belongs to none of any classes; (c) it belongs
181
to more than one class. The UV-Vis spectra from real sample could be different from the UV-Vis
182
spectra in the library attributed to influence of environment (e.g., temperature, solvent) and
183
possible coeluted compounds. In addition, the accuracy of classification is also highly relied on
184
the quality of the training set: the number and representativeness of flavonoids in the library (The
185
in-house UV-Vis library may not be able to represent all flavonoids in the tested plant materials).
186
If modeling mode was used, some flavonoids that were not included in the library or their UV-
187
Vis spectra were distorted by other background influences could be misclassified as non-
188
flavonoids which resulted in false negatives. Therefore, the winner-takes-all mode (classification
189
mode) was used in both SIMCA and FOAM models to avoid situation (b). Statistic values (i.e.,
190
the combination of X-residuals and Hotelling’s T2 value for SIMCA and F-value for FOAM
191
model) were calculated between the variance of an unknown UV-Vis spectrum in
192
chromatographic peak and each flavonoid class, and the unknown UV-Vis spectrum was
193
assigned to the best fit class of flavonoids (in another word, the most ‘similar’ class of flavonoids
194
with smaller X-residuals and Hotelling’s T2 value for SIMCA model or F-value for FOAM
9 ACS Paragon Plus Environment
Analytical Chemistry
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 10 of 32
195
model). Although the winner-takes-all mode may result false positives, the result will be refined
196
by using MS spectra. Similarly, for situation (c), only the most ‘similar’ class instead of multiple
197
classes was assigned to an unknown UV-Vis spectrum. When only one class of flavonoids was selected to construct chemometric models, the
198 199
statistic criteria (X-residuals and Hotelling’s T2 with 95% confidence intervals for SIMCA and F
200
0.05
for FOAM model) was used to define the limit of the class and reject non-flavonoids.
201
Projected Difference Resolution Method to Optimize Wavelength Range of UV-Vis
202
Data. UV-Vis spectra contain characteristic regions and non-informative regions. Chemometric
203
models built directly using the UV-Vis spectral data over the full scan range (200-600 nm) were
204
not effective as shown in Figure 2B. Overlaps of the four classes were observed. It can be
205
advantageous to identify and remove the non-informative regions because it improves the
206
predictive ability and reduces complexity for chemometric models.23 For example, dropping off
207
the wavelength range between 200-220 nm in UV region is a common practice to avoid the
208
interferences caused by mobile phase and retain the most obvious features for flavonoids
209
between 220-600 nm.24
210
Selection of the wavelength range used in chemometric models can affect the classification
211
and is a challenging task because the spectra may have imperceptible distinctive features.
212
Therefore, the wavelength range of UV-Vis spectrum needs to be optimized in this study. One
213
straightforward way to achieve this is to build chemometric models for different wavelength
214
ranges, evaluate the models by cross-validation methods such as leave-one-sample-out
215
method25,26 and bootstrapped Latin partition method,21 calculate the classification rates for the
216
different wavelength ranges, and select the optimum range which gives the best classification
217
rate. However this calculation required hours to execute depending on which chemometric 10 ACS Paragon Plus Environment
Page 11 of 32
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Analytical Chemistry
218
models and validation method were chosen and is not practical to use in the data processing
219
program.
220
In this study, the projected difference resolution (PDR) method27 was applied as an
221
alternative to determine the optimum wavelength range. The PDR method measures the
222
separation of two classes in multivariate data space and has been used successfully for selecting
223
the optimal parameters for baseline correction, wavelet filters, and data transformation.13,27,28
224
The larger the PDR values, the better the separation between two classes in the multivariate data
225
space. For the assessment of multiple classes, the minimum PDR value of all the pairwise
226
combinations was used to optimize the wavelength range.13 The two most similar classes among
227
multiple classes were considered as the most critical pairs for classification, so their PDR values
228
were calculated under different wavelength ranges. For example, when we have four classes, for
229
a specific wavelength range their PDR values in pairs (6 pairs) were measured, and the minimum
230
PDR value of 6 pairs was used to indicate the separation of the two most similar classes among
231
the four classes. Since the UV range is easily influenced by conjugated bonds and the higher
232
range of UV-vis spectra (wavelength ≥ 250 nm) usually represents characteristic information for
233
the structure of each flavonoid class, only the starting wavelengths (WLs) of UV-Vis spectra was
234
optimized in our study. Therefore a series of test UV-Vis spectral data sets were constructed
235
with different starting WLs: Test-set-1 (200-600 nm), Test-Set-2 (201-600 nm), Test-Set-3 (202-
236
600 nm) … Test-Set-301 (500-600 nm). For each wavelength range, the PDR values were
237
calculated for the different classes of flavonoids. The wavelength range with the maximum PDR
238
value represented the optimum wavelength range for the classification of the flavonoid classes.
239
Compared to the optimization of wavelength range with chemometric models which required
240
hours to execute, the PDR method only took seconds which saved considerable time.
11 ACS Paragon Plus Environment
Analytical Chemistry
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
241
Page 12 of 32
Classification of Flavonoids by Step-wise Strategy. Step-wise classification was devised in
242
this study to classify the UV-Vis spectra of flavonoids in a novel way and the starting WLs of
243
each step was optimized respectively. A flowchart for the two strategies of classification of 4
244
classes of flavonoids and the HADs is shown in Figure 3. Conventional classification strategy
245
optimized universal parameters in data preprocessing and constructed one chemometric model
246
by using all the data for the different classes. Step-wise classification strategy optimized data
247
representation for each pair of classes and constructed multiple chemometric models. In each
248
step, only two classes were defined and one group of flavonoids (class 1) was differentiated from
249
other flavonoids (class 2). It is worth noting that either SIMCA or FOAM model can be selected
250
for the classification, and the same model is used throughout the steps in step-wise classification.
251
It is shown in Figure 4 the dendrograms based on Euclidean distances between spectra for
252
two strategies outlined in Figure 3. Figure 4A shows that for a conventional classification
253
strategy, even after the starting WL was optimized, the four classes were mixed with each other
254
and none of classes was completely separated from others. However, the three dendrograms for a
255
step-wise classification strategy (Figure 4B) demonstrated classification of each group of
256
flavonoids into well-defined clusters. The benefit of step-wise classification strategy was proven
257
by classification rates of SIMCA/FOAM models through leave-one sample out cross validation.
258
With the conventional strategy, the best classification rates for the SIMCA and FOAM models
259
were 99.3% and 95.6% respectively. The classification rates were 100% for both the SIMCA and
260
FOAM models using the step-wise classification strategy.
261
The order of flavonoid classes in step-wise classification process has great impact on the
262
classification. For the four classes of flavonoids in Figure 2, twelve sequences were evaluated
263
and their PDR values in each step were calculated based on the method in ‘Projected Difference 12 ACS Paragon Plus Environment
Page 13 of 32
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Analytical Chemistry
264
Resolution Method to Optimize Wavelength Range of UV-Vis Data’ section. The results shown
265
in Table S1 demonstrate that the flavonoid classification sequence used in Figure 3 and 4 is the
266
optimal order in step-wise classification process: a relatively larger PDR value was achieved for
267
the two most similar classes which indicates the better separation of the two classes in
268
multivariate data space. Therefore, FlavonQ-2.0v separates anthocyanidins from the rest classes
269
in the first step, then flavan/flavanone, and finally flavone/HAD and isoflavone in the step-wise
270
classification process.
271
The application of the step-wise strategy eliminates some misclassifications of flavonoids in
272
real samples. For example, peak #12 in blueberry sample (Figure S2) was manually identified as
273
petunidin-3-O-arabinoside by the study of its mass spectra in both the positive and negative
274
ionization modes.29 When conventional classification was used, it was misclassified as
275
flavone/HAD group (Figure 5A). The absorption band at 525 nm indicates that it is an
276
anthocyanin instead of a flavone (Figure 5C). Peak #12 was successfully classified as
277
anthocyanin when the step-wise classification strategy was applied (Figure 5B). Higher weight
278
was given to the characteristic UV-Vis band (525 nm) for anthocyanin by this strategy (Figure
279
5D). The step-wise strategy was more effective for classifying flavonoids based on their UV-Vis
280
spectra and, therefore, was adopted in this program. It is worth noting that isoflavones were not
281
included in the chemometric model to study the flavonoids in this example because isoflavones
282
are usually not found in blueberry.
283
Identification of Flavonoids Using In-house Database. After the chromatographic peaks
284
were categorized into different classes of flavonoids, HRAM/MSn data were used for putative
285
identification of flavonoids and HAD. An in-house database was established in our lab which
286
contained 5686 flavonoids and related compounds categorized into the four classes. The 13 ACS Paragon Plus Environment
Analytical Chemistry
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 14 of 32
287
information for each compound such as chemical name, formula, accurate monoisotopic mass,
288
protonated mass, deprotonated mass and major product ions were included. Major product ions
289
assigned to 4283 compounds were obtained by the observation of fragmentation mass spectra of
290
flavonoids in our lab, METLIN mass spectrum database,30 and the mass spectral library from
291
Sumner’s group31 or by predictions based on experience of experts and in silico fragmentation
292
patterns using commercial software package HighChem Mass Frontier (Thermo Fisher Scientific
293
Inc., San Jose, CA).
294
A selected number (user defined) of the most intense ions from the MS full scan spectrum of
295
unknown chromatographic peaks were screened and matched with ions in the positive or
296
negative mode from the in-house database. If the MSn spectra for the ions in full scan spectrum
297
were available in the data, the major product ions were searched through the MSn spectra for
298
matches. Multiple hits could be found after this searching process and all these candidate
299
compounds were ranked in the result table based on the following priorities: candidate
300
compounds with both precursor ions and product ions matched were ranked higher than others
301
which were then ranked by mass errors in ascending order.
302
The program may provide multiple flavonoid candidates for a chromatographic peak, and
303
expertise in the field of flavonoid research is needed for affirmative identification (see examples
304
in Table S2-3). In the previous version of FlavonQ, the identification of flavonoids was based on
305
a virtual mass spectrum database which was constructed by theoretically combining common
306
aglycones and substitution groups.10 For a single class of flavone/flavonol glycosides, it
307
contained over 1.5 million possible combinations which, in most cases, have never occurred in
308
the real world. In this study all 5686 flavonoids and related compounds in the in-house database
309
have been reported before. With this database, the computation speed of the program was faster 14 ACS Paragon Plus Environment
Page 15 of 32
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Analytical Chemistry
310
and the identification results were more accurate. Flavonoids not included in this database could
311
not be identified. However, potential flavonoid peaks (based on UV data) are flagged, and can be
312
manually identified and added to the database if needed.
313
Comparison with METLIN and MegFrag using Flavonoid UHPLC-DAD-MS Dataset.
314
Two sets of mix reference standards (25 flavonoids and hydroxycinnamic acids) were analyzed
315
by UHPLC-DAD-MS method as described previously. Their precursor ions and MS/MS spectra
316
were manually input into METLIN database (http://metlin.scripps.edu) and MegFrag Web tool
317
(https://msbi.ipb-halle.de/MetFragBeta). For METLIN, the precursor ions were searched under
318
‘Simple Search’ function with 20 ppm tolerance; fragment search were performed under
319
‘Fragment Search’ function with ‘Precursor M/Z’ selected and up to 3 fragment ions were input
320
for each search. KEGG database was selected in MegFrag Web tool and 20 ppm tolerance was
321
set for ‘Parent Ion’ search and ‘Fragmentation Processing’. Besides of the proposed FlavonQ-
322
2.0v data process pipeline, the dataset were also processed in FlavonQ-2.0v by only searching
323
precursor ions and characteristic product ion in MS spectra through in-house database without
324
flavonoid classification based on UV-Vis spectra. The results, given in Table 1, show that
325
FlavonQ-2.0v generally performed better, indicated by higher number of correct first ranked
326
candidates, a lower number of ‘none of correct candidates available’, and a lower number of
327
output candidates for each search. Take puerarin (an isoflavone) as an example, by searching
328
[M-H]- (m/z 415.1029), METLIN outputs a list of 39 candidate compounds (puerarin ranked 28th)
329
and MetFrag outputs 7 candidate (puerarin ranked 5th). For the 39 candidates from METLIN, 11
330
of them are non-flavonoids, with 22 flavone glucosides, 1 anthocyanidin, and 5 isoflavones. If
331
‘Fragment search’ is applied, puerarin was not found because it has not been analyzed in
332
METLIN. The 7 candidates from MetFrag includes 1 flavone, 2 isoflavones and 4 non-
15 ACS Paragon Plus Environment
Analytical Chemistry
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 16 of 32
333
flavonoids, and MS/MS spectrum improved the ranking of puerarin from the 5th to the 3rd out of
334
7. For FlavonQ-2.0v, 19 candidates were found without the use of UV-Vis data (12 flavones, 5
335
isoflavones, and 2 anthocyanidins); only 5 candidates were found if UV-Vis spectra were used
336
for flavonoid classification and they were all isoflavones. The results were not unexpected due
337
to the specificity of the FlavonQ program.32 For example, METLIN includes 961,829 molecules
338
among which about 14,000 metabolites have been individually analyzed and another 200,000 has
339
in silico MS/MS data by May, 2017. For the ‘Fragment Search’ in METLIN, about half of the
340
queries (13 of 25) in Table 1 returned ‘0 candidate’ due to the lack of MS/MS data in the
341
database. From Table 1, it has been observed that flavonoid classification based on UV-Vis
342
spectra can effectively narrow down the list of candidate compounds because there are some
343
limitations for compound identification solely relied on MS/MS spectral comparison: for
344
example sometimes mass spectra are dominated by one or only a few fragments (e.g., a glycoside
345
group loss, Figure S1) that can be explained by several candidates. Further examples and
346
limitations of MS spectral library search are discussed extensively by Stephen Stein.33
347
Expansion of UV-Vis Library and In-house Database. As discussed in the previous
348
sections, the accuracy of flavonoid identification based on UV-Vis spectra and MS spectra can
349
be improved by expanding the UV-Vis library and in-house database. In our lab, the number of
350
flavonoid UV-Vis spectra continues to increase from several resources: acquisition of more
351
flavonoid reference standards, isolated chromatographic peaks from plant materials, and reported
352
spectra from peer-reviewed journals. The isolated chromatographic peaks should be pure and be
353
identified and confirmed by mass spectrometric (HRMS, MSn) and/or NMR methods34, and the
354
reported UV-Vis spectra should be validated by other independent labs. In this study, the UV-
16 ACS Paragon Plus Environment
Page 17 of 32
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Analytical Chemistry
355
Vis spectra in the library were exclusively collected from flavonoid reference standards, and
356
more UV-Vis data from other sources will be updated in the future release.
357
The product ions information in the in-house database can effectively target the correct hit of
358
flavonoids especially when multiple isomers exist in the database for a particular precursor ion.
359
Ideally the mass spectra library should contain MSn spectra in both positive/negative modes and
360
different collision energies. Such a library will be able to provide the most accurate
361
identification of a compound such as Sumner’s plant natural product MS library31 and
362
Compound Discoverer from Thermo Fisher Scientific. However, to construct such a library for
363
over 5,000 flavonoids is not feasible for any single laboratory. Therefore we are enhancing our
364
in-house databases gradually by adding more characteristic product ions based on experiments
365
and literatures. As the UV-Vis library and in-house database expands, it will be effective
366
automatically because FlavonQ-2.0v recalculates its chemometric models and searching results
367
based on the updated library and database every time it executes.
368
Quantitation. The quantitation of flavonoids was performed using an external calibration
369
curve with flavonoid reference standards and molar response factors as previously reported.14,35
370
Ideally separate calibration curves should be used to quantify the flavonoids of each flavonoid
371
class. For example, quercetin 3-O-rutinoside (rutin) for HADs and flavone/flavonol glycosides,
372
catechin for flavan-3-ols and proanthocyanidins, hesperetin for flavanones, cyanidin 3-O-
373
glucoside for anthocyanins, and genistein for isoflavones. The peak area integration method was
374
demonstrated in the sample chromatogram (Figure S2) and different classes of flavonoids are
375
represented by different colors. A brief identification, including major ion and formula, is
376
provided for each flavonoid candidate chromatographic peak. The identification, peak areas, and
17 ACS Paragon Plus Environment
Analytical Chemistry
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 18 of 32
377
tentative quantitation information were output automatically into spreadsheet which allowed the
378
user to further analyze the results.
379
Performance of FlavonQ-2.0v. The performance of FlavonQ-2.0v was validated on samples
380
spiked with flavonoid mixed standards and samples of plant extracts. FlavonQ-2.0v successfully
381
identified all the flavonoid peaks in the flavonoid spiked mix standard samples. The results are
382
shown in Table S2 and S3. The results demonstrate the effectiveness of flavonoid identification
383
by UV-Vis and MS spectra. For example, apigenin was firstly classified by chemometric models
384
in FlavonQ-2.0v as flavone/flavonol/HAD based on its UV-Vis spectrum, so other isoflavone or
385
anthocyanin isomers were excluded after this step. It was then identified as ‘Apigenin’ in the
386
flavonoid candidate list based on its precursor ion (m/z 269.0450 with error -1.59 ppm) and
387
characteristic product ion (m/z 151, 1,3A-) and it was distinguished from Baicalein which has
388
characteristic product ion of m/z 169, 1,3A-). In some cases, multiple candidate flavonoids were
389
listed for a single peak since some flavonoid isomers with common product ions cannot be
390
differentiated by the program. For example, quercetin, morin, and hieracin (Figure 6) are all
391
flavonols and they have exact the same precursor ion (m/z 301.0348) and common characteristic
392
product ion (m/z 151, 1,3A-), so they were all reported in the result table.
393
FlavonQ-2.0v was also applied to the analysis of flavonoids in blueberry, mizuna, purple
394
mustard, red cabbage, and red mustard green. The data were also analyzed manually. The
395
FlavonQ-2.0v identification results were compared to those identified manually (Table S4).
396
Among the 39 flavonoid candidate peaks, two anthocyanins, petunidin-3-O-arabinoside and
397
petunidin-3-O-glucoside, were misidentified by FlavonQ-2.0v as flavonol glucosides and
398
flavanone glycoside, respectively. They were all small shoulder peaks (Peak #9 and #10 in
399
Figure S2) and their UV-Vis spectra were distorted by the close major peaks which led to the 18 ACS Paragon Plus Environment
Page 19 of 32
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Analytical Chemistry
400
misclassification. This indicates that the chromatographic separation is critical for the correct
401
identification of flavonoids. Another two peaks were labeled as “uncertain peaks” as the spectral
402
data could not provide enough information for identification (Peak #21 and #24 in Figure S2 and
403
Table S4). The identification accuracy of flavonoids by FlavonQ-2.0v for plant materials is
404
shown in Table 2. Overall, positive identifications was achieved for more than 88% of the
405
flavonoid peaks using FlavonQ-2.0v.
406
The execution time of FlavonQ-2.0v was about 1 min for each sample after data format
407
conversion. Construction of chemometric models using all 146 UV-Vis spectra took about 30
408
seconds and the time was significantly reduced when fewer classes of flavonoids were selected
409
and fewer steps were conducted in the step-wise classification strategy. FlavonQ-2.0v was
410
developed in MATLAB 2012b, but it’s not necessary for the end user to install MATLAB to use
411
FlavonQ-2.0v. MATLAB Compiler Runtime (MCR) is required to run FlavonQ-2.0v standalone
412
application and is freely available at https://www.mathworks.com/products/compiler/mcr.html.
413
The graphic user interface is shown in Figure S3. The UV-Vis spectra of 146 flavonoid and
414
HAD reference standards and in-house flavonoid database were compiled into FlavonQ-2.0v.
415
This database will be continuously expanded and the chemometric models will become more
416
reliable. Other common food constituents, such as simple phenolic compounds, phenyl alcohols,
417
stilbenes, and lignans will also be included in the future. The in-house flavonoid database will be
418
updated regularly as new flavonoids are found and reported.
419
CONCLUSIONS
420
A data processing tool for flavonoid analysis, FlavonQ-2.0v, was developed in this study.
421
The program can classify the flavonoids using a chemometric model based on the UV-Vis
422
reference spectral library. The chemometric model used a novel step-wise classification strategy 19 ACS Paragon Plus Environment
Analytical Chemistry
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 20 of 32
423
and data representation in each step was optimized by projected distance resolution (PDR)
424
method. The step-wise classification strategy significantly improved the performance of the
425
classifiers which resulted in more accurate and reliable classification of flavonoids. An in-house
426
flavonoid database was implemented in the program for identification of flavonoids. FlavonQ-
427
2.0v was validated by analyzing data from samples spiked with flavonoid mixed standards and
428
blueberry, mizuna, purple mustard, red cabbage, and red mustard green extract samples.
429
Accuracies of identification for all samples were above 88%. FlavonQ-2.0v greatly facilitates the
430
identification and quantitation of flavonoids from UHPLC-HRAM-MS data. The automated
431
computational tool is developed to assist, rather than replace, human expert. The result shows
432
that it not only saves tremendous efforts for human experts, but also allows less-experienced
433
chemists to perform data analysis on flavonoids with reasonable results.
434
ASSOCIATED CONTENT
435
Supporting Information. Additional information as noted in text. This material is available
436
free of charge via the Internet at http://pubs.acs.org.
437
AUTHOR INFORMATION
438 439 440 441 442
Corresponding Author. ∗ E-mail address:
[email protected]. Tel.: +1 301 504 8144; fax: +1 301 504 8314. Notes. The authors declare no competing financial interest. ACKNOWLEDGMENT This research is supported by the Agricultural Research Service of the U.S. Department of
443
Agriculture, an Interagency Agreement Number AOD12026-001-01004 with the Office of
444
Dietary Supplements at the National Institutes of Health. The John A. Milner Fellowship
445
program by USDA Beltsville Human Nutrition Research Center and the NIH Office of Dietary 20 ACS Paragon Plus Environment
Page 21 of 32
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Analytical Chemistry
446
Supplements is acknowledged for the support to Dr. Mengliang Zhang. We thank Dr. Peter de B.
447
Harrington from Department of Chemistry and Biochemistry at Ohio University for providing
448
Matlab routines for PDR, PCA, SIMCA, and FOAM functions. We thank Dr. Joseph M. Betz
449
from NIH Office of Dietary Supplements and Dr. James M. Harnly from USDA for the careful
450
revision of this article.
451
References
452
(1) Nijveldt, R. J.; van Nood, E.; van Hoorn, D. E. C.; Boelens, P. G.; van Norren, K.; van
453
Leeuwen, P. A. M. Am. J. Clin. Nutr. 2001, 74, 418-425.
454
(2) Balentine, D. A.; Dwyer, J. T.; Erdman, J. W., Jr.; Ferruzzi, M. G.; Gaine, P. C.; Harnly, J.
455
M.; Kwik-Uribe, C. L. Am. J. Clin. Nutr. 2015, 101, 1113-1125.
456
(3) Satterfield, M.; Brodbelt, J. S. Anal. Chem. 2000, 72, 5898-5906.
457
(4) Johnson, A. R.; Carlson, E. E. Anal. Chem. 2015, 87, 10668-10678.
458
(5) Smith, C. A.; Want, E. J.; O'Maille, G.; Abagyan, R.; Siuzdak, G. Anal. Chem. 2006, 78, 779-
459
787.
460
(6) Pluskal, T.; Castillo, S.; Villar-Briones, A.; Oresic, M. BMC Bioinformatics 2010, 11, 395.
461
(7) Katajamaa, M.; Miettinen, J.; Oresic, M. Bioinformatics 2006, 22, 634-636.
462
(8) Wei, X.; Sun, W.; Shi, X.; Koo, I.; Wang, B.; Zhang, J.; Yin, X.; Tang, Y.; Bogdanov, B.;
463
Kim, S.; Zhou, Z.; McClain, C.; Zhang, X. Anal. Chem. 2011, 83, 7668-7675.
464
(9) Zhang, W. C.; Chang, J.; Lei, Z. T.; Huhman, D.; Sumner, L. W.; Zhao, P. X. Anal. Chem.
465
2014, 86, 6245-6253.
466
(10) Zhang, M.; Sun, J.; Chen, P. Anal. Chem. 2015, 87, 9974-9981.
467
(11) Kazusa DNA Research Insititute, C., Japan. Komics Wiki, 2008.
21 ACS Paragon Plus Environment
Analytical Chemistry
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 22 of 32
468
(12) Kessner, D.; Chambers, M.; Burke, R.; Agus, D.; Mallick, P. Bioinformatics 2008, 24,
469
2534-2536.
470
(13) Zhang, M.; Harrington, P. d. B. Talanta 2013, 117, 483-491.
471
(14) Lin, L. Z.; Harnly, J.; Zhang, R. W.; Fan, X. E.; Chen, H. J. J. Agric. Food. Chem. 2012, 60,
472
544-553.
473
(15) Sun, J. H.; Lin, L. Z.; Chen, P. Curr. Anal. Chem. 2013, 9, 397-416.
474
(16) Chen, J. H.; Ho, C. T. J. Agric. Food. Chem. 1997, 45, 2374-2378.
475
(17) Frank, I. E.; Lanteri, S. Chemometr. Intell. Lab. 1989, 5, 247-256.
476
(18) Wabuyele, B. W.; Harrington, P. D. Appl. Spectrosc. 1996, 50, 35-42.
477
(19) Bylesjo, M.; Rantalainen, M.; Cloarec, O.; Nicholson, J. K.; Holmes, E.; Trygg, J. J.
478
Chemom. 2006, 20, 341-351.
479
(20) Harrington, P. B. J. Chemom. 1991, 5, 467-486.
480
(21) Wang, Z.; Zhang, M.; Harrington Pde, B. Anal. Chem. 2014, 86, 9050-9057.
481
(22) Harnly, J. M.; Doherty, R. F.; Beecher, G. R.; Holden, J. M.; Haytowitz, D. B.; Bhagwat, S.;
482
Gebhardt, S. J. Agric. Food. Chem. 2006, 54, 9966-9977.
483
(23) Anderssen, E.; Dyrstad, K.; Westad, F.; Martens, H. Chemometr. Intell. Lab. 2006, 84, 69-
484
74.
485
(24) Bohm, B. A. In Introduction to Flavonoids; Harwood academic publishers: Amsterdam, The
486
Netherlands, 1999, p 200.
487
(25) Zhang, M.; de B. Harrington, P.; Chen, P. Curr. Chromatogr. 2015, 2, 145-151.
488
(26) Zhang, M.; Zhao, Y.; Harrington, P. d. B.; Chen, P. Anal. Lett. 2016, 49, 711-722.
489
(27) Xu, Z. F.; Sun, X. B.; Harrington, P. D. Anal. Chem. 2011, 83, 7464-7471.
490
(28) Chen, P.; Lu, Y.; Harrington, P. B. Anal. Chem. 2008, 80, 7218-7225.
22 ACS Paragon Plus Environment
Page 23 of 32
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Analytical Chemistry
491
(29) Sun, J.; Lin, L. Z.; Chen, P. Rapid Commun. Mass Spectrom. 2012, 26, 1123-1133.
492
(30) Smith, C. A.; O'Maille, G.; Want, E. J.; Qin, C.; Trauger, S. A.; Brandon, T. R.; Custodio, D.
493
E.; Abagyan, R.; Siuzdak, G. Ther. Drug Monit. 2005, 27, 747-751.
494
(31) Lei, Z. T.; Jing, L.; Qiu, F.; Zhang, H.; Huhman, D.; Zhou, Z. Q.; Sumner, L. W. Anal.
495
Chem. 2015, 87, 7373-7381.
496
(32) Nishioka, T.; Kasama, T.; Kinumi, T.; Makabe, H.; Matsuda, F.; Miura, D.; Miyashita, M.;
497
Nakamura, T.; Tanaka, K.; Yamamoto, A. Mass Spectrometry 2014, 3, S0039-S0039.
498
(33) Stein, S. Anal. Chem. 2012, 84, 7274-7282.
499
(34) Qiu, F.; Fine, D. D.; Wherritt, D. J.; Lei, Z.; Sumner, L. W. Anal. Chem. 2016, 88, 11373-
500
11383.
501
(35) Lin, L. Z.; Harnly, J. M. J. Agric. Food. Chem. 2012, 60, 5832-5840.
502
23 ACS Paragon Plus Environment
Analytical Chemistry
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 24 of 32
503
Table 1. Comparison of search results for 25 flavonoid and hydroxycinnamic acid derivatives UHPLCDAD-MS data MetFrag (KEGG)2 FlavonQ-2.0v3 METLIN1 Simple Fragment Parent Ion Fragment MS UV-Vis & Search Search Search Search Search MS Match 4 Top 1 ranks 2 3 5 6 8 18 Top 5 ranks 10 12 15 18 24 25 # of NA5 0 13 2 2 0 0 6 # of candidate compounds 663 54 221 213 162 113 Average # of candidate compounds 26.5 2.2 8.8 8.5 6.5 4.5 1. ‘Simple Search’ and ‘Fragment Search’ are two functions for METLIN database: ‘Simple Search’ matches up precursor ions (20 ppm tolerance); ‘Fragment Search’ matches up both precursor ions and selected fragment ions (up to 5 fragment ions) (http://metlin.scripps.edu). 2. KEGG database was selected for MetFrag search. ‘Parent Ion Search’ and ‘Fragment Search’ are functions for MetFrag Web tool: ‘Parent Ion Search’ matches up precursor ions (20 ppm tolerance); ‘Fragment Search’ matches up both precursor ions and MS/MS spectra (https://msbi.ipbhalle.de/MetFragBeta/). 3. ‘MS Search’ matches up precursor ions and characteristic product ions (up to 1 for each precursor ion) in the MS spectra with in-house database; ‘UV-Vis & MS Match’ uses chemometric methods to determine the type of flavonoids before ‘MS Search’. 4. Number of correct first ranked candidates. 5. Number of ‘none of correct candidates available’. 6. Number of total candidate compound for all queries.
504 505
24 ACS Paragon Plus Environment
Page 25 of 32
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Analytical Chemistry
Table 2. Flavonoid identification accuracy in different plants by FlavonQ-2.0v # of flavonoids # of # of uncertain Accuracy a b Plant name identified misidentification peaksc (%) Blueberry 39 2 2 89.7 Mizuna 47 1 0 97.9 Purple mustard 45 1 4 88.9 Red cabbage 44 0 0 100.0 Red mustard green 88 1 6 92.0 a Flavonoid peaks were identified by FlavonQ with s/n setting at 10. bNonflavonoid peaks were identified as flavonoids. cIdentity of peaks cannot be verified based on the data given. 506 507 508 509 510 511 512 513 514 515 516
25 ACS Paragon Plus Environment
Analytical Chemistry
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 26 of 32
A
B
C
D
Group A: Flavone, flavonol, and hydroxycinnamic acid derivatives; Group B: Flavan, flavanol, flavanone, and flavanonol; Group C: Anthocyanidin; Group D: Isoflavone.
Figure 1. Core structures of the main flavonoid classes and hydroxycinnamic acid derivatives. 517 518 519 520 521 522 523
26 ACS Paragon Plus Environment
Page 27 of 32
A
0.35 Flavone/HAD Flavan/flavanone
Normalized Response
0.3
Anthocyanidin Isoflavone
0.25
0.2
0.15
0.1
0.05
0 200
250
300
350 400 450 Wavenumber (nm)
500
550
600
B 0.8 0.6
PC #2 (18%, 0.0927)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Analytical Chemistry
A: Flavone/HAD B: Flavan/flavanone C: Anthocyanidin D: Isoflavone C
C
0.4
A A A A AA AAAA A AA A A A A A A A A A AA A A A A A A A AA A A A A AA A AAAA A AA AA A A A AD A AAAAA A CC AA A A A A A DD C B AA D D B A D A D C A A A C C AA B B A A AA B BB A AA B B B BBB B BBB A A B B B B B B B A B BBBB B
0.2 0 -0.2 -0.4 -0.6 -0.8
C C C C C C CC C CC CC C C C C
-0.6
-0.4
-0.2 0 0.2 PC #1 (40%, 0.2)
0.4
0.6
0.8
Figure 2. One hundred and forty six UV-Vis spectra of flavonoids and HAD (A) and principal component analysis score plot for UV-Vis spectra data of four classes (B).
524 525
27 ACS Paragon Plus Environment
Analytical Chemistry
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 28 of 32
Figure 3. Flowchart for step-wise classification strategy and conventional classification strategy. 526 527
28 ACS Paragon Plus Environment
Page 29 of 32
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Analytical Chemistry
528 529
Figure 4. UV-vis spectra after wavelength range optimization (left) and dendrogramatic
530
representations (right) of differentiation for four classes of flavonoids by conventional
531
classification strategy (A) and step-wise classification strategy (B).
532 533
29 ACS Paragon Plus Environment
Analytical Chemistry
0.2 0 -0.2 -0.6 -1
C
C C CC C CC CC
Peak #12 in blueberry sample is anthocyanidin peak, but misclassified as A (flavone/HAD) -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 PC #1 (51%, 0.324) Average UV-vis spectrum of flavone/HAD Average UV-vis spectrum of anthocyanidin UV-vis spectrum of Peak #12
0.12 0.08 0.04 0 280 300
B 0.8 A: Other flavonoids and HAD C: Anthocyanidins A A AAA AA 0.4 A AA A A A A A A A AA A A A A A C A A 0 A C A C C C A CC C A CCC A A A A A AA X A A A A -0.4 A A AA A A AA A A Peak #12 in blueberry sample A -0.8 is anthocyanidin, and classified correctly -1.5 -1 -0.5 0 0.5 PC #1 (62%, 0.397)
PC #2 (22%, 0.138)
B B B B B BBB BB B BB C CC C C C C
A: Flavone/HAD B: Flavan/flavanone C: Anthocyanidins AAAA AAAA A A AA X A A A A A A A A A A A A AA A A A A A A A A A AA
350
400 450 500 550 Wavelength number (nm)
Average UV-vis spectrum of flavone/HAD Average UV-vis spectrum of anthocyanidin UV-vis spectrum of Peak #12
D
Normalized intensity
PC #2 (30%, 0.191)
A 0.6
Normalized intensity
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 30 of 32
600
0.2
0.1
0 350
400 450 500 550 Wavelength number (nm)
600
Figure 5. Principal component analysis score plot for UV-Vis spectra data of three classes by conventional classification strategy (A) and by step-wise classification strategy-step 1 (B). Average UV-Vis spectra of flavone/HAD and anthocyanidin and UV-Vis spectrum of Peak #12 in blueberry sample after starting WL optimization in conventional classification strategy (C) and in step-wise classification strategy-step 1 (D). 534 535
30 ACS Paragon Plus Environment
Page 31 of 32
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
536 537 538
Analytical Chemistry
A
B
C
D
n1+n2+n3 = 5 and 0 ≤ n1, n2, n3 ≤ 5
Figure 6. Chemical structures for quercetin (A), morin (B), hieracin (C), and pentahydroxyflavone (D).
539 540 541
31 ACS Paragon Plus Environment
Analytical Chemistry
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
542
Page 32 of 32
For TOC only
543
544 545 546
32 ACS Paragon Plus Environment