Article pubs.acs.org/JAFC
Food Fingerprinting: Metabolomic Approaches for Geographical Origin Discrimination of Hazelnuts (Corylus avellana) by UPLC-QTOFMS Sven Klockmann,† Eva Reiner,† René Bachmann,‡ Thomas Hackl,‡ and Markus Fischer*,† †
Hamburg School of Food Science, Institute of Food Chemistry, University of Hamburg, Grindelallee 117, 20146 Hamburg, Germany Institute of Organic Chemistry, University of Hamburg, Martin-Luther-King-Platz 6, 20146 Hamburg, Germany
‡
S Supporting Information *
ABSTRACT: Ultraperformance liquid chromatography quadrupole time-of-flight mass spectrometry (UPLC-QTOF-MS) was used for geographical origin discrimination of hazelnuts (Corylus avellana L.). Four different LC-MS methods for polar and nonpolar metabolites were evaluated with regard to best discrimination abilities. The most suitable method was used for analysis of 196 authentic samples from harvest years 2014 and 2015 (Germany, France, Italy, Turkey, Georgia), selecting and identifying 20 key metabolites with significant differences in abundancy (5 phosphatidylcholines, 3 phosphatidylethanolamines, 4 diacylglycerols, 7 triacylglycerols, and γ-tocopherol). Classification models using soft independent modeling of class analogy (SIMCA), linear discriminant analysis based on principal component analysis (PCA-LDA), support vector machine classification (SVM), and a customized statistical model based on confidence intervals of selected metabolite levels were created, yielding 99.5% training accuracy at its best by combining SVM and SIMCA. Forty nonauthentic hazelnut samples were subsequently used to estimate as realistically as possible the prediction capacity of the models. KEYWORDS: metabolomics, UPLC-ESI-QTOF, hazelnut, Corylus avellana, geographical origin, chemometrics
■
INTRODUCTION Hazelnut (Corylus avellana L.) is an important commodity in the chocolate, confectionary, and bakery industries (predominantly in its shelled and roasted forms) as well as in-shell market for direct consumption. Depending on the harvest year, the world production accounts for about 800,000 tons; Turkey is the leading producer (64%), followed by Italy (13%), the United States, Georgia, Azerbaijan, Spain, France, Iran, and China (95%, formic acid >99%, and sodium hydroxide >99% were purchased from Carl Roth (Karlsruhe, Germany). Hexakis(2,2-difluoroethoxy)phosphazine, used as lock mass, was purchased from Santa Cruz Biotechnology (Dallas, TX, USA). The reference standards 1,2-dilinoleoyl-sn-glycero-3-phosphocholine (PC(18:2/18:2)), 1-palmitoyl-2-linoleoyl-sn-glycero-3-phosphocholine (PC(16:0/18:2)), 1,2-dilinoleoyl-sn-glycero-3-phosphoethanolamine (PE(18:2/18:2)), and 1-palmitoyl-2-oleoyl-sn-glycerol (DG(16:0/18:1/0:0)) were purchased from Avanti Polar Lipids (Alabaster, AL, USA); 1,3-dilinoleoyl-rac-glycerol (DG(18:2/0:0/18:2)) and 1palmitoyl-2-linoleoyl-rac-glycerol-3-phosphoethanolamine (PE(16:0/ 18:2)) from Cayman Chemical (Ann Arbor, MI, USA); and 1,2dioleoyl-sn-glycerol (DG(18:1/18:1/0:0)) and γ-tocopherol from Sigma-Aldrich (Munich, Germany). 1-Linoleoyl-2-hydoxy-sn-glycero3-phosphocholine (PC(18:2/0:0)) was from Echelon Biosciences (Salt Lake City, UT, USA). Hazelnut Samples. Overall 196 authentic raw hazelnut samples of different varieties, origins, and producers from harvest years 2014 (104) and 2015 (92) were obtained for analyses. The samples were harvested in the respective commercially relevant regions of each country, represented by 115 French (mainly Midi-Pyrénées and Aquitaine), 35 German (mainly Bavaria), 22 Italian (Piedmont, Campania, and Lazio), 14 Turkish (Ordu, Akçakoca, and Samsun), and 10 Georgian (Guria, Samegrelo, and Imereti) samples (Figure 1). Additionally, 40 samples of raw hazelnut kernels for the confectionary industry (nonauthentic) were provided by industrial partners including 6 Georgian, 9 Italian, and 25 Turkish samples. Each sample comprised 9254
DOI: 10.1021/acs.jafc.6b04433 J. Agric. Food Chem. 2016, 64, 9253−9262
Article
Journal of Agricultural and Food Chemistry
Figure 2. PCA scores plots of (A) NTNP-pos, PC1 versus PC3; (B) NTNP-neg, PC1 versus PC2; (C) NTP-pos, PC1 versus PC2; and (D) NTPneg, PC1 versus PC2, using all detectable features. PubChem.38 To verify the calculated identities, retention time, exact mass, isotope ratio, and fragment spectra of selected metabolites were compared to those of commercial standard substances. Data Processing and Chemometrics. Calibration and feature detection of each data set using the molecular feature algorithm was carried out with DataAnalysis 4.1 (Bruker Daltronics). Normalization, time alignment, and peak grouping were applied using ProfileAnalysis 2.1 (Bruker Daltronics). Peak grouping was carried out by applying a minimum presence of a signal in at least 90% of all samples and 50% of each country. The resulting peak list was exported to The Unscrambler X 10.3 (Camo Software, Oslo, Norway) and, after median centering and interquartile range scaling, used for multivariate analysis. Principal component analysis (PCA) and partial least squares regression− discriminant analysis (PLS-DA) were employed to distinguish samples according to their origin. To identify metabolites with a high origin dependency, two different methods were used. First, features with high loadings in the corresponding loadings plot, displaying the influence of each feature for PCA or PLS calculation, were picked for further identifications. Additionally, two-sided t-test models for each country pair (for example, Italy vs Turkey) and one-way analysis of variance (ANOVA) were calculated in ProfileAnalysis to test the equality of two (t test) or more (ANOVA) means. Classification models for prediction of unknown samples were applied using soft independent modeling of class analogy (SIMCA), linear discriminant analysis based on PCA scores (PCA-LDA), and support vector machine classification (SVM) in The Unscrambler as well as a customized statistical model based on confidence intervals of metabolite levels in Excel (Microsoft Corp., Redmond, WA, USA).
65% B for 2 min, linearly increased to 75% B in 3 min, and afterward to 100% in 10 min, which was kept constant for 6 min and brought back to 65% B in 1 min followed by 3 min of re-equilibration. The injection volume for all samples was 1 μL. For the separation of polar metabolites a 150 mm × 2.1 mm i.d., 2.2 μm, Cogent Diamond Hydride HPLC column, with a 10 mm × 2.0 mm i.d. guard column of the same material (MicroSolv Technology, Leland, NC, USA) at 30 °C with a flow rate of 400 μL/min using water as mobile phase A and acetonitrile/isopropanol (10:1 v/v) as B, both containing 5 mM ammonium formate buffer at pH 3.5, was employed. In this case the gradient elution started with 0% A for 2 min followed by a linear increase to 15% in 3 min. After 2 min of keeping constant, A was increased to 50% in 2 min of holding for 1.5 min, finally raised to 100% in 1 min, and after 3.5 min brought back to 0% in 1 min, terminating with 4 min of re-equilibration. For this method the injection volume was set to 4 μL for all samples. For detection a maXis ESI-QTOF-MS (Bruker Daltronics, Bremen, Germany) was used operating either in positive or negative ion mode in the mass range of 60−1000 Da with the following mass spectrometer settings: end plate offset = −500 V; capillary = 4500 V; nebulizer gas = 4 bar; drying gas = 9 L/min; drying temperature = 200 °C. The acquisition of fragment spectra for each mass was carried out using MS/MS mode with collision energy set to 40 eV. Calibrations in the respective mass range were conducted with a mixture of formic acid/1 M NaOH in water/isopropanol (0.1:1:100, v/v/v) introduced to the source with a syringe pump at 0.1 mL/h using a valve switch in each run to forward the flow directly into the source after 24 min in the nonpolar run and 19 min the polar run, being switched back after 0.5 min, respectively. Additionally, a lock mass calibration was applied using hexakis(2,2-difluoroethoxy)phosphazine dissolved in isopropanol at a concentration of 1 mg/ mL, which was applied on a support material inside the source exhibiting a constant vaporization and, thus, detection during the run. Metabolite identification was carried out using the exact mass and isotope ratio for prediction of chemical formulas. Additionally, MetFrag was used for prediction of structural formulas by comparison of detected fragment ions with databases such as KEGG or
■
RESULTS AND DISCUSSION Analytical Method Evaluation and Validation. To check which method was best suited for origin differentiation of hazelnuts, 86 samples of harvest 2014 (60 France, 12 Germany, 5 Italy, 5 Georgia, and 4 Turkey) were initially analyzed using polar extraction and nontargeted LC-MS analysis in positive and negative ion modes (hereafter referred 9255
DOI: 10.1021/acs.jafc.6b04433 J. Agric. Food Chem. 2016, 64, 9253−9262
Article
Journal of Agricultural and Food Chemistry to as NTP-pos and NTP-neg) as well as nonpolar extraction and nontargeted LC-MS analysis in positive and negative ion modes (hereafter referred to as NTNP-pos and NTNP-neg). Extraction and LC-MS parameters for each method were optimized with regard to high reproducibility at maximum signal intensities. Time alignment, normalization, and peak grouping of data sets were carried out separately for each method. Subsequently, the resulting peak tables were unified to one matrix in The Unscrambler followed by centering and scaling. The feature detection algorithm was set rather soft to prevent false-negative results caused by missing detection of less abundant metabolites. On the contrary, by assuming that the metabolome of hazelnuts from different provenances will not vary significantly in the presence/absence of single metabolites but rather in their concentrations, the peak grouping parameters were chosen quite strictly (minimum presence of a signal in 90% of all samples and 50% of each country) to prevent false-positive results, which may occur by increased noise detection. Testing other parameters did confirm this hypothesis because no signals were found that are (not) present in particular countries. Carrying a heavy impact on the resulting PCA model, the selected parameters of peak grouping have to be chosen very carefully to prevent over- or underfitting. The resulting PCA scores plots are shown in Figure 2. In general, the nonpolar measurements led to better results of origin discrimination than the polar ones, whereas the positive ion mode was superior to the negative ion mode. The same held true for both total amount of metabolites and robustness. While the nonpolar methods showed a tight and random clustering among all QC samples, NTP-pos needed an additional set of 10 QCs before the actual batch to equilibrate the system and to obtain a random clustering of the following QCs. NTP-neg exhibited a vectored shift among the QCs during the whole batch. Furthermore, the coefficient of variation (CV) for each metabolite within QCs was calculated. According to commonly employed bioanalytical method validation, the precision of measurements should not exceed 15% of the CV, meeting >95% of all metabolites for this criteria in the nonpolar methods but only 24 and 31% in the polar ones, respectively.39 Nevertheless, each method provided possible marker substances for origin differentiation with amounts in ascending order from NTP-neg over NTP-pos and NTNP-neg to NTNP-pos (Table 1). In this context, a
Thus, the nonpolar methods exhibited 43 and 39 possible marker substances, the polar ones just 3 and 1, respectively. When only samples with cultivars from more than one provenance were considered, the influence of the variety was only slightly noticeable in the polar methods because some cultivars were grouped together, although overlapping with others. When the nonpolar methods were considered, all cultivars were randomly scattered. It is worth mentioning that PLS-DA showed similar results to PCA in the nonpolar methods, although being a supervised method, it led to better differentiation of sample groups (here countries) by maximizing the covariance between the explanatory variables (here metabolites) and the response (here country groups).42 This indicated that most variance of the measurable, nonpolar metabolome is affected by geographical origin. Thus, NTNP-pos was chosen to be the best method providing the best classification results, containing the most possible marker substances and not being influenced by varieties and, therefore, used for further analyses. Analysis of Authentic Hazelnuts from 2014 and 2015 Using the NTNP-pos Method. Overall, 104 authentic hazelnut samples from 2014 and 92 from 2015 as well as 9 (2014) and 31 (2015) hazelnut samples for the confectionary industry were analyzed using the NTNP-pos method. Samples of each harvest year were measured in two batches, each with its own QC sample, exhibiting a good clustering among the QCs. Due to instrumental inconsistencies of interbatch measurements that are divided by a certain period of time, the data processing procedure also had to be divided. Peak detection, time alignment, normalization, and peak grouping could be performed together, whereas centering and scaling had to be carried out for each batch separately. One French sample (FRA017) was identified as a strong outlier because both residual variance and leverage were above the critical limits (critical F residuals limit = 0.49; critical leverage limit = 0.2) and therefore excluded for further processing. A deeper view into the corresponding data revealed an abnormal retention time shift during the whole analysis leading to missing or false discoveries of peak grouping. In other respects, deletion of apparent outliers is a critical step in metabolomics data processing because samples might be outliers due to their natural composition and must not be removed. Therefore, outlier detection and interpretation has to be examined individually and with special caution. The resulting PCA showed a good clustering of samples from both harvest years within the countries. When the raw data of both batches were processed together, statistical analysis was dominated by results showing batch separation. However, separation of countries was insufficient with strong overlapping of samples from Turkey, Italy, and Georgia as well as Germany and France and rather poor explained variances (PC-1, 20%; PC-2, 15%). Even when using the supervised PLS, the separation of countries was not improved. To gain a more efficient differentiation of countries, a more sophisticated statistical approach for group classification was applied, using only selected key metabolites. Depending on the previously presented criteria for possible marker substances of LC-MS method selection process, here the following criteria had to be used: (a) p value of ANOVA calculation had to be