Differentiation of Organically and Conventionally ... - ACS Publications

Oct 12, 2015 - Because the basic suitability of proton nuclear magnetic resonance .... Milano, Italy), and analyzed with an isotope ratio mass spectro...
0 downloads 0 Views 3MB Size
Article pubs.acs.org/JAFC

Differentiation of Organically and Conventionally Grown Tomatoes by Chemometric Analysis of Combined Data from Proton Nuclear Magnetic Resonance and Mid-infrared Spectroscopy and Stable Isotope Analysis Monika Hohmann,†,‡ Yulia Monakhova,§,∥ Sarah Erich,⊥ Norbert Christoph,‡ Helmut Wachter,*,‡ and Ulrike Holzgrabe† Downloaded via UNIV OF OTAGO on December 18, 2018 at 11:54:19 (UTC). See https://pubs.acs.org/sharingguidelines for options on how to legitimately share published articles.



Institute of Pharmacy and Food Chemistry, University of Würzburg, Am Hubland, 97074 Würzburg, Germany Bavarian Health and Food Safety Authority, Luitpoldstraße 1, 97082 Würzburg, Germany § Spectral Service, Emil-Hoffmann-Straße 33, 50996 Cologne, Germany ∥ Department of Chemistry, Saratov State University, Astrakhanskaya Street 83, 410012 Saratov, Russia ⊥ Chemical and Veterinary Investigation Laboratory, Bissierstraße 5, 79114 Freiburg, Germany ‡

ABSTRACT: Because the basic suitability of proton nuclear magnetic resonance spectroscopy (1H NMR) to differentiate organic versus conventional tomatoes was recently proven, the approach to optimize 1H NMR classification models (comprising overall 205 authentic tomato samples) by including additional data of isotope ratio mass spectrometry (IRMS, δ13C, δ15N, and δ18O) and mid-infrared (MIR) spectroscopy was assessed. Both individual and combined analytical methods (1H NMR + MIR, 1 H NMR + IRMS, MIR + IRMS, and 1H NMR + MIR + IRMS) were examined using principal component analysis (PCA), partial least squares discriminant analysis (PLS-DA), linear discriminant analysis (LDA), and common components and specific weight analysis (ComDim). With regard to classification abilities, fused data of 1H NMR + MIR + IRMS yielded better validation results (ranging between 95.0 and 100.0%) than individual methods (1H NMR, 91.3−100%; MIR, 75.6−91.7%), suggesting that the combined examination of analytical profiles enhances authentication of organically produced tomatoes. KEYWORDS: organic tomatoes, 1H NMR, MIR, IRMS, chemometrics, data fusion



INTRODUCTION The Committee on the Environment, Public Health and Food Safety of the European Parliament has recently published a draft report “On the Food Crisis, Fraud in the Food Chain and the Control Thereof” in which organic food is listed as number three of the top-ten products with a particularly high risk for adulterated food.1 This fact certainly derives from the increasing demand for organic food,2 with the willingness of the consumer to pay higher prices for organically produced food than comparable conventional produced food. Thus, verifying authenticity of organic products is of decisive importance to protect consumers against adulteration and to support the trustworthiness of organic labeling. This study will discuss the use of sophisticated chemometric methods for the differentiation of organic and conventional food, exemplified for tomatoes. Tomatoes and tomato products are consumed in a large scale in Europe3 and are, at present, the most popular vegetable in Germany, with an average annual consumption of 20.6 kg/person.4 Reliable markers to analytically verify the cultivation methods of tomatoes are hardly available, although numerous attempts are described in previous literature.5−11 Up to now, the composition of the nitrogen isotope (δ15N, expressed as the relative difference to the standard of atmospheric nitrogen) has presented as the most important marker to distinguish organically and conventionally produced tomatoes, but because of an overlap of © 2015 American Chemical Society

results, the cultivation method cannot be assigned in every case.12 We have recently described the approach of proton nuclear magnetic resonance (1H NMR) profiling for the authentication of organically produced tomatoes, and the results confirmed suitability, provided that an appropriate database of authentic tomatoes is available.13 When developing new methods for the authentication of organically produced tomatoes, the currently available analytical methods should not remain unconsidered. The potential of different techniques should rather be combined to achieve synergies. This approach has proven to be highly useful for the differentiation of organically and conventionally produced milk by combining data of 1H and 13C NMR spectra with stable isotope ratios and fatty acid composition,14 for the verification of the variety and origin of wines by combining data of 1H NMR and stable isotope ratios15 and for the determination of Sudan dyes in spices by combining 1H NMR and ultraviolet/visible (UV/vis) data.16 Combined multivariate examination of individual results from different analytical methods can be performed by simply concatenating data matrices or by use of multiblock methods,17 Received: Revised: Accepted: Published: 9666

August 10, 2015 October 9, 2015 October 12, 2015 October 12, 2015 DOI: 10.1021/acs.jafc.5b03853 J. Agric. Food Chem. 2015, 63, 9666−9675

Article

Journal of Agricultural and Food Chemistry

Table 1. Number of Tomato Samples for Measurements of 1H NMR/MIR/IRMS/Data Fusion Analysis (from Left to Right in Each Cell) with Respect to Harvesting Period, Cultivar, and Greenhouse 2013

2014

Mecano Tica Tastery Bocati Hamlet Mecano Savantas Seviocard Tica Sakura Sunstream Tastery

BA organic

N organic

6/6/5/5 1/0/0/0 6/6/5/5

6/6/6/6

6/6/0/0 6/6/6/6

HD organic

6/5/6/5

6/6/0/0 3/3/0/0

4/4/4/4 4/4/4/4 4/4/4/4

N conv. 1

N conv. 2

6/6/5/5

8/7/6/6

8/7/6/6

6/6/5/5

7/6/6/5

6/6/6/6

6/6/0/0 6/6/6/6

6/6/0/0

6/6/0/0 1/1/0/0

6/6/0/0 6/6/0/0 7/7/1/1 6/6/6/6

6/6/0/0

5/5/0/0

HD conv.

4/4/4/4

3/3/3/3 6/6/0/0 6/6/0/0 6/5/0/0 6/6/6/6

4/4/4/4 3/3/3/3

3/3/3/3 4/4/4/4

Sample Collection of Authentic Tomato Samples. The normally fruited (average weight of 100 g/fruit) tomato varieties Bocati, Hamlet, Mecano, Savantas, Seviocard, and Tica and the small fruited (average weight of 20 g/fruit) tomato varieties Sakura, Sunstream, and Tastery were grown in overall seven greenhouses in Germany: two greenhouses of the Bavarian State Research Institute of Viticulture and Horticulture in Bamberg, organically and conventionally, referred to as “BA organic” and “BA conv.” in the following text; two greenhouses of the State Horticultural College and Research Institute Heidelberg, organically and conventionally, referred to as “HD organic” and “HD conv.” in the following text; and three greenhouses of trading farms in the growing region “Knoblauchsland” near Nuremberg, one organically and two conventionally, referred to as “N organic”, “N conv. 1”, and “N conv. 2” in the following text. Conventional growing conditions were each carried out as hydroponic culture using perlite substrate and mineral fertilizer, while organic growing conditions were carried out using soil and clover-grass silage, horn shavings, vinasse, Patentkali, sheep wool, or winter rye (previous culture for green manure) as organic fertilizers. Sampling was performed by harvesting tomatoes systematically from different plants in the greenhouses, at regular intervals of ca. 4 weeks between April and October in 2013 and between May and October in 2014. In the harvesting period of 2013, only the greenhouses BA organic, BA conv., N organic, N conv. 1, and N conv. 2 cultivated the varieties Mecano and Tastery (with the exception of one Tica tomato sample of BA organic), while the cultivation was complemented with the varieties Bocati, Hamlet, Savantas, Seviocard, Tica, Sunstream, and Sakura in individual greenhouses, including two further greenhouses (HD organic and HD conv.) in 2014. This yielded overall 205 tomato samples, thereof 66 harvested in 2013 and 139 in 2014. The composition of samples for individual measurements with respect to cultivars and greenhouses is illustrated in Table 1 (samples available for 1H NMR/MIR/IRMS/ data fusions are described from left to right in each cell). For subsequent analysis, at least 250 g of tomatoes was pooled, pureed, and homogenized, and the puree was stored at −18 °C until measurement. IRMS. One part of the pureed tomato sample was freeze-dried using a freeze dryer (Alpha 1-4 LSC, Christ, Osterode, Germany), pulverized using a ball mill MM 301 (Retsch GmbH, Haan, Germany), and used for measurement of 13C/12C and 15N/14N isotope ratios. The other part was centrifuged for 10 min [2700 relative centrifugal force (rcf)], and sodium azide was added to the supernatant and used for measurement of the 18O/16O isotope ratio. A total of 2.2 mg of the pulverized sample dry mass was weighed into tin capsules, combusted using an elementar analyzer (Euro EA 3000, Euro Vectors SpA, Milano, Italy), and analyzed with an isotope ratio mass spectrometer (ΔPlus XP, Thermo Finnigan, Bremen, Germany) equipped with a ConFlow IV interface (Thermo Fisher Scientific, Bremen, Germany) and an auto sampler (Zero Blank Revolver autosampler, Blisotec GmbH, Jülich, Germany) controlled by

which facilitates the interpretation of models and their reliability concerning the targeted goal.18 Hence, with the aim to develop an optimized analytical approach to verify authenticity of organically produced tomatoes, several analytical methods were combined: isotope ratio mass spectrometry (IRMS, determining δ13C, δ15N, and δ18O), with δ15N as currently the most reliable marker, 1H NMR spectroscopy that we recently proved to be useful,13 and additionally, mid-infrared spectroscopy (MIR), which turned out to be helpful to differentiate organically and conventionally produced wines.19 The individual suitability of each analytical method for the differentiation between organically and conventionally grown tomatoes was analyzed by use of principal component analysis (PCA), partial least squares discriminant analysis (PLS-DA), and linear discriminant analysis (LDA). Furthermore, LDA and PLS-DA (using concatenated data after variable selection for spectroscopic data) and common components and specific weight analysis (ComDim)20,21 were performed for combined data of 1H NMR spectroscopy, MIR spectroscopy, and IRMS. However, organic and conventional farming cannot be seen as black-and-white definitions, when faced with various possible implementations for each cultivation method.22 Moreover, establishing a database of authentic tomato samples that reflects all conceivable ways of farming is accordingly challenging and almost impossible. However, analyzing test sets of authentically grown tomatoes provides an estimation of the classification power of individual analytical methods to differentiate organically and conventionally grown tomatoes. After that, the applicability of classification models in the future can be assessed by validation studies. Therefore, in this study, authentic tomatoes were grown conventionally using hydroponic culture and mineral fertilizer and organically using soil and different organic fertilizers, both in a greenhouse, to keep influences of the weather to a minimum. These cultivation trials by far do not represent all variations in farming conditions but serve as a starting point to generally verify the capabilities of analytical methods to differentiate tomatoes regarding the cultivation method.



BA conv.

MATERIALS AND METHODS

Chemicals. NaOH pellets (for 1 M NaOH) were purchased from VWR (Leuven, Belgium), and HCl (37%, for 1 M HCl) was purchased from Sigma-Aldrich (St. Louis, MO). TSPd4 [3-(trimethylsilyl)propionic acid-d4 sodium salt, 98 atom % D], D2O (99.9 atom % D), ethylenediaminetetraacetic acid (EDTA), and NaN3 were purchased from Merck (Darmstadt, Germany). 9667

DOI: 10.1021/acs.jafc.5b03853 J. Agric. Food Chem. 2015, 63, 9666−9675

Article

Journal of Agricultural and Food Chemistry

Table 2. Number of Calibration and Validation Samples for Each Validation Step for Classification Models of 1H NMR, MIR and Data Fusions (Listed from Left to Right in Each Cell)a number of samples validation set small fruited tomatoes

cultivar

greenhouse

normally fruited tomatoes

cultivar

greenhouse

a

for calibration

Sakura Sunstream Tastery BA organic N organic HD organic BA conv. N conv. 1 N conv. 2 HD conv. Bocati Hamlet Mecano Savantas Seviocard Tica BA organic N organic HD organic BA conv. N conv. 1 N conv. 2 HD conv.

81 73 32 69 87 86 68 80 82 86 108 96 40 108 109 99 87 97 97 88 98 97 108

78 71 31 67 85 83 65 78 79 83 105 93 39 105 106 97 85 94 94 85 96 95 105

for validation 53 45 (8) 42 48 46 41 48 47 46 55 55 (11) 59 56 59 48 53 44 48 53 53 55

12 20 61 24 6 7 25 13 11 7 4 16 72 4 3 13 25 15 15 24 14 15 4

12 19 59 23 5 7 25 12 11 7 4 16 70 4 3 12 24 15 15 24 13 14 4

0 8 (45) 11 5 7 12 5 6 7 4 4 (48) 0 3 0 11 6 15 11 6 6 4

Numbers in parentheses are indicated for information purposes only; no validation was performed with these test sets.

Isodat 3.0 software (Thermo Finnigan, Bremen, Germany). Resulting gases, CO2 and N2, were separated by a gas chromatography (GC) column, and isotope ratios were determined simultaneously. The 18 O/16O ratio in “tomato water” was measured in 200 μL after equilibration with CO2 using a MultiFlow 07/003 (Elementar, Manchester, U.K.) with a Gilson 222XL sampler (Gilson, Villiers Le Bel, France) interfaced to IRMS (IsoPrime, Manchester, U.K.). The 13 C/12C, 15N/14N, and 18O/16O isotope ratios were given in ‰ on a δ scale. The values refer to the international reference standards Vienna Pee Dee Belemnite (VPDB) for δ13C, atmospheric nitrogen for δ15N, and Vienna Standard Mean Ocean Water 2 (VSMOW2) for δ18O. δ (‰) =

R sample − R standard R standard

a WineScan FT120 instrument (Foss GmbH, Rellingen, Germany) for tomato samples of 2013 and a WineScan FT2 Flex instrument (Foss GmbH, Rellingen, Germany) for tomato samples of 2014. For examinations, the range of wavelengths from 964 to 2998 cm−1 (528 acquired data points) was used, excluding the range from 1547 to 1716 cm−1 to eliminate water absorption. For each sample, the averaged spectra of two successive measurements were used. Multivariate Statistics. Data pre-processing was performed by reducing the dimension of 1H NMR data, bundling spectral regions of 0.02 ppm width into buckets using Amix 3.9.12 software (Bruker Biospin GmbH, Rheinstetten, Germany; each bucket represents the signal intensity related to the spectral region), and MIR transmission spectra were converted into respective absorption spectra. Buckets of 1 H NMR spectra, wavenumbers of MIR spectra, and IRMS data (δ13C, δ15N, and δ18O, given in ‰ on a δ scale) served as variables for multivariate data analysis. Multivariate data analysis was performed on the assumption of normally distributed data. For individual analysis of analytical data, PCA and LDA were carried out with SPSS statistics 21 (IBM Corporation, Armonk, NY). LDA was performed with equal “a priori” probabilities for all groups and a stepwise selection procedure23 (chosen method: minimization of Wilks’ lambda;24 selection criterion F statistics with p < 0.005 for inclusion and p > 0.010 for exclusion). During validations of LDA, instead of all variables, only the variables selected for LDA of all samples were taken into account for examination. PLS-DA was performed with Unscrambler X, version 10.0.1 (Camo Software AS, Oslo, Norway), using the nonlinear iterative partial least squares (NIPALS) algorithm. For combined analysis of several analytical methods, MATLAB 2015a (The Math Works, Natick, MA) and SAISIR package for MATLAB25 were used. For variable selection of 1H NMR and MIR data, clustering of latent variables (CLV) was used.26,27 LDA and PLSDA were applied to the concatenated spectroscopic data (1H NMR and MIR after CLV variable selection) and IRMS data (δ13C, δ15N, and δ18O). In this study, LDA was applied to the PCA scores, because the number of variables should not be too large.28 The best

× 1000

Acetanilide, casein, glutamic acid, and water were calibrated as working standards using the international standards (IAEA-CH6, IAEA-CH7, NBS 22, and USGS 40 for 13C/12C and 15N/14N and VSMOW2, SLAP2, and GISP for 18O/16O). Samples were analyzed twice, and working standards were measured 4 times to control the stability of the series of measurements. The standard deviation for IRMS analysis was ≤0.2 ‰. 1 H NMR Spectroscopy. The aqueous tomato phase was analyzed after centrifugation of puree at 3528g for 5 min. A total of 900 μL of clear liquid tomato phase was mixed with 100 μL of a solution of 7 mM TSPd4, 10 mM EDTA, and 2 mM NaN3 in D2O, and the pH was adjusted to pH 4.00 ± 0.03, using 1 M NaOH or 1 M HCl. Finally, 600 μL of the pH-adjusted solutions was filled into 5 mm NMR tubes for NMR measurement using a 400 MHz 1H NMR spectrometer. Acquisition and processing parameters of 1H NMR measurement were set as previously described.13 For examination, the spectral range from 0 to 10 ppm was used, excluding the regions of the residual water signal from 4.67 to 4.85 ppm and residual ethanol (NMR tubes were reused and washed with ethanol) from 3.60 to 3.70 ppm and from 1.14 to 1.22 ppm. MIR Spectroscopy. Tomato puree was filtered through a folded filter (4−7 μm). The clear filtrate was used for MIR measurement with 9668

DOI: 10.1021/acs.jafc.5b03853 J. Agric. Food Chem. 2015, 63, 9666−9675

Article

Journal of Agricultural and Food Chemistry classification models were constructed when the inverse of the sum of squares (square of Euclidian distance) was used as the block scaling factor (i.e., after applying the block scaling factor, the total variance of each block equals 1). Furthermore, multiblock method ComDim21,22 was performed on spectroscopic data (1H NMR and MIR after CLV variable selection) and IRMS data (δ13C, δ15N, and δ18O). For model evaluation, a test devised by Tóth et al.29 was performed, which provides information on the prediction performance of classification models by comparing the variance of classification models to the variance of their leave-one-out classification model counterparts using F statistics.



RESULTS AND DISCUSSION Overall, 205 tomato samples of nine different varieties were analyzed. Besides 1H NMR spectra that were recorded for each

Figure 1. Box plot of δ15N values of the aqueous tomato phase (expressed as ‰ versus atmospheric nitrogen) with regard to the cultivation method (organic, light gray; conventional, dark gray) for all tomato samples (on the left side) and tomato samples of individual greenhouses (on the right side). Each box is determined by the 25th and 75th percentiles, and each whisker is determined by the 5th and 95th percentiles.

tomato sample (n = 205), IRMS (n = 114) and MIR spectroscopy (n = 199) measurements were performed for a selection of tomato samples. In the following, the capabilities of individual methods and of combinations of these methods for the differentiation of organically and conventionally grown tomatoes will be described. To obtain an overview on the data structure, PCA was performed using individual data of 1H NMR and MIR spectroscopy and ComDim was applied to combined data of IRMS, 1H NMR, and MIR spectroscopy. Furthermore, for both data of individual methods (1H NMR and MIR) and combined methods (1H NMR + MIR, 1H NMR + IRMS, MIR + IRMS, and 1H NMR + MIR + IRMS), LDA and PLS-DA were tested for their ability to classify the cultivation method of tomatoes. At this, LDA classification models revealed equivalent or superior validation results to PLS-DA regarding the percentage of correct classifications, and LDA achieved constantly better comparability of results among different validation steps. Thus, for reasons of simplicity, only the outcomes of LDA will be demonstrated in the following. Validation of Classification Models. The use of supervised classification methods as LDA or PLS-DA always entails the risk that overfitted models are created, because purposing optimized classification can accidentally force the inclusion of insignificant variables. Such models reveal indeed

Figure 2. PCA of NMR data: (A) scatter plot of PC1 versus PC2 with square symbols for normally fruited tomato samples (Bocati, blue; Hamlet, red; Mecano, yellow; Savantas, cyan; Seviocard, purple; and Tica, pink) and triangular symbols for small fruited tomato samples (Sakura, yellow; Sunstream, light green; and Tastery, purple), (B) scatter plot of PC1 versus PC2 with square yellow symbols for normally fruited tomato samples and blue triangular symbols for small fruited tomato samples, and (C) scatter plot of PC1 versus PC5 with square symbols for normally fruited tomato samples and triangular symbols for small fruited tomato samples, which are colored light gray for organic cultivation methods and dark gray for conventional cultivation methods. 9669

DOI: 10.1021/acs.jafc.5b03853 J. Agric. Food Chem. 2015, 63, 9666−9675

Article

Journal of Agricultural and Food Chemistry

Table 3. Test Set Validation for LDA Using Data of 1H NMR, MIR, 1H NMR + MIR, 1H NMR + IRMS, MIR + IRMS, and 1H NMR + MIR + IRMS, Separately for Small and Normally Fruited Tomatoes with Each Random Validation and Validation of Individual Cultivars and Greenhouses validation step small fruited tomatoes

normally fruited tomatoes

random cultivar greenhouse random cultivar greenhouse

1

H NMR

MIR

93.5 91.3 95.7 100.0 100.0 99.1

80.0 82.2 75.6 91.7 85.3 83.5

1

H NMR + MIR 96.7 100.0 73.6 94.4 90.9 72.9

1

H NMR + IRMS

MIR + IRMS

100.0 100.0 96.2 95.0 100.0 94.9

100.0 83.3 84.9 100.0 90.9 88.1

1

H NMR + MIR + IRMS 100.0 100.0a 100.0 95.0 100.0a 98.3

a

Cultivar validation for Mecano and Tastery was renounced as a result of an inappropriate ratio between the number of samples for validation and calibration; only the respectively remaining cultivars served as the test set for cultivar validation.

of cultivar and greenhouse cultivation. Because the tomato samples are not evenly distributed on all cultivars/greenhouses and especially Tastery/Mecano are over-represented as cultivars for small/normally fruited tomato samples, the use of these cultivars as the validation test would lead to a relatively small number of remaining calibration samples. Hence, for cultivar validation of fused data, Tastery/Mecano were not used as the cultivar validation test set, because the remaining calibration set would provide 8/11 tomato samples for calibration only, which is not appropriate. For all remaining validation steps, at a minimum, 42 tomato samples were available for calibration. Generally, this validation concept presents a stepwise approach. First, results of random validation indicate the basic ability to differentiate organically and conventionally grown tomatoes, and subsequently, cultivar and greenhouse validation verify if the classification ability is adequately resistant to compositional variations subject to specific cultivars or greenhouses. Thus, good results for random validation coincident with worse outcome for cultivar/greenhouse validation indicate overfitted classification models, while good and comparable results among all validation steps confirm suitability of the classification models. Furthermore, to verify if validations yield significant results, which are not based on random events of meaningless data, a randomization test was performed; variables (1H NMR and MIR data) were replaced by random vectors, and the validation results for classification models thereof were analyzed. Because the random probability of each tomato sample to be organic or conventional is 50%, an objective validation process of classification models based on random data is expected to achieve ca. 50% correct predictions. In accordance with this, on average 52 ± 7 and 46 ± 7% correct classifications (average of random/cultivar/greenhouse validations) were achieved for the randomization test of LDA classification models for 1H NMR and MIR data, confirming the informative value of the validation approach. IRMS. Overall, 114 tomato samples were analyzed by IRMS regarding δ15N values of the dry residues of tomatoes. The isotope composition of nitrogen in the applied fertilizers predefines the isotope composition of nitrogen of the fertilized tomatoes, and consequently, higher δ15N values of organic fertilizers12 lead to higher δ15N values of organically produced tomatoes.11 One exception to be mentioned is the use of leguminous, which is legitimate as an organic fertilizer. Legumes can metabolize atmospheric nitrogen (δ15N value around 0‰) to plant-accessible nitrogenous molecules, which leads to noticeably low δ15N values and, thus, hampers differentiation from conventionally grown crops.12 Our results

good classification abilities for model samples but fail in the classification of further samples. Hence, suitable validation studies of classification models are highly important30 to consider both the suitability of multivariate analysis and the representative nature of model samples. With regard to the approach to differentiate organically and conventionally grown tomatoes, several critical influencing factors have to be considered during validation. As a natural product, the composition of tomatoes is subject to unavoidable natural variations, which complicate the aim to designate their cultivation method. To evaluate the influence of natural compositional fluctuations, one-third of samples was coincidentally excluded for the creation of LDA and PLS-DA models and used as an independent test set for validation (this validation procedure will be referred to as “random validation” in the following text). However, because tomato samples were collected repeatedly at different harvesting times, randomly selected validation test sets comprise tomato samples of the same cultivars grown in the same greenhouses but simply harvested at another point in time than samples of the calibration set. Thus, although good results for random validation indicate differentiability of organically and conventionally grown tomatoes, the practicality of these classification models for further tomato samples of another cultivar or from another greenhouse (with specific implementations of cultivation) is questionable. For instance, the previous results on the differentiation of organically produced tomatoes using 1H NMR showed that the differentiation between two greenhouses with different growing conditions works well but does not prove to be useful for the classification of tomato samples from different greenhouses, despite basically comparable growing conditions, because the model is overfitted, taking into account two greenhouses only.13 Hence, further validation steps were performed. Complete test sets of individual cultivars and individual greenhouses were specifically excluded for model calibration and used as the test set, to assess the quality of classifications for tomatoes, whose cultivar or specific cultivation method were not taken into account for calibration. Validation test sets consisted of each group of individual cultivars/greenhouses, consecutively, and calibration samples were formed by tomato samples of the remaining cultivars/greenhouses. At this, each tomato sample was excluded once for the corresponding cultivar and once for the greenhouse group, and the average result of all tomato samples yielded the terms of cultivar and greenhouse validation. Table 2 illustrates the respective number of calibration and validation samples for classification models of 1H NMR, MIR, and data fusions (indicated from left to right) during the steps 9670

DOI: 10.1021/acs.jafc.5b03853 J. Agric. Food Chem. 2015, 63, 9666−9675

Article

Journal of Agricultural and Food Chemistry

overlapping region existed in the range from 2 to 4‰. This overlap is mainly due to the use of green manures of the greenhouses N organic and BA organic, while HD organic only applied horn shavings and vinasse as fertilizers and yielded accordingly high δ15N values that were clearly separated from the δ15N range of conventionally grown tomatoes. Beside δ15N values, IRMS included the determination of 13 δ C (of the dry residue of tomatoes) and δ18O (of the aqueous tomato phase), but these are less relevant in view of the growing regime. δ13C (averaging −30.2 ± 3.5‰ versus VPDB) indicates greenhouse cultivation as a result of striking negative values of δ13C caused by the supplement of CO2 from heatings with CH4, and δ18O (averaging −4.4 ± 1.5‰ versus VSMOW2) depends upon the source of water.31 1 H NMR Spectroscopy. For each tomato sample (n = 205), 1 a H NMR spectrum of the aqueous phase was acquired. 1H NMR spectra provided wide information about sugars, organic acids, amino acids, and further minor components at the same time,13 and hence, 1H NMR is an accordingly useful source of data for tomato profiling. To reduce the dimension of data, 1H NMR spectra were transformed into buckets by bundling spectral regions of 0.02 ppm. PCA. PCA of buckets was performed to obtain an overview of the data clustering. Mean-centered and standardized buckets were used for analysis, because varying concentrations of ingredients resulted in signal intensities that differed highly in scale. The scatter plot of PC1 versus PC2 (Figure 2A) demonstrates that the data clustered mainly according to the cultivar type, and especially data clouds of normally fruited varieties were separated from small fruited tomato varieties (Figure 2B). Actually, the values of PC1 seem to be predefined by the dry mass of tomatoes, because PC1 was highly correlated with the total spectral intensity (R = 0.967; total spectral intensity was calculated as the sum of all buckets from 0 to 10 ppm, excluding ranges of water and ethanol resonance signals). However, a trend for the separation of respective organically and conventionally grown tomatoes was also achieved along PC5 with significantly higher values for the group of conventionally grown tomatoes (t test: p < 0.001; Figure 2C), but the overlapping data clouds did not enable obvious differentiation. LDA. Hence, the supervised classification algorithm of LDA was used for further examinations. PCA showed that the main variance in NMR data is given by the total spectral intensity, which is probably due to varying dry masses of tomatoes. To reduce this effect, buckets were transformed into their relative values referred to the total spectral intensity (sum of all buckets from 0 to 10 ppm, excluding ranges of water and ethanol resonance signals) prior to LDA. Moreover, because PCA revealed wide differences between normally and small fruited tomatoes, classification models were built for all tomatoes as well as for normally and small fruited tomatoes individually. The Tóth test29 thereof suggested better prediction performances for separate classification models for the groups of normally and small fruited tomatoes (p = 0.48 and 0.20, respectively) than for one overall classification model including all tomato samples (p < 0.05). Thus, separate classification models were used for further examinations. For both classification models of normally and small fruited tomato samples, the individual validation steps showed comparable outcome, and thus, no indication for overfitted models is given (Table 3). In comparison of LDA classification results for normally and small fruited tomatoes among each

Figure 3. Scatter plot of PC1 versus PC2 for PCA of MIR data: (A) square symbols for normally fruited tomato samples (Bocati, blue; Hamlet, red; Mecano, yellow; Savantas, cyan; Seviocard, purple; and Tica, pink) and triangular symbols for small fruited tomato samples (Sakura, yellow; Sunstream, light green; and Tastery, purple), (B) square symbols for normally fruited tomato samples (colored light gray for organic cultivation methods and dark gray for conventional cultivation methods) and triangular colorless symbols for small fruited tomato samples, and (C) square colorless symbols for normally fruited tomato samples and triangular symbols (colored light gray for organic cultivation methods and dark gray for conventional cultivation methods) for small fruited tomato samples.

of IRMS totally comply with these findings (Figure 1). The δ15N value averaged significantly higher results for organically grown tomatoes than conventionally grown tomatoes, but an 9671

DOI: 10.1021/acs.jafc.5b03853 J. Agric. Food Chem. 2015, 63, 9666−9675

Article

Journal of Agricultural and Food Chemistry

Figure 4. Panels A−D are illustrated for normally fruited tomatoes (on the left side) and small fruited tomatoes (on the right side) with square symbols for normally fruited tomato samples and triangular symbols for small fruited tomato samples, each colored light gray for organic cultivation methods and dark gray for conventional cultivation methods): (A) three-dimensional plot of the first three dimensions of ComDim analysis for 1H NMR + MIR + IRMS data, (B) PCA scatter plot (PC1 versus PC5) for 1H NMR data, (C) PCA scatter plot (PC1 versus PC2) for MIR data, and (D) box plot of δ15N of the aqueous tomato phase (expressed as ‰ versus atmospheric nitrogen).

samples is possibly more representative, because six different cultivars were included in comparison to only three different varieties of small fruited tomatoes. MIR Spectroscopy. MIR spectroscopy is a powerful tool for food analysis, offering simple sample preparation and rapid analysis.32 It can be used for authenticity analysis33 as well as for quantification purposes after adequate calibration with samples of known composition.34,35 To test the suitability of MIR for differentiating tomatoes of different cultivation methods, the aqueous phase of tomatoes was analyzed by means of MIR spectroscopy. Overall, 199 tomato samples were measured using MIR, and spectra were analyzed using PCA and LDA. PCA. Mean-centered data of MIR absorption spectra were used for examination. Just as for 1H NMR, PCA of MIR spectra mainly revealed the separation of cultivars (Figure 3A), especially of normally and small fruited tomato samples along PC1 (panels A and B of Figure 3). However, in between these

Figure 5. Salience of 1H NMR, MIR, and IRMS data on the first three dimensions of ComDim analysis for normally fruited tomatoes (on the left side) and small fruited tomatoes (on the right side).

other, normally fruited tomatoes yielded better validation results with 100% correct classifications for cultivar and 99.1% for greenhouse validation compared to 91.3% for cultivar and 95.7% for greenhouse validation of small fruited tomatoes. With regard to cultivar validation, the model for normally fruited 9672

DOI: 10.1021/acs.jafc.5b03853 J. Agric. Food Chem. 2015, 63, 9666−9675

Article

Journal of Agricultural and Food Chemistry

dimensions (D1, D2, and D3) of ComDim analysis. For both normally and small fruited tomatoes, MIR data are dominant for D1, IRMS data are dominant for D2, and 1H NMR data are dominant for D3 (Figure 5), and thus, each analytical method considerably influenced the results of ComDim. LDA of Concatenated Data. Concatenation of several data matrices presents the simplest way of data fusion and was applied combining data of 1H NMR + MIR, 1H NMR + IRMS, MIR + IRMS, and 1H NMR + MIR + IRMS. The quality of each data combination for a classification of the cultivation method (using LDA and PLS-DA) was again assessed by test set validation (Table 3). For the sake of comparability, data of the same tomato samples (n = 112) were used for all combinations of data, even if generally more samples were available for individual combinations. In comparison of different data fusion models for the classification quality of LDA, best results were achieved for 1H NMR + MIR + IRMS with 100% correct classifications for random, cultivar, and greenhouse validation for small fruited tomatoes and 95.0% for random, 100% for cultivar, and 98.3% for greenhouse validation for normally fruited tomatoes. Second best LDA validation results are presented by the combination of 1H NMR + IRMS (94.9−100.0%), while results for 1H NMR + MIR (72.9−100%) and MIR + IRMS (83.3− 100%) are occasionally different for individual validations but comparable in view of the average quality of all results. Comparison of Results for Fused Data and Individual Analysis. In comparison of classification results of concatenated data to findings of individual methods, LDA validation of 1H NMR + MIR + IRMS (95.0−100%) yielded better results than LDA validation of individual methods (1H NMR, 91.3−100%; MIR, 75.6−91.7% according to the data presented by Table 3). Hence, this supports the approach to combine these methods to achieve synergies for an optimized differentiation of organically and conventionally grown tomatoes. With regard to single analytical methods, especially the classification results of 1H NMR models are promising. However, the quality of 1H NMR models crucially depends upon the representative nature of model samples to avoid overfitting, and generally, more tomato samples differing in cultivar and specific growing conditions need to be measured to further enhance significance of results. Within the framework of possibilities for the actual available sample compilation, test set validation was performed as the best to yield a realistic estimation of the quality of results. In the future, if the database of authentic tomato samples is sufficiently widened, enhanced chemometric classification models can be used as a helpful screening tool to investigate the authenticity of tomatoes. Moreover, additional measurements of MIR and IRMS analyses can improve classification results of individual 1H NMR analysis.

groups, conventionally produced tomatoes yielded significantly higher values for PC1 than organically produced tomatoes (each t test: p < 0.001) but with overlapping regions (panels B and C of Figure 3). LDA. Classification models were created for all tomato samples as well as for the groups of normally and small fruited tomato samples individually. The Tóth test29 indicated adequate prediction performances of each classification model (p = 0.33, 0.24, and 0.38 for all, normally, and small fruited tomato samples, respectively). In favor of comparability to classification results of 1H NMR data, individual classification models for normally and small fruited tomatoes were used for further examinations. LDA of MIR data showed quite comparable results for random validation and cultivar/greenhouse validation and amounted to a maximum of 8.2% for the differences between classification results (91.7% for random and 83.5% for greenhouse validation of normally fruited samples; Table 3). Hence, no evidence for overfitting of LDA classification models is given. In comparison of LDA classification findings for normally and small fruited samples, results for normally fruited tomato samples (83.5−91.7%) were always better than those for small fruited tomato samples (75.6−82.2%). Overall, LDA classification results for MIR ranged between 75.6 and 91.7% and, thus, are inferior to 1H NMR results (ranging from 91.3 to 100% according to the data presented by Table 3). Data Fusion of 1H NMR, MIR, and IRMS. Finally, the differentiation of organically and conventionally grown tomatoes was analyzed by fusing data of individual methods (1H NMR, MIR, and IRMS). At this, normalized 1H NMR data were used, scaled to the total spectral intensity (total spectral intensity was calculated as the sum of all buckets from 0 to 10 ppm, excluding ranges of water and ethanol resonance signals). Because spectroscopic data naturally present data sources with a high number of variables in contrast to IRMS with only three variables (δ13C, δ15N, and δ18O), the number of variables of 1H NMR and MIR was reduced prior to data fusions using CLV. The CLV method involves two stages, namely, a hierarchical clustering analysis, followed by a partitioning algorithm. Partitioning is determined by the value of a quality criterion (T) which is the sum of the first eigenvalues of the data matrices of each cluster.26,27 ComDim Analysis. To obtain an overview of the sample grouping regarding data of all analytical methods, ComDim analysis was performed for data of 1H NMR + MIR + IRMS, separately for the groups of normally and small fruited tomatoes. The basic idea of ComDim is the creation of one common space of common components of several variable blocks available for the same samples,21,22 which are the variables of several analytical methods for this special case. Figure 4 illustrates the results of ComDim analysis (Figure 4A) compared to respective individual results of PCA for 1H NMR (Figure 4B) and MIR (Figure 4C) data as well as the range of δ15N (Figure 4D), separately for normally and small fruited tomato samples. In comparison to PCA analysis of individual methods, ComDim clearly shows an increased separation trend of data points according to the cultivation method. A major advantage of ComDim analysis compared to PCA on concatenated data is that ComDim provides information about the relationship of individual variable blocks and their selectivity on the total variance of common components.36 Figure 5 demonstrates the specific weight (or salience) of 1H NMR, MIR, and IRMS associated with the first three common



AUTHOR INFORMATION

Corresponding Author

*Telephone: +49-9131-68087151. Fax: +49-9131-68087210. Email: [email protected]. Funding

This research project was funded by the Bavarian State Ministry of the Environment and Consumer Protection, and Yulia Monakhova acknowledges funding in the framework of the State Contract 4.1708.2014K of the Russian Ministry of Education. 9673

DOI: 10.1021/acs.jafc.5b03853 J. Agric. Food Chem. 2015, 63, 9666−9675

Article

Journal of Agricultural and Food Chemistry Notes

(14) Erich, S.; Schill, S.; Annweiler, E.; Waiblinger, H. U.; Kuballa, T.; Lachenmeier, D. W.; Monakhova, Y. B. Combined chemometric analysis of 1H NMR, 13C NMR and stable isotope data to differentiate organic and conventional milk. Food Chem. 2015, 188, 1−7. (15) Monakhova, Y. B.; Godelmann, R.; Hermann, A.; Kuballa, T.; Cannet, C.; Schäfer, H.; Spraul, M.; Rutledge, D. N. Synergistic effect of the simultaneous chemometric analysis of 1H NMR spectroscopic and stable isotope (SNIF-NMR, 18O, 13C) data: Application to wine analysis. Anal. Chim. Acta 2014, 833, 29−39. (16) Di Anibal, C. V.; Callao, M. P.; Ruisánchez, I. 1H NMR and UVvisibile data fusion for determining Sudan syes in culinary spices. Talanta 2011, 84, 829−833. (17) MacGregor, J. F.; Jaeckle, C.; Kiparissides, C.; Koutoudi, M. Process monitoring and diagnosis by multiblock PLS methods. AIChE J. 1994, 40, 826−838. (18) Westerhuis, J. A.; Smilde, A. K. Deflation in multiblock PLS. J. Chemom. 2001, 15, 485−493. (19) Cozzolino, D.; Holdstock, M.; Dambergs, R. G.; Cynkar, W. U.; Smith, P. A. Mid infrared spectroscopy and multivariate analysis: A tool to discriminate between organic and non-organic wines grown in Australia. Food Chem. 2009, 116, 761−765. (20) Qannari, E. M.; Wakeling, I.; Courcoux, P.; MacFie, H. J. H. Defining the underlying sensory dimensions. Food Qual. Prefer 2000, 11, 151−154. (21) Qannari, E. M.; Wakeling, I.; MacFie, H. J. H. A hierarchy of models for analysing sensory data. Food Qual. Prefer. 1995, 6, 309− 314. (22) Drinkwater, L. E.; Letourneau, D. K.; Workneh, F.; van Bruggen, A. H. C.; Shennan, C. Fundamental differences between conventional and organic tomato agroecosystems in California. Ecol. Appl. 1995, 5, 1098−1112. (23) Flury, B.; Riedwyl, H. Angewandte Multivariate Statistik, 1st ed.; Gustav Fischer Verlag: Stuttgart, Germany, 1983. (24) Marini, F.; Magrì, A. L.; Balestrieri, F.; Fabretti, F.; Marini, D. Supervised pattern recognition applied to the discrimination of the floral origin of six types of Italian honey samples. Anal. Chim. Acta 2004, 515, 117−125. (25) Cordella, C.; Bertrand, D. SAISIR: A new general chemometric toolbox. TrAC, Trends Anal. Chem. 2014, 54, 75−82. (26) Vigneau, E.; Qannari, E. M. Clustering of Variables Around Latent Components. Commun. Stat. Simul. Comput. 2003, 32, 1131− 1150. (27) Cuny, M.; Vigneau, E.; Le Gall, G.; Colquhoun, I.; Lees, M.; Rutledge, D. N. Fruit juice authentication by 1H NMR spectroscopy in combination with different chemometrics tools. Anal. Bioanal. Chem. 2008, 390, 419−427. (28) Monakhova, Y. B.; Godelmann, R.; Kuballa, T.; Mushtakova, S. P.; Rutledge, D. N. Independent components analysis to increase efficiency of discriminant analysis methods (FDA and LDA): Application to NMR fingerprinting of wine. Talanta 2015, 141, 60− 65. (29) Tóth, G.; Bodai, Z.; Heberger, K. Estimation of influential points in any data set from coefficient of determination and its leaveone-out cross-validated counterpart. J. Comput.-Aided Mol. Des. 2013, 27, 837−844. (30) Riedl, J.; Esslinger, S.; Fauhl-Hassek, C. Review of validation and reporting of non-targeted fingerprinting approaches for food authentication. Anal. Chim. Acta 2015, 885, 17−32. (31) Schmidt, H. L.; Roßmann, A.; Voerkelius, S.; Schnitzler, W. H.; Georgi, M.; Grassmann, J.; Zimmermann, G.; Winkler, R. Isotope characteristics of vegetables and wheat from conventional and organic production. Isot. Environ. Health Stud. 2005, 41, 223−238. (32) Vandevoort, F. R. Fourier transform infrared spectroscopy applied to food analysis. Food Res. Int. 1992, 25, 397−403. (33) Cozzolino, D.; Smyth, H. E.; Gishen, M. Feasibility study on the use of visible and near-infrared spectroscopy together with chemometrics to discriminate between commercial white wines of different varietal origins. J. Agric. Food Chem. 2003, 51, 7703−7708.

The authors declare no competing financial interest.



ACKNOWLEDGMENTS Special thanks are due to colleagues from the Bavarian State Research Institute of Viticulture and Horticulture (LWG, Bamberg, Germany) and the State Horticultural College and Research Institute Heidelberg (LVG, Heidelberg, Germany) and to producers of the region “Knoblauchsland” for providing authentic tomato samples.



REFERENCES

(1) Committee on the Environment, Public Health and Food Saftey, European Parliament. Draft Report on the Food Crisis, Fraud in the Food Chain and the Control Thereof (2013/2091(INI)); http://www. europarl.europa.eu/sides/getDoc.do?pubRef=-//EP// NONSGML+COMPARL+PE-519.759+02+DOC+PDF+V0// EN&language=EN (accessed Aug 5, 2015). (2) Sahota, A. Global Market for Organic Food & Drink. The World of Organic Agriculture, Statistics and Emerging Trends 2013; https:// www.fibl.org/fileadmin/documents/shop/1606-organic-world-2013. pdf (accessed Aug 5, 2015). (3) Caris-Veyrat, C.; Amiot, M. J.; Tyssandier, V.; Grasselly, D.; Buret, M.; Mikolajczak, M.; Guilland, J. C.; Bouteloup-Demange, C.; Borel, P. Influence of organic versus conventional agricultural practice on the antioxidant microconstituent content of tomatoes and derived purees; Consequences on antioxidant plasma status in humans. J. Agric. Food Chem. 2004, 52, 6503−6509. (4) Bundesanstalt für Landwirtschaft und Ernährung. 20,6 kg pro Kopf verzehrt: Tomaten sind der Deutschen liebstes Gemüse; http://www. ble.de/SharedDocs/Downloads/08_Service/04_Pressemitteilungen/ Archiv2013/130709_Tomaten.pdf;jsessionid= F8552452F0D99F07C45DD6E21B128375.1_cid335?__blob= publicationFile (accessed Aug 5, 2015). (5) Mitchell, A. E.; Hong, Y. J.; Koh, E.; Barrett, D. M.; Bryant, D. E.; Denison, R. F.; Kaffka, S. Ten-year comparison of the influence of organic and conventional crop management practices on the content of flavonoids in tomatoes. J. Agric. Food Chem. 2007, 55, 6154−6159. (6) Vallverdú-Queralt, A.; Medina-Remón, A.; Casals-Ribes, I.; Amat, M.; Lamuela-Raventós, R. M. A Metabolomic Approach Differentiates between Conventional and Organic Ketchups. J. Agric. Food Chem. 2011, 59, 11703−11710. (7) Vallverdú-Queralt, A.; Medina-Remón, A.; Casals-Ribes, I.; Lamuela-Raventós, R. M. Is there any difference between the phenolic content of organic and conventional tomato juice? Food Chem. 2012, 130, 222−227. (8) Kelly, S. D.; Bateman, A. S. Comparison of mineral concentrations in commercially grown organic and conventional crops - Tomatoes (Lycopersum esculentum) and lettuces (Lactuca sativa). Food Chem. 2010, 119, 738−745. (9) Gosling, P.; Hodge, A.; Goodlass, G.; Bending, G. D. Arbuscular mycorrhizal fungi and organic farming. Agric., Ecosyst. Environ. 2006, 113, 17−35. (10) Bateman, A. S.; Kelly, S. D.; Jickells, T. D. Nitrogen isotope relationships between crops and fertilizer: implications for using nitrogen isotope analysis as an indicator of agricultural regime. J. Agric. Food Chem. 2005, 53, 5760−5765. (11) Bateman, A. S.; Kelly, S. D.; Woolfe, M. Nitrogen isotope composition of organically and conventionally grown crops. J. Agric. Food Chem. 2007, 55, 2664−2670. (12) Rogers, K. M. Nitrogen isotopes as a screening tool to determine the growing regimen of some organic and nonorganic supermarket produce from New Zealand. J. Agric. Food Chem. 2008, 56, 4078−4083. (13) Hohmann, M.; Christoph, N.; Wachter, H.; Holzgrabe, U. 1H NMR profiling as an approach to differentiate conventionally and organically grown tomatoes. J. Agric. Food Chem. 2014, 62, 8530− 8540. 9674

DOI: 10.1021/acs.jafc.5b03853 J. Agric. Food Chem. 2015, 63, 9666−9675

Article

Journal of Agricultural and Food Chemistry (34) Bauer, R.; Nieuwoudt, H.; Bauer, F. F.; Kossmann, J.; Koch, K. R.; Esbensen, K. H. FTIR spectroscopy for grape and wine analysis. Anal. Chem. 2008, 80, 1371−1379. (35) Lachenmeier, D. W. Rapid quality control of spirit drinks and beer using multivariate data analysis of fourier transform infrared spectra. Food Chem. 2007, 101, 825−832. (36) Mazerolles, G.; Hanafi, M.; Dufour, E.; Bertrand, D.; Qannari, E. M. Common components and specific weights analysis: a chemometric method for dealing with complexity of food products. Chemom. Intell. Lab. Syst. 2006, 81, 41−49.

9675

DOI: 10.1021/acs.jafc.5b03853 J. Agric. Food Chem. 2015, 63, 9666−9675