Differentiation of Organically and Conventionally Grown Tomatoes by

Oct 12, 2015 - Differentiation of Organically and Conventionally Grown Tomatoes by Chemometric Analysis of Combined Data from Proton Nuclear. Magnetic...
0 downloads 0 Views 1MB Size
Subscriber access provided by UNIV OF GEORGIA

Article

Differentiation of organically and conventionally grown tomatoes by chemometric analysis of combined data from 1H NMR- and MIR-spectroscopy and stable isotope analysis Monika Hohmann, Yulia Monakhova, Sarah Erich, Norbert Christoph, Helmut Wachter, and Ulrike Holzgrabe J. Agric. Food Chem., Just Accepted Manuscript • DOI: 10.1021/acs.jafc.5b03853 • Publication Date (Web): 12 Oct 2015 Downloaded from http://pubs.acs.org on October 18, 2015

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Journal of Agricultural and Food Chemistry is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 33

Journal of Agricultural and Food Chemistry

1

Differentiation of organically and conventionally

2

grown

3

combined data from 1H NMR- and MIR-spectroscopy

4

and stable isotope analysis

5

Monika Hohmann1,2, Yulia Monakhova3,4, Sarah Erich5, Norbert Christoph2, Helmut Wachter2*, Ulrike

6

Holzgrabe1

7 8

1

9

2 Bavarian

tomatoes

by

chemometric

analysis

of

Institute of Pharmacy and Food Chemistry, University of Würzburg, Am Hubland, 97074 Würzburg, Germany Health and Food Safety Authority, Luitpoldstraße 1, 97082 Würzburg, Germany

10

3 Spectral

11 12

4

Department of Chemistry, Saratov State University, Astrakhanskaya Street 83, 410012 Saratov, Russia

13

5 Chemical and Veterinary Investigation

Service, Emil-Hoffmann-Straße 33, 50996 Cologne, Germany

Laboratory, Bissierstraße 5, 79114 Freiburg, Germany

14 15 16

*

Corresponding author:

Helmut Wachter

17

Phone: +49 9131 68087151

18

Fax: +49 9131 68087210

19

Email: [email protected]

1 ACS Paragon Plus Environment

Journal of Agricultural and Food Chemistry

20

Abstract

21

Since the basic suitability of proton nuclear magnetic resonance spectroscopy (1H NMR) to

22

differentiate organic vs. conventional tomatoes was recently proved, the approach to optimize

23

1

24

additional data of isotope ratio mass spectrometry (IRMS, δ13C, δ15N, δ18O) and Mid Infrared

25

spectroscopy (MIR) was assessed. Both individual and combined analytical methods (1H NMR + MIR,

26

1

27

Analysis (PCA), Partial Least Squares – Discriminant Analysis (PLS-DA), Linear Discriminant Analysis

28

(LDA) and Common Components and Specific Weight Analysis (ComDim). Regarding classification

29

abilities, fused data of 1H NMR + MIR + IRMS yielded better validation results (ranging between

30

95.0% and 100.0%) than individual methods (1H NMR: 91.3% - 100%, MIR: 75.6% - 91.7%), suggesting

31

that the combined examination of analytical profiles enhances authentication of organically

32

produced tomatoes.

H NMR classification models (comprising overall 205 authentic tomato samples) by including

H NMR + IRMS, MIR + IRMS, 1H NMR + MIR + IRMS) were examined using Principal Component

33 34

Keywords: organic tomatoes, 1H NMR, MIR, IRMS, chemometrics, data fusion

2 ACS Paragon Plus Environment

Page 2 of 33

Page 3 of 33

Journal of Agricultural and Food Chemistry

35

Introduction

36

The Committee on the Environment, Public Health and Food Safety of the European Parliament has

37

recently published a draft report “on the food crisis, fraud in the food chain and the control thereof”

38

in which organic food is listed as number three of the top-ten products with a particularly high risk

39

for adulterated food.1 This fact certainly derives from the increasing demand for organic food2 with

40

the consumer’s willingness to pay higher prices for organically than for comparable conventional

41

produced food. Thus, verifying authenticity of organic products is of decisive importance to protect

42

consumers against adulteration and to support the trustworthiness of organic labelling.

43

This study will discuss the use of sophisticated chemometric methods for the differentiation of

44

organic and conventional food, exemplified for tomatoes. Tomatoes and tomato products are

45

consumed in a large scale in Europe3 and at present the most popular vegetable in Germany with an

46

average annual consumption of 20.6 kg per person.4 Reliable markers to analytically verify the

47

cultivation methods of tomatoes are hardly available, although numerous attempts are described in

48

previous literature.5-11 Up to now, the composition of the nitrogen isotope (δ15N, expressed as

49

relative difference to the standard of atmospheric nitrogen) has presented the most important

50

marker to distinguish organically and conventionally produced tomatoes, but due to an overlap of

51

results the cultivation method cannot be assigned in every case.12

52

We have recently described the approach of proton nuclear magnetic resonance (1H NMR) profiling

53

for the authentication of organically produced tomatoes and the results confirmed suitability,

54

provided that an appropriate database of authentic tomatoes is available.13 When developing new

55

methods for the authentication of organically produced tomatoes, the currently available analytical

56

methods should not remain unconsidered. The potential of different techniques should rather be

57

combined to achieve synergies. This approach has proven to be highly useful for the differentiation

58

of organically and conventionally produced milk by combining data of 1H NMR and 13C NMR spectra

59

with stable isotope ratios and fatty acid composition,14 for the verification of variety and origin of 3 ACS Paragon Plus Environment

Journal of Agricultural and Food Chemistry

60

wines by combining data of 1H NMR and stable isotope ratios15 and for the determination of Sudan

61

dyes in spices by combining 1H NMR and UV/vis data.16 Combined multivariate examination of

62

individual results from different analytical methods can be performed by simply concatenating data

63

matrices or by use of multiblock methods,17 which facilitates the interpretation of models and their

64

reliability concerning the targeted goal.18

65

Hence, with the aim to develop an optimized analytical approach to verify authenticity of organically

66

produced tomatoes, several analytical methods were combined: isotope ratio mass spectrometry

67

(IRMS, determining δ13C, δ15N and δ18O) with δ15N as currently most reliable marker, 1H NMR

68

spectroscopy that we recently proved to be useful,13 and additionally, Mid InfraRed spectroscopy

69

(MIR) which turned out to be helpful to differentiate organically and conventionally produced

70

wines.19 The individual suitability of each analytical method for the differentiation between

71

organically and conventionally grown tomatoes was analyzed by use of Principal Component Analysis

72

(PCA), Partial Least Squares – Discriminant Analysis (PLS-DA) and Linear Discriminant Analysis (LDA).

73

Furthermore, LDA and PLS-DA (using concatenated data after variable selection for spectroscopic

74

data) and Common Components and Specific Weight Analysis (ComDim)20,21 were performed for

75

combined data of 1H NMR spectroscopy, MIR spectroscopy and IRMS.

76

However, organic and conventional farming cannot be seen as black-and-white definitions, when

77

faced with various possible implementations for each cultivation method.22 Moreover, establishing a

78

database of authentic tomato samples that reflects all conceivable ways of farming is accordingly

79

challenging and almost impossible. Yet analysing test sets of authentically grown tomatoes provides

80

an estimation of the classification power of individual analytical methods to differentiate organically

81

and conventionally grown tomatoes. After that, the applicability of classification models in future can

82

be assessed by validation studies. Therefore, in this study, authentic tomatoes were grown

83

conventionally using hydroponic culture and mineral fertilizer and organically using soil and different

84

organic fertilizers, both in a greenhouse, to keep influences of the weather to a minimum. These

85

cultivation trials do by far not represent all variations in farming conditions, but serve as a starting 4 ACS Paragon Plus Environment

Page 4 of 33

Page 5 of 33

Journal of Agricultural and Food Chemistry

86

point to generally verify the capabilities of analytical methods to differentiate tomatoes regarding its

87

cultivation method.

88

Materials and Methods

89

Chemicals

90

NaOH pellets (for 1 M NaOH) were purchased from VWR (Leuven, Belgium) and HCl (37%, for

91

1 M HCl) from Sigma Aldrich (Saint Louis, USA). TSPd4 (3-(trimethylsilyl)propionic acid-d4 sodium salt,

92

98 atom% D), D2O (99.9 atom% D), EDTA (ethylenediaminetetraacetic acid) and NaN3 were

93

purchased from Merck (Darmstadt, Germany).

94

Sample collection of authentic tomato samples

95

The normally fruited (average weight of 100 g/fruit) tomato varieties Bocati, Hamlet, Mecano,

96

Savantas, Seviocard and Tica and the small fruited (average weight of 20 g/fruit) tomato varieties

97

Sakura, Sunstream and Tastery were grown in overall seven greenhouses in Germany:

98

-

99

Bamberg; organically and conventionally, referred to as ‘BA organic’ and ‘BA conv.’ in the

100 101

two greenhouses of the Bavarian State Research Institute of Viticulture and Horticulture in

following text -

two greenhouses of the State Horticultural College and Research Institute Heidelberg;

102

organically and conventionally, referred to as ‘HD organic’ and ‘HD conv.’ in the following

103

text

104

-

three greenhouses of trading farms in the growing region ‘Knoblauchsland’ near Nuremberg;

105

one organically and two conventionally, referred to as ‘N organic’, ‘N conv. 1’ and ‘N conv. 2’

106

in the following text

107

Conventional growing conditions were each carried out as hydroponic culture using perlite substrate

108

and mineral fertilizer, while organic growing conditions were carried out using soil and clover-grass

5 ACS Paragon Plus Environment

Journal of Agricultural and Food Chemistry

109

silage, horn shavings, vinasse, Patentkali, sheep wool or winter rye (previous culture for green

110

manure) as organic fertilizers.

111

Sampling was performed by harvesting tomatoes systematically from different plants in the

112

greenhouses, at regular intervals of circa 4 weeks between April and October in 2013 and between

113

May and October in 2014. In the harvesting period of 2013 only the greenhouses BA organic, BA

114

conv., N organic, N conv. 1 and N conv. 2 cultivated the varieties Mecano and Tastery (with the

115

exception of one Tica tomato sample of BA organic), while the cultivation was complemented with

116

the varieties Bocati, Hamlet, Savantas, Seviocard, Tica, Sunstream and Sakura in individual

117

greenhouses including two further greenhouses (HD organic and HD conv.) in 2014. This yielded

118

overall 205 tomato samples, thereof 66 harvested in 2013 and 139 in 2014. The composition of

119

samples for individual measurements with respect to cultivars and greenhouses is illustrated in

120

Table 1 (samples available for 1H NMR/MIR/IRMS/data fusions are described from left to right in each

121

cell). For subsequent analysis, at least 250 g of tomatoes were pooled, pureed and homogenized, and

122

the puree was stored at -18 °C until measurement.

123

Isotope ratio mass spectrometry (IRMS)

124

One part of pureed tomato sample was freeze dried using a freeze dryer (Alpha 1-4 LSC, Christ,

125

Osterode, Germany), pulverized using a ball mill MM 301 (Retsch GmbH, Haan, Deutschland) and

126

used for measurement of 13C/12C and 15N/14N isotope ratios. The other part was centrifuged for 10

127

min (2700 rcf), sodium azide was added to the supernatant and used for measurement of 18O/16O

128

isotope ratio.

129

2.2 mg of the pulverized sample dry mass was weighed into tin capsules, combusted using an

130

Elementar Analyzer (Euro EA 3000, Euro Vectors SpA, Milano, Italy) and analyzed with an Isotope

131

Ratio Mass Spectrometer (ΔPlus XP, Thermo Finnigan, Bremen, Germany) equipped with a ConFlow

132

IV Interface (ThermoFisher Scientific, Bremen, Germany), and an auto sampler (Zero Blank Revolver

133

Autosampler, Blisotec GmbH, Jülich, Germany) controlled by Isodat 3.0 software (Thermo Finnigan, 6 ACS Paragon Plus Environment

Page 6 of 33

Page 7 of 33

Journal of Agricultural and Food Chemistry

134

Bremen, Germany). Resulting gases, CO2 and N2, were separated by a GC column and isotope ratios

135

were determined simultaneously. The 18O/16O ratio in “tomato water” was measured in 200 µL after

136

equilibration with CO2 using a MultiFlow 07/003 (Elementar, Manchester, England) with a Gilson

137

222XL Sampler (Gilson, Villiers Le Bel, France) interfaced to an IRMS (IsoPrimeTM, Manchester,

138

England). The 13C/12C, 15N/14N and 18O/16O isotope ratios were given in ‰ on a δ-scale. The values

139

refer to the international reference standards VPDB (Vienna Pee Dee Belemnite) for δ13C,

140

atmospheric nitrogen for δ15N and Vienna-Standard Mean Ocean Water (V-SMOW2) for δ18O.

141

ߜሾ‰ሿ =

142

Acetanilide, casein, glutamic acid and water were calibrated as working standards using the

143

international standards (IAEA-CH6, IAEA-CH7, NBS 22, USGS 40 for 13C/12C, 15N/14N and V-SMOW2,

144

SLAP2 and GISP for 18O/16O). Samples were analyzed twice and working standards were measured

145

four times to control the stability of the series of measurement. The standard deviation for IRMS

146

analysis was ≤ 0.2 ‰.

147

1

148

The aqueous tomato phase was analysed after centrifugation of puree at 3528 g for 5 min. 900 µL of

149

clear liquid tomato phase was mixed with 100 µL of a solution of 7 mM TSPd4, 10 mM EDTA and

150

2 mM NaN3 in D2O and the pH was adjusted to pH 4.00 ± 0.03, using 1 M NaOH or 1 M HCl. Finally,

151

600 µL of the pH adjusted solutions were filled into 5 mm NMR-tubes for NMR-measurement using a

152

400 MHz 1H NMR spectrometer. Acquisition and processing parameters of 1H NMR measurement

153

were set as previously described.13 For examination, the spectral range from 0-10 ppm was used,

154

excluding the regions of the residual water signal from 4.67 – 4.85 ppm and of residual ethanol (NMR

155

tubes were reused and washed with ethanol) from 3.60-3.70 and 1.14-1.22 ppm.

ோೞೌ೘೛೗೐ ିோೞ೟ೌ೙೏ೌೝ೏ ோೞ೟ೌ೙೏ೌೝ೏

∗ 1000

H NMR spectroscopy

7 ACS Paragon Plus Environment

Journal of Agricultural and Food Chemistry

156

MIR spectroscopy

157

Tomato puree was filtered through a folded filter (4-7 µm). The clear filtrate was used for MIR

158

measurement with a WineScan FT120 instrument (Foss GmbH, Rellingen, Germany) for tomato

159

samples of 2013 and a WineScan FT2 Flex instrument (Foss GmbH, Rellingen, Germany) for tomato

160

samples of 2014. For examinations, the range of wavelengths from 964 cm-1 to 2998 cm-1 (528

161

acquired data points) was used, excluding the range from 1547 cm-1 to 1716 cm-1 in order to

162

eliminate water absorption. For each sample the averaged spectra of two successive measurements

163

was used.

164

Multivariate statistics

165

Data pre-processing was performed by reducing the dimension of 1H NMR data, bundling spectral

166

regions of 0.02 ppm width into buckets using Amix 3.9.12 software (Bruker Biospin GmbH,

167

Rheinstetten, Germany; each bucket represents the signal intensity related to the spectral region)

168

and MIR transmission spectra were converted into respective absorption spectra. Buckets of 1H NMR

169

spectra, wavenumbers of MIR spectra and IRMS data (δ13C, δ15N, and δ18O, given in ‰ on a δ-scale)

170

served as variables for multivariate data analysis.

171

Multivariate data analysis was performed on the assumption of normally distributed data. For

172

individual analysis of analytical data, PCA and LDA were carried out with SPSS statistics 21 (IBM

173

Corporation, Armonk, USA). LDA was performed with equal ‘a priori’ probabilities for all groups and

174

stepwise selection procedure23 (chosen method: minimization of Wilks’ Lambda24; selection criterion

175

F-statistics with p < 0.005 for inclusion and p > 0.010 for exclusion). During validations of LDA, instead

176

of all variables only the variables selected for LDA of all samples were taken into account for

177

examination. PLS-DA was performed with Unscrambler X version 10.0.1 (Camo Software AS, Oslo,

178

Norway) using the Non-linear Iterative PArtial Least Squares algorithm (NIPALS).

179

For combined analysis of several analytical methods, MATLAB 2015a (The Math Works, Natick, MA,

180

USA) and SAISIR package for MATLAB25 were used. For variable selection of 1H NMR and MIR data 8 ACS Paragon Plus Environment

Page 8 of 33

Page 9 of 33

Journal of Agricultural and Food Chemistry

181

clustering of latent variables (CLV) was used.26,27 LDA and PLS-DA were applied to the concatenated

182

spectroscopic data (1H NMR and MIR after CLV variable selection) and IRMS data (δ13C, δ15N, δ18O). In

183

this study LDA was applied to the PCA scores, since the number of variables should not be too large28.

184

The best classification models were constructed when the inverse of the sum of squares (square of

185

Euclidian distance) was used as block scaling factor (i.e. after applying the block scaling factor, the

186

total variance of each block equals 1). Furthermore, multiblock method ComDim21,22 was performed

187

on spectroscopic data (1H NMR and MIR after CLV variable selection) and IRMS data (δ13C, δ15N,

188

δ18O).

189

For model evaluation a test devised by Tóth et al.29 was performed, which provides information on

190

the prediction performance of classification models by comparing the variance of classification

191

models to the variance of their leave-one-out classification model counterparts using F-statistics.

192

Results and Discussion

193

Overall, 205 tomato samples of nine different varieties were analyzed. Besides 1H NMR spectra that

194

were recorded for each tomato sample (n = 205), IRMS (n = 114) and MIR spectroscopy

195

measurements (n = 199) were performed for a selection of tomato samples. In the following, the

196

capabilities of individual methods and of combinations of these methods for the differentiation of

197

organically and conventionally grown tomatoes will be described.

198

In order to get an overview on the data structure, PCA was performed using individual data of

199

1

200

spectroscopy.

201

Furthermore, for both data of individual methods (1H NMR, MIR) and combined methods

202

(1H NMR + MIR, 1H NMR + IRMS, MIR + IRMS, 1H NMR + MIR + IRMS) LDA and PLS-DA were tested for

203

their ability to classify the cultivation method of tomatoes. At this, LDA classification models revealed

204

equivalent or superior validation results than PLS-DA regarding the percentage of correct

H NMR and MIR spectroscopy and ComDim was applied to combined data of IRMS, 1H NMR and MIR

9 ACS Paragon Plus Environment

Journal of Agricultural and Food Chemistry

205

classifications, and LDA achieved constantly better comparability of results among different

206

validation steps. Thus, for reasons of simplicity, only the outcomes of LDA will be demonstrated in

207

the following.

208

Validation of classification models

209

The use of supervised classification methods as LDA or PLS-DA always entails the risk that overfitted

210

models are created, since purposing optimized classification can accidentally force the inclusion of

211

insignificant variables. Such models reveal indeed good classification abilities for model samples, but

212

fail in the classification of further samples. Hence, suitable validation studies of classification models

213

are highly important30 to consider both the suitability of multivariate analysis and the representative

214

nature of model samples.

215

Regarding the approach to differentiate organically and conventionally grown tomatoes, several

216

critical influencing factors have to be considered during validation. As a natural product, the

217

composition of tomatoes is subject to unavoidable natural variations which complicate the aim to

218

designate their cultivation method. In order to evaluate the influence of natural compositional

219

fluctuations, one third of samples were coincidentally excluded for the creation of LDA and PLS-DA

220

models and used as independent test set for validation (this validation procedure will be referred to

221

as “random validation” in the following text).

222

However, since tomato samples were collected repeatedly at different harvesting times, randomly

223

selected validation test sets comprise tomato samples of the same cultivars grown in the same

224

greenhouses, but simply harvested at another point in time than samples of the calibration set. Thus,

225

although good results for random validation indicate differentiability of organically and

226

conventionally grown tomatoes, the practicality of these classification models for further tomato

227

samples of another cultivar or from another greenhouse (with specific implementations of

228

cultivation) is questionable. For instance, the previous results on the differentiation of organically

229

produced tomatoes using 1H NMR showed that the differentiation between two greenhouses with 10 ACS Paragon Plus Environment

Page 10 of 33

Page 11 of 33

Journal of Agricultural and Food Chemistry

230

different growing conditions works well, but does not prove to be useful for the classification of

231

tomato samples from different greenhouses despite basically comparable growing conditions, since

232

the model is overfitted taking into account two greenhouses only.13 Hence, further validation steps

233

were performed. Complete test sets of individual cultivars and individual greenhouses were

234

specifically excluded for model calibration and used as test set, in order to assess the quality of

235

classifications for tomatoes, whose cultivar or specific cultivation method were not taken into

236

account

237

cultivars/greenhouses, consecutively, and calibration samples were formed by tomato samples of the

238

remaining cultivars/greenhouses. At this, each tomato sample was excluded once for the

239

corresponding cultivar and once for the greenhouse group and the average result of all tomato

240

samples yielded the terms of cultivar and greenhouse validation.

241

Table 2 illustrates the respective number of calibration and validation samples for classification

242

models of 1H NMR, MIR and data fusions (indicated from left to right) during the steps of cultivar and

243

greenhouse

244

cultivars/greenhouses and especially Tastery/Mecano is over-represented as cultivar for

245

small/normally fruited tomato samples, the use of these cultivars as validation test would lead to a

246

relatively small number of remaining calibration samples. Hence, for cultivar validation of fused data

247

Tastery/Mecano were not used as cultivar validation test set, because the remaining calibration set

248

would provide 8/11 tomato samples for calibration only, which is not appropriate. For all remaining

249

validation steps at minimum 42 tomato samples were available for calibration.

250

Generally, this validation concept presents a stepwise approach. Firstly, results of random validation

251

indicate the basic ability to differentiate organically and conventionally grown tomatoes and

252

subsequently, cultivar and greenhouse validation verify if the classification ability is adequately

253

resistant to compositional variations subject to specific cultivars or greenhouses. Thus, good results

254

for random validation coincident with worse outcome for cultivar/greenhouse validation indicate

for

calibration.

cultivation.

Validation

As

the

test

tomato

sets consisted

samples

are

of

not

11 ACS Paragon Plus Environment

each

evenly

group

of

distributed

individual

on

all

Journal of Agricultural and Food Chemistry

255

overfitted classification models, while good and comparable results among all validation steps

256

confirm suitability of the classification models.

257

Furthermore, in order to verify if validations yield significant results which are not based on random

258

events of meaningless data, a randomization test was performed: variables (1H NMR data and MIR

259

data, respectively) were replaced by random vectors and the validation results for classification

260

models thereof were analyzed. Since the random probability of each tomato sample to be organic or

261

conventional is 50%, an objective validation process of classification models based on random data is

262

expected to achieve circa 50% correct predictions. In accordance to this, on average 52±7% and

263

46±7% correct classifications (average of random/cultivar/greenhouse validations) were achieved for

264

the randomization test of LDA classification models for 1H NMR and MIR data, confirming the

265

informative value of the validation approach.

266

Isotope ratio mass spectrometry (IRMS)

267

Overall 114 tomato samples were analysed by IRMS regarding δ15N values of the dry residues of

268

tomatoes. The isotope composition of nitrogen in the applied fertilizers predefines the isotope

269

composition of nitrogen of the fertilized tomatoes and consequently, higher δ15N values of organic

270

fertilizers12 lead to higher δ15N values of organically produced tomatoes.11 One exception to be

271

mentioned is the use of leguminous, which is legitimate as organic fertilizer. Legumes can metabolize

272

atmospheric nitrogen (δ15N value around 0‰) to plant-accessible nitrogenous molecules which leads

273

to noticeable low δ15N values and thus, hampers differentiation from conventionally grown crops12.

274

Our results of IRMS totally comply with these findings (Figure 1). The δ15N value averaged

275

significantly higher results for organically than for conventionally grown tomatoes, but yet an

276

overlapping region existed in the range from 2‰ to 4‰. This overlap is mainly due to the use of

277

green manures of the greenhouses N organic and BA organic, while HD organic only applied horn

278

shavings and vinasse as fertilizers and yielded accordingly high δ15N values that were clearly

279

separated from the δ15N range of conventionally grown tomatoes. 12 ACS Paragon Plus Environment

Page 12 of 33

Page 13 of 33

Journal of Agricultural and Food Chemistry

280

Beside δ15N values, IRMS included the determination of δ13C (of the dry residue of tomatoes) and

281

δ18O (of the aqueous tomato phase), but these are less relevant in view of the growing regime. δ13C

282

(averaging -30.2 ± 3.5‰ vs. VPDB) indicates greenhouse cultivation due to striking negative values of

283

δ13C caused by supplement of CO2 from heatings with CH4 and δ18O (averaging -4.4 ± 1.5‰ vs. V-

284

SMOW) depends on the source of water.31

285

1

H NMR spectroscopy

286

For each tomato sample (n = 205) a 1H NMR spectrum of the aqueous phase was acquired. 1H NMR

287

spectra provided wide information about sugars, organic acids, amino acids and further minor

288

components at the same time13 and hence, 1H NMR is an accordingly useful source of data for

289

tomato profiling. To reduce the dimension of data, 1H NMR spectra were transformed into buckets

290

by bundling spectral regions of 0.02 ppm.

291

PCA

292

PCA of buckets was performed to get an overview of the data clustering. Mean-centred and

293

standardized buckets were used for analysis, because varying concentrations of ingredients resulted

294

in signal intensities that differed highly in scale. The scatter plot of PC1 vs. PC2 (Figure 2A)

295

demonstrates that the data clustered mainly according to the cultivar type and especially data clouds

296

of normally fruited varieties were separated from small fruited tomato varieties (Figure 2B). Actually,

297

the values of PC1 seem to be predefined by the dry mass of tomatoes, as PC1 highly correlated with

298

the total spectral intensity (R = 0.967; total spectral intensity was calculated as the sum of all buckets

299

from 0 - 10 ppm, excluding ranges of water and ethanol resonance signals). However, a trend for the

300

separation of respective organically and conventionally grown tomatoes was also achieved along PC5

301

with significantly higher values for the group of conventionally grown tomatoes (t-test: p < 0.001;

302

Figure 2C), but the overlapping data-clouds did not enable obvious differentiation.

303

LDA

13 ACS Paragon Plus Environment

Journal of Agricultural and Food Chemistry

304

Hence, the supervised classification algorithm of LDA was used for further examinations. PCA showed

305

that the main variance in NMR data is given by the total spectral intensity, which is probably due to

306

varying dry masses of tomatoes. To reduce this effect, buckets were transformed into their relative

307

values referred to the total spectral intensity (sum of all buckets from 0 - 10 ppm, excluding ranges of

308

water and ethanol resonance signals) prior to LDA.

309

Moreover, as PCA revealed wide differences between normally and small fruited tomatoes,

310

classification models were built for all tomatoes as well as for normally and small fruited tomatoes

311

individually. The Tóth-test29 thereof suggested better prediction performances for separate

312

classification models for the groups of normally and small fruited tomatoes (p = 0.48 and p = 0.20)

313

than for one overall classification model including all tomato samples (p < 0.05). Thus, separate

314

classification models were used for further examinations.

315

For both classification models of normally and small fruited tomato samples the individual validation

316

steps showed comparable outcome and thus, no indication for overfitted models is given (Table 3).

317

Comparing LDA classification results for normally and small fruited tomatoes among each other,

318

normally fruited ones yielded better validation results with 100% correct classifications for cultivar

319

and 99.1% for greenhouse validation compared to 91.3% for cultivar and 95.7% for greenhouse

320

validation of small fruited tomatoes. Regarding cultivar validation, the model for normally fruited

321

samples is possibly more representative, as six different cultivars were included compared to only

322

three different varieties of small fruited tomatoes.

323

MIR spectroscopy

324

MIR spectroscopy is a powerful tool for food analysis, offering simple sample preparation and rapid

325

analysis.32 It can be used for authenticity analysis33 as well as for quantification purposes after

326

adequate calibration with samples of known composition.34,35 To test the suitability of MIR for

327

differentiating tomatoes of different cultivation methods, the aqueous phase of tomatoes was

14 ACS Paragon Plus Environment

Page 14 of 33

Page 15 of 33

Journal of Agricultural and Food Chemistry

328

analyzed by means of MIR-spectroscopy. Overall 199 tomato samples were measured using MIR and

329

spectra were analyzed using PCA and LDA.

330

PCA

331

Mean centered data of MIR absorption spectra were used for examination. Just as for 1H NMR, PCA

332

of MIR-spectra mainly revealed the separation of cultivars (Figure 3A), especially of normally and

333

small fruited tomato samples along PC1 (Figures 3B and 3C). However, in between these groups,

334

conventionally produced tomatoes yielded significantly higher values for PC1 than organically

335

produced tomatoes (each t-test: p < 0.001), but yet with overlapping regions (Figures 3B and 3C).

336

LDA

337

Classification models were created for all tomato samples as well as for the groups of normally and

338

small fruited tomato samples individually. The Tóth-test29 indicated adequate prediction

339

performances of each classification model (p = 0.33, p = 0.24, p = 0.38 for all, normally and small

340

fruited tomato samples, respectively). In favour of comparability to classification results of 1H NMR

341

data, individual classification models for normally and small fruited tomatoes were used for further

342

examinations.

343

LDA of MIR data showed quite comparable results for random validation and cultivar/ greenhouse

344

validation and amounted to a maximum of 8.2% for the differences between classification results

345

(91.7% for random and 83.5% for greenhouse validation of normally fruited samples; Table 3). Hence,

346

no evidence for overfitting of LDA classification models is given. Comparing LDA classification findings

347

for normally and small fruited samples, results for normally fruited tomato samples (83.5%-91.7%)

348

were always better than for small fruited ones (75.6%-82.2%). Overall, LDA classification results for

349

MIR ranged between 75.6% and 91.7% and thus, are inferior to 1H NMR results (ranging from 93.5%

350

to 100%).

15 ACS Paragon Plus Environment

Journal of Agricultural and Food Chemistry

351

Data fusion of 1H NMR, MIR, and IRMS

352

Finally, the differentiation of organically and conventionally grown tomatoes was analyzed by fusing

353

data of individual methods (1H NMR, MIR, and IRMS). At this, normalized 1H NMR data were used,

354

scaled to the total spectral intensity (total spectral intensity was calculated as the sum of all buckets

355

from 0 - 10 ppm, excluding ranges of water and ethanol resonance signals). Since spectroscopic data

356

naturally present data sources with a high number of variables in contrast to IRMS with only three

357

variables (δ13C, δ15N, and δ18O), the number of variables of 1H NMR and MIR was reduced prior to

358

data fusions by using clustering of latent variables (CLV). The CLV method involves two stages,

359

namely a hierarchical clustering analysis followed by a partitioning algorithm. Partitioning is

360

determined by the value of a quality criterion (T) – the sum of the first eigenvalues of the data

361

matrices of each cluster.26,27

362

ComDim analysis

363

To get an overview of the sample grouping regarding data of all analytical methods, ComDim analysis

364

was performed for data of 1H NMR + MIR + IRMS, separately for the groups of normally and small

365

fruited tomatoes. The basic idea of ComDim is the creation of one common space of common

366

components out of several variable blocks available for the same samples,21,22 which are the variables

367

of several analytical methods for this special case. Figure 4 illustrates the results of ComDim analysis

368

(Figure 4A) compared to respective individual results of PCA for 1H NMR (Figure 4B) and MIR (Figure

369

4C) data as well as the range of δ15N (Figure 4D), separately for normally and small fruited tomato

370

samples. Compared to PCA analysis of individual methods, ComDim clearly shows an increased

371

separation trend of data points according to the cultivation method.

372

A major advantage of ComDim analysis compared to PCA on concatenated data is that ComDim

373

provides information about the relationship of individual variable blocks and their selectivity on the

374

total variance of common components.36 Figure 5 demonstrates the specific weight (or salience) of

375

1

H NMR, MIR, and IRMS associated with the first three common dimensions (D1, D2, D3) of ComDim 16 ACS Paragon Plus Environment

Page 16 of 33

Page 17 of 33

Journal of Agricultural and Food Chemistry

376

analysis. For both normally and small fruited tomatoes, MIR data are dominant for D1, IRMS data for

377

D2 and 1H NMR data for D3 (Figure 5), and thus, each analytical method considerably influenced the

378

results of ComDim.

379

LDA of concatenated data

380

Concatenation of several data matrices presents the simplest way of data fusion and was applied

381

combining data of 1H NMR + MIR, 1H NMR + IRMS, MIR + IRMS, 1H NMR + MIR + IRMS. The quality of

382

each data combination for a classification of the cultivation method (using LDA and PLS-DA) was

383

again assessed by test set validation (Table 3). For the sake of comparability, data of the same

384

tomato samples (n = 112) were used for all combinations of data, even if generally more samples

385

were available for individual combinations.

386

Comparing different data fusion models for the classification quality of LDA, best results were

387

achieved for 1H NMR + MIR + IRMS with 100% correct classifications for random, cultivar and

388

greenhouse validation for small fruited and 95.0% for random, 100% for cultivar and 98.3% for

389

greenhouse validation for normally fruited tomatoes. Second best LDA validation results are

390

presented by the combination of 1H NMR + IRMS (94.9%-100.0%), while results for 1H NMR + MIR

391

(72.9%-100%) and MIR + IRMS (83.3%-100%) are occasionally different for individual validations but

392

comparable in view of the average quality of all results.

393

Comparison of results for fused data and individual analysis

394

Comparing classification results of concatenated data to findings of individual methods, LDA

395

validation of 1H NMR + MIR + IRMS (95.0% - 100%) yielded better results than LDA validation of

396

individual methods (1H NMR: 91.3% - 100% and MIR: 75.6% - 85.3%). Hence, this supports the

397

approach to combine these methods in order to achieve synergies for an optimized differentiation of

398

organically and conventionally grown tomatoes.

17 ACS Paragon Plus Environment

Journal of Agricultural and Food Chemistry

399

Regarding single analytical methods, especially the classification results of 1H NMR models are

400

promising. However, the quality of 1H NMR models depends crucially on the representative nature of

401

model samples to avoid overfitting and generally, more tomato samples differing in cultivar and

402

specific growing conditions need to be measured to further enhance significance of results. Within

403

the framework of possibilities for the actual available sample compilation, test set validation was

404

performed as the best to yield a realistic estimation of the quality of results. In the future, if the

405

database of authentic tomato samples is sufficiently widened, enhanced chemometric classification

406

models can be used as a helpful screening tool to investigate the authenticity of tomatoes.

407

Moreover, additional measurement of MIR and IRMS analysis can improve classification results of

408

individual 1H NMR analysis.

409

Acknowledgement

410

Special thanks are due to colleagues from the Bavarian State Research Institute of Viticulture and

411

Horticulture (LWG, Bamberg, Germany), the State Horticultural College and Research Institute

412

Heidelberg (LVG, Heidelberg, Germany) and to producers of the region “Knoblauchsland” for

413

providing authentic tomato samples.

414

References

415

1. European Parliament, Committee on the Environment, Public Health and Food Saftey. Draft

416

report on the food crisis, fraud in the food chain and the control thereof (2013/2091(INI)).

417

http://www.europarl.europa.eu/sides/getDoc.do?pubRef=-//EP//NONSGML+COMPARL+PE-

418

519.759+02+DOC+PDF+V0//EN&language=EN (as from 05.08.2015).

419

2. Sahota, A. The world of Organic Agriculture, Statistics and Emerging Trends 2013. The Global

420

Market for Organic Food & Drink. https://www.fibl.org/fileadmin/documents/shop/1606-

421

organic-world-2013.pdf (as from 05.08.2015).

18 ACS Paragon Plus Environment

Page 18 of 33

Page 19 of 33

Journal of Agricultural and Food Chemistry

422

3. Caris-Veyrat C.; Amiot M. J.; Tyssandier V.; Grasselly D.; Buret M.; Mikolajczak M.; Guilland, J.

423

C.; Bouteloup-Demange C.; Borel, P. Influence of organic versus conventional agricultural

424

practice on the antioxidant microconstituent content of tomatoes and derived purees;

425

Consequences on antioxidant plasma status in humans. J Agric. Food Chem. 2004, 52, 6503-

426

6509.

427

4. Presseinformation der Bundesanstalt für Landwirtschaft und Ernährung vom 09.07.2013 -

428

20,6

kg

pro

Kopf

verzehrt:

Tomaten

sind

der

Deutschen

liebstes

Gemüse.

429

http://www.ble.de/SharedDocs/Downloads/08_Service/04_Pressemitteilungen/Archiv2013/

430

130709_Tomaten.pdf;jsessionid=F8552452F0D99F07C45DD6E21B128375.1_cid335?__blob=

431

publicationFile (as from 05.08.2015).

432

5. Mitchell, A. E.; Hong, Y. J.; Koh, E.; Barrett, D. M.; Bryant, D. E.; Denison, R. F.; Kaffka, S. Ten-

433

year comparison of the influence of organic and conventional crop management practices on

434

the content of flavonoids in tomatoes. J. Agric. Food Chem. 2007, 55, 6154-6159.

435

6. Vallverdú-Queralt, A.; Medina-Remón, A.; Casals-Ribes, I.; Amat, M.; Lamuela-Raventós, R.M.

436

A Metabolomic Approach Differentiates between Conventional and Organic Ketchups. J.

437

Agric. Food Chem. 2011, 59, 11703-11710.

438

7. Vallverdú-Queralt, A.; Medina-Remón, A.; Casals-Ribes, I.; Amat, M.; Lamuela-Raventós, R.M.

439

Is there any difference between the phenolic content of organic and conventional tomato

440

juice? Food Chem. 2012, 130, 222-227.

441

8. Kelly, S. D.; Bateman, A. S. Comparison of mineral concentrations in commercially grown

442

organic and conventional crops - Tomatoes (Lycopersum esculentum) and lettuces (Lactuca

443

sativa). Food Chem. 2010, 119, 738-745.

444 445

9. Gosling, P.; Hodge, A.; Goodlass, G.; Bending, G. D. Arbuscular mycorrhizal fungi and organic farming. Agric. Ecosyst. Environ. 2006, 113, 17-35.

19 ACS Paragon Plus Environment

Journal of Agricultural and Food Chemistry

446

10. Bateman, A. S.; Kelly, S. D.; Jickells, T. D. Nitrogen isotope relationships between crops and

447

fertilizer: implications for using nitrogen isotope analysis as an indicator of agricultural

448

regime. J. Agric. Food Chem. 2005, 53, 5760-5765.

449 450

11. Bateman, A. S.; Kelly, S. D.; Woolfe, M. Nitrogen isotope composition of organically and conventionally grown crops. J. Agric. Food Chem. 2007, 55, 2664-2670.

451

12. Rogers, K. M.; Nitrogen isotopes as a screening tool to determine the growing regimen of

452

some organic and nonorganic supermarket produce from New Zealand. J. Agric. Food Chem.

453

2008, 56, 4078-4083.

454

13. Hohmann, M.; Christoph, N.; Wachter, H.; Holzgrabe, U.; 1H NMR profiling as an approach to

455

differentiate conventionally and organically grown tomatoes. J. Agric. Food Chem. 2014, 62,

456

8530-8540.

457

14. Erich, S.; Schill, S.; Annweiler, E.; Waiblinger, H. U.; Kuballa, T.; Lachenmeier, D.W.;

458

Monakhova, Y. B. Combined chemometric analysis of 1H NMR, 13C NMR and stable isotope

459

data to differentiate organic and conventional milk. Food Chem. 2015, 188, 1-7.

460

15. Monakhova, Y. B.; Godelmann, R.; Hermann, A.; Kuballa, T.; Cannet, C.; Schäfer, H.; Spraul,

461

M.; Rutledge, D. N. Synergistic effect of the simultaneous chemometric analysis of 1H NMR

462

spectroscopic and stable isotope (SNIF-NMR,

463

Anal. Chim. Acta 2014, 833, 29-39.

464 465 466 467

18

O,

13

C) data: Application to wine analysis.

16. Di Anibal, C. V.; Callao, M. P.; Ruisánchez, I. 1H NMR and UV-visibile data fusion for determining Sudan syes in culinary spices. Talanta, 2011, 84, 829-833. 17. MacGregor, J. F.; Jaeckle, C.; Kiparissides, C.; Koutoudi, M. Process monitoring and diagnosis by multiblock PLS methods. AICHE J. 1994, 40, 826-838.

468

18. Westerhuis J. A.; Smilde, A. K. Deflation in multiblock PLS. J. Chemometr. 2001, 15, 485-493.

469

19. Cozzolino, D.; Holdstock, M.; Dambergs, R. G.; Cynkar W. U.; Smith, P. A. Mid infrared

470

spectroscopy and multivariate analysis: A tool to discriminate between organic and non-

471

organic wines grown in Australia. Food Chem. 2009, 116, 761-765. 20 ACS Paragon Plus Environment

Page 20 of 33

Page 21 of 33

472 473 474 475

Journal of Agricultural and Food Chemistry

20. Qannari, E. M.; Wakeling, I.; Courcoux, P.; MacFie, H. J. H. Defining the underlying sensory dimensions. Food Qual. Prefer. 2000, 11, 151-154. 21. Qannari, E. M.; Wakeling, I.; MacFie, H. J. H. A hierarchy of models for analysing sensory data. Food Qual. Prefer. 1995, 6, 309-314.

476

22. Drinkwater L. E.; Letourneau, D. K.; Workneh, F.; van Bruggen, A. H. C.; Shennan, C.

477

Fundamental differences between conventional and organic tomato agroecosystems in

478

California. Ecol. Appl. 1995, 5, 1098-1112.

479 480

23. Flury, B.; Riedwyl, H. In Angewandte multivariate Statistik. 1st ed.; Gustav Fischer Verlag: Stuttgart, Germany, 1983.

481

24. Marini, F., Magrí, A. L., Balestrieri, F., Fabretti, F., Marini, D. Supervised pattern recognition

482

applied to the discrimination of the floral origin of six types of Italian honey samples. Anal.

483

Chim. Acta 2004, 515, 117-125.

484 485 486 487

25. Cordella, C.; Bertrand, D. SAISIR: A new general chemometric toolbox. Trac – Trend Anal. Chem. 2014, 54, 75-82. 26. Vigneau, E.; Qannari, E. M. Clustering of Variables Around Latent Components. Commun. Stat. – Simul. C. 2003, 32, 1131-1150.

488

27. Cuny, M.; Vigneau, E.; Le Gall, G.; Colquhoun, I.; Lees, M.; Rutledge, D. N. Fruit juice

489

authentication by 1H NMR spectroscopy in combination with different chemometrics tools.

490

Anal. Bioanal. Chem. 2008, 390, 419-427.

491

28. Monakhova, Y. B.; Godelmann, R.; Kuballa, T.; Mushtakova, S. P., Rutledge, D. N.

492

Independent components analysis to increase efficiency of discriminant analysis methods

493

(FDA and LDA): Application to NMR fingerprinting of wine. Talanta 2015, 141, 60-65.

494

29. Tóth, G.; Bodai, Z.; Heberger, K. Estimation of influential points in any data set from

495

coefficient of determination and its leave-one-out cross-validated counterpart. J. Comput.

496

Aid. Mol. Des. 2013, 27, 837–844.

21 ACS Paragon Plus Environment

Journal of Agricultural and Food Chemistry

497 498

30. Riedl, J.; Esslinger, S.; Fauhl-Hassek, C. Review of validation and reporting of non-targeted fingerprinting approaches for food authentication. Anal. Chim. Acta 2015, 885, 17-32.

499

31. Schmidt, H. L.; Roßmann, A.; Voerkelius, S.; Schnitzler, W. H.; Georgi, M.; Grassmann, J.

500

Zimmermann, G., Winkler, R. Isotope characteristics of vegetables and wheat from

501

conventional and organic production. Isot. Environ. Healt. S. 2005, 41, 223-238.

502 503

32. Vandevoort, F. R.; Fourier transform infrared spectroscopy applied to food analysis. Food Res. Int. 1992, 25, 397-403.

504

33. Cozzolino, D.; Smyth, H. E.; Gishen, M. Feasibility study on the use of visible and near-

505

infrared spectroscopy together with chemometrics to discriminate between commercial

506

white wines of different varietal origins. J. Agric. Food Chem. 2003, 51, 7703-7708.

507 508 509 510

34. Bauer, R.; Nieuwoudt, H.; Bauer, F. F.; Kossmann, J.; Koch, K. R.; Esbensen, K. H. FTIR spectroscopy for grape and wine analysis. Anal. Chem. 2008, 80, 1371-1379. 35. Lachenmeier, D. W. Rapid quality control of spirit drinks and beer using multivariate data analysis of fourier transform infrared spectra. Food Chem. 2007, 101, 825-832.

511

36. Mazerolles, G.; Hanafi, M.; Dufour, E.; Bertrand, D.; Qannari, E. M. Common components and

512

specific weights analysis: a chemometric method for dealing with complexity of food

513

products. Chemometr. Intell. Lab. 2006, 81, 41-49.

514

Notes

515

This research project was funded by the Bavarian State Ministry of the Environment and Consumer

516

Protection and Y. Monakhova acknowledges funding in the framework of the state contract

517

4.1708.2014K of the Russian Ministry of Education.

22 ACS Paragon Plus Environment

Page 22 of 33

Page 23 of 33

Journal of Agricultural and Food Chemistry

518

Figure Captions

519

Figure 1: Box plot of δ15N values of the aqueous tomato phase (expressed as ‰ vs. atmospheric

520

nitrogen) with regard to the cultivation method (organic light grey and conventional dark

521

grey colored) for all tomato samples (on the left side) and tomato samples of individual

522

greenhouses (on the right side); each box is determined by the 25th and 75th percentiles, each

523

whiskers by the 5th and 95th percentiles.

524

Figure 2: PCA of NMR data; A: scatter plot of PC1 vs. PC2 with square symbols for normally fruited

525

(Bocati blue, Hamlet red, Mecano yellow, Savantas cyan, Seviocard purple and Tica pink

526

colored) and triangular symbols for small fruited tomato samples (Sakura yellow, Sunstream

527

light green, Tastery purple colored); B: scatter plot of PC1 vs. PC2 with square yellow symbols

528

for normally fruited and blue triangular symbols for small fruited tomato samples; C: scatter

529

plot of PC1 vs. PC5 with square symbols for normally fruited and triangular symbols for small

530

fruited tomato samples, which are colored light grey for organic and dark grey for

531

conventional cultivation methods.

532

Figure 3: Scatter plot of PC1 vs. PC2 for PCA of MIR data; A: square symbols for normally fruited

533

(Bocati blue, Hamlet red, Mecano yellow, Savantas cyan, Seviocard purple and Tica pink

534

colored) and triangular symbols for small fruited tomato samples (Sakura yellow, Sunstream

535

light green, Tastery purple colored); B: square symbols for normally fruited (colored light

536

grey for organic and dark grey for conventional cultivation methods) and triangular colorless

537

symbols for small fruited tomato samples; C: square colorless symbols for normally fruited

538

and triangular symbols (colored light grey for organic and dark grey for conventional

539

cultivation methods) for small fruited tomato samples.

540

Figure 4: Figures A-D are illustrated for normally (on the left side) and small fruited tomatoes (on the

541

right side) with square symbols for normally fruited and triangular symbols for small fruited

542

tomato samples, each colored light grey for organic and dark grey for conventional 23 ACS Paragon Plus Environment

Journal of Agricultural and Food Chemistry

543

cultivation methods) A: three-dimensional plot of the first three dimensions of ComDim

544

analysis for 1H NMR + MIR + IRMS data; B: PCA scatter plot (PC1 vs. PC5) for 1H NMR data; C:

545

PCA scatter plot (PC1 vs. PC2) for MIR data; D: box plot of δ15N of the aqueous tomato phase

546

(expressed as ‰ vs. atmospheric nitrogen).

547

Figure 5: Salience of 1H NMR, MIR and IRMS data on the first three dimensions of ComDim analysis

548

for normally fruited (on the left side) and small fruited tomatoes (on the right side).

24 ACS Paragon Plus Environment

Page 24 of 33

Page 25 of 33

Journal of Agricultural and Food Chemistry

Table 1: Number of tomato samples for measurements of 1H NMR/MIR/IRMS/data fusion analysis (from left to right in each cell) with respect to harvesting period, cultivar, and greenhouse.

2013 2014

Mecano Tica Tastery Bocati Hamlet Mecano Savantas Seviocard Tica Sakura Sunstream Tastery

BA org. 6/6/5/5 1/0/0/0 6/6/5/5 6/6/0/0 6/6/6/6

6/6/0/0 6/6/0/0 6/5/0/0 6/6/6/6

N org. 6/6/6/6

HD org.

6/5/6/5 4/4/4/4 4/4/4/4 6/6/0/0 4/4/4/4 3/3/0/0 3/3/3/3

4/4/4/4 3/3/3/3

BA conv. 6/6/5/5

N conv.1 8/7/6/6

N conv.2 8/7/6/6

6/6/5/5

7/6/6/5

6/6/6/6

6/6/0/0 6/6/6/6

6/6/0/0

6/6/0/0 1/1/0/0

6/6/0/0 6/6/0/0 7/7/1/1 6/6/6/6

6/6/0/0

5/5/0/0

25 ACS Paragon Plus Environment

HD conv.

4/4/4/4

3/3/3/3 4/4/4/4

Journal of Agricultural and Food Chemistry

Table 2: Number of calibration and validation samples for each validation step for classification models of 1H NMR, MIR and data fusions (listed from left to right in each cell); numbers in brackets are indicated for information purposes only, no validation was performed with this test sets.

cultivar greenhouse cultivar greenhouse

normally fruited tomatoes

small fruited tomatoes

Validation set Sakura Sunstream Tastery BA organic N organic HD organic BA conv. N conv. 1 N conv. 2 HD conv. Bocati Hamlet Mecano Savantas Seviocard Tica BA organic N organic HD organic BA conv. N conv. 1 N conv. 2 HD conv.

Number of samples for calibration for validation 81 78 53 12 12 0 73 71 45 20 19 8 32 31 (8) 61 59 (45) 69 67 42 24 23 11 87 85 48 6 5 5 86 83 46 7 7 7 68 65 41 25 25 12 80 78 48 13 12 5 82 79 47 11 11 6 86 83 46 7 7 7 108 105 55 4 4 4 96 93 55 16 16 4 40 39 (11) 72 70 (48) 108 105 59 4 4 0 109 106 56 3 3 3 99 97 59 13 12 0 87 85 48 25 24 11 97 94 53 15 15 6 97 94 44 15 15 15 88 85 48 24 24 11 98 96 53 14 13 6 97 95 53 15 14 6 108 105 55 4 4 4

26 ACS Paragon Plus Environment

Page 26 of 33

Page 27 of 33

Journal of Agricultural and Food Chemistry

Table 3: Test set validation for LDA using data of 1H NMR, MIR, 1H NMR + MIR, 1H NMR + IRMS, MIR + IRMS, and 1H NMR + MIR + IRMS, separately for small and normally fruited tomatoes with each random validation and validation of individual cultivars and greenhouses. 1

H NMR

MIR

H NMR + MIR

random Small fruited cultivar Tomatoes greenhouse

93.5

80.0

96.7

100.0

100.0

H NMR + MIR + IRMS 100.0

91.3

82.2

100.0

100.0

83.3

100.0*

95.7

75.6

73.6

96.2

84.9

100.0

Normally random cultivar fruited Tomatoes greenhouse

100.0

91.7

94.4

95.0

100.0

95.0

100.0

85.3

90.9

100.0

90.9

100.0*

99.1

83.5

72.9

94.9

88.1

98.3

validation step

1 1

1

H NMR + IRMS

MIR + IRMS

* cultivar validation for Mecano and Tastery was renounced due to an inappropriate ratio between the number of samples for validation and calibration - only the respectively remaining cultivars served as test set for cultivar validation

27 ACS Paragon Plus Environment

Journal of Agricultural and Food Chemistry

Figure 1.

28 ACS Paragon Plus Environment

Page 28 of 33

Page 29 of 33

Journal of Agricultural and Food Chemistry

Figure 2.

29 ACS Paragon Plus Environment

Journal of Agricultural and Food Chemistry

Figure 3.

30 ACS Paragon Plus Environment

Page 30 of 33

Page 31 of 33

Journal of Agricultural and Food Chemistry

Figure 4.

31 ACS Paragon Plus Environment

Journal of Agricultural and Food Chemistry

Figure 5.

32 ACS Paragon Plus Environment

Page 32 of 33

Page 33 of 33

Journal of Agricultural and Food Chemistry

TOC

33 ACS Paragon Plus Environment