Untangling the Chemistry of Port Wine Aging with ... - ACS Publications

Feb 18, 2013 - Chromatography separates the different components of complex mixtures and generates a fingerprint representing the chemical composition...
1 downloads 0 Views 2MB Size
Subscriber access provided by RYERSON UNIV

Article

Untangling the chemistry of Port wine aging with the use of GC-FID, multivariate statistics and network reconstruction Dan Jacobson, Ana Rita Monforte, and Antonio Cesar Silva Ferreira J. Agric. Food Chem., Just Accepted Manuscript • DOI: 10.1021/jf3046544 • Publication Date (Web): 18 Feb 2013 Downloaded from http://pubs.acs.org on February 21, 2013

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Journal of Agricultural and Food Chemistry is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 36

Journal of Agricultural and Food Chemistry

 

ACS Paragon Plus Environment

Journal of Agricultural and Food Chemistry

Page 2 of 36

1

Title: Untangling the chemistry of Port wine aging with the use of GC-FID,

2

multivariate statistics and network reconstruction

3

4

Dan Jacobsona, Ana Rita Monforteb, António César Silva Ferreiraa,b

5

a

6

South Africa

IWBT – DVO University of Stellenbosch, Private Bag XI, Matieland 7602,

7

8

b

9

Dr.António Bernardino de Almeida, 4200-072 Porto, Portugal.

Escola Superior de Biotecnologia, Universidade Católica Portuguesa, Rua

10

11

12

13

Corresponding author: António Silva Ferreira

14

Phone: +351 22 558 0001

15

Fax: +351 22 509 0351

16

E-mail address: [email protected]

17

Running title: Untangling the chemistry of Port wine aging with the use of

18

GC-FID, multivariate statistics and network reconstruction

19

20

ACS Paragon Plus Environment

1

Page 3 of 36

Journal of Agricultural and Food Chemistry

21

Abstract

22

Chromatography separates the different components of complex mixtures

23

and generates a fingerprint representing the chemical composition of the

24

sample. The resulting data structure will depend on the characteristics of the

25

detector used, univariate for devices such as a flame ionization detector (FID)

26

or multivariate for mass spectroscopy (MS). This study addresses the potential

27

use of a univariate signal for a non-targeted approach in order to (i) classify

28

samples according to a given process or perturbation, (ii) evaluate the feasibility

29

of developing a screening procedure to select candidates related to the process

30

and (iii) provide insight into the chemical mechanisms which are affected by the

31

perturbation. In order to achieve this it was necessary to use and develop

32

methods for data pre-processing and visualization tools to assist an analytical

33

chemist to view and interpret complex multidimensional data sets.

34

Dichloromethane Port wine extracts were collected using GC-FID; the

35

chromatograms were then aligned with correlation optimized warping (COW)

36

and subsequently analyzed with multivariate statistics (MVA) by principal

37

component analysis

38

Furthermore, wavelets were used for peak calling and alignment refinement and

39

the resulting matrix used to perform kinetic network reconstruction via

40

correlation networks and maximum spanning trees. Network-target correlation

41

projections were used to screen for potential chromatographic regions/peaks

42

related

43

chromatograms and target molecules showed a high X to Y correlation of 0.91,

44

092 and 0.89; with 5-hydroxymethylfurfural (HMF) (Maillard), acetaldehyde

45

(oxidation) and 4,5-dimethyl-(5H)-3-hydroxy-2-furanone, respectively. The

to

aging

(PCA) and partial least squares regression (PLS-R).

mechanisms.

Results

from

PLS

ACS Paragon Plus Environment

between

aligned

2

Journal of Agricultural and Food Chemistry

Page 4 of 36

46

context of the correlation (and therefore likely kinetic) relationships amongst

47

compounds detected by GC-FID and the relationships between target

48

compounds within different regions of the network can be clearly seen.

49

Keywords: Univariate signal, Aging, Mechanisms, Preprocessing, GC-FID,

50

PCA, PLS, Network Theory, Kinetic Network Reconstruction

51

ACS Paragon Plus Environment

3

Page 5 of 36

Journal of Agricultural and Food Chemistry

Introduction

52

53

Port wine is a fortified wine produced in the Douro region of Portugal. After

54

the vinification process wines are exclusively barrel aged (Tawnys) or matured

55

for two years in a cask and then bottled (Vintage).

56

The aromatic profile of port wine changes during aging as the result of

57

several underlying mechanisms. Therefore, if one wants to understand or

58

modulate the sensory attributes of Port it is important to understand these

59

mechanisms and the inter-connections amongst them. Several of the

60

mechanisms are to a large extent already described as Maillard1-7 or oxidation8-

61

12

, nevertheless the overlaps between these two mechanisms is not well known.

62

In Port wines sotolon was recognized as a key molecule in the “perceived

63

age” of barrel storage Port wine and consequently in the aroma quality of the

64

final product. Its concentration can range from a few dozen µg/L in a young

65

wine to 1 mg/L in wines older than 50 years. The odor threshold has been

66

estimated to be 19 µg/L19,20.

67

The Maillard reaction has been suggested by several authors to be

68

responsible for the formation of sotolon as a product of a reaction involving

69

hexoses and pentoses in the presence of cysteine4 and from the aldol

70

condensation of butane-2,3-dione and hydroxyacetaldehyde21. On the other

71

hand, several papers have linked sotolon formation with oxidation19, 22-25. Both

72

oxygen and temperature influence sotolon concentration, which suggests that

73

its origin involves a connection between oxidation and Maillard mechanisms26.

74

Therefore, wine aging is a complex system, which requires more information

75

to be analysed in order to better understand the mechanisms at play. Given

76

this, techniques that are able to capture information about a broader range of

ACS Paragon Plus Environment

4

Journal of Agricultural and Food Chemistry

Page 6 of 36

77

compounds participating in the aging process are necessary in order to achieve

78

a better understanding thereof.

79

Metabolomics is defined as the study of “as many-small-metabolites-as-

80

possible” in a system27. In this paper we attempt to describe an example of

81

chemiomics which we define to be the study of the relationships between as

82

many chemical compounds as possible in a complex chemical (non-enzymatic)

83

system. In order to accomplish this by chemical profiling two strategies can be

84

employed: (i) “targeted analysis”, using a priori knowledge of which compounds

85

to analyse, which requires their identification and manual quantification or (ii)

86

“untargeted analysis” where one tries to detect as many compounds as possible

87

in order to acquire sample fingerprints which will be submitted to multivariate

88

analyses and network analysis for further contextualisation. The identification

89

and quantitation is then performed on the variables that are found to be

90

associated with the principal components/correlation vectors as determined by

91

multivariate and network analysis.

92

Spectroscopic detectors, such as those based on UV-Vis (ultraviolet-visible),

93

FTIR (Fourier transform infrared) or NMR (nuclear magnetic resonance)

94

spectroscopy, are largely employed to obtain chemical fingerprints which can be

95

used for sample classification as well for chemical quantitation28-37. The

96

extremely convoluted resulting spectra can be further processed with MVA

97

techniques that compensate, to a certain extent, for the absence of structural

98

information in complex chemical mixtures.

99

In spite of the versatility of these detectors the absence of structural

100

information, due to the extremely convoluted signal, constitutes a major

ACS Paragon Plus Environment

5

Page 7 of 36

Journal of Agricultural and Food Chemistry

101

drawback in obtaining molecular identifications if there is a need to study

102

complex systems that require kinetic contextualization.

103

Chromatography has long been used for the separation of molecules

104

enabling both the quantitative and qualitative analysis of constituents in

105

complex mixtures. The separation of the different components of a complex

106

mixture generates a fingerprint representing the chemical composition of the

107

sample.

108

experimental conditions can then be used to explain the changes caused by a

109

perturbation38. Due to the separation performed by the column it is possible to

110

identify structures based on the elution time in a given chromatographic profile.

Chromatographic fingerprints taken from samples under different

111

Chromatographic data has a huge number of variables and PCA and PLS

112

are MVA visualization techniques that allow for the interpretation of

113

multidimensional data sets. When multivariate analysis involves large datasets,

114

variable selection processes play an important role because they eliminate the

115

less significant or non-informative variables. The overall aim of any variable

116

selection technique is to capture variables from the original dataset that are

117

most specifically related to the problem of interest and to exclude those

118

variables that are affected by other sources of variation.

119

PCA is a non-supervised technique that decomposes the original variables of

120

a data set into two matrices: the score and the loading matrices. The scores

121

matrix contains information about the samples, which are described in terms of

122

their projection onto the principal components. The loading matrix contains

123

information about the variables which are also described in terms of their

124

projection onto the principal components. The loadings can also be interpreted

125

as the contribution of the variables for the observed scores distribution.

ACS Paragon Plus Environment

6

Journal of Agricultural and Food Chemistry

Page 8 of 36

126

Consequently, the use of GC fingerprints with MVA should make it possible to

127

extract considerable amounts of information from complex mixtures. The

128

tandem of GC-MVA is, in our perspective, a middle ground between a rich

129

detector, which provides structural information (NMR), and detectors like FTIR

130

and UV-Vis.

131

Therefore the aim of this study is to validate the feasibility of using univariate

132

chromatographic data, in particular gas chromatography with a flame ionization

133

detector (GC-FID), as a screening procedure to classify complex chemical

134

mixtures, such as wine samples, to identify which compounds are responsible

135

for differences and to perform network reconstructions that may indicate

136

underlying kinetic relationships and mechanisms.

137

Materials and Methods

138

Reagents. The chemicals, 3-octanol (97%) 3-hydroxy-4,5-dimethyl-2(5H)-

139

furanone

140

(≥99.5%), acetaldehyde (≥99.5%), ethyl lactate (98%), 5-(ethoxymethyl)furfural

141

(≥99.5%), acetic acid (≥99.5%), 2,3-butanediol (98%), diethyl succinate

142

(≥99.5%), 2-phenylethanol (≥99.5%), diethyl malate (>97%), succinic monoethyl

143

ester (≥99.5%), benzaldehyde (≥99.5%), octanoic acid (≥99.5%), hexanoic acid

144

(≥99.5%) aspartame (>99%), glutamine

145

(>99%), glycine (>99%), arginine (>99%), gamma-aminobutyric acid (>99%)

146

alanine (>99%), tyrosine (>99%), valine (>99%), phenylalanine (>99%), leucine

147

(>99%), ornitin (>99%), lysine (>99%), homoserine (>98%), norvaline (>98%),

148

homocysteine (>99%), 2-sulfanylethanol (98%), tetraphenylborate (>99.5%),

149

iodoacetic acid (>99%), o-phtaldialdehyde (>99%), and n-alkanes (C11-C22)

150

were obtained from Sigma-Aldrich, USA. The cis-5-hydroxy-2-methyl-1,3-

(≥99.5%),

5-methylfurfural

(≥99.5%),

5-hydroxymethylfurfural

(>99%), cysteine (>99%), serine

ACS Paragon Plus Environment

7

Page 9 of 36

Journal of Agricultural and Food Chemistry

151

dioxane

(cis-dioxane),

152

dioxolane),

153

trans-5-hydroxy-2-methyl-1,3-dioxane

154

according to Maillard39.

155

Dicloromethane (HPLC grade) was purchased from LabScan, Sowinskiego,

156

Gliwice. Anhydrous sodium sulfate and methanol (HPLC grade) were obtained

157

from Merck, Darmstadt, Germany.

cis-4-hydroxymethyl-2-methyl-1,3-dioxolane

trans-4-hydroxymethyl-2-methyl-1,3-dioxolane (trans-dioxane)

(cis-

(trans-dioxolane), were

synthetized

158 159

Port Wine samples. The 37 samples used in this study were between 2 and 60

160

years of age: one sample for 2, 14, 19, 23, 35, 40, 42, 48, 54, 57 and 60 years

161

of age, two samples for 7 and 20 years of age, three samples for 4 and 5 years

162

of age and ten samples of 10 years of age. All wines were matured in oak

163

barrels. These samples were supplied by the “Instituto do Vinho do Porto e do

164

Douro”. The wines were made following standard traditional winemaking

165

procedures for Port wine and have been certified.

166

Analytical procedure.

167

Volatiles Extraction. A liquid-liquid extraction was performed to extract the

168

volatile fraction from each sample. The procedure used was as follows: 5 g of

169

anhydrous sodium sulphate and 50 µL of internal standard (3-octanol) were

170

added to 50 mL of sample and were extracted twice with 5 mL of

171

dicloromethane using a magnetic stir bar for 5 minutes per extraction and 2 mL

172

of the resulting organic phase were concentrated under a nitrogen stream 4

173

times. The extract was then analysed by GC (Agilent 5980, USA) with FID

174

detection. Two microliters of the extract were injected. Chromatographic

175

conditions were the following; column BP-21 (50 m x 0,25 mm x 0,25 µm) fused

ACS Paragon Plus Environment

8

Journal of Agricultural and Food Chemistry

Page 10 of 36

176

silica (SGE, Portugal); hydrogen (5.0, Air-liquide, Portugal); 1.2 mL/min flow

177

rate; injector temperature, 220ºC; oven temperature, 40 ºC for 1 min

178

programmed at a rate of 2ºC/min to 220 ºC, maintained during 30 min; splitless

179

time, 0.5 min; split flow, 30 mL/min.

180

In order to facilitate identification, the Kovat’s index for each peak was 40

181

calculated as described by Van den Dool and Kratz

. This determination was

182

performed on polar phase columns, BP21 (50 m x 0.25 mm x 0.25 µm).

183 184

Amino Acids Analysis. Twenty one amino acids were analysed in the Port

185

wine samples: aspartic acid, glutamic acid, cysteine, asparagine, histidine,

186

serine, glycine, arginine, threonine, alanine, g-aminobutyric acid, tyrosine,

187

ethanolamine, valine, methionine, tryptophan, phenylalanine, isoleucine,

188

leucine, ornithine and lysine. The methodology used was that described by

189

Pipris-Nicolau et at41.

190 191

Acetaldehyde, furanic compounds and sotolon analysis. These analyses

192

were done as described by Silva Ferreira et al19.

193 194

Data pre-processing. The ASCII file of chromatographic data obtained from

195

each sample was extracted and a matrix created containing all of the

196

chromatograms. The intensities were normalized by dividing each value by the

197

intensity of the internal standard (3-octanol). The raw dataset (GC-FID) was

198

then imported into The Unscrambler®X 10.1 (Camo, Sweden), where the first

199

stage of the alignment of chromatograms was performed using Correlation

200

Optimized Warping (GC-FID-COW).

This algorithm aligns chromatograms by

ACS Paragon Plus Environment

9

Page 11 of 36

Journal of Agricultural and Food Chemistry

201

means of sectional linear stretching and compression, which shifts the peaks of

202

one chromatogram to correlate with those of the other chromatograms in the

203

dataset42. The saturated peaks were then removed and the baseline corrected

204

(GC-FID-COW-saturated removed and baseline correction). The resulting

205

matrix (GC-FID-X) was then used for multivariate data analysis as described in

206

the statistical analysis section.

207 208

Statistical analysis. The data was analysed with PCA and PLS-R using either

209

Qlucore (Lund, Sweden) or SIMCA-P+ 12.0.1 (Umetrics, Norway). PCA shows

210

similarities between samples projected on a plane and makes it possible to

211

determine which variables determine these similarities and in what way. PLS is

212

used to extract factors related to one or more response values. PLS validation

213

was performed by cross-validation method.

214 215

Kinetic Network Reconstruction. In order to attempt to reconstruct the

216

underlying kinetic network, the fingerprint needed to be further compressed to a

217

single value for each putative molecule detected by GC-FID. Thus, each

218

chromatographic peak needed to be replaced with a single value for the

219

intensity and retention-time, at the apex of each peak and therefore a more

220

refined alignment procedure was required. This was achieved as follows: An

221

“average chromatogram” was created by taking the mean of the values at each

222

elution point in the GC-FID-X matrix. The average chromatogram and the

223

sample chromatograms were smoothed with the Savitzky-Golay method

224

(settings: left=15, right=15, polynomial degree=0)

225

Unscrambler

43

as implemented in The

®

X 10.1 (Camo, Sweden). The wavelet method of Du et al.44,

ACS Paragon Plus Environment

10

Journal of Agricultural and Food Chemistry

Page 12 of 36

226

which was originally developed for peak calling in peptide mass-spec data, was

227

adapted for finding peak centres in chromatographic data and a Mexican hat

228

wavelet used to determine the location of all of the peaks in the average and

229

sample chromatograms. A custom built Perl program was integrated with the

230

R-based wavelet method to achieve this. The distances between the locations

231

of all of the peaks in each sample chromatogram and the locations of the peaks

232

in the average chromatogram were calculated. It was observed that there was

233

a correlation between peak height and the amount of peak centre shift that

234

occurred across chromatograms and we thus devised a two-step process for

235

aligning sample peaks to those of the average chromatogram. If the (internal-

236

standard-normalized) height of the average peak was greater than 2 and the

237

distance to the nearest sample peak was less than 0.3 minutes then the

238

intensity value of the sample peak was assigned as the sample value at the

239

average peak location. However, if the (internal-standard-normalized) height of

240

the average peak was less than 2 and the distance to the nearest sample peak

241

was less than 0.15 minutes then the intensity value of the sample peak was

242

assigned as the sample value at the average peak retention time.

243

algorithm was implemented in Perl.

This

244

As a result of this process a new matrix was created that contained the

245

retention time of all peaks in the average chromatogram and the intensity

246

values of all of the peak centres from each sample aligned to these average

247

retention times. Thus a vector was created for each peak (presumably

248

representing a compound) across all samples. An all-against-all comparison

249

was done calculating the Pearson correlation between each and every peak

250

vector. As such, one is able to track the increase or decrease of compounds

ACS Paragon Plus Environment

11

Page 13 of 36

Journal of Agricultural and Food Chemistry

251

(peaks) during the aging process and determine the correlative relationships

252

amongst them.

253

represented the remaining relationships as a mathematical graph in order to

254

form a correlation network with the nodes representing peaks and the edges

255

weighted with the Pearson correlations between the peak vectors. In order to

256

reconstruct the most likely kinetic network underlying the set of chemical

257

reactions involved in the aging process, a maximum spanning tree was created

258

by transforming the edge weights into inverse correlations (by taking the

259

difference between the number 1 and the absolute correlation values) and the

260

subsequent use of a minimum spanning tree (mst) algorithm45 on the this

261

inverse correlation network. A minimum spanning tree represents the shortest

262

possible path through a graph and, as such, selects for the smallest inverse

263

correlation (i.e highest correlation) pairs between all nodes in the network. The

264

resulting networks were visualized in Cytoscape 46.

We applied a Pearson correlation threshold of 0.8 and

265 266

Results / Discussion

267

Principal Component Analysis. Our initial goal for the use of PCA was to

268

examine the intrinsic variation in the data set prior to alignment in order to

269

determine if the volatile fraction of the samples followed a trend related to age.

270

However, when using the GC-FID matrix described above some samples did

271

not follow the latent age variable described by PC1, namely the 4 and 60 year

272

old samples (score plot not shown). The analyses global workflow is described

273

in Figure 1.

274

The loading plot in Figure 2 shows higher levels of acetic acid, 2,3-

275

butanediol, diethyl succinate, diethyl malate, phenylethanol and succinic mono

ACS Paragon Plus Environment

12

Journal of Agricultural and Food Chemistry

Page 14 of 36

276

ethyl ester present in the older samples. The esterification process appears to

277

be the most prevalent reaction amongst the compounds apparent in the loading

278

plot. The organic acids naturally present in grape must, such as malic acid and

279

those present as the result of fermentation, such as lactic, succinic and acetic

280

acids all react with ethanol to yield the esters seen in the loading plot47.

281

However, these molecules are out of the detector’s linear response range, so

282

they needed to be eliminated.

283

aligned because an unavoidable characteristic of all chromatographic data is

284

that the retention times for the peaks in the chromatograms shift slightly from

285

one analysis to another. To address this problem correlation optimized warping

286

(COW) was used to align all of the chromatograms.

Furthermore, the chromatograms must be

287

A new fingerprint was created (GC-FID-COW), by the removal of saturated

288

peaks and baseline correction to yield the GC-FID-X matrix, which was

289

subsequently analysed by PCA. The new score plot shows the same latent age

290

variable described by PC1, but the explained variance of the first two

291

components is 74% (Figure 3) and the samples that did not previously follow

292

the age vector now do so after alignment.

293

The score-plots in Figure 3A and 3B show a clear trend related to wine age,

294

suggesting that the chemical mechanisms are correlated with time, across the

295

first principal component with the first two components explaining 74% of the

296

variance. It appears that the latent age vector remains intact whether the data

297

is mathematically normalised or not.

298

It is important to note that the data for both the PCA and PLS analyses was

299

not mathematically centered or normalized as is commonly done to give all

300

variables equal impact on the model. Centering is usually used as a matter of

ACS Paragon Plus Environment

13

Page 15 of 36

Journal of Agricultural and Food Chemistry

301

convenience for display and mathematically has no impact on the multivariate

302

model. When we view the loadings we chose not to center the data in order to

303

not have negative scores along PC1 and thus to have no negative peaks on the

304

loading plot in order to keep the loading plot looking as much like a normal

305

chromatogram as possible. We are aware that our choice not to normalize

306

means that peaks with higher intensities will have a larger impact on the model

307

and be more apparent in the loading plots shown in Figures 2 and 3. However,

308

this allows the patterns visible in the loading plots to be recognizable to an

309

analytical chemist and therefore is easily read and interpreted as a normal

310

chromatogram. Mathematical normalization unfortunately makes the standard

311

chromatographic patterns unrecognizable as it rescales every peak to the same

312

amount of variance.

313

normalization does not change the fact that PC1 comprises the age vector with

314

the biggest affect of normalization changing the sample distribution along PC2,

315

which is likely to represent vintage and vinification technology effects. This

316

suggests that the aging of wine largely overwhelms the differences between

317

wines that are present due to the season they were made in or variations

318

amongst the approaches used to make them (different yeast strains,

319

temperatures, crushing mechanisms, fermentation tanks, barrels used for

320

maturation, etc.). The primary purpose of PCA and PLS in our pipeline is as a

321

graphical user interface for analytical chemists to use as a screening step for

322

univariate data sets. As such, we strove to present the analytical chemist with a

323

multivariate interpretative environment with which they would be as familiar as

324

possible, namely chromatographic fingerprints with which they have large

325

experience.

In addition, as can be seen from Figure 3A and 3B,

Thus, we feel that, for this part of the analysis, the visual

ACS Paragon Plus Environment

14

Journal of Agricultural and Food Chemistry

Page 16 of 36

326

representation of the chromatographic loading plots outweighs the assignment

327

of equal weights to every variable in the model. Furthermore, we address this

328

variable normalization issue with the use of network reconstruction via Pearson

329

correlation networks and maximum spanning trees. In the network analysis,

330

every peak is analyzed and has an equal opportunity to form a part of the

331

network and Pearson correlation includes vector normalization.

332

During aging there are likely to be several different mechanisms involved,

333

including oxidation and Maillard reactions. In PC1 the samples correlate with

334

the age of the wine, which points out that the overall kinetic system overrides

335

any one specific mechanism.

336

mechanisms at play are more relevant to sample classification than the

337

contributions of any individual mechanism.

As such, the connections between the

338

After alignment, compounds which appear to correlate with port wine aging

339

as found in the loading plots were: cis-dioxane, cis-dioxolane, trans-dioxolane

340

trans-dioxane, octanoic acid and HMF as shown in Figure 3C.

341

The cis and trans dioxane and dioxolane are formed by the condensation

342

reaction between glycerol and acetaldehyde. These molecules were identified in

343

Port wine by Silva Ferreira et al48 who noted that they increased with age, and,

344

as such, could be used as age markers for port wine kept under oxidative

345

conditions. Furanic compounds, HMF, 5MF and furfural are thought to be

346

products from the Maillard reaction, formed by the fragmentation or cyclization

347

of 3-deoxyosone, a highly reactive intermediary of the reaction49. Barrel oak can

348

also be a source of HMF and furfural50.

349

ACS Paragon Plus Environment

15

Page 17 of 36

Journal of Agricultural and Food Chemistry

350

Partial Least Squares Analysis. Partial Least Squares (PLS) analysis was

351

used on the GC-FID-X matrix in an effort to associate specific peaks/fingerprint

352

regions with mechanisms known to be involved in aging. It is worth noting that

353

PLS was not used in its traditional role as a method with which to build

354

calibrated, predictive models (that would therefore be built with training sets and

355

validated with independent test sets). Rather, the goal of our use of PLS-1 was

356

simply as a method with which to perform principal-component-based

357

regressions in an effort to identify sets of peaks that were associated with

358

known mechanistic markers or potential precursors for volatile compounds.

359

Molecules that are thought to be associated with different mechanisms were

360

selected and quantified from each sample and used as markers to try and find

361

other compounds in GC-FID-X that may be related with the same

362

mechanism. Acetaldehyde and HMF were used as markers for oxidation and

363

the Maillard reaction respectively. Sotolon was also used in an effort to gather

364

more information about its origin. The concentration of each of the marker

365

molecules was determined for each sample and the resulting vectors used as a

366

second data block in PLS.

367

The resulting PLS coefficient plots show the variables which correlate with

368

each mechanism-marker. Some variables have a positive value, which means

369

that these have kinetic vectors which correlate with that of the mechanism-

370

marker, and some have negative values, which indicate that they have an

371

inverse correlation with the kinetic vector of the mechanism-marker.

372

For sotolon the correlation is 0.89 (over 7 components) for molecules such as

373

furfural, 5-MF, cis dioxane, cis-dioxolane, trans-dioxane, trans-dioxolane and

374

HMF. We also found some organic acids with negative correlations, which

ACS Paragon Plus Environment

16

Journal of Agricultural and Food Chemistry

Page 18 of 36

375

indicate that they were being consumed as sotolon was being formed (Figure

376

4). The model had a correlation of 0.91 for HMF (a Maillard reaction marker)

377

and 0.92 to acetaldehyde (an oxidation marker) for 7 latent variables. The PLS

378

loading plot for sotolon was very similar to those seen for HMF and

379

acetaldehyde which means that the mechanisms are correlated, and during

380

aging contribute in the same way to the dynamics of the overall process.

381

Some amino acids, namely, valine, alanine, arginine, glutamine and aspartate,

382

had relatively high (0.69-0.83) inverse correlations with a number of peaks in

383

the volatile profile which were themselves correlated to Maillard reaction

384

markers such as such as HMF and furfural. Thus it seems likely that these

385

amino acids are major Maillard aroma precursors.

386 387

Network Reconstruction. Figure 5 shows the maximum spanning tree derived

388

from the correlation network between all peaks.

389

center of a peak (Kovats index) and each edge represents the best correlation

390

between the peaks. Fold changes between 2 year old and 60 year old wines

391

were calculated for each peak and the nodes colored accordingly with shades

392

of red representing increasing concentration and blue representing decreasing

393

concentration. The thickness of each edge has been scaled to represent the

394

level of correlation (thicker lines mean higher correlation values). Furthermore,

395

the size of each node has been scaled to represent the number of correlations it

396

had with other peaks above a threshold of 0.8.

Each node represents the

397

Using pure standards as markers for Maillard and Oxidation together with the

398

Kovats index it is possible to explore and extract more information from the

399

proposed network. In fact the cyclic acetals of glycerol and ethanal (oxidation

ACS Paragon Plus Environment

17

Page 19 of 36

Journal of Agricultural and Food Chemistry

400

products) cluster together on the upper part of Figure 5.

401

ethoxymethylfurfural, an ester of a major Amadori product (HMF), links two

402

major branches of the network.

403

process with the continuous formation of substances absent in young wines,

404

which explains the aging character of wine. The volatile compounds that relate

405

to 5-ethoxymethylfurfural are co-expressed during aging, thus presenting similar

406

kinetics, and future work will focus on their identification using the respective

407

Kovats

408

representation captures some of the dynamics of the aging process based on

409

the underlying kinetics. In fact those compounds that are highly correlated to

410

one another (>0.9) are likely to have the same kinetic order and the network

411

thus enables one to screen molecules according to their kinetic parameters.

index

and

rich

In addition, 5-

As such, the network illustrates the aging

information

detectors

like

MS.

The

network

412

We propose that this network is an approximate representation of the

413

underlying chemical reaction network during aging. The higher level of

414

correlation (and therefore the nearer any two peaks are to each other in the

415

network) the higher the probability is that they participate in the same or

416

neighboring reactions. The correlation between compounds drops as you move

417

further away in a chemical reaction network, as the intervening kinetics of each

418

reaction will cause differences at each step. There are no doubt intermediate

419

compounds for some reactions that were not detected by FID. The network is

420

robust to this missing data as the intervening steps will simply be represented

421

by a lower correlation value of an edge between compounds that were

422

detected. This correlative approach of course is not proof of causation but

423

rather serves as a useful tool for hypothesis generation in order to prioritise the

424

identification of unkown compounds represented by the peaks.

ACS Paragon Plus Environment

18

Journal of Agricultural and Food Chemistry

Page 20 of 36

425

By using correlation values to targeted compounds (or other variables such

426

as age) we found that we could highlight the regions of the network that are

427

closely associated with them and therefore likely involved in their formation or

428

consumption. In order to explore regions of the network that may be related to

429

age and particular mechanisms, Pearson correlation values between the peaks

430

and each of the target vectors (sample age, sotolon, acetaldehyde, HMF,

431

glutamate and alanine) were loaded into cytoscape as node annotations.

432

Alanine and glutamate were selected as target vectors because they were the

433

amino acids best correlated with the GC-FID-X matrix. By sorting the nodes by

434

correlation values and selecting the nodes corresponding to a correlation value

435

with a target above some threshold, portions of the network which correlated

436

with each target vector could be visualized in aqua as shown in Figure 6.

437

Figure 6A shows the nodes with a 0.86 Pearson correlation to the age of the

438

wines. There is a clearly defined subnetwork that corresponds to age and

439

represents compounds involved in the aging process. It was clear in the PCA

440

diagrams that there are a group of compounds that correspond to aging which

441

are responsible for the first principal component. It is likely that the compounds

442

responsible for the second principle component are due to differences between

443

the vintages of the starting wines. We can see the same pattern in this network

444

where there are a number of compounds that do not correlate well with age and

445

are likely reflecting vintage differences amongst the wines.

446

Figures 6 B, C and D show the regions of the network (colored as aqua) that

447

correlate (Pearson threshold 0.86) with sotolon, HMF and acetaldehyde,

448

respectively.

449

subnetworks correlated with these three compounds and as such it is possible

It is clear that there is considerable overlap between the

ACS Paragon Plus Environment

19

Page 21 of 36

Journal of Agricultural and Food Chemistry

450

that there is a mixture of oxidation and maillard reactions at play in forming

451

these compounds.

452

threshold of 0.86 in Figure 6A clearly overlaps to a very large degree with the

453

subnetworks defined by the correlations to acetaldehyde, HMF and sotolon. The

454

arrow in Figure 6A shows the node that negatively correlates to both alanine

455

and glutamate (-0.8 Pearson threshold) and as such likely represents the entry

456

point to the volatile network. The fact that the anti-correlation is relatively low (-

457

0.8 to -0.83) probably indicates that there are one to several intermediates

458

between the amino acids and their products’ entry into the volatile network.

The subnetwork that correlates with age at a Pearson

459

We believe that we have demonstrated that the approach described here

460

using GC-FID univariate data, when used as sample fingerprints, can be used

461

to classify the age of Port wine and to predict potential molecules involved in

462

this process by the deconvolution of time and the kinetics of different aging

463

mechanisms. We began this analysis with the hope that multivariate analysis

464

and network reconstruction would be useful tools with which to study

465

mechanisms related to a perturbation. The PCA score plots allow for sample

466

classification and the visualization of larger peaks that correspond with aging.

467

The PLS loading plots provide the analytical chemist with a familiar set of

468

patterns, namely virtual chromatograms which point out the larger peaks that

469

appear to be associated with marker compounds for oxidation and Maillard

470

mechanisms.

471

relationships between all the compounds detected via GC-FID and their

472

changes in concentration over time.

473

considerably more information in an effort to understand the probable kinetic

474

contexts of the molecules represented by peaks in each chromatogram.

The network reconstruction is very useful in visualizing the

This view of the data should provide

ACS Paragon Plus Environment

20

Journal of Agricultural and Food Chemistry

Page 22 of 36

475

Furthermore, it is possible to identify regions of the network that appear to be

476

involved in the formation or consumption of target compounds. As such, the

477

approach described here should indeed be a very powerful tool for the further

478

study of mechanisms and kinetic networks in complex mixtures.

479

In conclusion, univariate chromatographic signals are less expensive

480

compared to NMR or MS and therefore constitute a valuable tool in a bio-

481

analytical pipeline.

482

approach required the development and use of new data processing methods

483

and graphical user interfaces in order to extract the maximum amount of

484

information from the data. The approach reported here enables us to: 1)

485

minimize the cost of analysis; 2) perform sample classification and

486

contextualization; 3) perform process monitoring of aging or other time series;

487

4) consider the prospect of building databases from the large amounts of

488

univariate data already available; 5) screen for correlations with known

489

mechanism markers; 6) select biomarkers for identification and further study

490

and 7) explore the putative kinetic network for a greater understanding of the

491

process being studied.

However, the use this type of data in an untargeted

492 493

Acknowledgments

494

The authors would like to thank Piet Jones, Guy Emerton and Debbie

495

Weighill for useful discussions about the networks presented in this paper.

496

This research was funded by the project “Wine Metrics: Revealing the Volatile

497

Molecular Feature Responsible for the Wine Like Aroma a Critical Task Toward

498

the Wine Quality Definition.” (PTDC/AGR-ALI/121062/2010), partially supported

499

by ESB/UCP plurianual funds through the POS-Conhecimento Program that

ACS Paragon Plus Environment

21

Page 23 of 36

Journal of Agricultural and Food Chemistry

500

includes FEDER funds through the program COMPETE (Programa Operacional

501

Factores de Competitividade) by Portuguese national funds through FCT

502

(Fundação para a Ciência e a Tecnologia), Winetech, the Technology and

503

Human Resources Programme and the South African National Science

504

Foundation.

ACS Paragon Plus Environment

22

Journal of Agricultural and Food Chemistry

Page 24 of 36

505

506

References

507

(1) Maillard, L. C. Action des acides amines sur les sucres formation des

508

melanoidines par voie methodique. Council of Royal Academy Science Series

509

2. 1912 154, 66-68.

510

(2) Hodge, J. E. Dehydrated foods, chemistry of browning reactions in model

511

systems. J. Agric. Food Chem. 1953, 1, 928-943.

512

(3) Hoffman, T.; & Schieberle, P. Evaluation of the key odorants in a thermally

513

treated solution of ribose and cysteine by aroma extract dilution techniques. J.

514

Agric. Food Chem. 1995, 43, 2187-2194.

515

(4) Hoffman, T.; Schieberle, P. Identification of potent aroma compounds in

516

thermally treated mixtures of glucose/cysteine and rhamnose/cysteine using

517

aroma extract dilution. J. Agric. Food Chem. 1997, 45, 898-906.

518

(5) Pripis-Nicolau, L.; Revel, G.; Bertrand, A.; Maujean, A. Formation of Flavor

519

Components by the Reaction of Amino Acid and Carbonyl Compounds in Mild

520

Conditions. J. Agric. Food Chem. 2000, 48, 3761-3766.

521

(6) Marchand, S.; Revel, G. (2000). Approaches to wine aroma: release of

522

aroma compounds from reactions between cysteine and carbonyl compounds in

523

wine. J. Agric. Food Chem. 2000, 48, 4890-4895.

524

(7) Silva, H. O.; Machado, B. P.; Hogg, T.; Marques, J. C.; Câmara, J. S.;

525

Albuquerque, F.; Silva Ferreira, A. C. Impact of Forced-Aging Process on

526

Madeira Wine Flavor. J. Agric. Food Chem. 2008, 56, 11989-11996.

527

(8) Singleton, V. Oxygen with phenols and related reactions in musts, wines and

528

model systems:Observations and pratical implications. Am. J. Enol. Vitic. 1987 ,

529

38, 69-77.

530

(9) Danilewicz, J. C. Review of reaction mechanisms of oxygen and proposed

531

intermediates reduction products in wine: central role of iron and copper. Am. J.

532

Enol. Vitic. 2003, 54, 73-85.

533

(10) Waterhouse, A. L.; Laurie, V. F. Oxidation of wine phenolics: a critical

534

evaluation and hypotheses. Am. J. Enol. Vitic. 2006, 57, 306-313.

535

(11) du Toit, W.; Marais, J.; Pretorius, I.; du Toit, M. Oxygen in Must and Wine:

536

A review. S. Afr. J. Enol. Vitic. 2006, 27, 76-94.

ACS Paragon Plus Environment

23

Page 25 of 36

Journal of Agricultural and Food Chemistry

537

(12) Kilmartin, P. The oxidation of red and white wines and its impact on wine

538

aroma. Chemistry in New Zeland. 2009, 18-22.

539

(13) Sulser, H.; Depizzol, J.; Buchi, W.A probable flavoring principle in

540

vegetable-protein hydrolysates. J. Food Sci. 1967, 32, 611-615.

541

(14)

542

dimethyltetrahydrofuran-2,3.dione in a Flor sherry wine. Lebensm. Wiss.

543

Technol. 1976, 9, 366-368.

544

(15) Masuda, M.; Okawa, E.; Nishimura, K.; Yunome, H. Identification of 4,5-

545

dimethyl-3-hydroxy-2(5H)-furanone (Sotolon) and ethyl 9-hydroxynonanoate in

546

Botrytised wine and evaluation of the roles of compounds characteristics. Agric.

547

Biol. Chem. 1984, 48, 2707-2010.

548

(16) Silva Ferreira, A. C. Caractérisation du vieillissement du vin de Porto.

549

Approche chimique et statistique. Rôle aromatique du sotolon. Ph.D. Thesis,

550

Université Victor Segalen Bordeaux 2, 1998.

551

(17) Câmara, J. S.; Marques, J. C.; Alves, M. A.; Silva Ferreira, A. C. 3-

552

Hydroxy-4,5-dimethyl-2(5H)-furanone. J. Agric. Food Chem. 2004, 52, 6765-

553

6769.

554

(18) Lavigne, V.; Pons, A.; Darriet, P.; Dubourdieu, D. (2008). Changes in the

555

sotolon content od dry white wines during barrels and bottle aging. J. Agric.

556

Food Chem. 2008, 56, 2688-2693.

557

(19) Silva Ferreira, A. C.; Barbe, J.; Bertrand, A. 3-Hydroxy-4,5-dimethyl-2(5H)-

558

furanone: A key odorant of the typical aroma of oxidative aged Port wine. J.

559

Agric. Food Chem. 2003, 51, 4356-4363.

560

(20) Silva Ferreira, A.; Avila, I.; Guedes de Pinho, P. Sensorial Impact of

561

Sotolon as the "Perceived Age" of Aged Port Wine. In Natural Flavors and

562

Fragrances, First Edition; Frey, C., Rouseff, R., Eds; ACS Symposium Series;

563

A. C. Society: Washington, DC, 2005, 10, pp. 141-159.

564

(21) Hofman, T.; Schieberle, P. Identification of the key odorants in processed

565

ribose-cysteine Maillard mixtures by instrumental analysis and sensory studies.

566

Spec. Publ. - R. Soc. Chem. 1996, 197, 175-81.

567

(22) Pham, T. T.; Guichard, E.; Schlich, P.; Charpentier, C. Optimal conditions

568

for the formation of sotolon from alfa-ketobutyric acid in the French Vin Jaune.

569

J. Agric. Food Chem. 1995, 43, 2616-2619.

Dubois,

P.;

Rigaud,

J.;

Dekimpe,

J.

Identification

of

4,5-

570

ACS Paragon Plus Environment

24

Journal of Agricultural and Food Chemistry

Page 26 of 36

571

(23) Cutzach, I., Chatonnet, P., & Dubourdieu, D. Rôle du sotolon dans l'arôme

572

des vins doux naturels, influence des conditions d'élevage et de vieillissement.

573

J. Int. Sci. Vigne Vin 1998, 32, 223-233.

574

(24) Silva Ferreira, A. C., Hogg, T., & Guedes de Pinho, P. Identification of key

575

odorants related to the typical aroma of oxidation-spoiled white wines. J. Agric.

576

Food Chem. 2003, 51, 1377-1381.

577

(25) Escudero, A., Cacho, J., & Ferreira, V. Isolation and identification of

578

odorants generated in wine during its oxidation: a gas chromatography-

579

olfactometric study. Eur. Food Res. Technol. 2011, 211, 105-110.

580

(26) Martins, R. C.; Lopes, V. V.; Silva Ferreira, A. C. Port Wine Oxidation

581

Management: A ChemoInformatics Approach. 60th American Society for

582

Enology and Viticulture Meeting. 2006

583

(27) Cevallos-Cevallos, J.; Reyes-De-Corcuera, J.; Etxeberria, E.; Danyluk, M.;

584

Rodrick, G. Metabolomics analysis in food science: a review. Trends Food Sci.

585

Technol. 2009, 20, 557-566.

586

(28) Rohman, A.; Che Man, Y. B.; 2010. Fourier transform infrared (FTIR)

587

spectroscopy for analysis of extra virgin olive oil adulterated with palm oil. Food

588

Res. Int. 2010, 43, 886-892.

589

(29) Coimbra, M.; Alves, F. G.; Barros, A.; & Delgadillo, I. Fourier Transform

590

Infrared

591

Polysaccharide Extracts. J. Agric. Food Chem. 2002, 50, 3405-3411.

592

(30) Zhang, J.; Cui, M.; Yun, H.; Yu, H.; Guo, D.; Chemical fingerprint and

593

metabolic fingerprint analysis of Danshen injection by HPLC-UV and HPLC-MS

594

methods. J. Pharm. and Biomed. Anal. 2005, 36, 1029-1035.

595

(31) Shen, D.; Wu, Q.; Sciarappa, W.; Simon, J. Chromatographic fingerprints

596

and quantitative analysis of isoflavones in Tofu-type soybeans. Food Chem.

597

2012, 130, 1003-1009.

598

(32) Ding, Y.; Wu, E.; Liang, C.; Chen, J.; Tran, M.; Hong, C. Discrimination of

599

cinnamon barl and cinnamon twig samples sourced from various countries

600

using HPLC-based fingerprint. Food Chem. 2011, 127, 755-760.

601

(33). Gu, H. NMR and MS-based metabolomics: Development and applications.

602

Ph. D. Thesis Purdue University. 2008.

Spectroscopy

and

Chemometric

Analysis

ACS Paragon Plus Environment

of

White

Wine

25

Page 27 of 36

Journal of Agricultural and Food Chemistry

603

(34) Lee, J.-E.; Hwang, G.-S.; Van Den Berg, F.; Lee, C.-H.; Hong, Y. Evidence

604

of vintage effects on grape wines using H-NMR-based metabolomic study. Anal.

605

Chim. Acta. 2009, 19, 71-76.

606

(35) Cuadros-Inostroza, A.; Giavalisco, P.; Hummel, J.; Eackard, A.; Willmitzer,

607

L.; Peña-Cortés, H. Discrimination of wine attributes by metabolome analysis.

608

Anal. Chem. 2010, 82, 3573-3580.

609

(36) Consonni, R.; Cagliani, L.; Guantieri, V.; Simonati, B. (2011). Identification

610

of metabolic content of selected Amarone wine. Food Chem. 2011, 129, 693-

611

699.

612

(37) Son, H.; Hwang, G.; Ahn, H.; Park, W.; Lee, C.; Hong, Y. Characterization

613

of wines from grape varieties through multivariate statistical analysis of H NMR

614

spectroscopic data. Food Res. Int. 2009, 42, 1483-1491.

615

(38) Gong, F.; Wang B.; Chau, F.; Liang, Y. Data preprocessing for

616

chromatographic fingerprint of herbal medicine with chemometric approaches.

617

Anal. Lett. 2005, 38, 2475-2492.

618

(39) Maillard, B. Additions radicalaires de diols et de leurs dérivés: diesters et

619

acétals cycliques. Ph.D. Thesis (nº. 325); University Bordeaux 1; Bordeaux,

620

France, 1971.

621

(40). Van den Dool, H.; Kratz, P. D. A generalization of the retention index

622

system

623

chromatography. J. Chromatogr., A. 1963, 11, 463-471.

624

(41) Pipris-Nicolau, L.; Revel, G.; Marchand, S.; Beloqui, A. A.; Bertrand, A.

625

Automated HPLC method for the measurement of free amino acids including

626

cysteine in musts and wines; first applications. J. Sci. Food Agric. 2001, 81,

627

731-738.

including

linear

temperature

programmed

gas-liquid

partition

628

(42) Fan, G.; Liang, Y.-Z.; Ying-Sing, F.; Chau, F. Correction of retention time

629

for chromatographic fingerprints of herbal medicines. J. Chromatogr. A. 2004,

630

1029, 173-183.

631

(43) Savitzky, A.; Golay, M.J.E. Smoothing and Differentiation of Data by

632

Simplified Least Squares Procedures. Anal. Chem.1964, 36, 1627–1639.

633

(44) Du, P.; Kibbe, W. A.; Lin, S. M. Improved peak detection in mass spectrum

634

by incorporating continuous wavelet transform-based pattern matching.

635

Bioinformatics. 2006, 22, 2059-2065.

ACS Paragon Plus Environment

26

Journal of Agricultural and Food Chemistry

Page 28 of 36

636

(45) Dijkstra, E. W. A note on two problems in connection with graphs.

637

Numerische Mathematik 1959, 1, 269–271.

638

(46) Shannon, P.; Markiel, A.; Ozier, O.; Baliga, N. S.; Wang, J. T.; Ramage,

639

D.; Amin, N.; Schwikowski, B.; Ideker, T. Cytoscape: a software environment for

640

integrated models of biomolecular interaction networks. Genome Res. 2003, 13,

641

2498-2504.

642

(47) Ribéreau-Gayon, P.; Glories, Y.; Maujean, A.; Dubourdieu, D. (2006).

643

Handbook of Enology: The Chemistry of Wine Stabilization and Treatments.

644

John Wiley & Sons Ltd, Chichester, England, 2006; 2, pp. 51-64

645

(48) Silva Ferreira, A. C.; Barbe, J.; Bertrand, A. Heterocyclic acetals from

646

glycerol and acetaldehyde in Port wines: Evolution with aging. J. Agric. Food

647

Chem. 2000, 50, 2560-2564.

648

(49) Martins, S.; Jongen, W.; Boekel van, M. A review of Maillard reaction in

649

food and implications to kinetic modelling. Trends Food Sci. Technol. 2001, 11,

650

364-373.

651

(50) Moutounet, M.; Rabier, P.; Puech, J. L.; Verette, E.; Barillere, M. Analysis

652

by HPLC of extractable substances of oak wood- application to a chardonnay

653

wine. Sci. Aliments. 1989, 9, 35- 51.

654

ACS Paragon Plus Environment

27

Page 29 of 36

Journal of Agricultural and Food Chemistry

Figure captions Figure 1. Proposed workflow for univariate (chromatographic) signal processing Figure 2. Raw chromatogram overlay of all samples (n=31) and Loading plot (PC1) representing the average chromatogram GC-FID chromatogram. (1) ethyl lactate, (2) acetic acid, (3) 2,3-butanediol, (4) diethyl succinate, (5) phenylethanol, (6) diethyl malate and (7) succinic monoethyl ester. Figure 3. PCA score plots of cleaned and COW-aligned chromatograms: (A) un-normalized (B) normalized.

Colours denote wines of age 2 to 7 years

(yellow), 10 to 42 years (blue) and 48 to 60 years (pink). (C) Loading plot of PC1 with 9 of the peaks identified as (1) furfural, (2) cis dioxane, (3) benzaldehyde, (4) 5MF, (5) cis dioxolane, (6) trans dioxolane, (7) octanoic acid, (8) unknown and (9) HMF. Figure 4. Figure 4. PLS Loading Plot with Sotolon as the Y vector for the first latent variable. Figure 5. Putative Kinetic Network.

Nodes are coloured in shades of red

based on the fold change from 2 to 60 years. Node sizes are scaled by the number of other nodes (peaks) that are correlated to them above a Pearson threshold of 0.8. Edge thickness is scale by the degree of correlation between its two nodes. (Dioxanes in the network are labelled as follows: cisdioxane: Diox1; cisdioxolane : Diox2-; transdioxolane: Diox3 and transdioxane: Diox4). Figure 6. Subnetworks correlating to A) Age, B) Sotolon, C) HMF and D) Acetaldehyde. Nodes (compounds) with strong Pearson correlations to these target vectors are colored with aqua.

ACS Paragon Plus Environment

28

Journal of Agricultural and Food Chemistry

Page 30 of 36

Figure 1. Proposed workflow for univariate (chromatographic) signal processing.

ACS Paragon Plus Environment

29

Page 31 of 36

Journal of Agricultural and Food Chemistry

Figure 2. Raw chromatogram overlay of all samples (n=31) and Loading plot (PC1) representing the average chromatogram GC-FID chromatogram. (1) ethyl lactate, (2) acetic acid, (3) 2,3-butanediol, (4) diethyl succinate, (5) phenylethanol, (6) diethyl malate and (7) succinic monoethyl ester.

ACS Paragon Plus Environment

30

Journal of Agricultural and Food Chemistry

Page 32 of 36

Figure 3. PCA score plots of cleaned and COW-aligned chromatograms: (A) un-normalized (B) normalized.

Colours denote wines of age 2 to 7 years

(yellow), 10 to 42 years (blue) and 48 to 60 years (pink). (C) Loading plot of PC1 with 9 of the peaks identified as (1) furfural, (2) cis dioxane, (3) benzaldehyde, (4) 5MF, (5) cis dioxolane, (6) trans dioxolane, (7) octanoic acid, (8) unknown and (9) HMF.

ACS Paragon Plus Environment

31

Page 33 of 36

Journal of Agricultural and Food Chemistry

Figure 4. PLS b coefficients for Sotolon as the Y vector with 7 latent variables.

ACS Paragon Plus Environment

32

Journal of Agricultural and Food Chemistry

Page 34 of 36

Figure 5. Putative Kinetic Network. Nodes are coloured in shades of red based on the fold change from 2 to 60 years. Node sizes are scaled by the number of other nodes (peaks) that are correlated to them above a Pearson threshold of 0.8. Edge thickness is scale by the degree of correlation between its two nodes. (Dioxanes in the network are labelled as follows: cisdioxane: Diox 1cisdioxane;, cisdioxolane : Diox 2-cisdioxolane;, transdioxolane: Diox 3transdioxolane and, transdioxane: Diox 4-transdioxane).

ACS Paragon Plus Environment

33

Page 35 of 36

Journal of Agricultural and Food Chemistry

Figure 6. Subnetworks correlating to A) Age, B) Sotolon, C) HMF and D) Acetaldehyde. Nodes (compounds) with strong Pearson correlations to these target vectors are colored with aqua.

ACS Paragon Plus Environment

34

Journal of Agricultural and Food Chemistry

Page 36 of 36

TOC Graphic

ACS Paragon Plus Environment

35