Subscriber access provided by RYERSON UNIV
Article
Untangling the chemistry of Port wine aging with the use of GC-FID, multivariate statistics and network reconstruction Dan Jacobson, Ana Rita Monforte, and Antonio Cesar Silva Ferreira J. Agric. Food Chem., Just Accepted Manuscript • DOI: 10.1021/jf3046544 • Publication Date (Web): 18 Feb 2013 Downloaded from http://pubs.acs.org on February 21, 2013
Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.
Journal of Agricultural and Food Chemistry is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.
Page 1 of 36
Journal of Agricultural and Food Chemistry
ACS Paragon Plus Environment
Journal of Agricultural and Food Chemistry
Page 2 of 36
1
Title: Untangling the chemistry of Port wine aging with the use of GC-FID,
2
multivariate statistics and network reconstruction
3
4
Dan Jacobsona, Ana Rita Monforteb, António César Silva Ferreiraa,b
5
a
6
South Africa
IWBT – DVO University of Stellenbosch, Private Bag XI, Matieland 7602,
7
8
b
9
Dr.António Bernardino de Almeida, 4200-072 Porto, Portugal.
Escola Superior de Biotecnologia, Universidade Católica Portuguesa, Rua
10
11
12
13
Corresponding author: António Silva Ferreira
14
Phone: +351 22 558 0001
15
Fax: +351 22 509 0351
16
E-mail address:
[email protected] 17
Running title: Untangling the chemistry of Port wine aging with the use of
18
GC-FID, multivariate statistics and network reconstruction
19
20
ACS Paragon Plus Environment
1
Page 3 of 36
Journal of Agricultural and Food Chemistry
21
Abstract
22
Chromatography separates the different components of complex mixtures
23
and generates a fingerprint representing the chemical composition of the
24
sample. The resulting data structure will depend on the characteristics of the
25
detector used, univariate for devices such as a flame ionization detector (FID)
26
or multivariate for mass spectroscopy (MS). This study addresses the potential
27
use of a univariate signal for a non-targeted approach in order to (i) classify
28
samples according to a given process or perturbation, (ii) evaluate the feasibility
29
of developing a screening procedure to select candidates related to the process
30
and (iii) provide insight into the chemical mechanisms which are affected by the
31
perturbation. In order to achieve this it was necessary to use and develop
32
methods for data pre-processing and visualization tools to assist an analytical
33
chemist to view and interpret complex multidimensional data sets.
34
Dichloromethane Port wine extracts were collected using GC-FID; the
35
chromatograms were then aligned with correlation optimized warping (COW)
36
and subsequently analyzed with multivariate statistics (MVA) by principal
37
component analysis
38
Furthermore, wavelets were used for peak calling and alignment refinement and
39
the resulting matrix used to perform kinetic network reconstruction via
40
correlation networks and maximum spanning trees. Network-target correlation
41
projections were used to screen for potential chromatographic regions/peaks
42
related
43
chromatograms and target molecules showed a high X to Y correlation of 0.91,
44
092 and 0.89; with 5-hydroxymethylfurfural (HMF) (Maillard), acetaldehyde
45
(oxidation) and 4,5-dimethyl-(5H)-3-hydroxy-2-furanone, respectively. The
to
aging
(PCA) and partial least squares regression (PLS-R).
mechanisms.
Results
from
PLS
ACS Paragon Plus Environment
between
aligned
2
Journal of Agricultural and Food Chemistry
Page 4 of 36
46
context of the correlation (and therefore likely kinetic) relationships amongst
47
compounds detected by GC-FID and the relationships between target
48
compounds within different regions of the network can be clearly seen.
49
Keywords: Univariate signal, Aging, Mechanisms, Preprocessing, GC-FID,
50
PCA, PLS, Network Theory, Kinetic Network Reconstruction
51
ACS Paragon Plus Environment
3
Page 5 of 36
Journal of Agricultural and Food Chemistry
Introduction
52
53
Port wine is a fortified wine produced in the Douro region of Portugal. After
54
the vinification process wines are exclusively barrel aged (Tawnys) or matured
55
for two years in a cask and then bottled (Vintage).
56
The aromatic profile of port wine changes during aging as the result of
57
several underlying mechanisms. Therefore, if one wants to understand or
58
modulate the sensory attributes of Port it is important to understand these
59
mechanisms and the inter-connections amongst them. Several of the
60
mechanisms are to a large extent already described as Maillard1-7 or oxidation8-
61
12
, nevertheless the overlaps between these two mechanisms is not well known.
62
In Port wines sotolon was recognized as a key molecule in the “perceived
63
age” of barrel storage Port wine and consequently in the aroma quality of the
64
final product. Its concentration can range from a few dozen µg/L in a young
65
wine to 1 mg/L in wines older than 50 years. The odor threshold has been
66
estimated to be 19 µg/L19,20.
67
The Maillard reaction has been suggested by several authors to be
68
responsible for the formation of sotolon as a product of a reaction involving
69
hexoses and pentoses in the presence of cysteine4 and from the aldol
70
condensation of butane-2,3-dione and hydroxyacetaldehyde21. On the other
71
hand, several papers have linked sotolon formation with oxidation19, 22-25. Both
72
oxygen and temperature influence sotolon concentration, which suggests that
73
its origin involves a connection between oxidation and Maillard mechanisms26.
74
Therefore, wine aging is a complex system, which requires more information
75
to be analysed in order to better understand the mechanisms at play. Given
76
this, techniques that are able to capture information about a broader range of
ACS Paragon Plus Environment
4
Journal of Agricultural and Food Chemistry
Page 6 of 36
77
compounds participating in the aging process are necessary in order to achieve
78
a better understanding thereof.
79
Metabolomics is defined as the study of “as many-small-metabolites-as-
80
possible” in a system27. In this paper we attempt to describe an example of
81
chemiomics which we define to be the study of the relationships between as
82
many chemical compounds as possible in a complex chemical (non-enzymatic)
83
system. In order to accomplish this by chemical profiling two strategies can be
84
employed: (i) “targeted analysis”, using a priori knowledge of which compounds
85
to analyse, which requires their identification and manual quantification or (ii)
86
“untargeted analysis” where one tries to detect as many compounds as possible
87
in order to acquire sample fingerprints which will be submitted to multivariate
88
analyses and network analysis for further contextualisation. The identification
89
and quantitation is then performed on the variables that are found to be
90
associated with the principal components/correlation vectors as determined by
91
multivariate and network analysis.
92
Spectroscopic detectors, such as those based on UV-Vis (ultraviolet-visible),
93
FTIR (Fourier transform infrared) or NMR (nuclear magnetic resonance)
94
spectroscopy, are largely employed to obtain chemical fingerprints which can be
95
used for sample classification as well for chemical quantitation28-37. The
96
extremely convoluted resulting spectra can be further processed with MVA
97
techniques that compensate, to a certain extent, for the absence of structural
98
information in complex chemical mixtures.
99
In spite of the versatility of these detectors the absence of structural
100
information, due to the extremely convoluted signal, constitutes a major
ACS Paragon Plus Environment
5
Page 7 of 36
Journal of Agricultural and Food Chemistry
101
drawback in obtaining molecular identifications if there is a need to study
102
complex systems that require kinetic contextualization.
103
Chromatography has long been used for the separation of molecules
104
enabling both the quantitative and qualitative analysis of constituents in
105
complex mixtures. The separation of the different components of a complex
106
mixture generates a fingerprint representing the chemical composition of the
107
sample.
108
experimental conditions can then be used to explain the changes caused by a
109
perturbation38. Due to the separation performed by the column it is possible to
110
identify structures based on the elution time in a given chromatographic profile.
Chromatographic fingerprints taken from samples under different
111
Chromatographic data has a huge number of variables and PCA and PLS
112
are MVA visualization techniques that allow for the interpretation of
113
multidimensional data sets. When multivariate analysis involves large datasets,
114
variable selection processes play an important role because they eliminate the
115
less significant or non-informative variables. The overall aim of any variable
116
selection technique is to capture variables from the original dataset that are
117
most specifically related to the problem of interest and to exclude those
118
variables that are affected by other sources of variation.
119
PCA is a non-supervised technique that decomposes the original variables of
120
a data set into two matrices: the score and the loading matrices. The scores
121
matrix contains information about the samples, which are described in terms of
122
their projection onto the principal components. The loading matrix contains
123
information about the variables which are also described in terms of their
124
projection onto the principal components. The loadings can also be interpreted
125
as the contribution of the variables for the observed scores distribution.
ACS Paragon Plus Environment
6
Journal of Agricultural and Food Chemistry
Page 8 of 36
126
Consequently, the use of GC fingerprints with MVA should make it possible to
127
extract considerable amounts of information from complex mixtures. The
128
tandem of GC-MVA is, in our perspective, a middle ground between a rich
129
detector, which provides structural information (NMR), and detectors like FTIR
130
and UV-Vis.
131
Therefore the aim of this study is to validate the feasibility of using univariate
132
chromatographic data, in particular gas chromatography with a flame ionization
133
detector (GC-FID), as a screening procedure to classify complex chemical
134
mixtures, such as wine samples, to identify which compounds are responsible
135
for differences and to perform network reconstructions that may indicate
136
underlying kinetic relationships and mechanisms.
137
Materials and Methods
138
Reagents. The chemicals, 3-octanol (97%) 3-hydroxy-4,5-dimethyl-2(5H)-
139
furanone
140
(≥99.5%), acetaldehyde (≥99.5%), ethyl lactate (98%), 5-(ethoxymethyl)furfural
141
(≥99.5%), acetic acid (≥99.5%), 2,3-butanediol (98%), diethyl succinate
142
(≥99.5%), 2-phenylethanol (≥99.5%), diethyl malate (>97%), succinic monoethyl
143
ester (≥99.5%), benzaldehyde (≥99.5%), octanoic acid (≥99.5%), hexanoic acid
144
(≥99.5%) aspartame (>99%), glutamine
145
(>99%), glycine (>99%), arginine (>99%), gamma-aminobutyric acid (>99%)
146
alanine (>99%), tyrosine (>99%), valine (>99%), phenylalanine (>99%), leucine
147
(>99%), ornitin (>99%), lysine (>99%), homoserine (>98%), norvaline (>98%),
148
homocysteine (>99%), 2-sulfanylethanol (98%), tetraphenylborate (>99.5%),
149
iodoacetic acid (>99%), o-phtaldialdehyde (>99%), and n-alkanes (C11-C22)
150
were obtained from Sigma-Aldrich, USA. The cis-5-hydroxy-2-methyl-1,3-
(≥99.5%),
5-methylfurfural
(≥99.5%),
5-hydroxymethylfurfural
(>99%), cysteine (>99%), serine
ACS Paragon Plus Environment
7
Page 9 of 36
Journal of Agricultural and Food Chemistry
151
dioxane
(cis-dioxane),
152
dioxolane),
153
trans-5-hydroxy-2-methyl-1,3-dioxane
154
according to Maillard39.
155
Dicloromethane (HPLC grade) was purchased from LabScan, Sowinskiego,
156
Gliwice. Anhydrous sodium sulfate and methanol (HPLC grade) were obtained
157
from Merck, Darmstadt, Germany.
cis-4-hydroxymethyl-2-methyl-1,3-dioxolane
trans-4-hydroxymethyl-2-methyl-1,3-dioxolane (trans-dioxane)
(cis-
(trans-dioxolane), were
synthetized
158 159
Port Wine samples. The 37 samples used in this study were between 2 and 60
160
years of age: one sample for 2, 14, 19, 23, 35, 40, 42, 48, 54, 57 and 60 years
161
of age, two samples for 7 and 20 years of age, three samples for 4 and 5 years
162
of age and ten samples of 10 years of age. All wines were matured in oak
163
barrels. These samples were supplied by the “Instituto do Vinho do Porto e do
164
Douro”. The wines were made following standard traditional winemaking
165
procedures for Port wine and have been certified.
166
Analytical procedure.
167
Volatiles Extraction. A liquid-liquid extraction was performed to extract the
168
volatile fraction from each sample. The procedure used was as follows: 5 g of
169
anhydrous sodium sulphate and 50 µL of internal standard (3-octanol) were
170
added to 50 mL of sample and were extracted twice with 5 mL of
171
dicloromethane using a magnetic stir bar for 5 minutes per extraction and 2 mL
172
of the resulting organic phase were concentrated under a nitrogen stream 4
173
times. The extract was then analysed by GC (Agilent 5980, USA) with FID
174
detection. Two microliters of the extract were injected. Chromatographic
175
conditions were the following; column BP-21 (50 m x 0,25 mm x 0,25 µm) fused
ACS Paragon Plus Environment
8
Journal of Agricultural and Food Chemistry
Page 10 of 36
176
silica (SGE, Portugal); hydrogen (5.0, Air-liquide, Portugal); 1.2 mL/min flow
177
rate; injector temperature, 220ºC; oven temperature, 40 ºC for 1 min
178
programmed at a rate of 2ºC/min to 220 ºC, maintained during 30 min; splitless
179
time, 0.5 min; split flow, 30 mL/min.
180
In order to facilitate identification, the Kovat’s index for each peak was 40
181
calculated as described by Van den Dool and Kratz
. This determination was
182
performed on polar phase columns, BP21 (50 m x 0.25 mm x 0.25 µm).
183 184
Amino Acids Analysis. Twenty one amino acids were analysed in the Port
185
wine samples: aspartic acid, glutamic acid, cysteine, asparagine, histidine,
186
serine, glycine, arginine, threonine, alanine, g-aminobutyric acid, tyrosine,
187
ethanolamine, valine, methionine, tryptophan, phenylalanine, isoleucine,
188
leucine, ornithine and lysine. The methodology used was that described by
189
Pipris-Nicolau et at41.
190 191
Acetaldehyde, furanic compounds and sotolon analysis. These analyses
192
were done as described by Silva Ferreira et al19.
193 194
Data pre-processing. The ASCII file of chromatographic data obtained from
195
each sample was extracted and a matrix created containing all of the
196
chromatograms. The intensities were normalized by dividing each value by the
197
intensity of the internal standard (3-octanol). The raw dataset (GC-FID) was
198
then imported into The Unscrambler®X 10.1 (Camo, Sweden), where the first
199
stage of the alignment of chromatograms was performed using Correlation
200
Optimized Warping (GC-FID-COW).
This algorithm aligns chromatograms by
ACS Paragon Plus Environment
9
Page 11 of 36
Journal of Agricultural and Food Chemistry
201
means of sectional linear stretching and compression, which shifts the peaks of
202
one chromatogram to correlate with those of the other chromatograms in the
203
dataset42. The saturated peaks were then removed and the baseline corrected
204
(GC-FID-COW-saturated removed and baseline correction). The resulting
205
matrix (GC-FID-X) was then used for multivariate data analysis as described in
206
the statistical analysis section.
207 208
Statistical analysis. The data was analysed with PCA and PLS-R using either
209
Qlucore (Lund, Sweden) or SIMCA-P+ 12.0.1 (Umetrics, Norway). PCA shows
210
similarities between samples projected on a plane and makes it possible to
211
determine which variables determine these similarities and in what way. PLS is
212
used to extract factors related to one or more response values. PLS validation
213
was performed by cross-validation method.
214 215
Kinetic Network Reconstruction. In order to attempt to reconstruct the
216
underlying kinetic network, the fingerprint needed to be further compressed to a
217
single value for each putative molecule detected by GC-FID. Thus, each
218
chromatographic peak needed to be replaced with a single value for the
219
intensity and retention-time, at the apex of each peak and therefore a more
220
refined alignment procedure was required. This was achieved as follows: An
221
“average chromatogram” was created by taking the mean of the values at each
222
elution point in the GC-FID-X matrix. The average chromatogram and the
223
sample chromatograms were smoothed with the Savitzky-Golay method
224
(settings: left=15, right=15, polynomial degree=0)
225
Unscrambler
43
as implemented in The
®
X 10.1 (Camo, Sweden). The wavelet method of Du et al.44,
ACS Paragon Plus Environment
10
Journal of Agricultural and Food Chemistry
Page 12 of 36
226
which was originally developed for peak calling in peptide mass-spec data, was
227
adapted for finding peak centres in chromatographic data and a Mexican hat
228
wavelet used to determine the location of all of the peaks in the average and
229
sample chromatograms. A custom built Perl program was integrated with the
230
R-based wavelet method to achieve this. The distances between the locations
231
of all of the peaks in each sample chromatogram and the locations of the peaks
232
in the average chromatogram were calculated. It was observed that there was
233
a correlation between peak height and the amount of peak centre shift that
234
occurred across chromatograms and we thus devised a two-step process for
235
aligning sample peaks to those of the average chromatogram. If the (internal-
236
standard-normalized) height of the average peak was greater than 2 and the
237
distance to the nearest sample peak was less than 0.3 minutes then the
238
intensity value of the sample peak was assigned as the sample value at the
239
average peak location. However, if the (internal-standard-normalized) height of
240
the average peak was less than 2 and the distance to the nearest sample peak
241
was less than 0.15 minutes then the intensity value of the sample peak was
242
assigned as the sample value at the average peak retention time.
243
algorithm was implemented in Perl.
This
244
As a result of this process a new matrix was created that contained the
245
retention time of all peaks in the average chromatogram and the intensity
246
values of all of the peak centres from each sample aligned to these average
247
retention times. Thus a vector was created for each peak (presumably
248
representing a compound) across all samples. An all-against-all comparison
249
was done calculating the Pearson correlation between each and every peak
250
vector. As such, one is able to track the increase or decrease of compounds
ACS Paragon Plus Environment
11
Page 13 of 36
Journal of Agricultural and Food Chemistry
251
(peaks) during the aging process and determine the correlative relationships
252
amongst them.
253
represented the remaining relationships as a mathematical graph in order to
254
form a correlation network with the nodes representing peaks and the edges
255
weighted with the Pearson correlations between the peak vectors. In order to
256
reconstruct the most likely kinetic network underlying the set of chemical
257
reactions involved in the aging process, a maximum spanning tree was created
258
by transforming the edge weights into inverse correlations (by taking the
259
difference between the number 1 and the absolute correlation values) and the
260
subsequent use of a minimum spanning tree (mst) algorithm45 on the this
261
inverse correlation network. A minimum spanning tree represents the shortest
262
possible path through a graph and, as such, selects for the smallest inverse
263
correlation (i.e highest correlation) pairs between all nodes in the network. The
264
resulting networks were visualized in Cytoscape 46.
We applied a Pearson correlation threshold of 0.8 and
265 266
Results / Discussion
267
Principal Component Analysis. Our initial goal for the use of PCA was to
268
examine the intrinsic variation in the data set prior to alignment in order to
269
determine if the volatile fraction of the samples followed a trend related to age.
270
However, when using the GC-FID matrix described above some samples did
271
not follow the latent age variable described by PC1, namely the 4 and 60 year
272
old samples (score plot not shown). The analyses global workflow is described
273
in Figure 1.
274
The loading plot in Figure 2 shows higher levels of acetic acid, 2,3-
275
butanediol, diethyl succinate, diethyl malate, phenylethanol and succinic mono
ACS Paragon Plus Environment
12
Journal of Agricultural and Food Chemistry
Page 14 of 36
276
ethyl ester present in the older samples. The esterification process appears to
277
be the most prevalent reaction amongst the compounds apparent in the loading
278
plot. The organic acids naturally present in grape must, such as malic acid and
279
those present as the result of fermentation, such as lactic, succinic and acetic
280
acids all react with ethanol to yield the esters seen in the loading plot47.
281
However, these molecules are out of the detector’s linear response range, so
282
they needed to be eliminated.
283
aligned because an unavoidable characteristic of all chromatographic data is
284
that the retention times for the peaks in the chromatograms shift slightly from
285
one analysis to another. To address this problem correlation optimized warping
286
(COW) was used to align all of the chromatograms.
Furthermore, the chromatograms must be
287
A new fingerprint was created (GC-FID-COW), by the removal of saturated
288
peaks and baseline correction to yield the GC-FID-X matrix, which was
289
subsequently analysed by PCA. The new score plot shows the same latent age
290
variable described by PC1, but the explained variance of the first two
291
components is 74% (Figure 3) and the samples that did not previously follow
292
the age vector now do so after alignment.
293
The score-plots in Figure 3A and 3B show a clear trend related to wine age,
294
suggesting that the chemical mechanisms are correlated with time, across the
295
first principal component with the first two components explaining 74% of the
296
variance. It appears that the latent age vector remains intact whether the data
297
is mathematically normalised or not.
298
It is important to note that the data for both the PCA and PLS analyses was
299
not mathematically centered or normalized as is commonly done to give all
300
variables equal impact on the model. Centering is usually used as a matter of
ACS Paragon Plus Environment
13
Page 15 of 36
Journal of Agricultural and Food Chemistry
301
convenience for display and mathematically has no impact on the multivariate
302
model. When we view the loadings we chose not to center the data in order to
303
not have negative scores along PC1 and thus to have no negative peaks on the
304
loading plot in order to keep the loading plot looking as much like a normal
305
chromatogram as possible. We are aware that our choice not to normalize
306
means that peaks with higher intensities will have a larger impact on the model
307
and be more apparent in the loading plots shown in Figures 2 and 3. However,
308
this allows the patterns visible in the loading plots to be recognizable to an
309
analytical chemist and therefore is easily read and interpreted as a normal
310
chromatogram. Mathematical normalization unfortunately makes the standard
311
chromatographic patterns unrecognizable as it rescales every peak to the same
312
amount of variance.
313
normalization does not change the fact that PC1 comprises the age vector with
314
the biggest affect of normalization changing the sample distribution along PC2,
315
which is likely to represent vintage and vinification technology effects. This
316
suggests that the aging of wine largely overwhelms the differences between
317
wines that are present due to the season they were made in or variations
318
amongst the approaches used to make them (different yeast strains,
319
temperatures, crushing mechanisms, fermentation tanks, barrels used for
320
maturation, etc.). The primary purpose of PCA and PLS in our pipeline is as a
321
graphical user interface for analytical chemists to use as a screening step for
322
univariate data sets. As such, we strove to present the analytical chemist with a
323
multivariate interpretative environment with which they would be as familiar as
324
possible, namely chromatographic fingerprints with which they have large
325
experience.
In addition, as can be seen from Figure 3A and 3B,
Thus, we feel that, for this part of the analysis, the visual
ACS Paragon Plus Environment
14
Journal of Agricultural and Food Chemistry
Page 16 of 36
326
representation of the chromatographic loading plots outweighs the assignment
327
of equal weights to every variable in the model. Furthermore, we address this
328
variable normalization issue with the use of network reconstruction via Pearson
329
correlation networks and maximum spanning trees. In the network analysis,
330
every peak is analyzed and has an equal opportunity to form a part of the
331
network and Pearson correlation includes vector normalization.
332
During aging there are likely to be several different mechanisms involved,
333
including oxidation and Maillard reactions. In PC1 the samples correlate with
334
the age of the wine, which points out that the overall kinetic system overrides
335
any one specific mechanism.
336
mechanisms at play are more relevant to sample classification than the
337
contributions of any individual mechanism.
As such, the connections between the
338
After alignment, compounds which appear to correlate with port wine aging
339
as found in the loading plots were: cis-dioxane, cis-dioxolane, trans-dioxolane
340
trans-dioxane, octanoic acid and HMF as shown in Figure 3C.
341
The cis and trans dioxane and dioxolane are formed by the condensation
342
reaction between glycerol and acetaldehyde. These molecules were identified in
343
Port wine by Silva Ferreira et al48 who noted that they increased with age, and,
344
as such, could be used as age markers for port wine kept under oxidative
345
conditions. Furanic compounds, HMF, 5MF and furfural are thought to be
346
products from the Maillard reaction, formed by the fragmentation or cyclization
347
of 3-deoxyosone, a highly reactive intermediary of the reaction49. Barrel oak can
348
also be a source of HMF and furfural50.
349
ACS Paragon Plus Environment
15
Page 17 of 36
Journal of Agricultural and Food Chemistry
350
Partial Least Squares Analysis. Partial Least Squares (PLS) analysis was
351
used on the GC-FID-X matrix in an effort to associate specific peaks/fingerprint
352
regions with mechanisms known to be involved in aging. It is worth noting that
353
PLS was not used in its traditional role as a method with which to build
354
calibrated, predictive models (that would therefore be built with training sets and
355
validated with independent test sets). Rather, the goal of our use of PLS-1 was
356
simply as a method with which to perform principal-component-based
357
regressions in an effort to identify sets of peaks that were associated with
358
known mechanistic markers or potential precursors for volatile compounds.
359
Molecules that are thought to be associated with different mechanisms were
360
selected and quantified from each sample and used as markers to try and find
361
other compounds in GC-FID-X that may be related with the same
362
mechanism. Acetaldehyde and HMF were used as markers for oxidation and
363
the Maillard reaction respectively. Sotolon was also used in an effort to gather
364
more information about its origin. The concentration of each of the marker
365
molecules was determined for each sample and the resulting vectors used as a
366
second data block in PLS.
367
The resulting PLS coefficient plots show the variables which correlate with
368
each mechanism-marker. Some variables have a positive value, which means
369
that these have kinetic vectors which correlate with that of the mechanism-
370
marker, and some have negative values, which indicate that they have an
371
inverse correlation with the kinetic vector of the mechanism-marker.
372
For sotolon the correlation is 0.89 (over 7 components) for molecules such as
373
furfural, 5-MF, cis dioxane, cis-dioxolane, trans-dioxane, trans-dioxolane and
374
HMF. We also found some organic acids with negative correlations, which
ACS Paragon Plus Environment
16
Journal of Agricultural and Food Chemistry
Page 18 of 36
375
indicate that they were being consumed as sotolon was being formed (Figure
376
4). The model had a correlation of 0.91 for HMF (a Maillard reaction marker)
377
and 0.92 to acetaldehyde (an oxidation marker) for 7 latent variables. The PLS
378
loading plot for sotolon was very similar to those seen for HMF and
379
acetaldehyde which means that the mechanisms are correlated, and during
380
aging contribute in the same way to the dynamics of the overall process.
381
Some amino acids, namely, valine, alanine, arginine, glutamine and aspartate,
382
had relatively high (0.69-0.83) inverse correlations with a number of peaks in
383
the volatile profile which were themselves correlated to Maillard reaction
384
markers such as such as HMF and furfural. Thus it seems likely that these
385
amino acids are major Maillard aroma precursors.
386 387
Network Reconstruction. Figure 5 shows the maximum spanning tree derived
388
from the correlation network between all peaks.
389
center of a peak (Kovats index) and each edge represents the best correlation
390
between the peaks. Fold changes between 2 year old and 60 year old wines
391
were calculated for each peak and the nodes colored accordingly with shades
392
of red representing increasing concentration and blue representing decreasing
393
concentration. The thickness of each edge has been scaled to represent the
394
level of correlation (thicker lines mean higher correlation values). Furthermore,
395
the size of each node has been scaled to represent the number of correlations it
396
had with other peaks above a threshold of 0.8.
Each node represents the
397
Using pure standards as markers for Maillard and Oxidation together with the
398
Kovats index it is possible to explore and extract more information from the
399
proposed network. In fact the cyclic acetals of glycerol and ethanal (oxidation
ACS Paragon Plus Environment
17
Page 19 of 36
Journal of Agricultural and Food Chemistry
400
products) cluster together on the upper part of Figure 5.
401
ethoxymethylfurfural, an ester of a major Amadori product (HMF), links two
402
major branches of the network.
403
process with the continuous formation of substances absent in young wines,
404
which explains the aging character of wine. The volatile compounds that relate
405
to 5-ethoxymethylfurfural are co-expressed during aging, thus presenting similar
406
kinetics, and future work will focus on their identification using the respective
407
Kovats
408
representation captures some of the dynamics of the aging process based on
409
the underlying kinetics. In fact those compounds that are highly correlated to
410
one another (>0.9) are likely to have the same kinetic order and the network
411
thus enables one to screen molecules according to their kinetic parameters.
index
and
rich
In addition, 5-
As such, the network illustrates the aging
information
detectors
like
MS.
The
network
412
We propose that this network is an approximate representation of the
413
underlying chemical reaction network during aging. The higher level of
414
correlation (and therefore the nearer any two peaks are to each other in the
415
network) the higher the probability is that they participate in the same or
416
neighboring reactions. The correlation between compounds drops as you move
417
further away in a chemical reaction network, as the intervening kinetics of each
418
reaction will cause differences at each step. There are no doubt intermediate
419
compounds for some reactions that were not detected by FID. The network is
420
robust to this missing data as the intervening steps will simply be represented
421
by a lower correlation value of an edge between compounds that were
422
detected. This correlative approach of course is not proof of causation but
423
rather serves as a useful tool for hypothesis generation in order to prioritise the
424
identification of unkown compounds represented by the peaks.
ACS Paragon Plus Environment
18
Journal of Agricultural and Food Chemistry
Page 20 of 36
425
By using correlation values to targeted compounds (or other variables such
426
as age) we found that we could highlight the regions of the network that are
427
closely associated with them and therefore likely involved in their formation or
428
consumption. In order to explore regions of the network that may be related to
429
age and particular mechanisms, Pearson correlation values between the peaks
430
and each of the target vectors (sample age, sotolon, acetaldehyde, HMF,
431
glutamate and alanine) were loaded into cytoscape as node annotations.
432
Alanine and glutamate were selected as target vectors because they were the
433
amino acids best correlated with the GC-FID-X matrix. By sorting the nodes by
434
correlation values and selecting the nodes corresponding to a correlation value
435
with a target above some threshold, portions of the network which correlated
436
with each target vector could be visualized in aqua as shown in Figure 6.
437
Figure 6A shows the nodes with a 0.86 Pearson correlation to the age of the
438
wines. There is a clearly defined subnetwork that corresponds to age and
439
represents compounds involved in the aging process. It was clear in the PCA
440
diagrams that there are a group of compounds that correspond to aging which
441
are responsible for the first principal component. It is likely that the compounds
442
responsible for the second principle component are due to differences between
443
the vintages of the starting wines. We can see the same pattern in this network
444
where there are a number of compounds that do not correlate well with age and
445
are likely reflecting vintage differences amongst the wines.
446
Figures 6 B, C and D show the regions of the network (colored as aqua) that
447
correlate (Pearson threshold 0.86) with sotolon, HMF and acetaldehyde,
448
respectively.
449
subnetworks correlated with these three compounds and as such it is possible
It is clear that there is considerable overlap between the
ACS Paragon Plus Environment
19
Page 21 of 36
Journal of Agricultural and Food Chemistry
450
that there is a mixture of oxidation and maillard reactions at play in forming
451
these compounds.
452
threshold of 0.86 in Figure 6A clearly overlaps to a very large degree with the
453
subnetworks defined by the correlations to acetaldehyde, HMF and sotolon. The
454
arrow in Figure 6A shows the node that negatively correlates to both alanine
455
and glutamate (-0.8 Pearson threshold) and as such likely represents the entry
456
point to the volatile network. The fact that the anti-correlation is relatively low (-
457
0.8 to -0.83) probably indicates that there are one to several intermediates
458
between the amino acids and their products’ entry into the volatile network.
The subnetwork that correlates with age at a Pearson
459
We believe that we have demonstrated that the approach described here
460
using GC-FID univariate data, when used as sample fingerprints, can be used
461
to classify the age of Port wine and to predict potential molecules involved in
462
this process by the deconvolution of time and the kinetics of different aging
463
mechanisms. We began this analysis with the hope that multivariate analysis
464
and network reconstruction would be useful tools with which to study
465
mechanisms related to a perturbation. The PCA score plots allow for sample
466
classification and the visualization of larger peaks that correspond with aging.
467
The PLS loading plots provide the analytical chemist with a familiar set of
468
patterns, namely virtual chromatograms which point out the larger peaks that
469
appear to be associated with marker compounds for oxidation and Maillard
470
mechanisms.
471
relationships between all the compounds detected via GC-FID and their
472
changes in concentration over time.
473
considerably more information in an effort to understand the probable kinetic
474
contexts of the molecules represented by peaks in each chromatogram.
The network reconstruction is very useful in visualizing the
This view of the data should provide
ACS Paragon Plus Environment
20
Journal of Agricultural and Food Chemistry
Page 22 of 36
475
Furthermore, it is possible to identify regions of the network that appear to be
476
involved in the formation or consumption of target compounds. As such, the
477
approach described here should indeed be a very powerful tool for the further
478
study of mechanisms and kinetic networks in complex mixtures.
479
In conclusion, univariate chromatographic signals are less expensive
480
compared to NMR or MS and therefore constitute a valuable tool in a bio-
481
analytical pipeline.
482
approach required the development and use of new data processing methods
483
and graphical user interfaces in order to extract the maximum amount of
484
information from the data. The approach reported here enables us to: 1)
485
minimize the cost of analysis; 2) perform sample classification and
486
contextualization; 3) perform process monitoring of aging or other time series;
487
4) consider the prospect of building databases from the large amounts of
488
univariate data already available; 5) screen for correlations with known
489
mechanism markers; 6) select biomarkers for identification and further study
490
and 7) explore the putative kinetic network for a greater understanding of the
491
process being studied.
However, the use this type of data in an untargeted
492 493
Acknowledgments
494
The authors would like to thank Piet Jones, Guy Emerton and Debbie
495
Weighill for useful discussions about the networks presented in this paper.
496
This research was funded by the project “Wine Metrics: Revealing the Volatile
497
Molecular Feature Responsible for the Wine Like Aroma a Critical Task Toward
498
the Wine Quality Definition.” (PTDC/AGR-ALI/121062/2010), partially supported
499
by ESB/UCP plurianual funds through the POS-Conhecimento Program that
ACS Paragon Plus Environment
21
Page 23 of 36
Journal of Agricultural and Food Chemistry
500
includes FEDER funds through the program COMPETE (Programa Operacional
501
Factores de Competitividade) by Portuguese national funds through FCT
502
(Fundação para a Ciência e a Tecnologia), Winetech, the Technology and
503
Human Resources Programme and the South African National Science
504
Foundation.
ACS Paragon Plus Environment
22
Journal of Agricultural and Food Chemistry
Page 24 of 36
505
506
References
507
(1) Maillard, L. C. Action des acides amines sur les sucres formation des
508
melanoidines par voie methodique. Council of Royal Academy Science Series
509
2. 1912 154, 66-68.
510
(2) Hodge, J. E. Dehydrated foods, chemistry of browning reactions in model
511
systems. J. Agric. Food Chem. 1953, 1, 928-943.
512
(3) Hoffman, T.; & Schieberle, P. Evaluation of the key odorants in a thermally
513
treated solution of ribose and cysteine by aroma extract dilution techniques. J.
514
Agric. Food Chem. 1995, 43, 2187-2194.
515
(4) Hoffman, T.; Schieberle, P. Identification of potent aroma compounds in
516
thermally treated mixtures of glucose/cysteine and rhamnose/cysteine using
517
aroma extract dilution. J. Agric. Food Chem. 1997, 45, 898-906.
518
(5) Pripis-Nicolau, L.; Revel, G.; Bertrand, A.; Maujean, A. Formation of Flavor
519
Components by the Reaction of Amino Acid and Carbonyl Compounds in Mild
520
Conditions. J. Agric. Food Chem. 2000, 48, 3761-3766.
521
(6) Marchand, S.; Revel, G. (2000). Approaches to wine aroma: release of
522
aroma compounds from reactions between cysteine and carbonyl compounds in
523
wine. J. Agric. Food Chem. 2000, 48, 4890-4895.
524
(7) Silva, H. O.; Machado, B. P.; Hogg, T.; Marques, J. C.; Câmara, J. S.;
525
Albuquerque, F.; Silva Ferreira, A. C. Impact of Forced-Aging Process on
526
Madeira Wine Flavor. J. Agric. Food Chem. 2008, 56, 11989-11996.
527
(8) Singleton, V. Oxygen with phenols and related reactions in musts, wines and
528
model systems:Observations and pratical implications. Am. J. Enol. Vitic. 1987 ,
529
38, 69-77.
530
(9) Danilewicz, J. C. Review of reaction mechanisms of oxygen and proposed
531
intermediates reduction products in wine: central role of iron and copper. Am. J.
532
Enol. Vitic. 2003, 54, 73-85.
533
(10) Waterhouse, A. L.; Laurie, V. F. Oxidation of wine phenolics: a critical
534
evaluation and hypotheses. Am. J. Enol. Vitic. 2006, 57, 306-313.
535
(11) du Toit, W.; Marais, J.; Pretorius, I.; du Toit, M. Oxygen in Must and Wine:
536
A review. S. Afr. J. Enol. Vitic. 2006, 27, 76-94.
ACS Paragon Plus Environment
23
Page 25 of 36
Journal of Agricultural and Food Chemistry
537
(12) Kilmartin, P. The oxidation of red and white wines and its impact on wine
538
aroma. Chemistry in New Zeland. 2009, 18-22.
539
(13) Sulser, H.; Depizzol, J.; Buchi, W.A probable flavoring principle in
540
vegetable-protein hydrolysates. J. Food Sci. 1967, 32, 611-615.
541
(14)
542
dimethyltetrahydrofuran-2,3.dione in a Flor sherry wine. Lebensm. Wiss.
543
Technol. 1976, 9, 366-368.
544
(15) Masuda, M.; Okawa, E.; Nishimura, K.; Yunome, H. Identification of 4,5-
545
dimethyl-3-hydroxy-2(5H)-furanone (Sotolon) and ethyl 9-hydroxynonanoate in
546
Botrytised wine and evaluation of the roles of compounds characteristics. Agric.
547
Biol. Chem. 1984, 48, 2707-2010.
548
(16) Silva Ferreira, A. C. Caractérisation du vieillissement du vin de Porto.
549
Approche chimique et statistique. Rôle aromatique du sotolon. Ph.D. Thesis,
550
Université Victor Segalen Bordeaux 2, 1998.
551
(17) Câmara, J. S.; Marques, J. C.; Alves, M. A.; Silva Ferreira, A. C. 3-
552
Hydroxy-4,5-dimethyl-2(5H)-furanone. J. Agric. Food Chem. 2004, 52, 6765-
553
6769.
554
(18) Lavigne, V.; Pons, A.; Darriet, P.; Dubourdieu, D. (2008). Changes in the
555
sotolon content od dry white wines during barrels and bottle aging. J. Agric.
556
Food Chem. 2008, 56, 2688-2693.
557
(19) Silva Ferreira, A. C.; Barbe, J.; Bertrand, A. 3-Hydroxy-4,5-dimethyl-2(5H)-
558
furanone: A key odorant of the typical aroma of oxidative aged Port wine. J.
559
Agric. Food Chem. 2003, 51, 4356-4363.
560
(20) Silva Ferreira, A.; Avila, I.; Guedes de Pinho, P. Sensorial Impact of
561
Sotolon as the "Perceived Age" of Aged Port Wine. In Natural Flavors and
562
Fragrances, First Edition; Frey, C., Rouseff, R., Eds; ACS Symposium Series;
563
A. C. Society: Washington, DC, 2005, 10, pp. 141-159.
564
(21) Hofman, T.; Schieberle, P. Identification of the key odorants in processed
565
ribose-cysteine Maillard mixtures by instrumental analysis and sensory studies.
566
Spec. Publ. - R. Soc. Chem. 1996, 197, 175-81.
567
(22) Pham, T. T.; Guichard, E.; Schlich, P.; Charpentier, C. Optimal conditions
568
for the formation of sotolon from alfa-ketobutyric acid in the French Vin Jaune.
569
J. Agric. Food Chem. 1995, 43, 2616-2619.
Dubois,
P.;
Rigaud,
J.;
Dekimpe,
J.
Identification
of
4,5-
570
ACS Paragon Plus Environment
24
Journal of Agricultural and Food Chemistry
Page 26 of 36
571
(23) Cutzach, I., Chatonnet, P., & Dubourdieu, D. Rôle du sotolon dans l'arôme
572
des vins doux naturels, influence des conditions d'élevage et de vieillissement.
573
J. Int. Sci. Vigne Vin 1998, 32, 223-233.
574
(24) Silva Ferreira, A. C., Hogg, T., & Guedes de Pinho, P. Identification of key
575
odorants related to the typical aroma of oxidation-spoiled white wines. J. Agric.
576
Food Chem. 2003, 51, 1377-1381.
577
(25) Escudero, A., Cacho, J., & Ferreira, V. Isolation and identification of
578
odorants generated in wine during its oxidation: a gas chromatography-
579
olfactometric study. Eur. Food Res. Technol. 2011, 211, 105-110.
580
(26) Martins, R. C.; Lopes, V. V.; Silva Ferreira, A. C. Port Wine Oxidation
581
Management: A ChemoInformatics Approach. 60th American Society for
582
Enology and Viticulture Meeting. 2006
583
(27) Cevallos-Cevallos, J.; Reyes-De-Corcuera, J.; Etxeberria, E.; Danyluk, M.;
584
Rodrick, G. Metabolomics analysis in food science: a review. Trends Food Sci.
585
Technol. 2009, 20, 557-566.
586
(28) Rohman, A.; Che Man, Y. B.; 2010. Fourier transform infrared (FTIR)
587
spectroscopy for analysis of extra virgin olive oil adulterated with palm oil. Food
588
Res. Int. 2010, 43, 886-892.
589
(29) Coimbra, M.; Alves, F. G.; Barros, A.; & Delgadillo, I. Fourier Transform
590
Infrared
591
Polysaccharide Extracts. J. Agric. Food Chem. 2002, 50, 3405-3411.
592
(30) Zhang, J.; Cui, M.; Yun, H.; Yu, H.; Guo, D.; Chemical fingerprint and
593
metabolic fingerprint analysis of Danshen injection by HPLC-UV and HPLC-MS
594
methods. J. Pharm. and Biomed. Anal. 2005, 36, 1029-1035.
595
(31) Shen, D.; Wu, Q.; Sciarappa, W.; Simon, J. Chromatographic fingerprints
596
and quantitative analysis of isoflavones in Tofu-type soybeans. Food Chem.
597
2012, 130, 1003-1009.
598
(32) Ding, Y.; Wu, E.; Liang, C.; Chen, J.; Tran, M.; Hong, C. Discrimination of
599
cinnamon barl and cinnamon twig samples sourced from various countries
600
using HPLC-based fingerprint. Food Chem. 2011, 127, 755-760.
601
(33). Gu, H. NMR and MS-based metabolomics: Development and applications.
602
Ph. D. Thesis Purdue University. 2008.
Spectroscopy
and
Chemometric
Analysis
ACS Paragon Plus Environment
of
White
Wine
25
Page 27 of 36
Journal of Agricultural and Food Chemistry
603
(34) Lee, J.-E.; Hwang, G.-S.; Van Den Berg, F.; Lee, C.-H.; Hong, Y. Evidence
604
of vintage effects on grape wines using H-NMR-based metabolomic study. Anal.
605
Chim. Acta. 2009, 19, 71-76.
606
(35) Cuadros-Inostroza, A.; Giavalisco, P.; Hummel, J.; Eackard, A.; Willmitzer,
607
L.; Peña-Cortés, H. Discrimination of wine attributes by metabolome analysis.
608
Anal. Chem. 2010, 82, 3573-3580.
609
(36) Consonni, R.; Cagliani, L.; Guantieri, V.; Simonati, B. (2011). Identification
610
of metabolic content of selected Amarone wine. Food Chem. 2011, 129, 693-
611
699.
612
(37) Son, H.; Hwang, G.; Ahn, H.; Park, W.; Lee, C.; Hong, Y. Characterization
613
of wines from grape varieties through multivariate statistical analysis of H NMR
614
spectroscopic data. Food Res. Int. 2009, 42, 1483-1491.
615
(38) Gong, F.; Wang B.; Chau, F.; Liang, Y. Data preprocessing for
616
chromatographic fingerprint of herbal medicine with chemometric approaches.
617
Anal. Lett. 2005, 38, 2475-2492.
618
(39) Maillard, B. Additions radicalaires de diols et de leurs dérivés: diesters et
619
acétals cycliques. Ph.D. Thesis (nº. 325); University Bordeaux 1; Bordeaux,
620
France, 1971.
621
(40). Van den Dool, H.; Kratz, P. D. A generalization of the retention index
622
system
623
chromatography. J. Chromatogr., A. 1963, 11, 463-471.
624
(41) Pipris-Nicolau, L.; Revel, G.; Marchand, S.; Beloqui, A. A.; Bertrand, A.
625
Automated HPLC method for the measurement of free amino acids including
626
cysteine in musts and wines; first applications. J. Sci. Food Agric. 2001, 81,
627
731-738.
including
linear
temperature
programmed
gas-liquid
partition
628
(42) Fan, G.; Liang, Y.-Z.; Ying-Sing, F.; Chau, F. Correction of retention time
629
for chromatographic fingerprints of herbal medicines. J. Chromatogr. A. 2004,
630
1029, 173-183.
631
(43) Savitzky, A.; Golay, M.J.E. Smoothing and Differentiation of Data by
632
Simplified Least Squares Procedures. Anal. Chem.1964, 36, 1627–1639.
633
(44) Du, P.; Kibbe, W. A.; Lin, S. M. Improved peak detection in mass spectrum
634
by incorporating continuous wavelet transform-based pattern matching.
635
Bioinformatics. 2006, 22, 2059-2065.
ACS Paragon Plus Environment
26
Journal of Agricultural and Food Chemistry
Page 28 of 36
636
(45) Dijkstra, E. W. A note on two problems in connection with graphs.
637
Numerische Mathematik 1959, 1, 269–271.
638
(46) Shannon, P.; Markiel, A.; Ozier, O.; Baliga, N. S.; Wang, J. T.; Ramage,
639
D.; Amin, N.; Schwikowski, B.; Ideker, T. Cytoscape: a software environment for
640
integrated models of biomolecular interaction networks. Genome Res. 2003, 13,
641
2498-2504.
642
(47) Ribéreau-Gayon, P.; Glories, Y.; Maujean, A.; Dubourdieu, D. (2006).
643
Handbook of Enology: The Chemistry of Wine Stabilization and Treatments.
644
John Wiley & Sons Ltd, Chichester, England, 2006; 2, pp. 51-64
645
(48) Silva Ferreira, A. C.; Barbe, J.; Bertrand, A. Heterocyclic acetals from
646
glycerol and acetaldehyde in Port wines: Evolution with aging. J. Agric. Food
647
Chem. 2000, 50, 2560-2564.
648
(49) Martins, S.; Jongen, W.; Boekel van, M. A review of Maillard reaction in
649
food and implications to kinetic modelling. Trends Food Sci. Technol. 2001, 11,
650
364-373.
651
(50) Moutounet, M.; Rabier, P.; Puech, J. L.; Verette, E.; Barillere, M. Analysis
652
by HPLC of extractable substances of oak wood- application to a chardonnay
653
wine. Sci. Aliments. 1989, 9, 35- 51.
654
ACS Paragon Plus Environment
27
Page 29 of 36
Journal of Agricultural and Food Chemistry
Figure captions Figure 1. Proposed workflow for univariate (chromatographic) signal processing Figure 2. Raw chromatogram overlay of all samples (n=31) and Loading plot (PC1) representing the average chromatogram GC-FID chromatogram. (1) ethyl lactate, (2) acetic acid, (3) 2,3-butanediol, (4) diethyl succinate, (5) phenylethanol, (6) diethyl malate and (7) succinic monoethyl ester. Figure 3. PCA score plots of cleaned and COW-aligned chromatograms: (A) un-normalized (B) normalized.
Colours denote wines of age 2 to 7 years
(yellow), 10 to 42 years (blue) and 48 to 60 years (pink). (C) Loading plot of PC1 with 9 of the peaks identified as (1) furfural, (2) cis dioxane, (3) benzaldehyde, (4) 5MF, (5) cis dioxolane, (6) trans dioxolane, (7) octanoic acid, (8) unknown and (9) HMF. Figure 4. Figure 4. PLS Loading Plot with Sotolon as the Y vector for the first latent variable. Figure 5. Putative Kinetic Network.
Nodes are coloured in shades of red
based on the fold change from 2 to 60 years. Node sizes are scaled by the number of other nodes (peaks) that are correlated to them above a Pearson threshold of 0.8. Edge thickness is scale by the degree of correlation between its two nodes. (Dioxanes in the network are labelled as follows: cisdioxane: Diox1; cisdioxolane : Diox2-; transdioxolane: Diox3 and transdioxane: Diox4). Figure 6. Subnetworks correlating to A) Age, B) Sotolon, C) HMF and D) Acetaldehyde. Nodes (compounds) with strong Pearson correlations to these target vectors are colored with aqua.
ACS Paragon Plus Environment
28
Journal of Agricultural and Food Chemistry
Page 30 of 36
Figure 1. Proposed workflow for univariate (chromatographic) signal processing.
ACS Paragon Plus Environment
29
Page 31 of 36
Journal of Agricultural and Food Chemistry
Figure 2. Raw chromatogram overlay of all samples (n=31) and Loading plot (PC1) representing the average chromatogram GC-FID chromatogram. (1) ethyl lactate, (2) acetic acid, (3) 2,3-butanediol, (4) diethyl succinate, (5) phenylethanol, (6) diethyl malate and (7) succinic monoethyl ester.
ACS Paragon Plus Environment
30
Journal of Agricultural and Food Chemistry
Page 32 of 36
Figure 3. PCA score plots of cleaned and COW-aligned chromatograms: (A) un-normalized (B) normalized.
Colours denote wines of age 2 to 7 years
(yellow), 10 to 42 years (blue) and 48 to 60 years (pink). (C) Loading plot of PC1 with 9 of the peaks identified as (1) furfural, (2) cis dioxane, (3) benzaldehyde, (4) 5MF, (5) cis dioxolane, (6) trans dioxolane, (7) octanoic acid, (8) unknown and (9) HMF.
ACS Paragon Plus Environment
31
Page 33 of 36
Journal of Agricultural and Food Chemistry
Figure 4. PLS b coefficients for Sotolon as the Y vector with 7 latent variables.
ACS Paragon Plus Environment
32
Journal of Agricultural and Food Chemistry
Page 34 of 36
Figure 5. Putative Kinetic Network. Nodes are coloured in shades of red based on the fold change from 2 to 60 years. Node sizes are scaled by the number of other nodes (peaks) that are correlated to them above a Pearson threshold of 0.8. Edge thickness is scale by the degree of correlation between its two nodes. (Dioxanes in the network are labelled as follows: cisdioxane: Diox 1cisdioxane;, cisdioxolane : Diox 2-cisdioxolane;, transdioxolane: Diox 3transdioxolane and, transdioxane: Diox 4-transdioxane).
ACS Paragon Plus Environment
33
Page 35 of 36
Journal of Agricultural and Food Chemistry
Figure 6. Subnetworks correlating to A) Age, B) Sotolon, C) HMF and D) Acetaldehyde. Nodes (compounds) with strong Pearson correlations to these target vectors are colored with aqua.
ACS Paragon Plus Environment
34
Journal of Agricultural and Food Chemistry
Page 36 of 36
TOC Graphic
ACS Paragon Plus Environment
35