Joint NMR and Solid-Phase Microextraction–Gas Chromatography

Jun 1, 2016 - In very complex mixtures, classification by chemometric methods may be limited by the difficulties to extract from the NMR or gas chroma...
0 downloads 8 Views 728KB Size
Subscriber access provided by UOW Library

Article

A joint NMR and SPME-GC chemometric approach for very complex mixtures: Grape and zone identification in wines Manuel Martin-Pastor, Esteban Guitian, and Ricardo Riguera Anal. Chem., Just Accepted Manuscript • DOI: 10.1021/acs.analchem.5b04505 • Publication Date (Web): 01 Jun 2016 Downloaded from http://pubs.acs.org on June 2, 2016

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Analytical Chemistry is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 26

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

A joint NMR and SPME-GC chemometric approach for very complex mixtures: Grape and zone identification in wines Manuel Martin-Pastor a, Esteban Guitian b and Ricardo Riguera c* a

Unidade de Resonancia Magnética, RIAIDT, Edif. CACTUS, Universidad de Santiago, Campus Vida, Santiago de Compostela,15782, SPAIN.

b

Unidade de Espectrometría de Masas, RIAIDT, Edif. CACTUS, Universidad de Santiago, Campus Vida, Santiago de Compostela,15782, SPAIN.

c

Centro Singular de Investigacion en Química Biológica y Materiales Moleculares (CIQUS). Universidad de Santiago, Santiago de Compostela, 15782 SPAIN.

Author Information *Corresponding Author E-mail: [email protected]; Phone +34 881815728; FAX: +34-881815704

Notes The authors declare no competing financial interest.

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Abstract. In very complex mixtures, classification by chemometric methods may be limited by the difficulties to extract from the NMR or GC-MS experimental data, information useful for a reliable classification. The joint analysis of both data has showed its superiority in the biomedical field but scarcely used in foodstuffs and never in wine in spite of the complexity of their spectra and classification. In this article we show that univariate and multivariate PCA-DA statistics applied to the combined 1H NMR and SPME-GC data of a collection of 270 wines from Galicia (NW Spain), allows a discrimination and classification not attainable from the separate data, distinguishing wines from autochtonous and non authochtonous grapes, mono from the plurivarietals and identifiying in part, the geographical subzone of origin of the albariño wines. A general and automatable protocol, based on the signal integration of selected ROIs, (Regions of Interest) is proposed that allows the fast and reliable identification of the grape in Galician wines.

Keywords: PCA-DA; classification; H-NMR; SPME-GC-MS; chemometrics, Galician wines.

1 ACS Paragon Plus Environment

Page 2 of 26

Page 3 of 26

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Introduction

Metabolic profiling using chemometrics constitutes a very useful and widely employed approach for the classification of mixtures such as biological fluids1 and the authentication of foodstuffs2 and beverages, including wine.3 Most chemometric approaches are based on NMR or in GC-MS experimental data because these two techniques not only provide information useful to define a fingerprint but additionally allow the identification of the metabolites associated to those signals. Naturally, the complexity of sample, the properties of the components relevant for the classification, their relative importance and the similarity among samples of the collection, are critical to decide on the more adequate spectroscopic technique. Thus, while GC-MS may be perfect for volatile and minor components, and NMR for very polar, non volatile samples, the lower sensitivity of NMR may represent a limitation to minor components. In fact, uncertainties in the assignment caused by peak overlapping and on integration of the NMR signals are very common.4-5 These limitations have been solved in the biomedical field carrying out the chemometric analysis on the combined NMR and SPME-GC-MS data6-10 instead of using the partial data separately. Surprisingly, this combined approach has been scarcely used with foodstuffs11 and never reported for the classification of wines (for a recent example on the combination of data from different analytical platforms for wine classification, see reference 12), that in general, have been addressed using either NMR4,5,13-16 or GCMS17-19 data. Galicia, in NW Spain is a small region known by the quality and variety of its white fresh and fruity mono and plurivarietals wines made basically from the autochthonous varieties, Albariño, Godello and Treixadura that give them a certain similarity in taste but a clear difference with respect to the most abundant Spanish white wines made from Palomino, Verdejo or Viura grapes A few studies on the authentication of monovarietal wines made from authochtonous Galician grapes, has been recently carried out20-21 by GC/MS. The most recent21, was found to require the analysis of the complete families of volatile compounds in order to get a good classification. 2 ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Nevertheless, when we explored the use of SPME-GC data for classification of a much larger collection (271 samples), including mono and plurivarietals made from authochtonous and non authochtonous grapes, we arrived to an incomplete classification. The same result was obtained when the 1H-NMR data of intact wine samples, was employed for the chemometrics calculations. This failure, is probably related to the very diverse properties and concentrations of wine components, that could hardly be accurately determined with a single technique, and moved us to examine the potential of the combined 1H-NMR (chemical shifts and integration) and SPME-GC (elution time and integration) data for the classification of those wines. Here, we present our results demonstrating that application of multivariate statistics to the combined 1H NMR and SPME-GC experimental data, allows a perfect chemometric discrimination and classification of the wines according to the type of grape or, in the case of the most appreciated Albariño wines to distinguish those coming from different geographical subzones. The potential of this approach is further demonstrated with the presentation of a fast and automatable protocol for the classification of wines based on the 1H NMR and SPME-GC data of the unknown samples.

Experimental Section Wine samples A collection of 271 commercial white wines formed by 260 wines from Galician PDOs Rias Baixas, Monterrey, Ribeiro and Valdeorras, and 11 from other regions, was used in this study. The wines come from two consecutive years (vintages 2001 and 2012), and were obtained by the Dirección Xeral de Innovación e Industrias Agrarias e Forestais directly from the producers and just before they were sent to market. According to their composition, 151 wines are monovarietals from the authochtonous Albariño, Godello, and Treixadura grapes; 9 are monovarietals from the non-autochthonous Galician cultivated Palomino; 69 wines are plurivarietals from those four grapes, while 31 samples are plurivarietals prepared in the laboratory by mixing Albariño, Godello, Treixadura and Palomino monovarietals in known 3 ACS Paragon Plus Environment

Page 4 of 26

Page 5 of 26

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

ratio. In addition, 11 monovarietal from non-Galician Verdejo and Viura were also included in the study. The distribution of the samples in groups according to grape variety and geographical origin is shown in the Results and discussion part. The wine samples were measured immediately after opening the bottle.

NMR analysis The NMR spectra of the collection of wines were measured at 300 K in an 11.7 T Bruker Advance I spectrometer (proton frequency 500 MHz). The spectrometer was coupled to a sample changer with capacity for 60 samples and the process of shimming and acquisition of spectra was fully automatized which provided high throughput and reproducibility. The quality of each NMR spectrum was checked according to the following three criteria: i) the internal reference peak of TSP has symmetric shape and narrow linewidth (good quality of the auto-shimming). ii) Small intensity for the residual peaks of water and ethanol (good quality of the water and ethanol signal suppression), iii) fully positive phase for all the visible peaks of the metabolites (good quality of the bucket integration). The preparation of a wine sample for NMR measurement involves the transfer to a 5 mm standard NMR tube of a volume of 0.5 mL of the wine from a recently open bottle, followed by the addition of 0.1 mL of a stock solution containing 1 M of deuterated sodium acetate buffer pH 4.5 with 0.5 mM of the sodium salt of (trimethyl) propionic-2,2,3,3 D acid (TSP) as internal reference (δTSP= 0 ppm) of chemical shift. 2223

One-dimensional 1D 1H NMR spectrum of wine the samples were taken (total acquisition time: 10 min., approximately), using a 1D presaturation NOESY pulse sequence modified for triple signal presaturation. The following acquisition conditions were used: 256 scans, 8 dummy-scans, 16384 data points, 10 kHz spectral width, acquisition time is 1.63 s and mixing time 150 ms. The resonances of water (~4.7 ppm) and ethanol (3.63 and 1.19 ppm) were selectively presaturated during the inter-scan relaxation delay (d1) of 2 s with a train of low power frequency selective pulses that reduced their intensity. The presaturation was carried out with 10 ms gaussian shaped 4 ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

pulses applied at the positions of water (~4.7 ppm) and ethanol (3.63 and 1.19 ppm). The power of each shaped pulse was adjusted individually between 20 and 24 dB to achieve the best possible suppression of each peak. The parameters were optimized manually for the first sample to be measured and then used for the remaining ones. NMR spectra of the wine samples were processed with software MestreNova 8.1 (Mestrelab Research). Free induction decays (FIDs) were Fourier transformed with 0.3 Hz line broadening, manually phased and baseline corrected. The spectra were referenced to the TSP signal (δTSP=0 ppm), ordered by wine type and represented in stacked mode for the subsequent chemometric analysis. Identification of wine components was carried out by comparison with literature NMR data.5,14,24-25 The results for Albariño wine sample ALB-75, taken here as representative of the collection, are reported in Table S1 and Figure S1 of the Supporting Information.

SPME-GC and MS analysis SPME-GC of the collection were measured in a Bruker 450-GC instrument coupled with a Combipal system, using a BR-5ms column (30 m x 0.25 mm i.d., 0.25 µm film thickness). Splitless injections were performed with the injector maintained at 250 ºC for 20 min. The temperature of the column oven was increased in three steps as follows: 40 ºC (2 min) to 225 ºC (0 min) at a rate of 5 ºC/min, to 280 ºC (4 min) at a rate of 20 ºC/min. The helium gas carrier was maintained at a constant column flow of 1.0 ml/min. The detection was performed by a Bruker 320-MS mass spectrometer in positive Electron Impact (EI+) and full scan mode in the 50-350 m/z mass range. The instrument was coupled to a Combipal system with capacity for 32 SPME samples and the process fully automatized providing high throughput and reproducibility. The preparation of the wine samples for SPME-GC-MS involved the following: i) 5 ml of wine were transferred to a 20 ml amber glass vial (headspace volume: 15 ml), and 2 gr of salt were added to increase the concentration of the volatile components in the headspace vapour, ii) the vial was closed and headspace extraction was performed with a 100 µm PDMS fibre (Supelco) and continuous stirring, for 40 min at 35 ºC, and iii) the compounds were desorbed by inserting the fibre into the gas chromatograph injector for 20 min at 250 ºC. The SPME GC of each wine sample was acquired in ca. 5 ACS Paragon Plus Environment

Page 6 of 26

Page 7 of 26

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

45 min. The SPME-GC chromatogram of each sample was stored as ASCII text file (.csv format) and imported into software MestreNova 8.1 (Mestrelab Research). GC chromatograms of the full collection of wine samples were ordered by wine type and represented in stacked form for the subsequent chemometric analysis. Identification of the components was carried out by comparison of the Kovacs index and MS spectral fragments with literature information,19, 26-33 and databases (Wiley, Mainlib NIST, and Aromas). The results for an Albariño wine (sample: ALB-75), are reported in Table S2 and in Figure S2 as a representative example.

Chemometric data The signal area in a number of Regions Of Interest (ROIs) of the NMR spectra and GC chromatograms, also referred as “buckets”, was integrated using the SUM algorithm of MestreNova 8.1 software. The ROIs were selected by visual inspection of the stacked NMR spectra and stacked GC chromatograms of the wines, according to the following criteria: i) A ROI can contain either one or several peaks that should be common in at least one type of wine, which prevents the selection of spurious peaks, ii) A ROI should contain full peaks, so that the precise integration of the ROI area, without border effects, should be possible, iii) Signals identified as solvents in the NMR spectrum of the wine (i.e. water or ethanol) or peaks in the GC generated from the column are excluded, and iv) Small time misalignments observed for some peaks in the SPME-GC, were corrected before ROI selection. The reproducibility of the bucket areas was checked for a random selection of wines of all classes. In this test, the measurement of the spectrum and the ROI analysis were repeated five times for each sample. For a given wine the single ROI area variability (percentage) was calculated with respect to the mean ROI area obtained from the five measurements. Considering the complete set of ROIs and wines, a 12±9% mean value (on a per wine basis), was obtained for the variability of the ROI areas. Thus, 75 NMR-ROIs and 59 GC-ROIs were selected for each wine and the signals in the ROIs were integrated using the SUM algorithm. For the statistical analysis, each individual NMR-ROI area was first normalized by the total area of the 75 ROIs 6 ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

selected. In the case of the SPME-GC, the ROI area values were used without further normalization. All that data was finally integrated in a single ROI data table that represents the full collection of wines. Univariate, ANOVA test and Principal Component Analysis-Linear Discriminant Analysis (PCA-DA) calculations were applied to the ROI data table attending to a given classification criterion of the wines. The PCADA is a two step method that applies Linear Discriminant Analysis to the previously calculated PCA scores. It improves the performance of the standard LDA to distinguish between a number of classes (i.e. groups of wines) when the number of training samples of each class is limited. In each PCA-DA calculation that is reported the Wilk’s lambda test (λ) and p-value based on Rao’s approximation for α=0.05 was evaluated and discarded the null-hypothesis of samples with equal means. The significance of the principal components of PCA-DA reported was measured by the Bartlett's test for eigenvalue significance and the Eigenvalues, Barlett´s statistic and p-values incorporated to Tables S3-S9. The PCA-DA results were full cross-validated and the accuracy rates were represented as a confusion matrix, provided in the Supporting Information, which measures the differences between prior and posterior classification. Univariate statistical analysis and representation of Box-Plots was performed with the MUMA package included in the R statistical software (http://www.Rproject.org). Univariate ANOVA and multivariate PCA-DA calculations were carried out with XSLAT software (Addinsoft inc.). Score plots of the PCA-DA results were represented using the RGL package included in the R statistical software.

Results and discussion

Wine samples, representative ROIs and statistics A collection of 271 commercial white wines from Galician PDOs Rias Baixas, Monterrey, Ribeiro and Valdeorras, was analysed by 1H NMR and SPME-GC-MS and chemometric methods. According to the grape variety, the wines of the collection are distributed into the following groups (the acronyms and the number of samples of each group are shown in parenthesis):

7 ACS Paragon Plus Environment

Page 8 of 26

Page 9 of 26

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

a) Autochthonous monovarietal wines: Formed by monovarietal commercial wines of the following autochthonous Galician grapes: Albariño, (ALB, N=89); Godello, (GOD, N=43), and Treixadura, (TRE, N=19).

b) Non-autochthonous monovarietal wines: Formed by commercial monovarietal wines made from the non-autochthonous Galician cultivated grape Palomino, (PAL, N=9).

c) Autochthonous plurivarietal wines: Commercial plurivarietal wines produced from several of the aforementioned autochthonous Galician grapes (MULT, N=69).

d) Laboratory prepared plurivarietals: This group of samples was prepared by us and is formed by mixtures of monovarietals ALB, GOD, TRE and PAL wines, in all combinations (binary, tertiary and quaternary mixtures) where the minor component/s is at least a 25% of the total (MULT*, N=31). Two subgroups were considered here depending on the quantitative composition of those mixtures: subgroup MULT1* (N=21) where the Galician autochthonous grapes ALB, GOD, and/or TRE represent ≥ 75% of the total in the wine, and subgroup MULT2* (N=10) where the autochthonous grapes ALB, GOD, and/or TRE represents ≤ 50% of the total.

e) Non-Galician white wines: For comparative purposes, non-Galician monovarietal white wines produced from the well-known Verdejo (VER, N=7), and Viura, (VIU, N=4) grapes, from PDOs Rueda and Rioja respectively were included in this group. The selection of the ROIs is illustrated in Figure S3. Part a shows all the 1H NMR spectra stacked together according to the grape variety. A visual inspection allows to select the ROIs relevant for the characterization of the samples and to eliminate from the study the spectral regions not useful, such as noise or inconsistent peaks. The same process was carried out for the SPME-GC chromatograms (Fig. S3 part b), leading to a set of 75 NMR-ROIs and 59 GC-ROIs, that are representative for the different wines in the collection.

8 ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Once the ROIs have been determined for each wine, the area of the signals was integrated in the ROIs, so that each wine sample is now described by a ROI data vector formed by the combination of the 75 + 59 area-values measured with the two techniques. Univariate and multivariate PCA-DA statistical methods were applied to the NMR and SPME-GC ROI data. PCA-DA is a multivariable classification that combines some of the advantages of both PCA and DA. PCA is an unsupervised method aimed to maximize the variance of the variables (ROIs) and DA is a supervised method that incorporates some known information of the group type, and calculates coefficients for a linear combination of the measured ROIs that maximize the differences between the groups. Sections 3.2 to 3.6 show the results of the univariate and PCA-DA calculations to discriminate several of those groups of wines, using this combined NMR-SPME-GC data. The comparison between that approach and the classification based on the individual NMR or SPME-GC data is shown in Section 3.7. Finally, in Section 3.8 we present a protocol based on those results for the rapid and reliable identification of the grape or geographical zone in a wine.

Distinction between wines obtained from grapes autochthonous from Galicia (ALB, GOD, TRE) and those produced from non-autochthonous (PAL, VIU and VER). The wine samples considered in this study are constituted by two groups; one formed by wines (mono and plurivarietals) originated from the autochthonous Galician grapes ALB, GOD, TRE, MULT and MULT1* (N=241) and another group formed by mono and plurivarietals originated from the non autochthonous grapes PAL, VER, VIU and MULT2*, (N= 30). Univariate analysis applied to the NMR and SPME-GC ROIs of the two groups identified 8 ROIs with area-values (mean and distribution) significantly different for the two groups considered (Table 1). The robustness of the selected ROIs to distinguish the two groups was proven by univariate ANOVA test.

Insert Table 1 9 ACS Paragon Plus Environment

Page 10 of 26

Page 11 of 26

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Table 1. Selected ROIs of the NMR intervals (chemical shift, ppm) and SPME-GC intervals (retention time, min.) data that best distinguish wines produced from autochthonous Galician grapes (N=241) from wines from non autochthonous grapes or their minoritarian mixtures (N=30). Anova test shows these ROIs present significantly different mean values (Pr >F) < 0.01.

NMR (ppm)

Autocthonous Galician grapes SPME-GC (min.)

(3.110 - 3.151)↓ (4.130 - 4.190)↑

(14.0 - 14.1)↑ Hotrienol (1,5,7-Octatrien-3-ol, 3,7dimethyl)

(7.620 - 7.651)↑ (8.280 - 8.306)↑ ↑ ↓

The ROI concentration is significantly higher in the group of autochthonous Galician grapes. The ROI concentration is significantly smaller in the group of autochthonous Galician grapes.

As a complement to the univariate analysis, the multivariate PCA-DA method that is able to discern different intensity patterns occurring simultaneously in several ROIs was applied. Figure 1 presents the score-plot obtained in the PCA-DA calculation performed with the combined NMR and SPME-GC ROI data. Clearly, monovarietal wines (Galician and non Galician) of these two groups are notably well distinguished by PCA-DA, as each one appear well clustered and located on opposite sides of the scoreplot, along the axis of maximum variability (F1). On the contrary, wines of the types MULT1* and MULT2* which contain relevant fractions of autochthonous and non autochthonous grape derived products, tend to appear in the central region of the plot, much closer to the limit of separation of the two groups. Figure 1 shows also that the MULTI wines are also notably well distinguished according to the majoritarian or minoritarian proportion of autochthonous Galician grapes (MULTI1* and MULTI2*). There is only one sample in these subgroups that is misclassified; It is a MULTI1* sample, constituted by the ternary mixture ALB: TRE: 10 ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

PAL in the ratio 0.50:0.25:0.25, that appears located very close to the central region of the score plot of Figure 1, where the discrimination among the two groups is more likely to fail. Overall, PCA-DA provided an excellent discrimination between the two groups, 171 out of 171 samples for the monovarietals and 99 out of 100 the plurivarietal wines.

Insert Figure 1

Figure 1.- PCA-DA score plot of the combined NMR and SPME-GC ROI data, for the distinction of two groups in 271 samples represented along the axis of maximum variability. Red bars correspond to wines made from autochthonous Galician grapes (N=241). Blue bars correspond to wines from non-autochthonous grapes (N=30). In the plot, axis F1 accounts for 100% of the samples variability (pF) < 0.05.

ALB

GOD

TRE

PAL

NMR (ppm)

NMR (ppm)

NMR (ppm)

NMR (ppm)

(7.655 - 7.718)=,a (7.850 - 7.892)↑

(7.522 - 7.570)↓ (8.395 - 8.462)↓

(3.110 - 3.151)↑ (3.467 - 3.513)↓ (5.267 - 5.317)↓

(2.678 - 2.877)↑ (3.199 - 3.269)↑ (6.900 - 6.952)↑ p-coumaric acid

(7.115 - 7.149)↑

(7.391 - 7.438)↓

Tyrosine

(7.471 - 7.513)↑ (7.620 - 7.651) ↑,b (7.720 - 7.756)↑

(7.620 - 7.651)↓,a (7.655 - 7.718)↓,b SPME-GC (min.) (1.8 - 1.9)↓ 1-Propanol



(1.9 - 2.0)↓

The ROI concentration is significantly higher in the type of wine respect to the other three.

12 ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

=

The ROI concentration is intermediate in the type of wine respect to the other three. The ROI concentration is significantly smaller in the type of wine respect to the other three. a,b ROI simultaneously selective for two types of wine. ↓

Application of the multivariate PCA-DA analysis to the NMR and SPME-GC ROIs are represented, along the three axes of maximum variability, in the score plot of Fig. 2a. In this system of coordinates, the points representing samples of the same group, appear clustered together in compact areas, around a centroid, and represented in Figure 2 as ellipsoids. The size of the ellipsoids is defined by giving to their three axes, F1, F2 and F3, values representing one standard deviation in the variability of the samples with respect to the centroid.

Insert Figure 2

Figure 2.- PCA-DA score plot along the three axes of maximum variability F1, F2 and F3 of the combined NMR and SPME-GC ROI data for selected groups of wine samples. a) four groups: ALB, GOD, TRE and PAL, b) five groups: ALB, GOD, TRE, PAL, and the combined group of MULT and MULT*.

Axis F1, F2 and F3 are those who

represent the greatest variability of the samples, and their dimensions correspond to one

13 ACS Paragon Plus Environment

Page 14 of 26

Page 15 of 26

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

standard deviation along each F1, F2 and F3 axis. Overall, the three axis account for the following percentages of variability a) 100% and b) 89.7%.

The results show that the ellipsoids corresponding to the four types of wine considered in this study are well separated one from another. The three monovarietals from autochthonous grape ALB, GOD and TRE are closely distributed in the score plot of Fig. 2a, while the ellipsoid from PAL wines is considerably isolated from the rest. The PCA-DA cross-validation of this data set of 160 samples correctly identified the group of every sample, thus, proving the excellent discrimination of these four monovarietal wines.

Distinction of mono and plurivarietal wines: Types ALB, GOD, TRE, PAL, MULT and MULT*. At first, an attempt was made to distinguish among the four Galician monovarietal wines (ALB, GOD, TRE, PAL) and another group formed by the combination of commercial and prepared Galician plurivarietals (MULT and MULT*), using a collection of 251 wines. The univariate analysis applied to the corresponding NMR and SPME-GC ROIs was unable to identify any ROI with sufficiently different area-values (mean and distribution) to distinguish among these five groups, obviously due to the complexity of the samples that require the use of multivariate methods. The results of the multivariate PCA-DA calculation of the NMR ROI and SPME-GC ROI data for the five groups of wines are represented along the three axes of maximum variability in the score plot of Fig. 2b. The points representing the wine samples appear well clustered in compact areas around the centroids corresponding to each type of wine. There is a partial overlap between the ellipsoids of GOD and TRE while those due to ALB, PAL and MULT are clearly separated from the rest. The PCA-DA crossvalidation analysis indicated no error for the classification into the three types ALB, PAL and MULT; however, there were some errors in the identification of wines of the types GOD and TRE that could be inter-exchanged.

14 ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The comparison of the results shown in Fig. 2a and 2b indicate that the inclusion of the commercial plurivarietal wines MULT in the PCA-DA calculation is the reason for the partial overlap of wines GOD and TRE seen in Fig. 2b. In this sense, it should be noted that the exact grape composition of many of the wines in the group MULT is not specified in the label limiting very much our possibilities to check the reliability of the PCA-DA work on these samples. As a solution, we repeated the study substituting the plurivarietals MULT wines of uncertain composition, by the MULT* group of wines that had been prepared as binary, ternary and quaternary mixtures of monovarietals ALB, GOD, TRE and PAL with known composition. In this way, a collection of 191 wines (types ALB, GOD, TRE and PAL and MULT*) were analysed by PCA-DA and the score plot of this calculation is shown Fig. 3a. The samples appear very well clustered according the type of wine in well separated compact areas. The monovarietal Galician wines of types ALB, GOD and TRE and the plurivarietals MULT* are relatively close in the score-plot of Fig. 3a, while group PAL is considerably further and isolated from them. The PCA-DA cross-validation on this set of 191 wines, indicated that the grape composition of every one sample of this set, is correctly identified. Thus, PCA-DA provided an excellent discrimination among these five groups of wine. Insert Figure 3

15 ACS Paragon Plus Environment

Page 16 of 26

Page 17 of 26

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Figure 3.- PCA-DA score plot along the three axes of maximum variability F1, F2 and F3 of the combined NMR and SPME-GC ROI data for selected groups of wine samples. a) five groups: ALB, GOD, TRE, PAL, MULT*, b) six groups: ALB, GOD, TRE, PAL, MULT, MULT* and c) eight groups: ALB, GOD, TRE, PAL, MULT, MULT*, VER, VIU. In each plot, the ellipsoids are centered at the mean coordinate value. Axis F1, F2 and F3 are those who represent the greatest variability of the samples, and their dimensions correspond to one standard deviation along each F1, F2 and F3 axis. Overall, the three axis account for the following percentages of variability a) 86.8%, b) 85.4% and c) 77.2 %.

Distinction of wines from groups ALB, GOD, TRE, PAL, MULT*, MULT, VIU and VER. A PCA-DA calculation was carried out for a collection of 261 wines of the ALB, GOD, TRE, MULT* and MULT groups. The score plot in Fig. 3b shows the samples of each type of wine well clustered together. There is a partial overlap between the ellipsoids of types GOD and TRE, while the ellipsoids of the other four groups are well separated. In the distribution of Fig. 3b, the ellipsoids of the three monovarietal autochthonous Galician grapes ALB, GOD and TRE and groups MULT* and MULT are relatively close to each other compared to the type PAL which is notably isolated from the rest. As a complement, we repeated the PCA-DA calculation after incorporation to the set of a few VIU and VER wines. The results on the complete set of 271 wines of types ALB, GOD, TRE, PAL, MULT*, MULT, VIU and VER, is given in Fig. 3c and shows the samples of each type well clustered around its centroid. There is a partial overlap between the ellipsoids of types GOD and TRE and between the ellipsoids of types ALB and MULT*. Similarly in Fig. 3c, the ellipsoids of the three autochthonous monovarietals ALB, GOD and TRE and their mixtures MULT* and MULT are relatively close to each other, as compared to those of the non-autochthonous VIU, VER and PAL grapes. The confusion matrix of the PCA-DA calculation (see Table S8) showed a better than 79% performance for the identification of six of the eight wines considered, PAL and VIU being the two discrepancies.

16 ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 26

Overall, PCA-DA was in this case not as efficient as in the previous examples, probably due to the higher complexity of the collection involved.

Distinction of ALB wines according to their geographical origin (subzones Rosal, Ribeiro, Tea, Salnés and Ulla). Monovarietal wines made from Albariño grapes ALB, are produced in five different but very close geographical zones of Galicia that are associated to variations in taste and aroma details. The quality, production and export value of these wines justifies the interest of a method for their rapid classification by geographical subzone. Univariate analysis applied to the ROIs of 89 ALB wines determined four ROIs with significantly different area-values (mean value and distribution) that distinguish some of the five subzones considered (Table 3 and Fig. S9). The robustness of the selected ROIs to distinguish the subzones was proven by univariate ANOVA test. Figure S10 shows as an example, the MS fragments obtained for some peaks in the SPME-GC-ROIs of Table 3, and the identification of components by comparison with databases. Insert Table 3

Table 3. Selected ROIs of the NMR (chemical shift, ppm) and SPME-GC (retention time, min.) data that best distinguish the geographical subzone in 89 wines ALB. Subzone TEA (N=20), ROSAL (N=15), SALNES (N=47), RIBEIRO (N=4) and ULLA (N=3). Anova test show that these ROIs have significantly different mean values (Pr >F) < 0.05. No regions were found that satisfy this criterion for sub-zones Ulla and Ribeiro.

Tea

Rosal

Salnes

NMR (ppm)

NMR (ppm)

SPME-GC (min.)

(2.914 - 2.940)↑

(3.110 - 3.151)↓

(26.9 - 27.3)↓ Dodecanoic acid, ethyl ester

(35.4 - 35.7)↓ Hexadecanoic acid, ethyl ester

17 ACS Paragon Plus Environment

Ulla

Ribeiro

No ROIs found

No ROIs found

Page 19 of 26

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

↑ ↓

The ROI concentration is significantly higher in this sub-region respect to the other four. The ROI concentration is significantly smaller in this sub-region respect to the other four.

The multivariate PCA-DA analysis of the NMR and SPME-GC ROI data of the 89 ALB wines of the five subzones is represented in the histogram-score plot of Fig. 4 along the single axis of maximum variability (F1). Overall, the confusion matrix for the cross-validation (Table S9) resulted in only a ~43% of correct subzone assignments. This means that contrary to grape classification, the current NMR and SPME-GC data is not fully satisfactory for a total and reliable subzone assignment of albariño wines, being useful for a partial identification only. It is important to mention here that the use of stable isotope (SNIF-NMR, 18O, 13C) data has been recently proven12 to be particularly effective for prediction of the geographical origin of wines. This could perhaps become the method of choice to solve this problem in the future. Insert Figure 4

18 ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 4.- PCA-DA score plot histogram for the distinction of the five geographical subzones of 89 ALB wines, represented along the axis of maximum variability of the combined NMR and SPME-GC ROI data (λ