Analysis of Volatile Compounds by Advanced Analytical Techniques

Mar 17, 2017 - is data analysis. It is done once metabolites are robustly identified and quantified. A previous peak identification can be performed t...
1 downloads 16 Views 2MB Size
Review pubs.acs.org/CR

Analysis of Volatile Compounds by Advanced Analytical Techniques and Multivariate Chemometrics Giuseppe Lubes† and Mohammad Goodarzi*,‡ †

Laboratorio de Química en Solución. Universidad Simón Bolívar (USB), Apartado 89000, Caracas 1080 A, Venezuela Department of Biochemistry, University of Texas Southwestern Medical Center, Dallas, Texas 75390, United States



ABSTRACT: Smelling is one of the five senses, which plays an important role in our everyday lives. Volatile compounds are, for example, characteristics of food where some of them can be perceivable by humans because of their aroma. They have a great influence on the decision making of consumers when they choose to use a product or not. In the case where a product has an offensive and strong aroma, many consumers might not appreciate it. On the contrary, soft and fresh natural aromas definitely increase the acceptance of a given product. These properties can drastically influence the economy; thus, it has been of great importance to manufacturers that the aroma of their food product is characterized by analytical means to provide a basis for further optimization processes. A lot of research has been devoted to this domain in order to link the quality of, e.g., a food to its aroma. By knowing the aromatic profile of a food, one can understand the nature of a given product leading to developing new products, which are more acceptable by consumers. There are two ways to analyze volatiles: one is to use human senses and/or sensory instruments, and the other is based on advanced analytical techniques. This work focuses on the latter. Although requirements are simple, low-cost technology is an attractive research target in this domain; most of the data are generated with very highresolution analytical instruments. Such data gathered based on different analytical instruments normally have broad, overlapping sensitivity profiles and require substantial data analysis. In this review, we have addressed not only the question of the application of chemometrics for aroma analysis but also of the use of different analytical instruments in this field, highlighting the research needed for future focus.

CONTENTS 1. Introduction 2. Why Volatile Metabolomics? 2.1. Application in Industry, Biotechnology, Food Safety/Quality, and Human Health 3. Instrumental Technique for Analysis of Volatiles: Gas Chromatography 3.1. Basic Concepts of GC 3.1.1. Sample Introduction 3.1.2. Columns 3.1.3. Detectors 3.2. Spectral Basis for Volatile Measurement 3.3. Analysis of Complex Mixtures: Revealing What Is Hidden 3.4. Comprehensive GC × GC 3.4.1. Modulation of the Effluent 3.4.2. Detection System in GCxGC 3.5. Mass Spectral Databases Used in GC-MS Volatile Metabolomics 3.5.1. NIST Library 3.5.2. Golm Metabolome Database (GMD) 3.5.3. Fiehn Library 3.5.4. Wiley Library 4. Volatile Analysis through Proton Transfer Reaction Mass Spectrometry (PTR-MS) 4.1. Basic Concepts of PTR-MS © 2017 American Chemical Society

4.2. Spectral Basis for Volatile Measurement 4.3. VOCs Quantification in PTR-MS 4.4. Applications 5. Sample Preparation 5.1. Gas Chromatography 5.1.1. Sample Extraction 5.1.2. Derivatization Step in VOCs Analysis 5.2. Sample Preparation in PTR-MS 6. Processing of the Data 6.1. Preprocessing of the Data 6.1.1. Data Preprocessing Techniques 6.1.2. Baseline Correction 6.1.3. Noise Reduction 6.1.4. Normalization of the Data 6.1.5. Retention Time Alignment 6.2. Statistical Modeling and Multivariate Data Analysis 6.2.1. Unsupervised Methods of Pattern Recognition 6.2.2. Supervised Pattern Recognition Techniques 7. Conclusion and Perspective Author Information

6400 6401 6402 6403 6403 6403 6403 6405 6406 6406 6407 6408 6408 6408 6408 6408 6409 6409 6409 6409

6409 6410 6410 6410 6410 6410 6412 6413 6413 6413 6413 6413 6413 6413 6414 6415 6415 6415 6417 6417

Received: October 11, 2016 Published: March 17, 2017 6399

DOI: 10.1021/acs.chemrev.6b00698 Chem. Rev. 2017, 117, 6399−6422

Chemical Reviews

Review

Figure 1. Typical workflow in VOCs analysis.

Corresponding Author ORCID Notes Biographies References

been a lot of progress in this area, there are always new mechanisms to discover, like, for example, the recently new proposed pathway for the biosynthesis of monoterpenes in roses.13 Normally, in undamaged plants, there is a base-level scent which could be denominated “standard level of volatiles”, and compounds like monoterpenes, sesquiterpenes, and many other aromatic compounds are released from specific storage sites or specific glands.3 Some of those VOCs present in fruits and vegetables, either fresh or processed, are associated with aroma and flavor in concentrations that can be perceived by the human nose. These compounds serve as a measure of the quality and of the nutrient availability of these products prior to consumption.1 To date, more than 700 compounds have been reported as aroma or flavor in fruits and vegetables.14 The complex blend of these volatiles gives characteristic attributes to the whole plant, flower, or fruit.15 The main analytical technique applied for the investigation of VOCs as well as aroma/flavor in fruits and in vegetables is gas chromatography (GC) coupled to mass spectrometry (MS).16 The isolation of all metabolites from an organism of interest followed by their identification and quantification combined with chemometrics analysis is denominated metabolomics.17 The applications of these techniques in volatiles lead the research in food control and quality, for example, it could be used to determine adulteration of food, determination of different origins, different applied treatments, etc.18−20 On the basis of Alonso et al. (2015),21 Figure 1 represents a typical workflow for the untargeted metabolomicsbased study of VOCs in plants or related organisms. The first step in this workflow is to choose the material of interest for the study at the right moment for analysis, for example, if the main interest is to know the aromatic composition of fruits after conservation, it would depend on the type of fruit, cultivator, and of course the conservation system. Successively, the second step, sample preparation, is one of the most important. If it is well done it saves time and money for analysis. Sample preparation must be efficient (not destroying metabolites while extracting all of them, etc.) and preferably not time consuming. Later we will discuss more deeply the techniques used in VOC analysis. The third step, technical analysis, is preferably done through GC, which provides separation of the volatile mixture and introduces the analyte to the mass analyzer. On the basis of that analysis, spectral data will be obtained. After obtaining the chromatographic and spectral data, it is important to preprocess the data with the aim of (I) determining the presence of false peaks, e.g., artifacts from the chromatographic columns or from the SPME fiber (noise

6417 6417 6417 6417 6417

1. INTRODUCTION Volatile organic compounds (VOCs), provenient from biological systems, are a series of molecules from different classes originating from primary and secondary metabolites.1 Generally, they are lipophilic substances with low molecular weight, low boiling point, and high vapor pressure in natural conditions.2 Most of these volatile metabolites allow the communication of, for example, plants with their surrounding environment. These chemical signals are nowadays defined as semiochemicals and can be divided in two classes: those that act as intermediate in the communication of individuals from the same species (intraspecific) and those that act as intermediate in the communication between different species (interspecific).3,4 By releasing volatiles, plants can defend themselves from the attack of pathogenics organisms or herbivores, intoxicating, repealing, or even attracting their natural enemies. For example, the parasitic wasp, Cardiochiles nigriceps, is able to distinguish between plants, e.g., cotton or tobacco, infected with Heliothis virescens (their host) and between plants infected with Helicoverpa zea (nonhost) according to the type of volatile compounds emitted by the plant.3,5,6 Usually the composition and/or intensity of those “defensive” volatile depends on the type of damage and is induced by the same herbivore species as a result of the contact of their saliva or oral secretion to the leaf or cortex of the plant.7 The emission of volatile compounds not only has a defensive function but also takes part in the reproduction of plants by attracting pollinators or seeds disseminators.8−10 New volatiles are identified daily in plants: up to now more than 1700 plant volatiles were identified in more than 90 plant families.11 They belong to different chemical classes, and the nature of these compounds depends on different biosynthetic pathways. For example; sesquiterpenes are biosynthesized in plants via the isopentyl pyrophosphate (IPP) intermediate, while an alternative IPP pathway synthesizes monoterpenes and diterpenes.12 On the other hand, the autolytic oxidative breakdown of lipids, such as oleic, linoleic, and linolenic acids, generates a series of saturated and unsaturated six-carbon alcohols and aldehydes that provides the characteristic green leaf odor in plants.3 Although there has 6400

DOI: 10.1021/acs.chemrev.6b00698 Chem. Rev. 2017, 117, 6399−6422

Chemical Reviews

Review

Figure 2. Number of publication found in the Web of Science in a period of every 5 years separated by topic of research.

chemical fingerprint in a complex system without necessarily identifying or quantifying specific compounds.25,26 The implementation of GC-MS in plant research was done at the beginning of the 1980s for profile analysis of volatile metabolites in plants.27 Since then many studies have shown that the emission of VOCs in plants is species specific, reflecting not only environmental suitability but also the quality or healthy conditions of a plant.28 The interest in the study of plant’s “volatilome” is because of its simplicity and the huge impact that it could cause. Modifying the metabolism of an organism by biotechnological techniques is often utilized for enhancing the production of metabolites which can be directly related to human health or plant growing. For example, the combination of metabolomics and transcriptomics techniques on the analysis of volatile terpenoids emitted from tomato plants (Solanum lycopersicum) infected with Tetranychus urticae demonstrated how plants could adapt to defend themselves from the attack of herbivores.29 This kind of study could be successfully applied in ecology and/or in chemical ecology knowing the aromatic profile of a plant or fruit is also possible to understand the preference of an eventual herbivore for that plant, researchers could play with the reduction of the blend either by modificating existing pathways or with the introduction of a particular gene or genes involved in VOCs synthesis or even mechanically installing filters with absorbents and testing at the same time the behavioral response of that specific insect.10,30 However, due the complexity of the regulatory system in controlling plant metabolism, manipulation of the genetic expression does not necessarily bring the expected results, as, for example, the overexpression of foreign S-linalool synthase in transgenic petunia led to accumulation of S-linalyl-β-D-glucoside instead of accumulation of free linalool.31 Metabolomics, in this context, can provide more precise information on the organism metabolism playing a key role in the field of molecular biotechnology. In combination with other “omics” sciences like genomics, transcriptomics, and/ or proteomics, it is a powerful tool for interpretation and understanding of many complex biological processes, which is nowadays recognized as a cornerstone of systems biology.32

filtering), (II) determining the presence of coeluting peaks, through deconvolution algorithms, (III) aligning chromatographic peaks through retention time algorithms, and then (IV) normalizing the quantitative data. The fifth step in the workflow is data analysis. It is done once metabolites are robustly identified and quantified. A previous peak identification can be performed through compounds libraries, which are already included in our instrument software or created through the injection of pure standards. Afterward, chemometric techniques (either univariate or multivariate calibrations) are applied in order to unravel patterns from the data that helps in the biological interpretation with models, pathways, etc. (step VI). The number of papers published about VOCs has been increasing over the last 15 years. According to Web of Science22 the number of publications with the topic “volatile GC” (Figure 2) increased by 26% from 2005 to 2010 (from 3065 to 4173) and 31% from 2011 to 2016 (from 4173 to 6076). This notable interest is also associated with the number of new sampling techniques combined with more sensitive or specialized detectors in GC as well as to increasing interest in food quality control. In this review, we describe and discuss the potential and different possibilities of using analytical techniques combined with chemometric tools for VOCs analysis in food samples.

2. WHY VOLATILE METABOLOMICS? As discussed above, there has been increasing interest in the study of the small-chemical molecules present in cells, tissues, or complete organisms that constitute the metabolome because it gives a rapid and reliable measure of the biochemical activity of the object under study. For instance, Fiehn et al. (2000)23 assured that the qualitative or quantitative study of intracellular metabolites can be used to monitor and assess gene function. In general, metabolomics is the field of study that focuses on small metabolites, either volatiles or not, in a complex system and is categorized as a targeted or untargeted analysis. Targeted metabolomics is focused on the analysis of defined groups of chemical compounds which are already characterized and classified as biochemical metabolites.24,25 On the other hand, untargeted metabolomics is a more holistic approach; it is a comprehensive analysis of metabolites focused on revealing the 6401

DOI: 10.1021/acs.chemrev.6b00698 Chem. Rev. 2017, 117, 6399−6422

Chemical Reviews

Review

compounds is sometimes related to finding molecules which have similar odor properties to a given target aroma in an easier and cheaper way than their natural extraction. For instance, the production cost of naturally extracted jasmine is around $4500− 7000 per kilogram; on the other hand, the “nature-identical” products, jasmone and methyl jasmine, have a production cost of around $450−700 per kilogram. Interestingly, an even simpler version, the cyclopentanone derivatives, have a production cost of around $15−70 per kilogram.35 However, in this century we are moving into a growing trend of organic-green food consumption. In the United States an estimation of organic sales in the past 2 decades (since 1990) has increased around 20%, reaching $13.8 billion in 2005. The reason for consumer choice varies;37 the main reason is the avoidance of pesticides (70%), followed by freshness (68%), and finally for health and nutrition benefits (67%). The flavor and cosmetic industries cannot ignore this trend and therefore are focusing on changing from chemical synthesis into a more natural production. Most of the flavors used in these industries are coming from essential oils extracted by hydrodistillation from plants. Through VOCs metabolomics the gene function in the biosynthesis of specialized metabolites can be elucidated by modification of the biosynthetic pathways with introduction of a particular gene or genes. This might increase the production of a target metabolite with aroma−flavor application.10,38 This strategy has been successfully applied in diverse plants comparing different varieties/cultivars and different mutant lines by different physiological conditions such as age, stages, light, and even in front of herbivore infestation.38 One good example is the recent identification of the enzymes O-methyltransferases (CTOMT1) in tomato (Solanum lycopersicum) involved in the synthesis of guaiacol, an important precursor to various commercial flavorants such as eugenol and vanillin.39 In the agricultural field, volatile metabolomics is playing an important role. Through multivariate data analysis of the VOCs composition, we can understand the differences present in the metabolite compositions between a tasty (or more aromatic) and a nontasty (or less aromatic) sample, e.g., VOCs profile used in combination with an unsupervised method like hierarchical clustering analysis (HCA) and principal component analysis (PCA) allows discrimination or differentiation of fruit cultivars like mandarins, apples, and tomatoes and even between different citrus cultivars.40−43 Application of the information obtained through metabolite profiling to quantitative genetics provides a strong resource to study the influence of the genetic regulation of quantitative traits,32 e.g., the quantitative trait loci (QTL) has been used to map the aromatic composition of apples aiming at finding valuable molecular markers to be employed in markerassisted breeding programmes (MAB) for selection of aromatic cultivars according to consumer’s acceptances.44,45 This improvement of crop properties could be achieved through two different strategies: Traditional breeding and genetic modification (GM).32 Nevertheless, crops such as barley, maize, pea, rice, soybean, and wheat and a growing number of products can be GM organisms that are still forbidden in many countries. In this case, the volatile metabolome can also be used in food safety. For example, volatile analysis of potatoes through untargeted discriminative metabolomics was able to differentiate GM potatoes from non-GM ones.46 Metabolomics has not only been used in breeding programs. In the area of food quality, analysis of targeted volatile compounds has shown great potential to assess preharvest issues such as identifying cultivars infected by fungus or other

2.1. Application in Industry, Biotechnology, Food Safety/Quality, and Human Health

As we have seen above, VOCs analysis in combination with multivariate data analysis is becoming more popular not only among the scientific community but also in industry. The noninvasive nature and close link to the phenotype is becoming important for pharmaceutical, agricultural, health, and food industries, etc. According to a recent economic report from a global market research consulting company,33 the actual value of the global metabolomics market is around $565 million, which is also expected to grow at a compound annual growth rate (CAGR) of 30% until $2100 million by 2019. The two biggest markets are located in North America and Europe (∼42% and ∼38% of the global market, respectively) with an expected growth of ∼23% in Asia. The main industries investing in these technologies are the food and flavor industries as well as the pharmaceutical and healthcare industries. It is known that since ancient times, human beings have been characterized by the use of aromatic herbs and essences for both cosmetic and medicinal purposes. Nevertheless, it was not until the late 19th century that the flavor and perfumery industry suffered a revolution and became the industry as we know today.34 This development was partially the result of advances in organic and analytical chemistry. Analytical chemistry helped to reveal the main scent or component of some important botanical materials, while organic chemistry played an important role in synthesizing these components in large quantities and in a profitable way. Since then the food and beverage, fragrance, cosmetics, house hold products, and toiletries industries, etc., have grown exponentially into this business niche (Figure 3)35 and where the global consumption of flavor and fragrances is expected to grow in ∼4% by 2018 (Figure 3, inset).36

Figure 3. Estimated flavor and fragrances sales by year.35,36 Growth of the total consumption of flavor and fragrances by 2018 (inset).36

This business is nowadays a billionaire market, where only the top five flavor companies have an estimated market value over $15 billion. Global corporations like Givaudan, Firmenich, International Flavors and Fragrance (IFF), Symrise, and Takasago (with an approximate revenue of $4.6, $3.3, $3.02, $2.8, and $1.1 billion, respectively)34,36 are mainly responsible for the discovery of new ingredients, flavors, and fragrances. These new ingredients are obtained in two ways: (1) from synthetizing and (2) extracting from nature. The synthesis of 6402

DOI: 10.1021/acs.chemrev.6b00698 Chem. Rev. 2017, 117, 6399−6422

Chemical Reviews

Review

diseases47,48 as well as in the determination of the best postharvest treatment in different cultivars.49 These targeted approaches have proven to be effective in food regulation by establishing a baseline for regional and varietal profiles possible for determination of the origin of food or authenticity as shown in studies performed in olive oil, tomatoes, or honey.50−52 Even though the major focus of this review is plants and food volatile metabolites, we cannot underestimate the impact of this area in human health and nutrition. One of the most promising and challenging applications in analysis of VOC is the discovery of possible biomarkers for detection of cancer. Abaffy et al. (2013)53 compared volatile metabolomic signatures of melanoma and non-neoplastic skin from the same patient and found an increasing in the levels of lauric acid and palmitic acid in melanoma as a consequence of upregulated de novo lipid synthesis, characteristic of cancer. Robroeks et al. (2010),54 on the other hand, by using only 22 of 1099 VOCs from exhaled breath of a patient with cystic fibrosis (CF), obtained 100% correct identification of patients not only with this disease but also between CF patients with or without Pseudomonas colonization. As we have seen, the use of VOC profiles as biomarkers is getting more and more important for determination of certain infectious diseases in humans.55 In Table 1 can be seen some applications of VOCs metabolomics applied in different matrixes and chemometrics approaches. For these and many other possible not yet discovered applications we want to remark on the importance of metabolomics in VOCs analysis.

suffer discrimination. On the other hand, the splitless mode is ideal for trace analysis (down to 0.5 ppm with a FID). In this mode, the split/splitless valve is closed by the time set by the operator, but it is usually for 10−40 s (splitless time), allowing the major part of the sample to be transferred into the chromatographic column. Afterward, the split line will automatically open and the liner will be flushed with the carrier gas. Due to the fact that the sample is being introduced onto the column during the splitless time, broadening of the peak could result without the reconcentration of the sample in the column. Generally, to overcome this issue, the oven is set at least at 20 °C below the boiling point of the solvent promoting the so-called solvent effect. By doing so, lighter components will recondense in the column with the solvent and will evaporate when the column reaches the vaporizing temperature. This mode is also used alongside solid-phase microextraction (SPME), which cannot extract large quantities of samples. Unfortunately, a few factors such as oven temperature, solvent, and splitless time bring a disadvantage to the splitless mode in comparison to the split mode. Moreover, thermal degradation is generally more pronounced in splitless than in split injection mode because the sample spends a longer time in the injection system. 3.1.1.2. Programmed Temperature Vaporizer (PTV). The programmed temperature vaporizer was developed as an alternative to the split/splitless injection mode. In this mode the sample is injected onto a cool PTV system (generally set at 40−60 °C) in order to preconcentrate the injected sample. Afterward, the inlet is rapidly heated to approximately 300 °C, transferring the sample to the column. The PTV is preferred over the conventional S/SL system in, for example, the analysis of thermolabile compounds, because low temperature is used in PTV which prevents degradation of such molecules.77 Moreover, the PTV injector has a higher volume capacity (∼250 μL), which allows loading a larger quantity of sample. This is surely ideal for trace compounds analysis. In addition, newer PTV systems can also operate in S/SL mode, which make it more suitable for analysis with SPME. Additionally, in a conventional system, when the SPME fiber passes through the septum of the inlet, it leaves material that increases the chromatographic background; this problem is avoided with the use of a septumless head (SLH) on the PTV. 3.1.2. Columns. We can define columns to be the heart of the chromatography system because not only the separation of mixtures occurs in them but they also permit eventual identification or quantification of samples once they reach the detection system. The most used columns for analysis of volatile compounds are made with a nonpolar stationary phase like polydimethylsiloxane. Commercial columns tend to denominate nonpolar stationary phases with the number 1 at the end, e.g., OV-1, DB-1, and HP-1, or columns with 5% of phenyl groups, e.g., DB-5, HP-5, Mega 5, etc. Moderately polar columns, for instance, polyethylene glycols, have commercials names such as DB-Wax, HP-Wax, CW-20, etc.78 The length of columns can vary, but generally the most selected columns for analysis of volatiles are capillary columns of 25−30 m which provide good resolution and reasonable time of analysis. Although the shortest columns of 15−20 m length can provide faster analysis, peak resolution could not be as good as with a 25−30 m column. In some cases, when analysis of very complex matrices is required, it is better to use longer columns of 60−100 m. Note that such a selection increases the time of analysis, but a better peak resolution will be obtained. As a recommendation, running a couple of blanks or setting the column at 20 °C above the final

3. INSTRUMENTAL TECHNIQUE FOR ANALYSIS OF VOLATILES: GAS CHROMATOGRAPHY Gas chromatography as a very powerful advanced separation technique was first described in 1952,76 and since then it has been used exponentially in many areas ranging from food chemistry, forensics, metabolomics, environmental studies, etc. It became an essential/principal instrument in research laboratories where the focus is on metabolite identification. In the following sections, we briefly describe some basic concepts of GC focusing on new technologies. 3.1. Basic Concepts of GC

Gas chromatography separates molecules according to their volatility and affinity with a temperate column situated in an oven, which can warm up or cool down the column in a few minutes. However, before it happens the liquid sample must be vaporized within the injector system. The most used mode of injection in gas chromatography is split/splitless because of its simple mechanism and because it can cover most of the analysis requirements; an alternative in the market to this kind of injector is the programmed temperature vaporizer (PTV)77 interface which provides similar features but higher sample capacity. 3.1.1. Sample Introduction. 3.1.1.1. Split/Splitless (S/SL) Injectors. The split mode is used aiming to prevent overloading in a used column. Overloading occurs when large volumes of samples are injected, and consequently, the active sites of the column are occupied affecting the separation and peak shape. This injection mode consists of a sample splitter which allows one part of the sample to go to the column while the rest goes to waste. It is controlled by the operating system software and can be set to an optimized value after calibrating through trial injections. Normally, the injection is performed using a glass syringe with typically used split ratios between 1:5 and 1:500, and it depends on the concentration of the sample. The main disadvantage is that less abundant analytes in a sample could 6403

DOI: 10.1021/acs.chemrev.6b00698 Chem. Rev. 2017, 117, 6399−6422

bay leaves

Capsicum spp.

persimmon fruit human breath

essential oils

sesame and peanut oils

Pu-Erh green tea

balsamic vinegars seeds of Nigella

tobacco leaf

human breath

essential oils

olive oils

HS-SPME GC × GC-TOFMS

HS-SPME GC-MS

GC-MS

HS-GC GC-TOF/MS

HS-SPME GC-MS

HS-SPME GC-MS

6404

GC-MS

HS-SPME GC-TOF/MS

GC-MS

PTR-MS

GC-MS

HS-SPME GC-MS

SPME GC-MS

HDGC-MS and GC-FID

matrix

pineapple (Ananas comosus (L.) Merr.) fruit rice bran

HS-SPME GC × GC-qMS

techniques

identification of tentative marker compounds and their contribution to differentiate postharvest maturity stages of greenripe sea-freighted pineapples and fully ripe air-freighted fruits qualitative and quantitative analysis of the flavor components of rice bran identification of 71 constituents of essential oil (EO) of bay leaves increased to 131 constituents with the help of different chemometric resolution techniques identification and quantification of 184 volatiles of different chemical classes: alkanes (26), alcohols (20), aldehydes (17), ketones (8), 160 esters (68), ethers (3), terpenes (40), pyrazine (1), and sulfur compound (1) untargeted approach for the identification of volatile compounds responsible for loss of astringency in persimmon fruit identification of VOCs in breath samples of patients with H. pylori identification of Eugenol as a biomarker attributing to the good antibacterial activity identified 30 common VOCs including aldehydes, alcohols, ketones, acids, esters, and hydrocarbons that play a key in the flavor total of 77 volatile compounds were identified in 18 green teas, including linalool, linalool oxides, phytol, caffeine, geraniol, and dihydroactinidiolide classification of balsamic vinegars of Modena of different maturation and aging profiling of primary metabolites in Nigella seed extracts where fatty acids were the major annotated metabolites followed by sugars and amino acids 20 metabolites, including primary metabolites (sucrose, D-fructose, D-mannose, D-glucose, inositol, maleic acid, citric acid, malic acid, L-threonic acid, L-proline, L-phenylalanine) and secondary metabolites (chlorogenic acid, α- and β-4,8,13duvatriene-1,3-diol, nicotine, quinic acid) that contributed to the discrimination were screened identification of 55 metabolites in the human breath using GC-TOF/MS where isopropyl alcohol was an indicator into the group of persons with lung cancer identification of 21 volatile compounds in Sch.t.Briq, in which 5 of them including pulegone, schizonal, methone, cis-pulegone oxide, and 2-hydroxy-2-isopropenyl-5-methylcyclohexanone were selected as marker compounds three different PLS-DA models were fitted to the data to classify samples into “country”, “region”, and “district” of origin; correct classification rates were assessed by cross-validation; the first fitted model produced an 86% success rate in classifying the samples into their country of origin; the second model, which was fitted to the Italian oils only, also demonstrated satisfactory results, with 74% of samples successfully classified into region of origin; the third model, classifying the Italian samples into district of origin, yielded a success rate of only 52%

objetive reached

chemometrics

partial least-squares discriminant analysis (PLS-DA) was carried out using the PLS Toolbox v.4.0 for Matlab

PCA and HCA were performed using the singular value decomposition method by the Multivariate Statistical Package program (MVSP, Kovach Computing Service, Anglesey, Wales)

70

69

68

67

HCA was performed with SPSS (SPSS 17.0, SPSS Inc., USA); orthogonal partial least-squares discriminate analysis (OPLS-DA) was performed with SIMCA-P 13.0 (Umetrics, Umeå, Sweden)

chemometric calculations (dimensionality reduction discriminant analysis (DFA) and factor analysis (FA)) were performed in Statistica 7.1 Data Miner (Statsoft, Krakow, Poland) software

66

65

64

63

62

61

60

59

58

57

56

ref

PCA was performed using Unscrambler_ 9.7 (CAMO SA,Oslo, Norway)

PCA and classification trees (CT) were performed under the software package SPSS Statistics 16.0

CA and PCA were performed by SIMCA-P software (version 12.0, Umetrics, Umea, Sweden)

chemometric calculations (discriminant analysis (DA) followed by canonical analysis and factor analysis (FA)) were performed using Statistica 7.1 Data Miner (Statsoft, Polska) software PCA and orthogonal projection to latent structures discriminant analysis (OPLS-DA) were performed under Simca-P+13.0 software (Umetrics, Umeå, Sweden) statistical analysis including PCA and CA was performed with the software MATLAB 7.1 (MathWorks, USA)

PCA, HCA, and Pearson correlation analysis was performed with the program Acuity 4.0 (Axon Instruments)

PCA was carried out using the Statistica v. 7.1 software (Statsoft Inc., Tulsa,OK, USA)

peak detection and deconvolution was performed with AMDIS software (National Institute of Standards and Technology, Gaithersburg, MD) MCR analysis executed with MCRC software presented by the Jalali−Heravi group; other programs for chemometric resolution methods were coded in MATLAB 7 by the author

PLS-DA and PLS regression models using XLSTAT version 2008.7.03 (Addinsoft, Andernach, Germany); subsequent multivariate data analysis was performed with Solo software version 7.5 (Eigenvector Research, Wenatchee, WA, USA)

Table 1. Some Applications of VOCs Metabolomics Applied in Different Matrixes and Chemometrics Approaches

Chemical Reviews Review

DOI: 10.1021/acs.chemrev.6b00698 Chem. Rev. 2017, 117, 6399−6422

75 principal components analysis (PCA) and multivariate analysis of variance (MANOVA) followed by Duncan’s post hoc means comparison (p < 0.05) test were performed using STATISTICA (StatSoft Inc., Tulsa, OK)

74

73

evaluation of statistical significance of differences was performed using one-way analysis of variance (ANOVA) followed by Duncan’s multiple range test with the aid of the SPSS 14.0 for Windows (SPSS Inc., Chicago, IL); multivariate analysis (PCA) of the data matrices was carried out with the same program PCA was performed in Pirouette 4.0 software (Infometrix, Seattle, WA)

72 PLS-DA was performed with PLS-Toolbox in Matlab

71

honeys

yogurt

PTR-MS

PTR-TOF-MS

PTR-MS

butter and butter oil saffron PTR-MS

chemometrics

PCA and PLS-DA were performed under Pirouette 4.01 (Infometrix, Seattle,WA)

temperature of analysis for at least 30 min must be done before starting a sequence in order to calibrate the conditions and enhance the performance of a given column.79 Although more information on column selection and performance can be obtained from vendors, we recommend using an experimental design to find the best column and condition for a specific separation system. 3.1.3. Detectors. After a sample is separated using a column, the separated compounds will reach the detection system. For many years the most commonly used detector has been the flame ionization detector (FID). This detector is quite stable and highly sensitive with detection limits around picograms to nanograms of analyte, and that is the reason why it is commonly used for quantitative analysis. The main disadvantage of this detector in volatile metabolomics is the lack of spectral information on the analyte, useful for annotation/identification of compounds. However, it could be used in the quantification of targeted compounds in matrices where the chromatographic conditions are optimal (i.e., good resolution of peaks, no coelution, and well-defined retention times). Other popular and frequently used detectors are the mass spectrometry (MS) detectors. Such detectors have pushed the metabolomics field toward a new era because of their high sensitivity and also providing useful spectral information which leads to identification of analytes even in untargeted metabolomics. Briefly, when using a MS detector, compounds eluting from the column are ionized, and the mode of ionization could be “hard” or “sof t” either by electron impact (EI) or chemical impact (CI), respectively. In a positive mode, the positively charged molecule and their possible fragments are selected according to their m/z ratio by different mass analyzers, e.g., single and/or tandem quadrupoles (Q), ion trap, or time of flight (TOF). 3.1.3.1. Single Quadrupole and Tandem Mass Spectrometry. 3.1.3.1.1. Single Quadrupole. This technology is relatively low priced, and therefore, it is widely used for analysis of volatile compounds. It consists of four precisely parallel metal rods (each rod in opposite sites is connected electrically). A radiofrequency (rf) is applied between one pair of rods with the other. In a threedimensional space, ions that come impulsed electrically from the ion source enter to the quadrupole along the z axis; they are sorted depending on the imposed rf and dc fields in these diagonally opposed rods. Sweeping the rf and dc voltages allows for ions with given m/z ranges to be transmitted through the mass analyzer (quadrupole), successfully reaching the detector. The ions that did not make it to detector have unstable trajectories and are, therefore, deflected into the rods or pumped out to the vacuum system. The “disadvantage” of the singlequadrupole system is mainly the low scan rate, which means across the width of a typical chromatographic peak acquisition of several spectra becomes more difficult. 3.1.3.1.2. Tandem Mass Spectrometry. The tandem mass spectrometers are referred to MS/MS instruments. The most common variation is the triple quadrupole, sometimes abbreviated as QqQ. As shown in Figure 4, it consists of three consecutive stages: (1) a quadrupole where only ions with a selected m/z (precursor ion) pass through to the second stage in the approximate expected retention time for each selected component, (2) consisting of a collision chamber denoted as “q” (here the selected ion is broken into fragments in the presence of inert gas like He under a pressure of approximately 0.2 Pa, which collides with the precursor ion; note that if the energy is sufficient, it can be broken into more characteristic fragments), (3) those fragments go to a second mass filter stage before

espresso coffees, Kopi Luwak coffee, and organic coffees could be distinguished by their profiles of volatile compounds with the help of chemometrics; cross validation showed correct prediction of 42 out of the 43 (98%) organic coffee samples and 63 out of the 67 (95%) regular coffee samples PLS-DA were fitted to predict the matrix (butter/butter oil) and the sensory grades of the samples from their PTR-MS data examination of the PTR-MS VOCs analysis, in combination with chemometrics, was used to screen the presence of lower quality saffron in a commercial product in few minutes PTR-MS was able to distinguish the floral origins being useful for a fast online screening of buckwheat honey identification and quantification of targeted flavor compounds that have a major impact on the development and perception of flavor of fermented milks

objetive reached matrix

coffees

techniques

Table 1. continued

Review

PTR-MS

ref

Chemical Reviews

6405

DOI: 10.1021/acs.chemrev.6b00698 Chem. Rev. 2017, 117, 6399−6422

Chemical Reviews

Review

Figure 4. Configuration of quadrupoles in a tandem mass spectrometer. Different colors represent different molecular ions formed from the ionization source; then a selected ion (precursor ion) with specific m/z passes to the collision chamber where more fragments are created and directed to the second quadrupole to filter into some specific masses and reach the detector generating the corresponding MRM signal.

Figure 5. Time of flight (TOF) mass analyzer scheme. Ions with different masses are electrically accelerated from the ion source and let to drift in the vacuum tube. Their separation occurs depending on their own mass; lighter ones reach the detector faster.

reaching the detector. This step can vary depending on the analysis desired, e.g., it can be set to scan all of the produced ions from the collision chamber, it can be set to jump from peak to peak, or it can even be set to select a specified product ion. However, a triple quadrupole has some advantages over a single quadrupole; for instance, it has a higher selectivity which results in less interference of coeluting compounds and matrix; thus, less HPLC separation and/or sample preparation is required. Moreover, it gives a better signal-to-noise (S/N) ratio. It has a better accuracy and reproducibility especially at low concentration levels. Since this technology has both sensitivity and specificity, it is mainly applied in the fields of drug metabolism, targeted metabolomics, pharmacokinetics, and environmental studies where samples are very complex. 3.1.3.2. Quadrupole Ion Trap Spectrometer. In principle, the ion trap is a quadrupole that instead of a vacuum works with a pressure of 0.133 Pa of helium gas, and the main function is to store gaseous ion, confining them for a determined period and at the same time can work as a mass spectrometer of variable mass resolution able to filter ions with different masses. The difference with linear quadrupoles is that in the ion trap ions are subjected to forces applied by an rf field in all three spatial dimensions instead of two. It means that in linear quadrupoles, ions are free of motion in one of the three dimensions (z direction), but in the ion trap there is no degree of freedom for the ions.80 Once the ions are introduced into an ion trap they are dynamically trapped by application of rf potentials. This rf could vary, allowing a selected ion mass to reach the detector. This kind of system is able to provide MS/MS to MSn analysis. This system offers a high sensitivity for targeted analysis. 3.1.3.3. Time of Flight (TOF). In the most basic time of flight (TOF) system, ions are accelerated from the ion source by an electrical field. They gain energy in relation to their charge ze, meaning that ions with the same charge will have the same kinetic energy; once they are allowed to drift in a vacuum tube (flight tube), they are separated into groups called “isomass packets”, and this separation is obtained according to their m/z ratio. In other words the heavier ions would take more time to reach the detector located at the end of the flight tube (Figure 5). Note that in a TOF system, although a lot of information is gained, the noise will be more in comparison to the QqQ system.

highly eluting profile, a minimum of 20 data points are required.81 The maximum acquisition rate in gas chromatography coupled with a mass analyzer is obtained precisely with TOF detectors, for example, the maximum acquisition rate for quadrupoles is 5−10 spectra/s with intervals of 0.2−0.1 s. As is known in capillary chromatography coupled with scanning instruments like quadrupoles or double-focusing mass spectrometers, spectral skewing can occur. It happens when the scanning time is long in relation to the changes of the analytes concentration passing through the mass analyzer. If those instruments are set for faster data acquisition rates this effect could be minimized, but as a consequence a poor quality of mass spectral data will be obtained. This is due to the little amount of time used to measure each individual value.81 Ion trap quadrupoles or TOF mass spectrometers are less susceptible to the spectral skewing problem. Basically, in the first one, the complete spectrum is acquired in sequential segments with each segment comprising an array. The acquisition rate ranging from 10 to 15 spectra/s provides a resolution of 0.1− 0.0067 s. In TOF analyzers, the principle is different: all ions present in the source are simultaneously extracted and measured. They can operate in full-scan mode with scan rates up to 500 spectra/s with a mass resolution and linearity of 4 orders of magnitude. There is a second mode of operation called highresolution TOF (HRTOF); it can evaluate mass spectra data at 5000−15 000 full width at half-maximum (fwhm) mass resolution; the mass accuracy obtained is about 5−10 ppm, with an extended mass range up to 1500 amu.82 3.3. Analysis of Complex Mixtures: Revealing What Is Hidden

In complex matrices, it is easy to find two or more compounds that elute at the same time. Fortunately, they rarely present the same ion fragmentation pattern. The separation and identification of these compounds could be reached by either changing the polarity and length of the chromatographic column or changing the temperature program of the oven. However, as simple as it sounds, in front of a validated chromatographic method, it implies investment in time and effort by researchers. Newer computational technologies have been developed to provide tools in order to reveal what is hidden in a chromatogram. Deconvolution of a peak in a chromatogram, for example, is a computational process that calculates the contribution of each component to the peak, allowing their separation and creating a pure spectrum for identification. It also could detect low-abundance peaks in the presence of analytes at much higher concentrations.83,84

3.2. Spectral Basis for Volatile Measurement

Mass spectrometers employed alongside GC have a typical mass range between 1 and 1000 uma. For an isolated peak, at least 10− 12 data points are needed in order to get a precise time measurement and an accurate area under the curve. In TOF instruments, aiming to get an accurate area, molecular mass, and 6406

DOI: 10.1021/acs.chemrev.6b00698 Chem. Rev. 2017, 117, 6399−6422

Chemical Reviews

Review

Figure 6. Deconvolution example: overlapping peaks were separated into two compounds with their corresponding mass spectrum.

hydrocarbons eluting immediately before and after the compound of interest and x to the compound of interest. Alongside (EI) mass spectral library matching, Kovát’s retention index (RI) greatly improves metabolite identification. Moreover, combination of the AMDIS and mass spectral databases allows users to perform deconvolution, RI calculation, and spectra database searching of an entire sample in one single run.87,88

Deconvolution became known with the publication of the automated mass spectrometry deconvolution and identification system (AMDIS) algorithm in 1999, and subsequent development and implementation in chromatographic software has been done.85 Using the deconvolution algorithm, a full chromatographic separation is not required, allowing reductions in the time of analysis. An example of deconvolution in GC-MS can be seen in Figure 6, where two overlapping peaks can be separated into two independent compounds. The classical workflow for GC/MS metabolomic analysis implies working with deconvoluted mass spectra and identified compounds instead of unidentified ones. Identification of peaks can be achieved through pure authentic references or by matching their mass spectra with libraries (e.g., the installed libraries of instrumental software). Nevertheless, due to the lack of pure standards and compounds with very similar EI mass spectra, the GC time retention parameter becomes particularly important. Retention times vary with the column length, type of stationary phase, and temperature, but a universal solution was proposed by Ervin Kováts in 1958,86 who relates the retention times of the unknown peak with the retention times of n-alkanes analyzed under the same chromatographic conditions. For example, a peak having the same retention time as the linear nalkane C15 would be assigned with an index of 1500. Peaks appearing in halfway between n-alkane C16 and C17 would be assigned with an index of 1650 and so on. Since isothermal analysis is rarely performed in complex matrices, the nonisothermal Kovát’s equation is more common to use: Ix = 100n + 100(tx − tn)/(tn+1 − tn), where n and n + 1 refer to the

3.4. Comprehensive GC × GC

The number of metabolites varies between organisms; for example, in eukaryotic cells like yeast, it is around 1100, while in more evolved organism like plants or/and fungi it can sometimes reach hundreds or thousands.89 One of the biggest challenges that researchers face is being able to detect and quantify all metabolites of a given organism under study. We can categorize the problem in two sections: one is database related, and the other is instrumental. To date, the existing metabolic databases contain only a fraction of all metabolites known. Although conventional GC provides an acceptable resolution, it does not allow the detection of all components present in complex matrices. In order to deal with this problem, comprehensive twodimensional gas chromatography (GC × GC) was developed approximately 25 years ago by Liu and Phillips.90 Since then it has been considered as the most powerful multidimensional chromatographic technique for metabolomics. This technique combines two columns with orthogonal separation characteristics installed in the same GC oven. The complete effluent from the first dimension is focused (mainly by a thermal-based modulator) and transferred to the second column in small 6407

DOI: 10.1021/acs.chemrev.6b00698 Chem. Rev. 2017, 117, 6399−6422

Chemical Reviews

Review

related to volatiles are using NIST libraries, followed by Wiley by 33%.97 In Table 2 are listed some of the most known commercially available libraries for GC analysis.

concentrated segments, being therefore separated in slices. This focusing effect creates narrow second-dimension (2D) peaks enhancing the detection sensitivity. Most metabolomics-related studies apply a conventional (30 m × 0.25 mm i.d. × 0.25 μm) nonpolar stationary phase column (PDMS or 5% phenylsubstituted PDMS) and a second column (1−2 m × 0.1 mm i.d. × 0.1 μm) with a semipolar or polar stationary phase (e.g., polyethylene glycol). 3.4.1. Modulation of the Effluent. The key point in GC × GC is the capacity to trap and modulate the first column (referred as first dimension or 1D) effluent into the second column (referred as second dimension or 2D). This process should be fast enough to retain the first-dimension peak resolution and to obtain multiple second-dimension peaks (slices).91 By trapping the effluent from the first column on a short band inside of the modulator, the signal-to-noise ratio increases because of the achievement of peak compression. This modulation process can be reached in two ways: by trapping through a thicker stationary film and applying pulse heating to release substances and through a cryogenic zone placed over a modulator capillary. Nowadays, the availability of commercial controlled modulators like the dual-jet cryo modulator allows decoupling the processes of collection from the first-dimension column and sampling into the second dimension.92 3.4.2. Detection System in GCxGC. In this type of chromatography, the peaks that enter the detector during a chromatographic separation often have widths on the order of 100 ms, which implies the need for a very fast detection system. The alternative systems are currently fast scanning quadrupoles (working in a restricted mass range) and TOF-MS. The latter system is preferred because of its capacity to produce full mass scans at rates greater than 100 Hz. Nevertheless, both systems offer full capabilities for unresolved trace components providing details on the peak deconvolution and generating pure mass spectra. The sensitivity and selectivity in, for example, GC × GC/ TOF-MS systems are comparable with classical high-resolution gas chromatography and high-resolution mass spectrometry (HR-GC and HRMS) methods.91 Applications of GC × GC-MS in metabolomics are increasing, for example, it has been used in the nontargeted solid-phase microextraction (SPME) approach for discriminative analysis of fermented cucumber volatiles before and after anaerobic spoilage.93 In their study, they obtained a resolution of over 300 peaks of which nearly 10% showed significant variation during the anaerobic spoilage process. Its use has been extended in wine to determine, for example, age markers,94 in the VOC determination from specific organisms like bacteria,95 or in the interaction of the mint bug (Chrysolina herbacea) with different mint plants (Mentha species), as well as in yeast with a focus on metabolite profiling.96

Table 2. Commercial Mass Spectra Libraries Used for GC-MS Identification database

no. of spectra

Wiley

775 500

NIST02 NIST 14 Golm Metabolome Database Fiehn library

123 434 276 248 3500

Adams Wiley FFNSC Library

1606 3462

Wiley/Yarkov

37 055

Wiley/Zeist Wiley/MPW Drug Library

1620 8650

Wiley/POS of Physiologically Active Substances

4182

2200

web site http://eu.wiley.com/WileyCDA/ WileyTitle/productCd-1119171016. html http://www.nist.gov/srd/nist1a.cfm http://nistmassspeclibrary.com/ http://gmd.mpimp-golm.mpg.de/ http://fiehnlab.ucdavis.edu/MetaboliteLibrary-2007/ www.allured.com http://www.sisweb.com/software/wileyffnsc.htm http://www.sisweb.com/software/wileyorganic-compounds.htm http://www.mswil.com/software/massspectrometry-libraries/maurer-pflegerweber http://www.mswil.com/software/massspectrometry-libraries/parropfermann-schaenzer

Briefly, the databases most used in GC/MS volatile metabolomics are the NIST Library (NIST, http:// nistmassspeclibrary.com/), the Golm Metabolome Database (GMD, http://gmd.mpimp-golm.mpg.de), and the Fiehn/ Binbase library (http://fiehnlab.ucdavis.edu/db), which also includes trimethylsilyl (TMS)-derivatized metabolites. 3.5.1. NIST Library. This is the most widely used and comprehensive mass spectral library. The latest version was released in 2014 called NIST 14. This library consists of a collection of 276 248 EI mass spectra from 242 477 unique compounds evaluated in detail by professional evaluators before inclusion.98 In addition, NIST 14 includes 387 463 measured Kovats indexes of different GC methods, column conditions, and literature citations for 82 868 compounds.87 The library also includes name, formula, molecular structure, molecular weight, CAS number, contributor name, list of peaks, synonyms, and measured retention index when available, and it is coming with an updated version of AMDIS software for deconvolution. 3.5.2. Golm Metabolome Database (GMD). The Golm Metabolome Database (GMD) is a freely available mass spectral library developed by the Max Planck Institute for Molecular Plant Physiology in Germany.99 It includes valid mass spectral information on more than 3500 analytes and 2000 metabolites organized in metabolite report cards. This information was obtained either with a single quadrupole or with a TOF mass detector. GMD is continuously being updated with new identified metabolites. In this database, it is also possible to find information on more than 3100 mass spectra of compounds which are not yet identified including their respective retention time behavior. All these data are used by the GMD algorithm to facilitate compound identification by matching them to a GC/ MS reference spectra and retention indices. In GMD, it is also possible for the users to generate customized libraries. All chemical compounds found during the search are displayed in

3.5. Mass Spectral Databases Used in GC-MS Volatile Metabolomics

Thanks to the improvement, in hardware as well as software, mass spectral libraries/databases have become an indispensable tool for chemical and biochemical annotation.97 The strategy consists in matching the mass spectrum acquired with those stored in the database. Better results are obtained with the combination of different libraries or in combination with different methods for identification. The numbers of mass spectra libraries and algorithms for peak annotation are increasing. According to recent research, 78% of the articles 6408

DOI: 10.1021/acs.chemrev.6b00698 Chem. Rev. 2017, 117, 6399−6422

Chemical Reviews

Review

such as dry cured ham, virgin olive oils, and coffee blends.71,102−104

the result window, showing as well their chemical and/or physical property information linking to external resources. 3.5.3. Fiehn Library. This library was created by the Fiehn Lab group and comprises more than 2200 mass spectra of different compounds obtained by electron impact mode at 70 eV. It contains also information on the Kovats retention indices for over 1000 primary metabolites below a molecular weight of 550 uma. This database includes different classes of compounds such as lipids, amino acids, fatty acids, amines, alcohols, sugars, aminosugars, sugar alcohols, sugar acids, organic phosphates, hydroxyl acids, aromatics, purines, and sterols as methoximated and trimethylsilylated compounds. In the database it is possible to find two kinds of mass spectral information: those obtained with a single-quadrupole system from Agilent (model 5973 MSD) operated at full scan covering mass from 50 to 650 m/z at a scan rate of 2 spectra/s and the other obtained with a TOF detector by a LECO Pegasus IV TOF instrument covering signals from 85 to 500 m/z at 20 spectra/s. The main disadvantage of this library is that fatty acid methyl esters (FAMEs) were used as internal retention markers and complex calculation has to be done in order to convert back those indexes in the most common and reported Kovats RI.87 3.5.4. Wiley Library. The Wiley Registry of Mass Spectral Data has been published in its 11th edition.100 It is considered as the largest and most comprehensive mass spectral library commercially available. It is provided with the same searching software as NIST 14, being compatible with most manufacturer data systems including Agilent, Bruker, DANI, Leco, PerkinElmer, Thermo XCalibur, Chromeleon, and Waters. It contains over 775 500 mass spectra and 741 000 compounds; most spectra are accompanied by the structure and trivial names, molecular formula, molecular weight, nominal mass, and base peak. Also available is the combination of the large Wiley Registry with the current NIST database. The Wiley Registry 11th edition/NIST 2014 (W11/N14, ISBN: 978-1-119-28422-2 of December 2016) currently provides the most extensive mass spectral library. It contains more than 1.2 million EI mass spectra, chemical formulas, and more than 45 298 MS/MS ions. Despite the advances made in algorithms and software technologies, and in order to avoid misidentification, the data obtained in GC-MS must be checked and confirmed by an operator in a quality control process.

4.1. Basic Concepts of PTR-MS

In principle, PTR-MS combines the fundaments of chemical ionization with the swarm technique presented by Ferguson and co-workers.101 The principal source of organic ions in PTR-MS comes from proton transfer (from H3O+) to many organic molecules. This is, in general, an exothermic process and is in eq 1 H3O+ + VOC → VOCH+ + H 2O

(1) +

A hollow cathode (Figure 7) produces H3O from pure water vapor. The purity of H3O+ ions exiting the ion source usually

Figure 7. Proton transfer mass spectrometer (PTR-MS) scheme. In the first step, pure H3O+ are generated from H2O vapor, then they are electrically impulsed to the next chamber (drift tube) where they collide with VOCs molecules from the sample transferring a proton to these VOCs. Protonated molecules are subsequently impulsed to the mass analyzer and detector generating the corresponding signal.

exceeds 99%. The most significant impurity (1%) is the O2+ ion produced either directly by electron impact or by charge transfer from H3O+ ions.101 Hydronium enters into the drift tube flushed with air, and they collide with N2, O2, Ar, etc. Due to VOCs having a proton affinity higher than water, about 1% of the primary H3O+ ions transfer their protons to them, becoming a protonated molecule. Because protonation is a soft ionization method, molecules are not highly fragmented and their quantification is far simpler than in the case of electron impact. Afterward, ions are transported through the drift tube as a consequence of the presence of an electrical field. Finally, both primary and products ions enter a chamber where air is pumped, allowing them to go through a quadrupole mass spectrometer (or any other kind of used detector such as ion trap or time-of-flight detector).105,106 The limit of detection (LOD) of this technique is around 10 ppt for a 1 s integration time.16 The coupling of TOF with a PTR ion source was done in 2004 by Blake et al.107 Despite the fact that coupling with a TOF system was challenging, nowadays we can find ultrasensitive and high-resolution instruments in the market. PTR-TOF instruments provide high mass resolution in comparison to quadrupole mass spectrometers, a practically unlimited mass range, and the capacity to acquire the entire mass spectrum for every package of ions injected into the drift tube.107

4. VOLATILE ANALYSIS THROUGH PROTON TRANSFER REACTION MASS SPECTROMETRY (PTR-MS) GC-MS has been positioned as the pillar technique for identification and quantification of VOCs in different matrices because of its high precision. In combination with good sample preparation, detection limits of around 0.1 ppt can be reached. Two of the main disadvantages of this technique are a relatively low time resolution and the presence of artifacts in the chromatogram.101 In order to overcome these issues, direct injection mass spectrometry (DI-MS) can be employed. Proton transfer mass spectrometry (PTR-MS) was introduced for the first time in the 1990s and has since then become one of the most established DI-MS methods.102 This technique is a nonseparative analysis of VOCs which does not require laborious sample preparation or preconcentration. In addition, it provides a rapid real-time monitoring with high sensitivity (at ppt levels).74 Its initial development was for the detection of gaseous VOCs in air103 but has been implemented in the analysis of VOCs from plants, polluted environment, in medicines, and in food matrices

4.2. Spectral Basis for Volatile Measurement

Identification and quantification of VOCs by PTR-MS is possible based on two premises: (1) the chemical ionization process is an exothermic proton transfer reaction coming from H3O+ ions to 6409

DOI: 10.1021/acs.chemrev.6b00698 Chem. Rev. 2017, 117, 6399−6422

Chemical Reviews

Review

VOCs to produce a quasimolecular VOCH+ ions which do not undergo secondary ion−molecule reactions and (2) the H3O+ ion signal is not depleted by reactions with organic molecules.108 For instance, the mass spectra of low molecular weight alcohols (with proton affinities ranging from 181.9 to 191.7 kcal/mol) was analyzed by Maleknia et al. (2007).109 The hollow cathode ion source, produces ionization of water vapor, generating a high concentration of H3O+ ions and protonated dimers ((H2O)2H+) and trimers ((H2O)3H+) with m/z of 19, 37, and 55, respectively;109 if ethanol is present, it undergoes a series of reactions converting the molecule to a protonated ethanol and cluster ions like in the following equations

4.3. VOCs Quantification in PTR-MS

As mentioned above, VOCs quantification in PTR-TOF is not only possible but also even easier than in GC and can be done in real time without the need for a continuous calibration gas feed. For example, the mass analyzing and detection system (either quadrupole mass filter or TOF) delivers count rates proportional to [VOCH+] and [H3O+]; then VOCs can be quantified through eq 5 [VOC] =

(2)

C2H5OHH+ + C2H5OH (3)

C2H5OHH+(C2H5OH) + C2H5OH → C2H5OHH+(C2H5OH)2 [m /z 139]

(5)

where k is the rate coefficient of the proton transfer reaction and can be found in the literature for many substances (alternatively it can be calculated or experimentally determined), τ is the residence time of the ion in the drift tube and can be calculated from system parameters (drift voltage, pressure, temperature, etc.), and [VOCH+] and [H3O+] are the ion concentrations. A newer version of PTR-MS software can automatically acquire and calculate all these parameters from the equation, allowing the users to quantify VOCs in real time.111

H3O+ + C2H5OH → C2H5OHH+ [m /z 47] + H 2O

→ C2H5OHH+(C2H5OH) [m /z 93]

1 [VOCH+] kτ [H3O+]

(4)

4.4. Applications

The characteristics mentioned before suggest that PTR-MS is a valuable tool for volatile metabolomics. As a VOC fingerprinting technique, it has been used for discriminating different treatments, origins, and samples based on the chemodiversity of cultivars. For example, it has been employed successfully in the classification of nine cultivars of strawberries from different locations.112,113 Moreover, it was used for the classification of apple cultivars, where five different clones belonging to three apple cultivars, such as “Fuji”, “Golden Delicious”, and “Gala”, were accurately profiled, and the VOCs emission from Gala showed two compounds as estragole and hexyl 2-methyl butanoate contributing to clone characterization.114 It has also been applied in enology, differentiating the headspace VOC of wines produced from different grape varieties.115 Apart from such research focus, it was also used to characterize dry-cured hams with protected designations of origin (PDO) from Iberia and Italy obtaining 100% of sample classification based on fast headspace VOC analysis, compounds such as dimethyl sulfide and methanethiol differed between these two origins, and compounds linked to the processing conditions like acetonitrile and hexanenitrile were identified.103 The applications of PTR-MS are growing in many areas, and some reviews and books detailing this topic have been written. For a deeper understanding we recommend reviewing refs 102 and 116.

The intensity of the cluster signal depends on the concentration of the present analyte.108 Analysis of mass spectra obtained by a PTR-MS is more complicated than that obtained by GC due to the lack of reliable mass libraries for these instruments. Moreover, in PTR-MS a manual elucidation of the spectrum is required. For example, based on eqs 2, 3, and 4 one can explain the signals presented in the analysis of ethanol at m/z 47, 93 (dimer), and 139 (trimer) (Figure 8). The signals at m/z 75 and 121 can also be deducted as fragment ions coming from the loss of water in the dimer and trimer.

Figure 8. Mass spectrum of ethanol obtained by PTR-MS.109 Signals at m/z 75, 93, 121, and 139 correspond to ethanol dimers and trimer. Signal m/z 29 corresponds to the loss of water from the protonate ethanol.

5. SAMPLE PREPARATION Sample preparation is the most important step prior to analysis of metabolites. It is a part of the experimental design, and certain conditions must be taken into consideration. In the selection of the best method for the sample preparation, it is first necessary to considerer the class of compounds to analyze in advance. Other factors such as the phase in which those components are contained are important. Below a brief strategic approach is given to sample preparation for analysis of VOCs in gas chromatography and PTR-MS.

The major and very critical drawback of a PTR-MS is the limitation in the identification of unknown compounds. Therefore, it is mainly used in fingerprinting analysis or to monitor real-time changes of known compounds. Thus, one should use a GC-MS again for compound annotation. Nevertheless, PTR-TOF reaches a mass accuracy of around 5 ppm for a wide range of m/z values.110 The molecular formula is determinable with high precision and strongly enhances the possibility of identification.

5.1. Gas Chromatography

5.1.1. Sample Extraction. Analysis of the VOCs can be performed on either freeze-dried samples or fresh samples. One of the main advantages of freeze drying is the preservation of the 6410

DOI: 10.1021/acs.chemrev.6b00698 Chem. Rev. 2017, 117, 6399−6422

Chemical Reviews

Review

proaches.123−125 For example, liquid extraction (LE) has been used to discriminate the aromas of eight varieties of apricots (Prunus armeniaca); the recovered organic phase (CH2Cl2 + volatile components) was microdestillated and analyzed by GCMS. By comparing different extraction techniques such as LE and SPME alongside multivariate data analysis, the authors were able to differentiate and identify the apricot varieties without any error.126 Dispersive liquid−liquid microextraction (DLLME), on the other hand, coupled with GC-MS has been used in the pattern recognition or fingerprinting analysis of 17 Iranian saffron samples (Crocus sativus L.) in order to discriminate different geographical regions. Their results showed that the saffron samples could be categorized into five different classes revealing 11 compounds as biomarkers contributing to this characterization. These 11 biomarkers include nine secondary metabolites of saffron (safranal, α- and β-isophorone, phenylethyl alcohol, ketoisophorone, 2,2,6-trimethyl-1,4-cyclohexanedione, 2,6,6-trimethyl-4-oxo-2-cyclohexen-1-arbaldehyde, 2,4,4trimethyl-3-carboxaldehyde-5-hydroxy-2,5-yclohexadien-1-one, and 2,6,6-trimethyl-4-hydroxy-1-cyclohexene-1-carboxaldehyde (HTCC)), a primary metabolite (linoleic acid), and a long-chain fatty alcohol (nanocosanol).127 5.1.1.2. Solid-Phase Extraction (SPE). This is a simple preparation technique, and it is based on the same principle of liquid chromatography, where the partition coefficient of an analyte between a liquid phase and an adsorbent material depends on its solubility and the interactions produced by the functional groups present in the molecule. Note that a wide range of chemically modified adsorbent materials in either silica gel or synthetic resins like reversed-phase (C2, C8, C18), etc., are commercially available. SPE is also useful in the treatment of sample matrices with high water content.128 Even though the SPE has been proved to be an effective method for isolation of VOCs in different types of samples, miniaturized solid-phase extraction techniques such as solid-phase microextraction and stir bar sorptive extraction are nowadays the trend for metabolomics analysis. The main advantage of these miniaturized techniques is the reduction in the quantity of sample and using simpler instrumentation for analysis. 5.1.1.3. Solid-Phase Microextraction (SPME). SPME was introduced for the first time in 1989; it is a fast, sensitive, solventless, and economical method used in sample preparation for analysis in GC.129 According to the Web of Science, 1989 documents have been found with the topic SPME + Volatile from 2010 to 2015. The fiber SPME device consists of a fiber holder provided with a spring-loaded plunger, a stainless-steel barrel, and an adjustable depth gauge with a needle.129 A fused-silica fiber coated with a thin film of one or several polymeric stationary phases like polydimethylsiloxane (PDMS), carboxen (CAR), or divinylbenezene (DVB) absorbs or adsorbs VOCs from the sample matrix. Analytes in the sample are extracted and concentrated in the fiber coating and can be directly desorbed on the injection port in the gas chromatograph. SPME is frequently used either in direct sampling or in headspace (HS) analysis. Because of its simplicity in use, it does not require an experienced operator because it is fully adapted to automatic sampling systems widely employed in different areas, especially in food quality and control, agriculture, and plant sciences. As an example, the high-throughput screening of volatiles from 94 different tomato genotypes was demonstrated by HS-SPME coupled to GC-MS, detecting 322 distinct plantderived compounds.130 This technique has been also used in a nontargeted approach to discriminate several Citrus species.

sample for a longer time. Elimination of water before storing is desirable for quenching the enzymatic activity.117 However, it has been demonstrated that sometimes this process increases the risk of losing volatiles, altering the sample chromatographic profile.118,119 For example, in the metabolite profiling of four different varieties of apples (“Golden Delicious”, “Granny Smith”, “Pinova”, and “Stark Delicious”) it was found that the sum of peak areas decreased by approximately 33% compared to the analysis performed using freshly prepared samples.117 In any case, samples in either fresh or freeze-dried form must be (pre)treated in order to extract and/or preconcentrate VOCs from their matrices, considering also the chemical differences of every compound. There are four main analytical techniques for analysis of VOCs such as liquid extraction (LE), solid-phase extraction (SPE), solid-phase microextraction (SPME), and most recently stir bar sorptive extraction (SBSE). Other techniques may be useful in selected circumstances, but it depends on the research focus criteria. 5.1.1.1. Liquid Extraction (LE). LE is a reference technique for extraction of volatile components from diverse matrices. In this method and depending on the scope of the analysis, solvents with different polarities such as methanol, ethanol, hexane, dichloromethane, etc., can be tested in order to determine which one is more effective. For example, for analysis of polar metabolites, from a homogenized sample, the extraction process is generally performed with a water−methanol or water−ethanol solution. After extraction, centrifugation or decantations steps are sometimes required for separation of the nonsoluble material from the liquid phase.120 In the case of the nonpolar metabolites, chloroform or hexane is also used as extraction solvent.121 Subsequently to the extraction and the separation of the liquid phase, a concentration step is done using a vacuum rotary evaporator or simply by blowing a stream of N2 or air into the vial containing the sample. Temperature and extraction time are parameters to be optimized for effective extraction. Note that room temperature is preferred to avoid recombination of molecules of thermal degradation; when higher temperatures are required to inactivate a majority of enzymes in several systems,122 another analytical technique should be considered. Disadvantages of this technique are the exposure to large volumes of organic solvents such as hexane, dichloromethane, etc. Another technical issue could be the low recovery of analytes, because extraction depends on the partition coefficient between the VOC and the solvent. Some of the VOC could not totally be extracted and still remains in the matrix. Therefore, it is recommendable to perform the extraction in different types of solvents or solvent mixtures and then compare them chromatographically. In order to overcome the problem of using a high volume of solvent, miniaturized liquid-extraction techniques were developed such as liquid−liquid microextraction and dispersive liquid−liquid microextraction. The main advantage of liquid− liquid microextraction (μ-LLE) is that it does not require special equipment, while the main disadvantage is that it sometimes presents an extraction efficiency that could be lower than conventional liquid extraction. In order to increase the efficiency of extraction, a mixture of two solvents (extraction solvent and disperser) is injected by syringe into the aqueous sample (dispersive liquid−liquid microextraction). This kind of extraction can be assisted by ultrasound, improving its efficacy. These techniques are mostly related to the analysis of organic compounds in water; nevertheless, one can find some implementations of these techniques in metabolomics ap6411

DOI: 10.1021/acs.chemrev.6b00698 Chem. Rev. 2017, 117, 6399−6422

Chemical Reviews

Review

Forty-four metabolites were detected across the species of Citrus monstruosa; monoterpenes were the most abundant among them. Conversely, some VOCs were found to be specific for only a few of the samples.131 SPME has been used as well in a multiplatform metabolomics analysis to determine the biochemical differences in 31 rice varieties from a diverse range of genetic backgrounds and origin. Comparing fragrant rice varieties, the author observed differences in the metabolic profiles of jasmine and basmati varieties with no consistent separation of the germplasm class.132 Even though most of the studies done with SPME are oriented toward analysis of plant VOCs or food metabolites, some applications can be found in the area of human metabolomics or human health. It is known that VOCs present in human faeces have a potential to diagnose different human diseases. Therefore, Dixon et al. (2011)133 evaluated eight different commercially available SPME fibers, identifying 50/30 μm CAR-DVB-PDMS, 85 μm CAR-PDMS, 65 μm DVB-PDMS, 7 μm PDMS, and 60 μm PEG SPME as being appropriate fibers for human faecal VOC metabolomics. In the area of cancer research, recent investigations are oriented to find possible biomarkers for discrimination of cancer cells from normal ones, for example, the study conducted by Huang et al. (2016)134 found potential VOC biomarkers of MCF-7, MDA-MB-231, and CCD-1095Sk cell lines. Fingerprinting analysis showed that each kind of cell line provides a unique chromatographic profile. Applying PCA and partial least-squares data analysis (PLS-DA), four volatiles identified as 2-ethyl-1-hexanol, 2,4-dimethyl-benzaldehyde, cyclohexanol, and p-xylene were found to be potential biomarkers for discriminating breast cancer cell lines from a normal mammary cell line. Finally, SPME was shown to be useful as well in targeted analysis. For example, in the analysis of 16 volatiles present in human urine of 19 healthy volunteers, compounds like acetone, 2-butanone, 3-methyl-2-butanone, 2-pentanone, 3-methyl-2pentanone, 4-methyl-2-pentanone, 2-hexanone, 3-hexanone, 2heptanone, and 4-heptanone, dimethyl sulfide, allyl methyl sulfide, and methyl propyl sulfide, and three heterocyclic compounds like furan, 2-methylfuran, and 3-methylfuran were identified and quantified supporting the concept of hybrid volatolomics improving and complementing the chemical information on the physiological status of an individual.135 5.1.1.4. Stir Bar Sorptive Extraction (SBSE). SBSE has the same technical principle of SPME. It was developed at the end of the 1990s and introduced for the trace analysis of organic compounds extracted from aqueous food, biological, and environmental samples. It consists of a bar coated with a sorbent polymer (polydimethylsiloxane) that can be immersed in the sample to extract the analytes from the solution according to their chemical affinities. The analytes are, afterward, desorbed thermically on the injection port of a GC. SBSE has some advantages in comparison to SPME, for example, since SBSE is between 50 and 250 times thicker than SPME, sample enrichment is improved, obtaining lower detection limits.136 SBSE is not as widely accepted as SPME due to fact that protocols are still not fully automatized and a limited number of coatings are commercially available.137 SBSE approaches have been employed in the VOCs composition analysis of raspberry cultivars. As a result, 29 volatile compounds, such as α-ionone, β-ionone, geraniol, linalool, and (Z)-3-hexenol, were quantified. The author found good correlation coefficients with most aroma-active compounds, with quantification limits of 1 μg/kg.138 It has been

employed as well in the quantitative analysis of sesquiterpenes in the headspace of mycelial cultures of Alternaria alternata and Fusarium oxysporum under different conditions (nutrient rich and poor, single cultures, and cocultivation) and different mycelial ages.139 In the field of plant defense or crop protection we found some studies performed with SBSE. VOC profiling is presented in one of them as an alternative method of disease detection. In studies conducted on healthy trees and on trees infected with the Citrus tristeza virus (CTV) (genus Closterovirus), which is a plant pathogen that infects important citrus crops, a total of 383 VOCs were found. Putative biomarkers of CTV were identified as terpenoid (myrcene, carene, ocimene, bulnesene), alcohols (nundecanol, Surfynol), and two acetones (geranyl acetone and neryl acetate), allowing efficient discrimination between trees (infected, healthy, and coinfected by another pathogen).140 Finally, the metabolomics approach in tomatoes infected with spider mites and aphids was studied employing SBSE coupled with GC-MS. Errard et al. (2015)141 studied the effects of singleversus multiple-pest infestation by Tetranychus urticae and Myzus persicae on the tomato fruit (Solanum lycopersicum). They observed that plants had a different response according to the pest infestation: the volatile emission presented differences between the adaxial and the abaxial leaf epidermis being the chemical compounds cyclohexadecane, dodecane, aromadendrene, and β-elemene emitted as indicator of multiple infestation. 5.1.2. Derivatization Step in VOCs Analysis. Even though this review is focusing on the analysis of free volatile organic or thermally stable volatile compounds that can be ran through GCMS, it is important to mention that there are metabolites that cannot be efficiently determined due to their physicochemical properties or the composition of the matrix. It is common in GCMS-based metabolomics to make them react with a derivatizing agent before the analytical run. Selection of the derivatizing reagent is based on the class of molecules that the researcher wants to accurately analyze; thus, there are different types of derivatizing agents. Modification of the analytes (usually to reduce the polarities of the functional groups) by the derivatizing agent may improve their extraction from the sample and desorption and also to improve the separation performance in comparison to previous poor chromatographic behavior.142 Predominantly, the reactions of trimethylsilylation (TMS) or tert-butyldimethylsilylation (TBDMS) are very popular and widely used in many applications, the latter being the preferred one in most metabolite profiling applications.143 The main advantage of the TBDMS derivatives are the lower sensitivity to hydrolytic effects compared to the TMS derivatives, which makes them suitable for structural assignments of plant metabolites.144 However, it significantly increases the molecular mass of the analyte. There are some considerations that must be taken into account: first, it is necessary to be sure that the molecular mass of the derivatized analyte is still inside the mass range of the GC detector, and second, due to the formation of multiple products, it is possible to face complications in the chromatogram. It is also worthy to mention that the derivatizing agent should be added in excess in order to guarantee the reaction with all interested compounds present in the sample. When the analysis is carried out through SPME, derivatization can be performed simply by exposing the enriched fiber to the derivatization agent solution after extraction.145 6412

DOI: 10.1021/acs.chemrev.6b00698 Chem. Rev. 2017, 117, 6399−6422

Chemical Reviews

Review

6.1.2. Baseline Correction. The first step that the analyst performs in chromatographic data analysis is baseline correction. The simplest procedure is through a “blank” injection; it consists in the chromatographic injection of an empty vial (e.g., in HSSPME analysis) or the solvent (or mixture of solvents) used in the metabolite extraction. The baseline is corrected by subtracting, either algorithmically or manually, all of the samples chromatograms with the “blank” chromatogram. This procedure allows for baseline drift corrections by reducing the lowfrequency variations due to column or SPME fiber bleed, background ionization, and low-frequency detector variations. Afterward, the baseline noise signal should be numerically centered to zero. Nevertheless, with the improvement in the instrumental sensitivity, baseline artifacts become more prevalent and severe; for that reason improvement in algorithms must be done. Wang et al. (2014)151 compared effectively three preprocessing algorithms: orthogonal basis (OB), fuzzy optimal associative memory (FOAM), and polynomial fitting (PF). OB and FOAM algorithms are a denominated two-way algorithm (for chromatographic and mass spectra data object), while PF is a one-way algorithm and in this case does not require blank data object but optimization of polynomial order and fitting threshold. Some of the GC-MS baseline correction methods are listed in Table 3.

5.2. Sample Preparation in PTR-MS

As mentioned before, PTR-MS does not require sample preparation and samples in any state or form can be introduced into the instrument. In principle, samples are contained in a jar in the presence of a stream of carrier gas (N2 or air). VOCs are introduced alongside this carrier gas, which allows them to pass through the sonda to the hollow cathode for formation of protonated ions. The analysis is similar to those done with a headspace autosampler system coupled to a GC. For example, in the monitoring of VOCs emitted in this case by a fruit (completely or after crushing) it is only necessary to deposit the sample in a sealed glass vessel large enough to keep the sample intact inside (Figure 9). The sample is kept at room

Table 3. Some of the Algorithms and Methods Employed in Baseline Correction Figure 9. Sample introduction of VOCs emitted by an apple fruit to a PTR-MS system.

temperature or heated for some time to enrich the headspace of the chamber. Then the valve connected to the carrier gas, with a determined flow, is opened, starting the analysis.146,147

6. PROCESSING OF THE DATA The analytical instruments described in this review, including GC-MS and PTR-MS, generate extremely large volumes of data, which often consist of artifacts, noise, and unwanted signals. In order to extract chemically relevant information from such complex data sets, automated software and chemometric tools are needed. The International Chemometrics Society (ICS) has defined chemometrics as the science of relating measurements made on a chemical system or process to the state of the system via application of mathematical or statistical methods.148 Practical steps in chemometric analysis include design of experiment, data preprocessing, classification, or calibration.149

method

abbreviation

ref

orthogonal basis fuzzy optimal associative memory polynomial fitting noise median method first-derivative method threshold-based classification signal removal methods composite (linear-sine-cosine) baseline method manual methods maximum entropy method Fourier transform method quantile polynomial regression adaptive iteratively reweighted penalized least squares

OB FOAM PF NMM FDM TBC SRM CBM MM MEM FTM QPR airPLS

151 151 151 152 152 152 152 152 152 152 152 153 154

6.1.3. Noise Reduction. The previous step is usually effective to correct low-frequency noises. However, for reduction of high-frequency variations employment of smoothing techniques may be necessary; these techniques improve the signal-to-noise ratio (SNR) in the chromatogram. Fortunately, instrumental software includes some of the smoothing option making this work easier. Classical methods used for smoothing and denoising chromatographic data are the Fourier transformation,155 the least-squares method proposed by Savitzky and Golay in 1964, since then widely used in analytical chemistry,156 the wavelet transform for Gaussian peaks157 that allows the screening of signals by low-pass and high-pass filters, and finally the roughness penalty approach, a recently developed statistical procedure158 that overcomes white noise from hyphenated chromatographic data.159 6.1.4. Normalization of the Data. During a set of analyses it is common to face slight unavoidable variation between samples. It occurs during sample preparation and/or injections in the chromatographic system. Therefore, chromatograms are often normalized by introduction of an internal standard (IS).160

6.1. Preprocessing of the Data

6.1.1. Data Preprocessing Techniques. Preprocessing techniques are applied with the objective of improving the accuracy and precision of qualitative and quantitative analysis. This is done to the raw data obtained from the measurement, reducing the chemically irrelevant information. This step might be controversial, because by manipulating the values of the observed signal, if done wrong, it can diminish the data integrity. The most common steps followed are baseline correction (not applicable to PTR-MS), noise reduction, normalization of the data, and retention time alignment.150 However, the order of these steps may be commuted but being aware of the risk of introducing noise when performing preprocessing techniques. This is because of the fact that some techniques are data dependent and cannot be applied on all types of data. 6413

DOI: 10.1021/acs.chemrev.6b00698 Chem. Rev. 2017, 117, 6399−6422

Chemical Reviews

Review

By normalizing, variations that come from experimental sources are removed but still keeping those variations from the systems under study. However, in a complex matrix where different classes of compounds coexist having different response factors in the detector makes it difficult to find a proper IS. To overcome this issue, it is sometimes required to add at least two IS of different chemical classes. Sometimes, when the IS method is no longer practical, the researcher can normalize the chromatogram in terms of area normalization; it is done by dividing the area of every single compound by the sum of the area present in the chromatogram.161 Nevertheless, it is important to consider that this procedure loses validity in heterogeneous data sets if they present marked differences in the number and intensities of the chromatographic peaks. Additionally, is worthy to considerer the influence of the quantitative data in multivariate data analysis. For example, in data obtained from an aroma analysis of fruits, it is evident that a few components could be the main contributors (from a quantitative aspect) of the aroma blend. If a PCA is performed using the raw data, the discrimination of the samples is merely based on this aspect. It might come to a wrong conclusion, because in aroma analysis the majoritarian compounds are not necessarily the characteristic odor of the fruit and they depend on the odor threshold. This means that even molecules in a low concentration but with low threshold could contribute more to this aspect than the majoritarian ones. To avoid such influences in the quantitative data, it is important to normalize the data around zero by centering, scaling, or logarithmic transformation. The centering converts all concentration data to fluctuate on the zero value of coordinates, calculating the average of each variable and subtracting it from each observation. It permits one to adjust differences between high and low levels of compound concentrations in samples maintaining their original relationship. The scaling procedure, on the other hand, is done by dividing each variable by a function related to its standard deviation (denominated scaling factor) to adjust for the variation in fold differences between detected VOCs. Note that it only modifies the weights of the different features of the data without changing its structure. Finally, the logarithmical transformation consists in applying, e.g., the Log10 to the raw data which results in big numbers being reduced to values closer to zero, reducing as well their influence in a PCA.162 The logarithmic scaling is not like other normalization techniques such as autoscaling or mean centering, etc. It does not add any additive values nor does it remove from the original data. It transforms a given data into a smaller range without masking the effect of small values within the data. In simple words, such scaling allows a large range to be normalized without small values being compressed (i.e., the large values will not dominate the effect of small values). This scaling method is very useful in this case because mass data cannot be comprised of negative values. 6.1.5. Retention Time Alignment. GC systems are subjected to uncontrollable fluctuations in pressure, carrier flow, and oven temperature, leading to retention time variations between chromatograms of different samples or even in replicated runs from the same sample. For this reason, retention time alignment is the most important preprocessing step. Many algorithms, as shown in Table 4, have been designed to shift peak positions, making retention time a more precise parameter. Some of these algorithms are designed for pixel-level chromatograms (or image data), while others are designed for aligning peak tables.150 Pixel-level alignment algorithms are more comprehensive and work over detector data points; it can be grouped

Table 4. Some Algorithm Used for Peak Alignment in GC-MS alignment algorithm correlation optimized warping dynamic time warping peak alignment by genetic algorithm peak alignment by fast Fourier transform recursive alignment by fast Fourier transform interval correlation-optimized shifting algorithm piecewise alignment peak alignment using reduced set mapping peak alignment by beam search automated peak alignment by beam search parametric time warping semiparametric time warping peak matching algorithm bidirectional best hits peak assignment and cluster extension center-star multiple alignment by pairwisepartitioned dynamic time warping variable penalty dynamic warping landmark selection distance and spectrum correlation optimization dynamic programming approach automated curve resolution multivariate curve resolution-correlation optimized warping

abbreviation

ref

COW DTW PAGA PAFFT RAFFT

167 168 169 170 170

Icoshift

171

PA PARS PABS Auto-PABS PTW STW PMA BIPACE

172 173 174 175 166 176 177 178

CEMAPP-DTW

178

VPdtw LS DISCO (GC × GC/TOF-MS) DPA ACR MCR-COW

179 180 181 182 183 184

into four major categories: simple scalar shift, alignment to select target peaks, local alignment algorithms, and globally optimized alignment algorithms. The scalar shift algorithms are considered as the simplest ones which use a simple similarity metric. They calculate the time shift between the sample and a target chromatogram and reduce the differences between them. It is important to mention that it works mostly with homogeneous data sets, where one can find the same number of peaks with the same intensities. This is because the similarity metrics are influenced by the number of peaks and their magnitude. For heterogeneous data sets or data sets with severe shifting, it is necessary to employ more sophisticated and powerful algorithms in order to obtain a proper peak alignment.150 The select target peaks method for retention time alignment, as its name says, requires the selection of a peak (called standard or target peak). The alignment is made by shifting the retention time of the standard peak in order to match targeted retention times in the chromatogram.163 On the other hand, the local alignment algorithms are based on shifting subregions of the sample chromatogram with a target chromatogram (it is based on a manually selected window size). This process is done iteratively until a similarity metric is maximized. Nevertheless, the globally optimized algorithms are considered as the most sophisticated alignment algorithms so far, because they make use of dynamic programming in order to find the locally and globally optimized shift for every region in the chromatogram with practically little user input. Two of the most popular globally optimized local alignment algorithms are the correlation-optimized warping (COW) for pixel-level chromatograms and dynamic time warping (DTW). The first one works dividing the chromatogram into a specific number of segments or regions, which are stretched and compressed until the correlation between the sample chromatogram and the target is maximized (target is the defined reference chromatogram). While the 6414

DOI: 10.1021/acs.chemrev.6b00698 Chem. Rev. 2017, 117, 6399−6422

Chemical Reviews

Review

example, showing trends, groupings, and outliers. The Y and X axes of a score plot are denominated PC1 and PC2, respectively, and illustrate the variation within and between groups. The first principal component (PC1) normally contains the maximum total variance from the data, while the second (PC2) is uncorrelated (orthogonal) with the PC1 and often contains the second maximum variance of the data. In a PC score plot, it is sufficient to retain only those components that consist of a large percentage of the total variance.162 Note that sometimes it is necessary to look at a few PCs all together in a three-dimensional space to avoid a subjective decision based on only two PCs due to sample overlapping. The loading plot, on the other hand, helps to detect the variables responsible for cluster formation of a given data set. It gives a numerical value that reflects how each original variable contributes to the score plot and how much it has in common with a specific component.185 Another important unsupervised pattern recognition technique used as a preliminary evaluation of a given data set is cluster analysis (CA).186 In this case, samples are grouped based on the information that describes their relationship by a similarity metric, such as distance, correlation, or even a combination of both, without considering the information about their class membership.187 These distances or correlations, calculated between samples, are denominated as Euclidean distance, squared Euclidian distance, Manhattan distance, etc., while the grouping of the samples is performed following clustering algorithms (linkage rule): single (nearest neighbor), complete (furthest neighbor) or average linkages, centroid method, Ward’s method, etc. Generally, there are no rules for choosing a specific distance metric, and it only relies on the criteria considered by researchers to define the distance between at least two groups. Hierarchical cluster analysis (HCA), on the other hand, is based on creation of branched structures with a defined hierarchy (called dendrograms) which permits a qualitative visualization (in a two-dimensional space) of grouping among samples.188 The two major strategies for comparing samples in a HCA are the agglomerative strategy, considered as a bottom-up strategy, where each observation starts in its own individual cluster and is afterward merged with others moving up into the hierarchy. The second is the divisive strategy (top down strategy), which starts with all samples in one single cluster, splitting them up while we move down into the hierarchy. It is important to mention that these unsupervised classification algorithms are subjected to overfitting, and they are informative only when the analyst can assume that the clusterization is correct. An internal validation method is always required to investigate the validation of such modeling. Nevertheless, the overfitting problem can be avoided with the method of ANOVA simultaneous component analysis (ASCA). This method is a semiunsupervised technique limiting the rotational ambiguity in a PCA.150 6.2.2. Supervised Pattern Recognition Techniques. Supervised pattern recognition techniques, on the contrary to the unsupervised ones, require “training data” (e.g., observations). Their application has been extended into a wide variety of chemical data sets with diverse objectives such as profiling, fingerprinting, authentication, detection of adulteration, food quality assessment, data interpretation, etc.162 In general, supervised techniques allow classification of new data by building a classification model based on prior known information; the predictive properties of this model are previously tested and validated using an independent sample set (training set) before it is used in unknown samples.185

second one calculates an optimal match, it works by determining the warped retention time vector, maximizing the similarity between sample and target chromatograms.164,165 Another algorithm similar to DTW is the parametric time warping (PTW). In this case, instead of using a linear fit for warping, it uses a higher order polynomial fit to correct nonlinear shifting.166 As mentioned before, analysis of chromatograms can be performed either at the pixel level or through to the analysis of peak table lists (raw data) exported from the instrument software. It is expected that as long as appropriate parameters and preprocessing methods are used, analyses at the pixel level are as accurate as the peak table list. However, for proteomic or metabolomic experiments it is preferable to obtain aligned peak table-level data rather than aligned pixel-level data.150 Usually the obtained data is exported as raw data in, for instance, a.CSV or a.txt format and subsequently imported into statistical data analysis platforms like MATLAB, SPSS, R, or Excel avoiding its manipulation and continuously preserving data integrity. When mass spectral data are available, implementation of alignment algorithms that utilize mass spectral information to correct match peaks becomes more important. The number of true peaks assigned increases with the use of this kind of algorithms. Spectral similarity metrics ensure that peaks are within a single given list and that they are aligned with the peaks from a different list among samples. 6.2. Statistical Modeling and Multivariate Data Analysis

The simplest form to represent the VOCs data is in pivot table, which is a rectangular table (matrix) consisting of n rows and m columns and each cell containing a numerical value of the measurement. Each row corresponds to a sample, while each column corresponds to a particular feature (e.g., peak area) of the sample.185 These high and complex metabolomic data sets can be chemometrically analyzed afterward using (un)supervised methods. Unsupervised methods are focused on the intrinsic structure, relations, and interconnectedness of the data and are sometimes referred to as descriptive models. On the other hand, supervised methods are often called predictive models, and the objective is to transform the multivariate data from VOCs profiles into a representation of biological interest under the guidance of a “supervisor” (it means we have a property of interest for which we try to establish a relationship with its VOCs profiles, e.g., the concentration of specific VOC). 6.2.1. Unsupervised Methods of Pattern Recognition. PCA is one of the most used unsupervised methods in the analysis of VOCs. PCA is often the first step in multivariate data analysis; employment of this algorithm is done in order to detect patterns in the measured data.162 During a PCA modeling there is no assumption that the data set follows any particular distribution; the dimensionality of the data is then algebraically reduced into orthogonal variables, also called principal components (PCs), retaining the information present in the original data as much as possible. The linear combinations of latent variables (PCs) are composed of two parts: scores and loading. This reduction of the data allows a multidimensional matrix to be projected into a low-dimensional space (the most common is 2D space), providing interpretable visualization of the complex data and highlighting, as a consequence, possible similarities or differences. However, it is not a method for classification but rather allowing users to understand the structure and patterns in a given data set. The score plot, from one side, is a graphical representation that provides information about relationships between each object under study, for 6415

DOI: 10.1021/acs.chemrev.6b00698 Chem. Rev. 2017, 117, 6399−6422

Chemical Reviews

Review

identify systematic variation in the orthogonal X matrix. This separation of the predictive and nonpredictive data facilitates interpretation.189 Finally, artificial neural networks (ANN) were historically developed to imitate the neuron’s operation in the brain and recently (early 1990s) started to be used as a powerful datamodeling chemometric tool that provides in a visual way the complex relationships between inputs and outputs of a data set. This supervised method is trained with a data set to adjust the internal parameters in the network. It is effective in the modeling of nonlinear systems including three interactive parts: input, data processing, and output layers. Because of the fact that it does not assume an initial mathematical relationship, it is particularly useful when analytes interfere strongly with each others.190,191 For an ANN model optimization requires different kinds of parameters depending on the activity function used, among them are, e.g., the number of neurons in the middle layer, scale functions, learning rate factors, momentum factors, and initial weights.190 In Table 5 some of the chemometric algorithms which are applied on the multivariate VOC data sets are summarized.

The most popular techniques used in VOCs analysis include linear discriminant analysis, k-nearest neighbor (k-NN), soft independent modeling of class analogy (SIMCA), artificial neural network (ANN), partial least-squares-discriminant analysis, and orthogonal projections to latent structures-discriminant analysis (OPLS-DA).189 Linear discriminant analysis is one of the most frequently used supervised pattern recognition methods used to reduce the dimensionality of a data set preserving the interclass separation. In the LDA, the first assumption is that the data follows a normal distribution and the variance−covariance matrices are equal. Then it performs linear combinations of the variables called canonical variates (CV) or discriminant functions (LDF), which maximize the ratio between-class variance and minimize the ratio within-class variance. This technique tries to find the optimal boundaries between objects through the linear combination of features to achieve a maximum separation between the different classes.185 LDA performs a reduction of the dimensionality similar to a PCA; it determines a smaller dimension hyperplane on which the points will be projected from the higher dimension. However, PCA methods select the directions that retain a maximal structure among the data in a reduced dimension, while LDA selects the direction that achieves a maximum separation among a group of classes to avoid overfitting. One of the pitfalls of LDA models is that it is not always clear which variables should be included in the analysis.185 Another powerful technique in pattern recognition that does not require complicated statistical computational systems is the k-nearest neighbor method. For instance, with this nonparametric method it is assumed through a probability density function that an object (sample) x belongs to a class Cj, and the sample is therefore classified according to the majority vote of its k-nearest neighbors, where k is an odd number. With this method there are no assumptions about the data distribution and the variance−covariance matrices of any of the classes.148 Note that the presence of outliers in the data set influences greatly in the result of the k-NN analysis. The acronym SIMCA stands for soft independent modeling of class analogy; it also requires a training data set of objects (in this case samples), and it is considered as a form of soft modeling used in chemical pattern recognition. In most other areas of statistics an object belongs to a discrete class. On the contrary to what is expected in other areas, with SIMCA, one object could belong to multiple classes not avoiding its classification into a nonoverfitting class. This situation is perfectly present in many situations in chemistry, where an object could fit into more than one class simultaneously. Therefore, SIMCA is a more useful technique for this kind of systems.188 However, the main problem with SIMCA is that the directions demonstrating the largest variation could be different from the directions separating the classes. This problem is overcome with partial least-squaresdiscriminant analysis (PLS-DA), which suits either to linear or to colinear data, but as soon as the number of classes increases, the interpretation of the model becomes progressively more complicated. OPLS-DA, on the other hand, is an extension of a linear regression method based on PLS-DA with integration of an orthogonal signal correction filter (OSC). These filters are addressed in finding predictive components, maximizing at the same time the covariance and correlation between the matrices X (e.g., VOC profiles) and Y (e.g., the property of interest to be modeled). The model uses information in the response matrix Y to decompose the variable matrix X into correlated, orthogonal, and residual structures of information or blocks, respectively. In other terms, the information in the response matrix Y is used to

Table 5. Summary of Some of the Chemometric Algorithms Used in Multivariate VOCs Data Sets model

name

unsupervised

canonical correlation analysis clustering and disjoint principal component analysis hierarchical clustering analysis principal component analysis independent component analysis redundancy analysis kernel-principal component analysis factor analysis kernel-canonical correlation analysis multilevel simultaneous component analysis ANONA simultaneous component analysis discriminant analysis linear discriminant analysis partial least squares soft independent modeling of class analogy partial least-squares discriminant analysis orthogonal partial least-squares-DA kernel-orthogonal partial least-squares-DA kernel-partial least-squares-DA artificial neural network principal component discriminant analysis

supervised

Many of these algorithms used to perform complex data analyses are included in many commercial software packages, among them SPSS (http://www.ibm.com/analytics/us/en/ technology/spss/), Unscrambler X (http://www.camo.com/ rt/Products/Unscrambler/unscrambler.html), SIRIUS (http:// www.prs.no/Sirius/Sirius.html), SIMCA (http://umetrics.com/ products/simca), and Pirouette (https://infometrix.com/ pirouette/), which include the most standard methods for multivariate data analysis such as PCA, PCR, PLS, and SIMCA. Each software presents its own advantages and disadvantages, but the mean weakness is the restricted capacity for writing personal programs or codes. In that case the analyst should employ software designed to facilitate the personal algorithms. Among them we mention R (https://www.r-project.org/), MaTlab 6416

DOI: 10.1021/acs.chemrev.6b00698 Chem. Rev. 2017, 117, 6399−6422

Chemical Reviews

Review

(http://mathworks.com/), and MiniTab (https://www.minitab. com).

aiming to develop innovative, optical sensing techniques for nondestructive characterization of agricultural and food products. He also visited a computational biology lab at K.U. Leuven, Belgium, for a few months where he focused on microarray analysis of metastatic cancer. He was a bioinformatics group leader at the Laimburg Research Centre, Bolzano, Italy, before he joined the University of Texas. In total he has three patents in which two of them won scientific international medals from Genova (Swiss) and Kualalumpor (Malaysia). He is author and coauthor of over 100 scientific articles and two chapters in a book. He has been a direct and indirect cosupervisor of 10 doctoral/master theses and an invited lecturer in 40 international meetings. His main focus is devoted to the development and application of bioinformatics in spectroscopy, hyperspectral imaging, and omics data analysis.

7. CONCLUSION AND PERSPECTIVE In this “-omics” era, combination of different approaches such as metabolomics, proteomics, transcriptomics, etc., contributes to an understanding of the interconnection between the genome and the phenotype. The metabolome is very dynamic and the most advanced analytical technique combined with chemometrics captures, like a snapshot, the physiological state of an organism, closing even more the gap between the genotype and the phenotype. VOCs metabolomics offers distinct advantages compared to other -omics technologies. First, the measurement (including sample preparation and analysis) as well as the annotation of volatile compounds is still easier than the measurement of less volatile metabolites. Second, it is based on the fact that changes occurring in the transcriptome or proteome level do not always correlate to biochemical phenotypes. For that reason, the future of volatile metabolomics is bright; it is increasing in industrial application as well as in research literature as a result of a divergence in the use of new technologies. The main pitfall in this area is still perhaps in the limitation for identification or structural elucidation of molecules due to the lack of universal metabolite-specific libraries. Fortunately, newer technologies are now emerging like GC × GC-TOF-MS, highly improved mass spectrometer instrumentation, online opened metabolomic databases, as well as newer and improved algorithms for peak annotation and identification.

REFERENCES (1) Goff, S. A.; Klee, H. J. Plant volatile compounds: sensory cues for health and nutritional value? Science 2006, 311, 815−819. (2) Pichersky, E.; Noel, J. P.; Dudareva, N. Biosynthesis of plant volatiles: nature’s diversity and ingenuity. Science 2006, 311, 808−811. (3) Paré, P. W.; Tumlinson, J. H. Plant volatiles as a defense against insect herbivores. Plant Physiol. 1999, 121, 325−332. (4) Law, J. H.; Regnier, F. E. Pheromones. Annu. Rev. Biochem. 1971, 40, 533−548. (5) De Moraes, C. M.; Lewis, W. J.; Pare, P. W.; Alborn, H. T.; Tumlinson, J. H. Herbivore-infested plants selectively attract parasitoids. Nature 1998, 393, 570−573. (6) Vancanneyt, G.; Sanz, C.; Farmaki, T.; Paneque, M.; Ortego, F.; Castanera, P.; Sanchez-Serrano, J. J. Hydroperoxide lyase depletion in transgenic potato plants leads to an increase in aphid performance. Proc. Natl. Acad. Sci. U. S. A. 2001, 98, 8139−8144. (7) Arimura, G. I.; Kost, C.; Boland, W. Herbivore-induced, indirect plant defences. Biochim. Biophys. Acta, Mol. Cell Biol. Lipids 2005, 1734, 91−111. (8) Reinhard, J.; Srivivasan, M. V.; Zhang, S. Scent-triggered navigation in honeybees. Nature 2004, 427, 411. (9) Pichersky, E.; Gershenzon, J. The formation and function of plant volatiles: Perfumes for pollinator attraction and defense. Curr. Opin. Plant Biol. 2002, 5, 237−243. (10) Dudareva, N.; Negre, F.; Nagegowda, D. A.; Orlova, I. Plant volatiles: recent advances and future perspectives. Crit. Rev. Plant Sci. 2006, 25, 417−440. (11) Dudareva, N.; Klempien, A.; Muhlemann, J. K.; Kaplan, I. Biosynthesis, function and metabolic engineering of plant volatile organic compounds. New Phytol. 2013, 198, 16−32. (12) Lichtenthaler, H. K.; Rohmer, M.; Schwender, J. Two independent biochemical pathways for isopentenyl diphosphate and isoprenoid biosynthesis in higher plants. Physiol. Plant. 1997, 101, 643− 652. (13) Magnard, J. L.; Roccia, A.; Caissard, J. C.; Vergne, P.; Sun, P.; Hecquet, R.; et al. Biosynthesis of monoterpene scent compounds in roses. Science 2015, 349, 81−83. (14) Qualley, A. V.; Dudareva, N. Metabolomics of plant volatiles. In Plant Systems Biology; Humana Press, 2009; p 329. (15) El Hadi, M. A. M.; Zhang, F. J.; Wu, F. F.; Zhou, C. H.; Tao, J. Advances in fruit aroma volatile research. Molecules 2013, 18, 8200− 8229. (16) Tholl, D.; Boland, W.; Hansel, A.; Loreto, F.; Röse, U. S.; Schnitzler, J. P. Practical approaches to plant volatile analysis. Plant J. 2006, 45, 540−560. (17) Tikunov, Y.; Lommen, A.; de Vos, C. R.; Verhoeven, H. A.; Bino, R. J.; Hall, R. D.; Bovy, A. G. A novel approach for nontargeted data analysis for metabolomics. Large-scale profiling of tomato fruit volatiles. Plant Physiol. 2005, 139, 1125−1137. (18) Braga, C. M.; Zielinski, A. A. F.; da Silva, K. M.; de Souza, F. K. F.; Pietrowski, G. D. A. M.; Couto, M.; Granato, D.; Nogueira, A. Classification of juices and fermented beverages made from unripe, ripe

AUTHOR INFORMATION Corresponding Author

*E-mail: email:[email protected]; mohammad. [email protected]. ORCID

Mohammad Goodarzi: 0000-0003-2891-9329 Notes

The authors declare no competing financial interest. Biographies Giuseppe Lubes is currently working as Research Scientist for the Nestlé Product Technology Center in Germany. He is supporting the flavor group and performing lipid-oxidation analysis through GC-MS techniques. He was a postdoctoral research fellow at the Laimburg Research Centre working on aroma analysis. He was born in Caracas (Venezuela). He obtained his Ph.D. degree in Chemistry at the ́ University Simón Bolivar in 2012. He has been working on the application of liquid chromatography and mass spectroscopy in the fields of chemical ecology and plant metabolomics. He also has experience in the analysis of metal complexes in aqueous solution and molecular modeling. He is (co)author of 17 peer-reviewed papers. Mohammad Goodarzi is currently a faculty member of the Biochemistry Department at the University of Texas Southwestern Medical Center. He obtained his Ph.D. degree on feature selection and modeling techniques in drug design and development at the Department of Analytical Chemistry and Pharmaceutical Technology, Vrije Universiteit Brussels, Belgium. As a Ph.D. student he visited the Institute of Chemistry Timisoara of the Romanian Academy, Romania, where he studied conformational analysis and protein−ligand interactions. He was a postdoctoral researcher at BIOSYST-MeBioS, Faculty of Bioscience Engineering, K.U. Leuven, Belgium, for a few years. The main research focus was to investigate light with biological material 6417

DOI: 10.1021/acs.chemrev.6b00698 Chem. Rev. 2017, 117, 6399−6422

Chemical Reviews

Review

and senescent apples based on the aromatic profile using chemometrics. Food Chem. 2013, 141, 967−974. (19) Yang, Y.; Ferro, M. D.; Cavaco, I.; Liang, Y. Detection and identification of extra virgin olive oil adulteration by GC-MS combined with chemometrics. J. Agric. Food Chem. 2013, 61, 3693−3702. (20) Cheng, H.; Qin, Z. H.; Guo, X. F.; Hu, X. S.; Wu, J. H. Geographical origin identification of propolis using GC−MS and electronic nose combined with principal component analysis. Food Res. Int. 2013, 51, 813−822. (21) Alonso, A.; Marsal, S.; Julià, A. Analytical methods in untargeted metabolomics: state of the art in 2015. Front. Bioeng. Biotechnol. 2015, 3, 23. (22) Reuters, T. ISI web of science; http://apps.webofknowledge.com, 2015 (accessed Dec 10, 2016). (23) Fiehn, O.; Kopka, J.; Dörmann, P.; Altmann, T.; Trethewey, R. N.; Willmitzer, L. Metabolite profiling for plant functional genomics. Nat. Biotechnol. 2000, 18, 1157−1161. (24) Roberts, L. D.; Souza, A. L.; Gerszten, R. E.; Clish, C. B. Targeted metabolomics.Current Protocols in Molecular Biology; John Wiley & Sons, Inc.: Hoboken, NJ, 2012. (25) Cevallos-Cevallos, J. M.; Reyes-De-Corcuera, J. I.; Etxeberria, E.; Danyluk, M. D.; Rodrick, G. E. Metabolomic analysis in food science: a review. Trends Food Sci. Technol. 2009, 20, 557−566. (26) Monton, M. R. N.; Soga, T. Metabolome analysis by capillary electrophoresis−mass spectrometry. J. Chromatogr. A 2007, 1168, 237− 246. (27) Sally-Ann, F.; Rumpel, K. GC-MS-based metabolomics. Biomarker methods in Drug Discovery and Development; Humana Press, 2008; pp 317−340. (28) Niederbacher, B.; Winkler, J. B.; Schnitzler, J. P. Volatile organic compounds as non-invasive markers for plant phenotyping. J. Exp. Bot. 2015, 66, 5403−5416. (29) Kant, M. R.; Ament, K.; Sabelis, M. W.; Haring, M. A.; Schuurink, R. C. Differential timing of spider mite-induced direct and indirect defenses in tomato plants. Plant Physiol. 2004, 135, 483−495. (30) Van Dam, N. M.; Poppy, G. M. Why plant volatile analysis needs bioinformatics−detecting signal from noise in increasingly complex profiles. Plant Biol. 2008, 10, 29−37. (31) Lücker, J.; Bouwmeester, H. J.; Schwab, W.; Blaas, J.; Van Der Plas, L. H.; Verhoeven, H. A. Expression of Clarkia S-linalool synthase in transgenic petunia plants results in the accumulation of S-linalyl-β-dglucopyranoside. Plant J. 2001, 27, 315−324. (32) Okazaki, Y.; Saito, K. Recent advances of metabolomics in plant biotechnology. Plant Biotechnol. Rep. 2012, 6, 1−15. (33) Metabolomics Market worth $2,100 Million by 2019. http:// www.marketsandmarkets.com/ (accessed Dec 10, 2016). (34) Berger, R. G. Flavours and fragrances: chemistry, bioprocessing and sustainability; Springer Science & Business Media, 2007; pp 1−14. (35) Mata, V. G.; Gomes, P. B.; Rodrigues, A. E. Engineering perfumes. AIChE J. 2005, 51, 2834−2852. (36) Flavor & Fragrance Industry Leaders. http://www.leffingwell. com/top_10.htm (accessed Dec 10, 2016). (37) Winter, C. K.; Davis, S. F. Organic foods. J. Food Sci. 2006, 71, R117−R124. (38) Iijima, Y. Recent advances in the application of metabolomics to studies of biogenic volatile organic compounds (BVOC) produced by plant. Metabolites 2014, 4, 699−721. (39) Mageroy, M. H.; Tieman, D. M.; Floystad, A.; Taylor, M. G.; Klee, H. J. A Solanum lycopersicum catechol-O-methyltransferase involved in synthesis of the flavor molecule guaiacol. Plant J. 2012, 69, 1043−1051. (40) Goldenberg, L.; Yaniv, Y.; Doron-Faigenboim, A.; Carmi, N.; Porat, R. Diversity among mandarin varieties and natural sub-groups in aroma volatiles compositions. J. Sci. Food Agric. 2016, 96, 57−65. (41) Farneti, B.; Khomenko, I.; Cappellin, L.; Ting, V.; Romano, A.; Biasioli, F.; Costa, F. Comprehensive VOC profiling of an apple germplasm collection by PTR-ToF-MS. Metabolomics 2015, 11, 838− 850. (42) Socaci, S. A.; Socaciu, C.; Mureşan, C.; Fărcaş, A.; Tofană, M.; Vicaş, S.; Pintea, A. Chemometric discrimination of different tomato

cultivars based on their volatile fingerprint in relation to lycopene and total phenolics content. Phytochem. Anal. 2014, 25, 161−169. (43) Lin, S. Y.; Roan, S. F.; Lee, C. L.; Chen, I. Z. Volatile organic components of fresh leaves as indicators of indigenous and cultivated citrus species in Taiwan. Biosci., Biotechnol., Biochem. 2010, 74, 806−811. (44) Dunemann, F.; Ulrich, D.; Boudichevskaia, A.; Grafe, C.; Weber, W. E. QTL mapping of aroma compounds analysed by headspace solidphase microextraction gas chromatography in the apple progeny ‘Discovery’בPrima’. Mol. Breed. 2009, 23, 501−521. (45) Costa, F.; Cappellin, L.; Zini, E.; Patocchi, A.; Kellerhals, M.; Komjanc, M.; Gessler, C.; Biasioli, F. QTL validation and stability for volatile organic compounds (VOCs) in apple. Plant Sci. 2013, 211, 1−7. (46) Catchpole, G. S.; Beckmann, M.; Enot, D. P.; Mondhe, M.; Zywicki, B.; Taylor, J.; et al. Hierarchical metabolomics demonstrates substantial compositional similarity between genetically modified and conventional potato crops. Proc. Natl. Acad. Sci. U. S. A. 2005, 102, 14458−14462. (47) Moalemiyan, M.; Vikram, A.; Kushalappa, A. C. Detection and discrimination of two fungal diseases of mango (cv. Keitt) fruits based on volatile metabolite profiles using GC/MS. Postharvest Biol. Technol. 2007, 45, 117−125. (48) Vikram, A.; Lui, L. H.; Hossain, A.; Kushalappa, A. C. Metabolic fingerprinting to discriminate diseases of stored carrots. Ann. Appl. Biol. 2006, 148, 17−26. (49) Ciesa, F.; Dalla Via, J.; Wisthaler, A.; Zanella, A.; Guerra, W.; Mikoviny, T.; Märk, T.; Oberhuber, M. Discrimination of four different postharvest treatments of ‘Red Delicious’ apples based on their volatile organic compound (VOC) emissions during shelf-life measured by proton transfer reaction mass spectrometry (PTR-MS). Postharvest Biol. Technol. 2013, 86, 329−336. (50) Cavaliere, B.; De Nino, A.; Hayet, F.; Lazez, A.; Macchione, B.; Moncef, C.; Perri, E.; Sindona, G.; Tagarelli, A. A metabolomic approach to the evaluation of the origin of extra virgin olive oil: a convenient statistical treatment of mass spectrometric analytical data. J. Agric. Food Chem. 2007, 55, 1454−1462. (51) Feudo, G. L.; Macchione, B.; Naccarato, A.; Sindona, G.; Tagarelli, A. The volatile fraction profiling of fresh tomatoes and triple concentrate tomato pastes as parameter for the determination of geographical origin. Food Res. Int. 2011, 44, 781−788. (52) Donarski, J. A.; Jones, S. A.; Charlton, A. J. Application of cryoprobe H-1 nuclear magnetic resonance spectroscopy and multivariate analysis for the verification of Corsican honey. J. Agric. Food Chem. 2008, 56, 5451−5456. (53) Abaffy, T.; Möller, M. G.; Riemer, D. D.; Milikowski, C.; DeFazio, R. A. Comparative analysis of volatile metabolomics signals from melanoma and benign skin: a pilot study. Metabolomics 2013, 9, 998− 1008. (54) Robroeks, C. M.; et al. Metabolomics of volatile organic compounds in cystic fibrosis patients and controls. Pediatr. Res. 2010, 68, 75−80. (55) Sethi, S.; Nanda, R.; Chakraborty, T. Clinical application of volatile organic compound analysis for detecting infectious diseases. Clin. Microbiol. Rev. 2013, 26, 462−475. (56) Steingass, C. B.; Jutzi, M.; Müller, J.; Carle, R.; Schmarr, H. G. Ripening-dependent metabolic changes in the volatiles of pineapple (Ananas comosus (L.) Merr.) fruit: II. Multivariate statistical profiling of pineapple aroma compounds based on comprehensive two-dimensional gas chromatography-mass spectrometry. Anal. Bioanal. Chem. 2015, 407, 2609−2624. (57) Zeng, M.; Zhang, L.; He, Z.; Qin, F.; Tang, X.; Huang, X.; Qu, H.; Chen, J. Determination of flavor components of rice bran by GC-MS and chemometrics. Anal. Methods 2012, 4, 539−545. (58) Asadollahi-Baboli, M. Chemometric resolution techniques combined with GC-MS to enhance determination of the volatile chemical constituents of bay leaves. Anal. Methods 2013, 5, 6368−6375. (59) Junior, S. B.; Março, P. H.; Valderrama, P.; Damasceno, F. C.; Aranda, M. S.; Zini, C. A.; Caramao, E. C.; Melo, A. M.; Filho, J.; Godoy, H. T. Analysis of volatile compounds in Capsicum spp. by headspace 6418

DOI: 10.1021/acs.chemrev.6b00698 Chem. Rev. 2017, 117, 6399−6422

Chemical Reviews

Review

solid-phase microextraction and GC× GC-TOFMS. Anal. Methods 2015, 7, 521−529. (60) Besada, C.; Sanchez, G.; Salvador, A.; Granell, A. Volatile compounds associated to the loss of astringency in persimmon fruit revealed by untargeted GC−MS analysis. Metabolomics 2013, 9, 157− 172. (61) Ulanowska, A.; Kowalkowski, T.; Hrynkiewicz, K.; Jackowski, M.; Buszewski, B. Determination of volatile organic compounds in human breath for Helicobacter pylori detection by SPME-GC/MS. Biomed. Chromatogr. 2011, 25, 391−397. (62) Maree, J.; Kamatou, G.; Gibbons, S.; Viljoen, A.; Van Vuuren, S. The application of GC−MS combined with chemometrics for the identification of antimicrobial compounds from selected commercial essential oils. Chemom. Intell. Lab. Syst. 2014, 130, 172−181. (63) Zhao, F.; Liu, J.; Wang, X.; Li, P.; Zhang, W.; Zhang, Q. Detection of adulteration of sesame and peanut oils via volatiles by GC× GC− TOF/MS coupled with principal components analysis and cluster analysis. Eur. J. Lipid Sci. Technol. 2013, 115, 337−347. (64) Lv, S. D.; Wu, Y. S.; Song, Y. Z.; Zhou, J. S.; Lian, M.; Wang, C.; Liu, L.; Meng, Q. X. Multivariate analysis based on GC-MS fingerprint and volatile composition for the quality evaluation of Pu-Erh green tea. Food Anal. Methods 2015, 8, 321−333. (65) Cirlini, M.; Caligiani, A.; Palla, L.; Palla, G. HS-SPME/GC−MS and chemometrics for the classification of Balsamic Vinegars of Modena of different maturation and ageing. Food Chem. 2011, 124, 1678−1683. (66) Farag, M. A.; Gad, H. A.; Heiss, A. G.; Wessjohann, L. A. Metabolomics driven analysis of six Nigella species seeds via UPLCqTOF-MS and GC−MS coupled to chemometrics. Food Chem. 2014, 151, 333−342. (67) Zhang, L.; Wang, X.; Guo, J.; Xia, Q.; Zhao, G.; Zhou, H.; Xie, F. Metabolic profiling of Chinese tobacco leaf of different geographical origins by GC-MS. J. Agric. Food Chem. 2013, 61, 2597−2605. (68) Rudnicka, J.; Kowalkowski, T.; Ligor, T.; Buszewski, B. Determination of volatile organic compounds as biomarkers of lung cancer by SPME−GC−TOF/MS and chemometrics. J. Chromatogr. B: Anal. Technol. Biomed. Life Sci. 2011, 879, 3360−3366. (69) Chun, M. H.; Kim, E. K.; Yu, S. M.; Oh, M. S.; Moon, K. Y.; Jung, J. H.; Hong, J. GC/MS combined with chemometrics methods for quality control of Schizonepeta tenuifolia Briq: Determination of essential oils. Microchem. J. 2011, 97, 274−281. (70) Araghipour, N.; et al. Geographical origin classification of olive oils by PTR-MS. Food Chem. 2008, 108, 374−383. (71) Ö zdestan, Ö .; van Ruth, S. M.; Alewijn, M.; Koot, A.; Romano, A.; Cappellin, L.; Biasioli, F. Differentiation of specialty coffees by proton transfer reaction-mass spectrometry. Food Res. Int. 2013, 53, 433−439. (72) Van Ruth, S. M.; Koot, A.; Akkermans, W.; Araghipour, N.; Rozijn, M.; Baltussen, M.; Wisthaler, A.; Märk, T.; Frankhuizen, R. Butter and butter oil classification by PTR-MS. Eur. Food Res. Technol. 2008, 227, 307−317. (73) Masi, E.; Taiti, C.; Heimler, D.; Vignolini, P.; Romani, A.; Mancuso, S. PTR-TOF-MS and HPLC analysis in the characterization of saffron (Crocus sativus L.) from Italy and Iran. Food Chem. 2016, 192, 75−81. (74) Kuś, P. M.; van Ruth, S. Discrimination of Polish unifloral honeys using overall PTR-MS and HPLC fingerprints combined with chemometrics. LWT-Food Sci. Technol. 2015, 62, 69−75. (75) Soukoulis, C.; Biasioli, F.; Aprea, E.; Schuhfried, E.; Cappellin, L.; Märk, T. D.; Gasperi, F. PTR-TOF-MS analysis for influence of milk base supplementation on texture and headspace concentration of endogenous volatile compounds in yogurt. Food Bioprocess Technol. 2012, 5, 2085−2097. (76) Bartle, K. D.; Myers, P. History of gas chromatography. TrAC, Trends Anal. Chem. 2002, 21, 547−557. (77) Cooled injection System. http://www.gerstel.com/en/CIS-PTVcooling-options.htm (accessed Dec 10, 2016). (78) Bicchi, C.; Cagliero, C. New trends in the analysis of the volatile fraction of matrices of vegetable origin: a short overview. A review. Flavour Fragrance J. 2011, 26, 321−325.

(79) Sangster, T.; Major, H.; Plumb, R.; Wilson, A. J.; Wilson, I. D. A pragmatic and readily implemented quality control strategy for HPLCMS and GC-MS-based metabonomic analysis. Analyst 2006, 131, 1075− 1078. (80) Wong, P. S.; Graham Cooks, R. Ion trap mass spectrometry. Curr. Sep. 1997, 16, 85−92. (81) Marsili, R. Flavor, fragrance, and odor analysis; CRC Press, 2001; pp 107−139. (82) Wong, J. W.; Hayward, D. G.; Zhang, K. Gas chromatography− mass spectrometry techniques for multiresidue pesticide analysis in agricultural commodities. Advanced Techniques in Gas ChromatographyMass Spectrometry (GC-MS-MS and GC-TOF-MS) for Environmental Chemistry; Newnes, 2013; Vol. 61, pp 3−23. (83) Du, X.; Zeisel, S. H. Spectral deconvolution for gas chromatography mass spectrometry-based metabolomics: current status and future perspectives. Comput. Struct. Biotechnol. J. 2013, 4, e201301013. (84) Dunn, W. B.; Ellis, D. I. Metabolomics: current analytical platforms and methodologies. TrAC, Trends Anal. Chem. 2005, 24, 285− 294. (85) Stein, S. E. An integrated method for spectrum extraction and compound identification from gas chromatography/mass spectrometry data. J. Am. Soc. Mass Spectrom. 1999, 10, 770−781. (86) Kovats, V. E. Gas chromatographische Charakterisierung organischer Verbindungen. Teil 1: Retentionsindices aliphatischer Halogenide, Alkohole, Aldehyde und Ketone. Helv. Chim. Acta 1958, 41, 1915−1932. (87) Vinaixa, M.; Schymanski, E. L.; Neumann, S.; Navarro, M.; Salek, R. M.; Yanes, O. Mass spectral databases for LC/MS and GC/MS-based metabolomics: state of the field and future prospects. TrAC, Trends Anal. Chem. 2016, 78, 23−35. (88) Halket, J. M.; Waterman, D.; Przyborowska, A. M.; Patel, R. K.; Fraser, P. D.; Bramley, P. M. Chemical derivatization and mass spectral libraries in metabolic profiling by GC/MS and LC/MS/MS. J. Exp. Bot. 2004, 56, 219−243. (89) Almstetter, M. F.; Oefner, P. J.; Dettmer, K. Comprehensive twodimensional gas chromatography in metabolomics. Anal. Bioanal. Chem. 2012, 402, 1993−2013. (90) Seeley, J. V.; Seeley, S. K. Multidimensional gas chromatography: fundamental advances and new applications. Anal. Chem. 2013, 85, 557−578. (91) Hübschmann, H. J. Handbook of GC-MS: Fundamentals and Applications; John Wiley & Sons, 2015; pp 180−191. (92) Beens, H.; Boelens, R.; Tijssen, R.; Blomberg, J. Simple, nonmoving modulation interface for comprehensive two-dimensional gas chromatography. J. High Resolut. Chromatogr. 1998, 21, 47. (93) Johanningsmeier, S. D.; McFeeters, R. F. Detection of volatile spoilage metabolites in fermented cucumbers using nontargeted, comprehensive 2-dimensional gas chromatography-time-of-flight mass spectrometry (GCxGC−TOFMS). J. Food Sci. 2011, 76, C168−C177. (94) Perestrelo, R.; Barros, A. S.; Câmara, J. S.; Rocha, S. M. In-depth search focused on furans, lactones, volatile phenols, and acetals as potential age markers of Madeira wines by comprehensive twodimensional gas chromatography with time-of-flight mass spectrometry combined with solid phase microextraction. J. Agric. Food Chem. 2011, 59, 3186−3204. (95) Guo, X.; Lidstrom, M. E. Metabolite profiling analysis of Methylobacterium extorquens AM1 by comprehensive twodimensional gas chromatography coupled with time-of-flight mass spectrometry. Biotechnol. Bioeng. 2008, 99, 929−940. (96) Humston, E. M.; Dombek, K. M.; Hoggard, J. C.; Young, E. T.; Synovec, R. E. Time-dependent profiling of metabolites from Snf1 mutant and wild type yeast cells. Anal. Chem. 2008, 80, 8002−8011. (97) Milman, B. L.; Zhurkovich, I. K. Mass spectral libraries: A statistical review of the visible use. TrAC, Trends Anal. Chem. 2016, 80, 636−640. (98) Ausloos, P.; Clifton, C. L.; Lias, S. G.; Mikaya, A. I.; Stein, S. E.; Tchekhovskoi, D. V.; Sparkman, O. D.; Zaikin, V.; Zhu, D. The critical 6419

DOI: 10.1021/acs.chemrev.6b00698 Chem. Rev. 2017, 117, 6399−6422

Chemical Reviews

Review

evaluation of a comprehensive mass spectral library. J. Am. Soc. Mass Spectrom. 1999, 10, 287−299. (99) Kopka, J.; et al. [email protected]: The Golm metabolome database. Bioinformatics 2005, 21, 1635−1638. (100) McLafferty, F. W.; Sttauffer, D. The Wiley/NBS registry of mass spectral data (No. 543.0873 M35/7 v.), 1989 (accessed Dec 10, 2016). (101) Cappellin, L.; Loreto, F.; Aprea, E.; Romano, A.; del Pulgar, J. S.; Gasperi, F.; Biasioli, F. PTR-MS in Italy: a multipurpose sensor with applications in environmental, agri-food and health science. Sensors 2013, 13, 11923−11955. (102) Blake, R. S.; Monks, P. S.; Ellis, A. M. Proton-transfer reaction mass spectrometry. Chem. Rev. 2009, 109, 861−896. (103) del Pulgar, J. S.; Soukoulis, C.; Biasioli, F.; Cappellin, L.; García, C.; Gasperi, F.; Granitto, P.; Märk, T.; Schuhfried, E. Rapid characterization of dry cured ham produced following different PDOs by proton transfer reaction time of flight mass spectrometry (PTR-ToFMS). Talanta 2011, 85, 386−393. (104) Ruiz-Samblás, C.; Tres, A.; Koot, A.; van Ruth, S. M.; GonzálezCasado, A.; Cuadros-Rodríguez, L. Proton transfer reaction-mass spectrometry volatile organic compound fingerprinting for monovarietal extra virgin olive oil identification. Food Chem. 2012, 134, 589−596. (105) Prazeller, P.; Palmer, P. T.; Boscaini, E.; Jobson, T.; Alexander, M. Proton transfer reaction ion trap mass spectrometer. Rapid Commun. Mass Spectrom. 2003, 17, 1593−1599. (106) Mielke, L. H.; Erickson, D. E.; McLuckey, S. A.; Mű ller, M.; Wisthaler, A.; Hansel, A.; Shepson, P. B. Development of a protontransfer reaction-linear ion trap mass spectrometer for quantitative determination of volatile organic compounds. Anal. Chem. 2008, 80, 8171−8177. (107) Sulzer, P.; et al. A proton transfer reaction-quadrupole interface time-of-flight mass spectrometer (PTR-QiTOF): high speed due to extreme sensitivity. Int. J. Mass Spectrom. 2014, 368, 1−5. (108) Boscaini, E.; Mikoviny, T.; Wisthaler, A.; von Hartungen, E.; Märk, T. D. Characterization of wine with PTR-MS. Int. J. Mass Spectrom. 2004, 239, 215−219. (109) Maleknia, S. D.; Bell, T. L.; Adams, M. A. PTR-MS analysis of reference and plant-emitted volatile organic compounds. Int. J. Mass Spectrom. 2007, 262, 203−210. (110) Cappellin, L.; Biasioli, F.; Fabris, A.; Schuhfried, E.; Soukoulis, C.; Mark, T.; Gasperi, F. Improved mass accuracy in PTR-TOF-MS: Another step towards better compound identification in PTR-MS. Int. J. Mass Spectrom. 2010, 290, 60−63. (111) IONICON PTR-TOFMS Series. http://www.ionicon.com/ products/ptr-ms/ptr-tofms-series (accessed Dec 10, 2016). (112) Biasioli, F.; Gasperi, F.; Aprea, E.; Mott, D.; Boscaini, E.; Mayr, D.; Mark, T. Coupling proton transfer reaction-mass spectrometry with linear discriminant analysis: A case study. J. Agric. Food Chem. 2003, 51, 7227−7233. (113) Granitto, P.; Biasioli, F.; Aprea, E.; Mott, D.; Furlanello, C.; Mark, T.; Gasperi, F. Rapid and non-destructive identification of strawberry cultivars by direct PTR-MS headspace analysis and data mining techniques. Sens. Actuators, B 2007, 121, 379−385. (114) Cappellin, L.; Soukoulis, C.; Aprea, E.; Granitto, P.; Dallabetta, N.; Costa, F.; Biasioli, F. PTR-ToF-MS and data mining methods: a new tool for fruit metabolomics. Metabolomics 2012, 8, 761−770. (115) Spitaler, R.; Araghipour, N.; Mikoviny, T.; Wisthaler, A.; Via, J. D.; Märk, T. D. PTR-MS in enology: Advances in analytics and data analysis. Int. J. Mass Spectrom. 2007, 266, 1−7. (116) Ellis, A. M.; Mayhew, C. A. Proton transfer reaction mass spectrometry: principles and applications; John Wiley & Sons, 2013. (117) Aprea, E.; Gika, H.; Carlin, S.; Theodoridis, G.; Vrhovsek, U.; Mattivi, F. Metabolite profiling on apple volatile content based on solid phase microextraction and gas-chromatography time of flight mass spectrometry. J. Chromatogr. A 2011, 1218, 4517−4524. (118) Oliveira, A. P.; Silva, L. R.; Andrade, P. B.; Valentão, P.; Silva, B. M.; Pereira, J. A.; de Pinho, P. G. Determination of low molecular weight volatiles in Ficus carica using HS-SPME and GC/FID. Food Chem. 2010, 121, 1289−1295.

(119) Chin, S. T.; Nazimah, S. A. H.; Quek, S. Y.; Man, Y. B. C.; Rahman, R. A.; Hashim, D. M. (2008). Changes of volatiles’ attribute in durian pulp during freeze-and spray-drying process. LWT-Food Sci. Technol. 2008, 41, 1899−1905. (120) Kanani, H.; Chrysanthopoulos, P. K.; Klapa, M. I. Standardizing GC−MS metabolomics. J. Chromatogr. B: Anal. Technol. Biomed. Life Sci. 2008, 871, 191−201. (121) Roessner, U.; Wagner, C.; Kopka, J.; Trethewey, R. N.; Willmitzer, L. Simultaneous analysis of metabolites in potato tuber by gas chromatography−mass spectrometry. Plant J. 2000, 23, 131−142. (122) Katona, Z. F.; Sass, P.; Molnar-Perl, I. Simultaneous determination of sugars, sugar alcohols, acids and amino acids in apricots by gas chromatography−mass spectrometry. J. Chromatogr. A 1999, 847, 91−102. (123) Zgoła-Grześkowiak, A.; Grześkowiak, T. Dispersive liquid-liquid microextraction. TrAC, Trends Anal. Chem. 2011, 30, 1382−1399. (124) Zang, X.-H.; Wu, Q.-H.; Zhang, M. Y.; Xi, G.-H.; Wang, Z. Developments of dispersive liquid-liquid microextraction technique. Chin. J. Anal. Chem. 2009, 37, 161−168. (125) Kocúrová, L.; Balogh, I. S.; Šandrejová, J.; Andruch, V. Recent advances in dispersive liquid−liquid microextraction using organic solvents lighter than water. A review. Microchem. J. 2012, 102, 11−17. (126) Solis-Solis, H. M.; Calderon-Santoyo, M.; Gutierrez-Martinez, P.; Schorr-Galindo, S.; Ragazzo-Sanchez, J. A. Discrimination of eight varieties of apricot (Prunus armeniaca) by electronic nose, LLE and SPME using GC−MS and multivariate analysis. Sens. Actuators, B 2007, 125, 415−421. (127) Aliakbarzadeh, G.; Sereshti, H.; Parastar, H. Pattern recognition analysis of chromatographic fingerprints of Crocus sativus L. secondary metabolites towards source identification and quality control. Anal. Bioanal. Chem. 2016, 408, 3295−3307. (128) Huie, C. W. A review of modern sample-preparation techniques for the extraction and analysis of medicinal plants. Anal. Bioanal. Chem. 2002, 373, 23−30. (129) Kataoka, H.; Lord, H. L.; Pawliszyn, J. Applications of solidphase microextraction in food analysis. J. Chromatogr. A 2000, 880, 35− 62. (130) Tikunov, Y.; Lommen, A.; de Vos, C. R.; Verhoeven, H. A.; Bino, R. J.; Hall, R. D.; Bovy, A. G. A novel approach for nontargeted data analysis for metabolomics. Large-scale profiling of tomato fruit volatiles. Plant Physiol. 2005, 139, 1125−1137. (131) Petretto, G. L.; Sarais, G.; Maldini, M. T.; Foddai, M.; Tirillini, B.; Rourke, J. P.; Pintore, G. Citrus monstruosa discrimination among several citrus species by multivariate analysis of volatiles: a metabolomic approach. J. Food Process. Preserv. 2016, 40, 950−957. (132) Mumm, R.; et al. Multi-platform metabolomics analyses of a broad collection of fragrant and non-fragrant rice varieties reveals the high complexity of grain quality characteristics. Metabolomics 2016, 12, 1−19. (133) Dixon, E.; Clubb, C.; Pittman, S.; Ammann, L.; Rasheed, Z.; Kazmi, N.; Keshavarzian, A.; Gillevet, P.; Rangwala, H.; Couch, R. D. Solid-phase microextraction and the human fecal VOC metabolome. PLoS One 2011, 6, e18471. (134) Huang, Y.; Li, Y.; Luo, Z.; Duan, Y. Investigation of biomarkers for discriminating breast cancer cell lines from normal mammary cell lines based on VOCs analysis and metabolomics. RSC Adv. 2016, 6, 41816−41824. (135) Mochalski, P.; Unterkofler, K. Quantification of selected volatile organic compounds in human urine by Gas Chromatography Selective Reagent Ionization Time of Flight Mass Spectrometry (GC-SRI-TOFMS) coupled with head-space solid-phase microextraction (HS-SPME). Analyst 2016, 141, 4796−4803. (136) Kataoka, H. Recent developments and applications of microextraction techniques in drug analysis. Anal. Bioanal. Chem. 2010, 396, 339−364. (137) Merkle, S.; Kleeberg, K. K.; Fritsche, J. Recent Developments and Applications of Solid Phase Microextraction (SPME) in Food and Environmental AnalysisA Review. Chromatography 2015, 2, 293− 381. 6420

DOI: 10.1021/acs.chemrev.6b00698 Chem. Rev. 2017, 117, 6399−6422

Chemical Reviews

Review

(158) Green, P. J.; Silverman, B. W. Nonparametric regression and generalized linear models: a roughness penalty approach; CRC Press, 1993. (159) Li, X. N.; Liang, Y. Z.; Chau, F. T. Smoothing methods applied to dealing with heteroscedastic noise in GC/MS. Chemom. Intell. Lab. Syst. 2002, 63, 139−153. (160) Kallio, M.; Kivilompolo, M.; Varjo, S.; Jussila, M.; Hyötyläinen, T. Data analysis programs for comprehensive two-dimensional chromatography. J. Chromatogr. A 2009, 1216, 2923−2927. (161) Robards, K.; Robards, K.; Haddad, P. R.; Jackson, P. E. Principles and practice of modern chromatographic methods; Academic Press, 1994. (162) Berrueta, L. A.; Alonso-Salces, R. M.; Héberger, K. Supervised pattern recognition in food analysis. J. Chromatogr. A 2007, 1158, 196− 214. (163) Li, X.; Xu, Z.; Lu, X.; Yang, X.; Yin, P.; Kong, H.; Yu, Y.; Xu, G. Comprehensive two-dimensional gas chromatography/time-of-flight mass spectrometry for metabonomics: Biomarker discovery for diabetes mellitus. Anal. Chim. Acta 2009, 633, 257−262. (164) Tomasi, G.; Van Den Berg, F.; Andersson, C. Correlation optimized warping and dynamic time warping as preprocessing methods for chromatographic data. J. Chemom. 2004, 18, 231−241. (165) Wang, C. P.; Isenhour, T. L. Time-warping algorithm applied to chromatographic peak matching gas chromatography/Fourier transform infrared/mass spectrometry. Anal. Chem. 1987, 59, 649−654. (166) Eilers, P. H. Parametric time warping. Anal. Chem. 2004, 76, 404−411. (167) Nielsen, N. P. V.; Carstensen, J. M.; Smedsgaard, J. Aligning of single and multiple wavelength chromatographic profiles for chemometric data analysis using correlation optimized warping. J. Chromatogr. A 1998, 805, 17−35. (168) Kassidas, A.; Taylor, P. A.; MacGregor, J. F. Off-line diagnosis of deterministic faults in continuous dynamic multivariable processes using speech recognition methods. J. Process Control 1998, 8, 381−393. (169) Forshed, J.; Andersson, F. O.; Jacobsson, S. P. NMR and Bayesian regularized neural network regression for impurity determination of 4-aminophenol. J. Pharm. Biomed. Anal. 2002, 29, 495−505. (170) Wong, J. W.; Durante, C.; Cartwright, H. M. Application of fast Fourier transform cross-correlation for the alignment of large chromatographic and spectral datasets. Anal. Chem. 2005, 77, 5655− 5661. (171) Tomasi, G.; Savorani, F.; Engelsen, S. B. icoshift: An effective tool for the alignment of chromatographic data. J. Chromatogr. A 2011, 1218, 7832−7840. (172) Pierce, K. M.; Hope, J. L.; Johnson, K. J.; Wright, B. W.; Synovec, R. E. Classification of gasoline data obtained by gas chromatography using a piecewise alignment algorithm combined with feature selection and principal component analysis. J. Chromatogr. A 2005, 1096, 101− 110. (173) Torgrip, R. J.; Åberg, M.; Karlberg, B.; Jacobsson, S. P. Peak alignment using reduced set mapping. J. Chemom. 2003, 17, 573−582. (174) Lee, S. Y. Analysis of covariance and correlation structures. Comput. Stat. Data Anal. 1985, 2, 279−295. (175) Yao, W.; Yin, X.; Hu, Y. A new algorithm of piecewise automated beam search for peak alignment of chromatographic fingerprints. J. Chromatogr. A 2007, 1160, 254−262. (176) Van Nederkassel, A. M.; Daszykowski, M.; Eilers, P. H. C.; Vander Heyden, Y. A comparison of three algorithms for chromatograms alignment. J. Chromatogr. A 2006, 1118, 199−210. (177) Dellicour, S.; Lecocq, T. GCAligner 1.0: An alignment program to compute a multiple sample comparison data matrix from large ecochemical datasets obtained by GC. J. Sep. Sci. 2013, 36, 3206−3209. (178) Hoffmann, N.; Keck, M.; Neuweger, H.; Wilhelm, M.; Högy, P.; Niehaus, K.; Stoye, J. Combining peak-and chromatogram-based retention time alignment algorithms for multiple chromatographymass spectrometry datasets. BMC Bioinf. 2012, 13, 214. (179) Clifford, D.; Stone, G.; Montoliu, I.; Rezzi, S.; Martin, F. P.; Guy, P.; Kochhar, S. Alignment using variable penalty dynamic time warping. Anal. Chem. 2009, 81, 1000−1007. (180) Krebs, M. D.; Tingley, R. D.; Zeskind, J. E.; Holmboe, M. E.; Kang, J. M.; Davis, C. E. Alignment of gas chromatography−mass

(138) Malowicki, S. M.; Martin, R.; Qian, M. C. Volatile composition in raspberry cultivars grown in the Pacific Northwest determined by stir bar sorptive extraction− gas chromatography− mass spectrometry. J. Agric. Food Chem. 2008, 56, 4128−4133. (139) Weikl, F.; Ghirardo, A.; Schnitzler, J. P.; Pritsch, K. Sesquiterpene emissions from Alternaria alternata and Fusarium oxysporum: Effects of age, nutrient availability, and co-cultivation. Sci. Rep. 2016, 6, 22152. (140) Cheung, W. H.; et al. Volatile organic compound (VOC) profiling of citrus tristeza virus infection in sweet orange citrus varietals using thermal desorption gas chromatography time of flight mass spectrometry (TD-GC/TOF-MS). Metabolomics 2015, 11, 1514−1525. (141) Errard, A.; Ulrichs, C.; Kühne, S.; Mewis, I.; Drungowski, M.; Schreiner, M.; Baldermann, S. Single-versus Multiple-Pest Infestation Affects Differently the Biochemistry of Tomato (Solanum lycopersicum ‘Ailsa Craig’). J. Agric. Food Chem. 2015, 63, 10103−10111. (142) Stalikas, C. D.; Fiamegos, Y. C. Microextraction combined with derivatization. TrAC, Trends Anal. Chem. 2008, 27, 533−542. (143) Birkemeyer, C.; Kolasa, A.; Kopka, J. Comprehensive chemical derivatization for gas chromatography−mass spectrometry-based multitargeted profiling of the major phytohormones. J. Chromatogr. A 2003, 993, 89−102. (144) Fiehn, O.; Kopka, J.; Trethewey, R. N.; Willmitzer, L. Identification of uncommon plant metabolites based on calculation of elemental compositions using gas chromatography and quadrupole mass spectrometry. Anal. Chem. 2000, 72, 3573−3580. (145) Costas-Rodriguez, M.; Pena-Pereira, F. Method Development with Miniaturized Sample Preparation Techniques. In Miniaturization in Sample Preparation; Walter de Gruyter GmbH & Co. KG, 2014; pp 276−307. (146) Tietz, M.; Buettner, A.; Conde-Petit, B. Interaction between starch and aroma compounds as measured by proton transfer reaction mass spectrometry (PTR-MS). Food Chem. 2008, 108, 1192−1199. (147) Aprea, E.; Biasioli, F.; Carlin, S.; Endrizzi, I.; Gasperi, F. Investigation of Volatile Compounds in Two Raspberry Cultivars by Two Headspace Techniques: Solid-Phase Microextraction/Gas Chromatography− Mass Spectrometry (SPME/GC− MS) and ProtonTransfer Reaction− Mass Spectrometry (PTR− MS). J. Agric. Food Chem. 2009, 57, 4011−4018. (148) Gemperline, P. Practical Guide to Chemometrics, 2nd ed.; CRC/ Taylor &Francis: Boca Raton, 2006. (149) Gad, H. A.; El Ahmady, S. H.; Abou Shoer, M. I.; Al Azizi, M. M. Application of chemometrics in authentication of herbal medicines: a review. Phytochem. Anal. 2013, 24, 1−24. (150) Pierce, K. M.; Kehimkar, B.; Marney, L. C.; Hoggard, J. C.; Synovec, R. E. Review of chemometric analysis techniques for comprehensive two dimensional separations data. J. Chromatogr. A 2012, 1255, 3−11. (151) Wang, Z.; Zhang, M.; Harrington, P. D. B. Comparison of three algorithms for the baseline correction of hyphenated data objects. Anal. Chem. 2014, 86, 9050−9057. (152) Schulze, G.; Jirasek, A.; Yu, M. M. L.; Lim, A.; Turner, R. F.; Blades, M. W. Investigation of selected baseline removal techniques as candidates for automated implementation. Appl. Spectrosc. 2005, 59, 545−574. (153) Komsta, Ł. Comparison of several methods of chromatographic baseline removal with a new approach based on quantile regression. Chromatographia 2011, 73, 721−731. (154) Zhang, Z. M.; Chen, S.; Liang, Y. Z. Baseline correction using adaptive iteratively reweighted penalized least squares. Analyst 2010, 135, 1138−1146. (155) Horlick, G. Digital data handling of spectra utilizing Fourier transformations. Anal. Chem. 1972, 44, 943−947. (156) Savitzky, A.; Golay, M. J. Smoothing and differentiation of data by simplified least squares procedures. Anal. Chem. 1964, 36, 1627− 1639. (157) Mittermayr, C. R.; Nikolov, S. G.; Hutter, H.; Grasserbauer, M. Wavelet denoising of Gaussian peaks: a comparative study. Chemom. Intell. Lab. Syst. 1996, 34, 187−202. 6421

DOI: 10.1021/acs.chemrev.6b00698 Chem. Rev. 2017, 117, 6399−6422

Chemical Reviews

Review

spectrometry data by landmark selection from complex chemical mixtures. Chemom. Intell. Lab. Syst. 2006, 81, 74−81. (181) Wang, B.; Fang, A.; Heim, J.; Bogdanov, B.; Pugh, S.; Libardoni, M.; Zhang, X. DISCO: distance and spectrum correlation optimization alignment for two-dimensional gas chromatography time-of-flight mass spectrometry-based metabolomics. Anal. Chem. 2010, 82, 5069−5081. (182) Robinson, M. D.; De Souza, D. P.; Keen, W. W.; Saunders, E. C.; McConville, M. J.; Speed, T. P.; Likić, V. A. A dynamic programming approach for the alignment of signal peaks in multiple gas chromatography-mass spectrometry experiments. BMC Bioinf. 2007, 8, 419. (183) Shen, H.; Grung, B.; Kvalheim, O. M.; Eide, I. Automated curve resolution applied to data from multi-detection instruments. Anal. Chim. Acta 2001, 446, 311−326. (184) Parastar, H.; Akvan, N. Multivariate curve resolution based chromatographic peak alignment combined with parallel factor analysis to exploit second-order advantage in complex chromatographic measurements. Anal. Chim. Acta 2014, 816, 18−27. (185) Massart, D. L.; Vandeginste, B. G.; Buydens, L. M. C.; Lewi, P. J.; Smeyers-Verbeke, J. Handbook of chemometrics and qualimetrics: Part A; Elsevier Science Inc., 1997. (186) Lavine, B. K. Pattern Recognition. Crit. Rev. Anal. Chem. 2006, 36, 153. (187) Møller, S. F.; von Frese, J.; Bro, R. Robust methods for multivariate data analysis. J. Chemom. 2005, 19 (10), 549−563. (188) Brereton, R. G. Chemometrics: data analysis for the laboratory and chemical plant; John Wiley & Sons, 2003. (189) Tistaert, C.; Thierry, L.; Szandrach, A.; Dejaegher, B.; Fan, G.; Frédérich, M.; Vander Heyden, Y. Quality control of Citri reticulatae pericarpium: Exploratory analysis and discrimination. Anal. Chim. Acta 2011, 705, 111−122. (190) Marini, F.; Bucci, R.; Magrì, A. L.; Magrì, A. D. Artificial neural networks in chemometrics: History, examples and perspectives. Microchem. J. 2008, 88, 178−185. (191) Miller, J. N.; Miller, J. C. Statistics and chemometrics for analytical chemistry; Pearson Education, 2005.

6422

DOI: 10.1021/acs.chemrev.6b00698 Chem. Rev. 2017, 117, 6399−6422