Review pubs.acs.org/ac
Multidimensional Gas Chromatography: Advances in Instrumentation, Chemometrics, and Applications Sarah E. Prebihalo,† Kelsey L. Berrier,† Chris E. Freye,† H. Daniel Bahaghighat,†,‡ Nicholas R. Moore,† David K. Pinkerton,† and Robert E. Synovec*,† †
Department of Chemistry, University of Washington, Box 351700, Seattle, Washington 98195, United States Department of Chemistry and Life Science, United States Military Academy, West Point, New York 10996, United States
‡
■
CONTENTS
Introduction Scope of Review Overview of GC × GC Basic Principles Instrumental Advances Modulators Thermal Modulation Valve-Based Modulation Flow Modulation Modulator Comparisons Detectors Time-of-Flight Mass Spectrometry (TOFMS) Vacuum Ultraviolet Absorption Other Detector Methods Additional GC × GC Components and Implementations Data Analysis Commercial Software Raw Data Formats for Chemometrics Deconvolution PARAFAC MCR-ALS Additional Deconvolution Methods Pattern Recognition Principal Component Analysis (PCA) Fisher Ratio Analysis Additional Pattern Recognition Methods Property Prediction Partial Least Squares (PLS) Regression Analysis Additional Data Analysis Methods Applications Forensic Illicit Drug Analysis Decomposition Profiling Ignitable Liquids and Explosives Environmental Soil and Sediments Water Biological Matrixes Additional Environmental Fuels Food, Flavors, and Fragrances Food and Fragrance Oils Beer and Wine Analysis GC × GC with Olfactory Data Cocoa Biological © 2017 American Chemical Society
Metabolomics VOC Profiling of Human Samples VOC Profiles of Specific Organisms Conclusion Author Information Corresponding Author ORCID Notes Biographies References
505 505 506 508 508 508 509 509 510 510 511 511 513
527 528 528 528 528 528 528 528 529 529
■
INTRODUCTION Scope of Review. Analysis of volatile and semivolatile analytes by gas chromatography (GC) methods is an indispensable tool in the analytical chemist’s tool box. A myriad of fields of study rely upon the application of GC methods to address an ever growing demand to provide useful chemical information from GC data. As the realm of GC application has expanded, there has been an evolution to develop more powerful instrumental and data analysis approaches to keep pace with the wealth of complex samples that require analysis. To address this challenge, advances in GC instrumentation have evolved from one-dimensional-gas chromatography (1D-GC) and heart cutting approaches such as (GC-GC), to a variety of instrumentation designs referred to broadly as multidimensional gas chromatography (MDGC). The principle form of MDGC that has gained wide implementation is comprehensive two-dimensional (2D) gas chromatography (GC × GC) as shown in Figure 1A, pioneered nearly 26 years ago by Liu and Phillips.1 When comparing 1DGC (Figure 1B) relative to GC × GC (Figure 1C), the benefits of a secondary separation become evident. Blumberg and coworkers have theoretically determined that the 2D peak capacity provided by GC × GC compared to the peak capacity of 1D-GC is approximately an order of magnitude higher when the run times are held constant.2 This benefit is illustrated using a relatively complex sample of coffee. By adding a multivariate detector such as a time-of-flight mass spectrometry (TOFMS), another selective dimension of data is provided which may allow identification of analytes (Figure 1D). This review will focus essentially on GC × GC, with selected developments of other forms of MDGC also covered. In this regard, we focus
513 515 515 517 517 518 518 518 518 520 520 521 521 521 521 522 522 522 522 524 524 524 526 526 526 526 527 527 527 527 527 527
Special Issue: Fundamental and Applied Reviews in Analytical Chemistry 2018 Published: October 31, 2017 505
DOI: 10.1021/acs.analchem.7b04226 Anal. Chem. 2018, 90, 505−532
Analytical Chemistry
Review
Figure 1. (A) Schematic of a basic GC × GC instrument. The injector introduces sample onto the 1D column and then 1D eluate is transferred to the 2D column by the modulator. After a rapid 2D separation, analytes are detected. (B) 1D-GC separation of coffee (C) GC × GC separation of coffee. (D) Mass spectrum of caffeine.
food, flavors and fragrances,26−28 and (5) biological, including metabolomics and biological volatile organic compound (VOC) profiling.29,30 Overview of GC × GC Basic Principles. From a chromatographic perspective, the goal of a separation is to resolve as many analytes as possible while keeping the separation run time sufficiently short. GC × GC facilitates this goal concurrent while providing a comprehensive separation, in which the separations on the two dimensions are complementary. Overlapped analyte peaks on the 1D column have the opportunity to be resolved on the 2D column (Figure 1). In turn, the ability to perform meaningful data analyses on the GC × GC data (single sample runs and data sets) depends greatly on the instrumental (separation and detection) design and performance. Implementation of chemometric data analysis approaches rely upon the GC × GC separations being suitably optimized from a variety of perspectives. For instance, the analyst generally aims to completely utilize the 2D peak capacity. The larger the 2D peak capacity, the more chemical information that can be obtained. Generating narrow peak widths increases the peak capacity while also improving other peak characteristics, such as signal-to-noise ratio (S/N). However, there are other considerations when attempting to optimize the 1D and 2D peak widths, which are tied to the appropriate sampling density of the 1D separation via the modulator. The sampling density31 (also referred to as the modulation ratio, MR),32 is the 1D peak width divided by the time length of the 2D separation, i.e., the modulation period, PM. Appropriate PM selection relative to the 1 D peak width is critical to provide an adequate sampling density, concurrent with maximizing the 2D peak capacity. Generally, the PM should be chosen to provide a sampling density of ∼2−4. A smaller sampling density results in undersampling of the 1D separation which causes broadening
primarily on research published since the last Fundamental Review published by Seeley and Seeley in 2013,3 with older publications covered as deemed necessary to provide additional insight into addressing the current challenges, and to put the more recent developments into historical context. This review is organized into the following broad categories: instrumental advances, data analysis, and applications. Within the realm of instrumental advances, there has been significant progress in the areas of modulators and detectors. The modulator is often referred to as the “heart” of the GC × GC instrument (Figure 1A) as it transfers eluate from the primary 1D column to the secondary 2D column facilitating comprehensive separations. Modulators are often classified into three broad categories: (1) thermal,4,5 (2) valve-based,6 and (3) flow.7,8 There has also been significant instrumental advances in the area of detectors, such as high-resolution-time-of-flight mass spectrometry (HR-TOFMS),9,10 which provides significant gains in chemical selectivity relative to MS with unit mass resolution as well as the vacuum ultraviolet (VUV) absorption detector.11,12 As GC × GC instrumentation has evolved to provide superior data, the desire by researchers to study more complex systems (and more samples per study), has led to the need to develop more powerful data analysis approaches while still addressing some of the obstacles of GC × GC, such as random and systemic retention time shifting13 and variation of peak intensities. Data analysis methods can be broadly divided into four categories: (1) deconvolution,14−16 (2) pattern recognition,17,18 (3) property prediction,19 and (4) retention time/index modeling.20,21 Advances in these methods have been provided by both commercial sources as well as in-house developed software. Finally, the ultimate goal of applying GC × GC and other MDGC technologies is to provide useful chemical information to address the needs of applications of interest. Several application categories are reviewed in some detail: (1) forensic,10,22 (2) environmental,23,24 (3) fuels,25 (4) 506
DOI: 10.1021/acs.analchem.7b04226 Anal. Chem. 2018, 90, 505−532
Analytical Chemistry
Review
Figure 2. Overview of the various types of modulators, including recent innovations. (A) Thermal modulator based upon thermoelectric cooling (left), with GC × GC separations of C8 to C16 mix (middle) and Canadian Winter Diesel Fuel (right). Reproduced from Luong, J., Guan, X., Xu, S., Gras, R., Shellie, R. A. Anal. Chem. 2016, 88 (17), 8428−8432 (ref 4). Copyright 2016 American Chemical Society. (B) Single-stage jet trap modulator (left). Doping the carrier gas with propane shows a significant drop in flow during the desorption stage (middle). Separation of diesel fuel (right). Reproduced from Mostafa, A., Górecki, T. Anal. Chem. 2016, 88 (10), 5414−5423 (ref 5). Copyright 2016 American Chemical Society. (C) Loop-based thermal modulator (left), with 2D peak widths obtained for various capillary inside diameters (middle), and separation comparisons of agarwood essential oil (upper right, 250 um i.d., lower right, 100 um i.d.). Reprinted from J. Chromatogr. A., Vol. 1314, Tranchida, P.Q., Zoccali, M., Franchina, F., Cotroneo, A., Dugo, P., Modello, L. Gas velocity at the point of reinjection: An additional parameter in comprehensive twodimensional gas chromatography optimization, pp. 216−223 (ref 72). Copyright 2013, with permission from Elsevier. (D) High-temperature diaphragm valve modulator (left) produces narrow 2D peaks due to the zone compression, with higher sensitivity than 1D-GC (middle), applied to essential orange oil (right). Left figure reprinted from J. Chromatogr. A., Vol. 1255, Seeley, J.V., Recent advances in flow-controlled multidimensional gas chromatography, pp. 14−24 (ref 59). Copyright 2012, with permission from Elsevier. Middle and right figures reprinted from J. Chromatogr. A., Vol. 1424, Freye, C.E., Mu, L., Synovec, R.E. High temperature diaphragm valve-based comprehensive two-dimensional gas chromatography, pp. 127−133 (ref 6). Copyright 2015, with permission from Elsevier. (E) Reverse fill/flush flow modulator (left), with 2D peaks produced (middle), and 507
DOI: 10.1021/acs.analchem.7b04226 Anal. Chem. 2018, 90, 505−532
Analytical Chemistry
Review
Figure 2. continued separation of SPME headspace of urine (right). Left portion reprinted from J. Chromatogr. A., Vol. 1501, Dubois, L.M., Perrault, K.A., Stefanuto, P.H., Koschinski, S., Edwards, E., McGregor, L., Focant, J.F. Thermal desorption comprehensive two-dimensional gas chromatography coupled to variableenergy electron ionization time-of-flight mass spectrometry for monitoring subtle changes in volatile organic compound profiles of human blood, pp. 117−127 (ref 7). Copyright 2017, with permission from Elsevier.
of the 1D peak widths33−35 with a concurrent reduction in 1D resolution and potential loss of quantitative precision. One approach to simultaneously optimize the 2D peak capacity and sampling density is to initially optimize the peak capacity of the 1D separation, generating narrow 1D peak widths of ∼2−6 s and then use a relatively short PM of ∼1 to 2 s to achieve the optimal sampling density of ∼2 to 4.14,36,37 Many applications, however, may still benefit from increasing the 2D peak capacity by using a relatively long PM (∼5−8 s), which is commonly practiced.25,26,28 With this longer PM, the 1D peaks are typically either undersampled or relatively broad (10−20 s), resulting in a trade-off by reducing the 1D peak capacity and the overall 2D peak capacity. Column selection, carrier gas flow rate, and temperature programming are all critical aspects for the successful application of GC × GC and plays a big role in optimizing the 2D peak capacity. Stationary phase composition, column dimensions, and phase thickness are important factors to consider.2,38−43 There is a wide variety of stationary phases, but in recent advances numerous column types such as ionic liquids 44 and application specific columns have been developed.45−47 While there are many stationary phase combinations available, all applications employ a sufficiently orthogonal column set in order to provide complementary separations on both dimensions. The most commonly used column combination has been a nonpolar 1D column and polar 2 D column,36,48 while for certain applications, a polar 1D and nonpolar 2D column set49,50 may be beneficial. For GC × GC separations, column inner diameter in conjunction with phase thickness greatly affects the separation efficiency, N, and loading capacity. Normally an analyst strives to use optimum flow rates, which are dependent on column dimensions (length and inside diameter). Temperature programming rate and flow rate should be complementary creating optimal separations.42,51,52 Several reviews have covered optimization of GC × GC separations.40,53,54
are defined because each mode is inherently different. Thermal modulation uses temperature control to trap 1D eluate followed by desorption onto the 2D column. Valve-based and flow modulation are often grouped together since both use a sample loop (or channel) to collect 1D eluate before reinjection onto the 2D column using a relatively high flow rate and both the 1D and 2D flows can be independently controlled; however, the two modes are significantly different. With valve-based modulation, the flow rates for 1D and 2D are not coupled, so the two columns are operated separately. Because the flows are decoupled, method development should be inherently easier with valve-based modulation. In flow modulation, the two column flows are coupled, resulting in more challenging method development. Each of the three modulator designs has distinct advantages and disadvantages, with the common goal to isolate and transfer eluate from 1D to 2D as efficiently and quickly as possible. The short and fast 2D separations produce very narrow peaks relative to the 1D separation, so detectors for GC × GC must have the ability to rapidly collect data in order to have an adequate number of detection intervals across the peak. In the area of detection advances, recent innovations have introduced high resolution mass spectrometers, including variable ionization to allow for more confident analyte identification and also vacuum ultraviolet (VUV) absorption detection. To that end, there has been a large commercial expansion into the field of GC × GC detectors. Modulators. The modulator transfers eluate from the 1D separation to the 2D separation at a user defined time interval, termed the modulation period, PM. The modulator must repeatedly and precisely trap or collect eluate before reinjecting it onto the 2D column while still preserving the integrity of the 1 D separation. There are many excellent review articles on modulators specifically discussing the history and development of this specific portion of the GC × GC field.3,56−61 The three modulator categories will now be presented: thermal, valvebased, and flow. Thermal Modulation. Thermal modulation is the most commonly applied technique and relies on low temperatures to trap and focus analytes as they elute from the 1D column and introduces them to the 2D column through rapid heating. There are three general types of thermal modulators: resistively heated trap, heated sweeper, and cryogenic focus, which is often divided into longitude movable trap and jet trap. The most frequently employed is the jet trap which uses strategically placed and timed jets of cryogenic gas or a combination of heat and cooled jets. Commercially available thermal modulators for GC × GC have been soundly demonstrated to be highly reliable; however, recent innovations have been directed toward providing simpler and more cost-effective thermal modulator designs. A recent trend in both commercial and research sectors is the development of cryogen-free thermal modulators. Removing the need for liquid nitrogen (LN2) or carbon dioxide (CO2) simplifies cryogenic modulation and enables thermal modulation to be used for a broader range of applications. To this end, both ZOEX and LECO Corporations manufacture closed
■
INSTRUMENTAL ADVANCES As the GC × GC field has evolved, many instrumental technologies, both commercial and academic, have been developed to further enhance the separation power and detection of chemical species, and many excellent reviews have been written on GC × GC.3,55,56 As previously described with regard to Figure 1, a GC × GC instrument uses an injection interface to introduce analytes in a sample of interest onto the 1D column, and then at the end of the 1D separation, a modulator transfers analyte from the 1D column to the 2D column, and after a relatively rapid 2D separation, the analytes are detected. Most of the innovation described in this section focuses on modulators and detectors or a combination of both. Modulators, often termed the “heart” of the instrument, can be broadly classified in three categories: thermal, valve-based, or flow. For this review, we adhere to this trend and also distinguish between commercially available versus those still only used in academic settings. While other modulator design categories can be rationalized, herein three different categories 508
DOI: 10.1021/acs.analchem.7b04226 Anal. Chem. 2018, 90, 505−532
Analytical Chemistry
Review
onto the 2D column. An important parameter for modulation in GC × GC is the carrier gas velocity at the point of reinjection (desorption).72,73 For loop-based thermal modulation, similar to other types of thermal modulation, the 1D and 2D separations are serially coupled (Figure 2C). This means that the 2D flow rate is directly influenced by the length, diameter, and flow rate of 1D. To optimize the reinjection conditions, a bleed line can be added using a Y connector at the end of 1D between the outlet of the modulator loop and 2D column. Others have used this technique to improve separation on the 2 D column.74,75 Valve-Based Modulation. Instead of using thermal zones to trap eluate from the 1D separation, valve-based modulators collect the 1D eluate in a short collection loop which is then flushed and directed onto the 2D column. Valve-based modulators excel at modulating compounds across a wide range of boiling points (e.g., C1 to C40+), require minimal consumables, and have a relatively simple design. However, GC × GC instruments with a valve-based modulator are not as sensitive as those with a thermal modulator and require high flow rates on 2D column to quickly flush the collection loop. Valve-based modulation for GC × GC was first performed with a diaphragm valve.76 Relatively narrow 2D peak widths were obtained with excellent 2tR reproducibility, but only 10% of the eluate reached the detector resulting in a loss in sensitivity, and the valve had a temperature limit of 175 °C. Improved performance was obtained when a sample loop was implemented and higher flow rates for the 2D separation were used to improve the modulation performance.77 Diaphragm valve technology for modulation in GC × GC has continued to improve. Recently, the temperature limit of 175 °C was overcome by a commercially available valve in which the temperature sensitive O-ring was replaced with a perfluoroelastomer-based O-ring, allowing reliable function up to 325 °C.6 Narrow, reproducible 2D peak widths and 2tRs were obtained. Under the conditions used, it was demonstrated that only ∼30% of the injected material made it to the FID detector, yet, the detection sensitivity was ∼8-fold higher than 1D-GC due to zone compression (Figure 2D). Using flow rates of 3 mL/min on the 2D separation, GC × GC with high temperature diaphragm valve modulation was demonstrated to be compatible with TOFMS detection.78 These high temperature valves have also been recently implemented in comprehensive three-dimensional GC coupled with TOFMS detection (GC × GC × GC-TOFMS).79 This GC3 instrument combined a high temperature diaphragm valve and quad jet LN2 as modulators. Flow Modulation. Flow modulation has gained increasing acceptance as a modulation method in GC × GC. Flow modulation is similar to valve-based modulation, but two columns flows are coupled. In 2006, Seeley pioneered a flow modulation design in which 100% of the 1D eluate was transferred to the 2D column and on to detection80 and many subsequent fluidic modulator designs would be based on this original design. A comprehensive review was published in 2011 that detailed a summary of flow modulation.59 The key event that precipitated the wider adoption of flow modulation was Agilent Technologies’ introduction of the Capillary Flow Technology (CFT), and many companies have subsequently introduced their own commercial version of the flow modulator.56 SepSolve has recently introduced the INSIGHT modulator, a valve-based reverse fill/flush modulator (Figure 2E).7 Many academic groups have also implemented
cycle refrigerator type modulators. For example, ZOEX Corporation has introduced the ZX2 thermal modulator which employs a closed cycle refrigerator/heat exchanger to create a two stage loop modulator capable of modulating C7 and above. This design has been used to study petroleum products in several different studies.17,62−64 Likewise, J&X Technologies has introduced the first thermal modulator based upon commercial thermoelectric (TE) cooling; modulation can be tailored to a specific range of volatilities with the largest range of C8−C40. This modulator is relatively new, but its use in a potential industrial application has been demonstrated (Figure 2A).4 A single-stage consumable-free modulator has also been developed using a coated stainless steel capillary trap.65 The desorption step is completed using a capacitive discharge power supply to resistively heat the trap, and the cooling function is accomplished using cooled ceramic pads. This modulator has been used to study honeybush tea66 and has been demonstrated to provide similar data relative to a commercially available LN2 quad jet thermal modulator in the study of environmental pollutants.67,68 A cost-effective approach for thermal modulation was introduced as a “Do-ItYourself” interface that can be built using two-low cost components that are commercially available.69 Indeed, this segmented loop-based thermal modulator is simple, requires no cyrogens, and was shown that GC × GC-FID chromatograms of petroleum samples and hop oils compared favorably to those obtained by state-of-art, consumable-free thermal modulators. Similarly, improvements to micro thermal modulators (μTM), used in the context of μGC × μGC, have been focused on replacing the cryogenic fluids with a solid-state TE cooler to further allow for a portable GC × GC system.70,71 To improve upon the trapping and desorption of a μTM using a TE cooler, an air-gap spacer has been added to enhance the temperature uniformity across the device’s channels.70 It was demonstrated that this improvement resulted in a 25% increase in peak intensity of a test analyte. Zellers and co-workers improved the performance of a μTM using a TE with temperature programming combined with using an ionic liquid as the stationary phase to assist in trapping the 1D eluate resulting in narrower 2D peaks at the point of reinjection.71 Cryogen-free modulation produces 2D peaks that are similar to those obtained by cryogen (either LN2 or CO2) modulation and thus similar peak capacities on 2D, but the drawback of cryogen-free modulators is an inability to trap highly volatile species under ∼C7 . Nevertheless, cryogen-free modulators should be generally suitable for a wide variety of academic research and private sector applications. Thermal modulators using cryogens remain extremely popular. A single-stage jet trap modulator is the simplest form of thermal modulation as it does not require any moving parts; however, single-stage thermal modulation is often plagued with breakthrough (i.e., not all the 1D eluate is trapped). Gorecki and co-workers demonstrated that introducing silica wool as a restriction into the column at the point of the jet can increase the trapping efficiency by slowing the gas flow during the desorption stage to prevent breakthrough (Figure 2B).5 The analytical performance of the restricted modulator was similar to a quad jet modulator. Loop-based thermal modulation was introduced as a viable alternative to singe stage modulation due to the issues normally associated with using a single-stage system. A loop-based system uses one set of jets to trap and then desorb the eluate which travels around the loop where it is trapped again and then desorbed 509
DOI: 10.1021/acs.analchem.7b04226 Anal. Chem. 2018, 90, 505−532
Analytical Chemistry
Review
their own version of the flow modulator. The use of flow modulation raises challenges when coupled with MS detection due to the relatively high 2D flow rates applied. Mondello and co-workers developed a flow modulator using a seven port plate in conjunction with a quadrupole MS.81 A proof-of-principle study was performed using this flow modulator with GC × GC coupled with a HR-TOFMS detection.82 Lower carrier gas flow rates (6−8 mL/min) compared to the typically high gas flows (∼20 mL/min) still provided efficient reinjection of the 1D eluate but required a longer flush time (i.e., reinjection time) to totally clear the loop.83 Additionally, using a larger sample loop volume for collection of the 1D eluate resulted in improved 2D peak shapes.73 Using the original flow modulator design,80 a low flow rate of 4 mL/min on the 2D separations was demonstrated in conjunction with MS detection, while providing increased detection sensitivity relative to using 1DGC.84 Similarly, using a high speed Deans Switch, constructed from commercially available parts, a low duty cycle modulator was demonstrated using a flow rate of 2 mL/min for the 2D separation.85 Narrow 2D peak widths and reproducible 2tR were achieved, but only ∼10% of the 1D eluate reached the detector due to the low duty cycle, resulting in decreased sensitivity. Using this same high speed Deans Switch, a new approach to GC × GC modulation was conceived, whereby a pattern of primary eluate, “pattern modulation,” was transferred to the 2D column instead of a single pulse per each modulation.86 The data obtained was readily decomposed into a traditional GC × GC chromatogram using Lucy-Richardson deconvolution, and the resulting peaks were 16−36 times more intense and the 2D peak widths were 40−69% narrower than obtained by traditional flow modulation. Additionally, a multimode fluidic modulator has been introduced by Seeley and co-workers that is capable of performing heart cutting (GC−GC), low duty cycle GC × GC, and total transfer GC × GC.8 Because of the high flow rates on the 2D separations, often ∼20 mL/min, flow modulation is often not compatible with many mass spectrometers, unless this issue is addressed. This issue can be addressed in two different manners: either split some of the flow into a bleed column or split the 2D exit flow into two detectors. Performing the latter allows for various combinations of detectors to be used in tandem that can be tailored to meet specific analytical goals. Armstrong, Sandra, and co-workers have shown that simultaneous detection using FID and quadrupole mass spectrometry (qMS) is very advantageous because the FID readily provides a reliable quantitative analysis, while the qMS spectral scan speed is sufficient to enable more confident analyte identification.87 The performance of forward and reverse fill/flush flow modulation has been compared for a wide range of concentrations.88 At low concentrations, the forward and reverse fill/flush were nearly identical, producing similar 2D peak widths and detection sensitivity. However, at high analyte concentrations, the reverse fill/flush was found to produce narrower 2D peak widths and thus a higher detection sensitivity. Thus, it was concluded that a reverse fill/flush is the preferred method because it is capable of handling a wider concentration dynamic range. Additionally, using an Agilent CFT modulator, it has been demonstrated that flow modulated GC × GC can result in an increased detection sensitivity compared to 1DGC.89 Comparing the peak heights from 1D-GC and GC × GC, a 10-fold to 33-fold increase in signal intensity could be achieved depending on the detection sampling density.
An interesting, yet unconventional, modulation approach, termed “partial modulation”, was introduced by Cai and Stearns in 2004.90 Using a custom built pulse flow modulator, small pulses of carrier gas are repetitively injected at the interface between the 1D and 2D columns, creating either local high or low concentration pulses in the eluate departing the 1D column, which are then separated on the 2D column. A PM of 1 s was applied and 2D peak widths ∼60 ms were achieved. However, additional data processing was required to achieve a conventional appearing 2D chromatogram. Recently, this approach was improved upon by using a commercially available pulsed flow valve.91 Modulation periods as fast as 50 ms were reported with apparent 2D peak widths ranging from 12 to 45 ms. A three-step data processing procedure was introduced to convert the raw data into a format analogous to a typical GC × GC separation. The fast modulation period has potential to open up new directions in GC-based separations. Modulator Comparisons. Comparison of thermal and flow modulation did not appear in the literature until 2011 when the resolving power of a LN2 quad jet thermal modulator was compared to an Agilent Technologies CFT modulator.92 Modulation of compounds < C10 was found to be superior with the differential flow modulator, while compounds > C10 were focused and modulated better by the liquid cryogen modulator. It was concluded that both techniques were able to adequately separate light crude oil, but different instrumental parameters (e.g., flow rates) were required for the two modulators in order to achieve the separation. Likewise, using heavy petroleum cuts as a test mixture, a cryogenic modulator was compared to both a forward and reverse fill/flush differential flow modulator.93 Four different types of flow modulators as well as a CO2 dual-jet modulator were investigated. The reverse fill/flush flow modulator was found to reduce band broadening and enhance peak intensity compared to a forward fill/flush modulator, and thermal and flow modulation produced similar chromatograms albeit with different instrumental conditions (e.g., flow rates). More recently, using the SepSolve INSIGHT flow modulator, the GC × GC analysis of volatile organic compounds (VOCs) was achieved with a cryogenically modulated GC × GC-TOFMS method adapted for the use with a reverse fill/flush flow modulator.7 Flow modulation of VOCs down to C4 was achieved producing separations similar to those obtained using cryogenic modulation. With flow modulation, only 20% of the injected sample reached the TOFMS due to flow splitting; however, a limit of detection (LOD) of 1−10 ppb was achieved. Detectors. Significant advances in detector technology for GC × GC have also been made. For proper data collection to support advanced chemometrics, discussed later, a detector should provide ∼10−20 scans per peak,94 which becomes a key challenge for detectors associated with GC × GC systems where the 2D separation peak widths are narrow, typically ∼100−250 ms. Detectors are classified as either univariate or multivariate, depending upon if they produce a single data point at each detected time point (univariate) or a vector of data points (multivariate). An example of a univariate detector commonly used with GC × GC is the flame ionization detector (FID), which detects ions formed from the combustion of organic compounds in a hydrogen flame. This system is simple, is robust, and can operate at a collection frequency that is more than sufficient, but analyte identification relies solely on matching the 2D retention times between a given analyte 510
DOI: 10.1021/acs.analchem.7b04226 Anal. Chem. 2018, 90, 505−532
Analytical Chemistry
Review
Figure 3. (A) Mass spectra obtained via HR-TOFMS for the ions C17H19 vs C14H23S. The mass split distinguished is 0.0031 Da (C3 vs SH4). Reproduced from Byer, J.D., Siek, K., Jobst, K. Anal. Chem. 2016, 88 (2), 6101−6104 (ref 9). Copyright 2016 American Chemical Society. (B) Mass spectra of 2-hexanone produced by two different ionization energies: 70 eV (top) and 12 eV (bottom) resulting in significantly different fragmentation patterns. Reprinted from J. Chromatogr. A., Vol. 1501, Dubois, L.M., Perrault, K.A., Stefanuto, P.H., Koschinski, S., Edwards, M., McGregor, L., Focant, J.F. Thermal desorption comprehensive two-dimensional gas chromatography coupled to variable-energy electron ionization time-of-flight mass spectrometry for monitoring subtle changes in volatile organic compound profiles of human blood, pp 117−127 (ref 7). Copyright 2017, with permission from Elsevier.
× GC coupled to a LECO Pegasus GC-HRT+ 4D was used to distinguish the C3 (C17H19) versus SH4 (C14H23S) mass split (Δ = 0.0034 Da).9 The calculated theoretical minimum resolving power to distinguish this mass split is 66 000. With the combination of GC × GC, the EI and CI spectra collected by the HR-TOFMS at 25 000 fwhm, addressed this challenging analysis of C17H19 (223.1481 Da) and C14H23S (223.1515 Da) (Figure 3A). The Pegasus GC-HRT+ 4D proved essential to provide the exact mass distinction of 3.4 mDa between the two species that enabled the identification of sulfur compounds in crude oil in order to satisfy regulatory requirements. The BenchTOF-Select TOFMS was recently introduced by Markes International, featuring Select-eV ionization technology and the capability to conduct Tandem Ionization (TI) within one run. The TI capability collects alternating user defined ionization energies in the range of 10−70 eV producing both hard and soft electron ionization (EI) at a scan rate up to 100 Hz (50 Hz per ionization energy). The two ionization energies provide complementary chemical selectivity,97 with a key advantage of soft ionization being the added ability to distinguish and identify large isometric species. The variable ionization energy feature was recently applied to distinguish isomers using energies of 70 and 14 eV.64 For example, VOC signatures of human blood were examined at four ionization energies: 12, 14, 16, and 70 eV (with 12 and 70 eV shown in Figure 3B), in which complementary chemical information is provided by the changes in fragmentation patterns.7 The soft ionization (12 eV) demonstrates a dramatic increase in the molecular ion at m/z 100. The variable ionization source allowed for increased confidence when identifying analytes especially for closely eluting isomers. Vacuum Ultraviolet Absorption. As an alternate to MS detection, the feasibility of using Vacuum Ultraviolet (VUV) multivariate detection with GC × GC has been reported.11,12 Recently, VUV Analytics has released two universal benchtop detectors, the VGA-100 and next generation VGA-101. The former won the 2015 Research and Development Top 100 Award. These benchtop VUV detectors have a maximum 90 Hz spectrum acquisition frequency which is sufficient to produce the needed data density for typical GC × GC peaks. The key improvements of the VGA-101 are an increased scan range from 120 to 240 nm to 120−430 nm and an increase in maximum operational temperature from 300 to 430 °C, which
peak in a sample relative to a known analyte peak in a standard. Multivariate detectors, like mass spectrometers (MS), have the ability to provide both the analyte 2D retention times but also produce spectral data used to “fingerprint” and identify the analyte based on known spectral databases. The most common implementation of MS with GC × GC is GC × GC-TOFMS, due to the high scan rate with TOFMS (up to 500 spectra/s). An appropriate MS for any application is typically judged by four key performance parameters: mass accuracy, mass resolving power, sensitivity, and acquisition speed.95 All of these performance parameters are important; however, for applications with GC × GC, acquisition speed has to be sufficient to produce the required data density for successful data analysis, discussed later, and thus limits many MS instrument types from being used with a GC × GC. Time-of-Flight Mass Spectrometry (TOFMS). In the latest version of the Pegasus TOF series, LECO introduced a GC × GC instrumental platform (Pegasus GC-HRT+ 4D) in which the TOFMS provides both hard electron ionization (EI) and soft chemical ionization (CI) sources, facilitating the option of comparison with classic library spectra (EI) and preservation of the molecular ion (CI). The Pegasus high resolution TOFMS (HR-TOFMS) 4D, that is part of the Pegasus GC-HRT+ 4D uses several unique advances to improve the performance of GC × GC experiments. The primary advance is the Encoded Frequent Pushing (EFP) feature which is a method of pulsing an orthogonal accelerator multiple times per transient to improve the duty cycle of the TOFMS. This advance minimizes the off duty cycle time, reducing the ions lost during the off duty cycle time. A 10-fold increase in detection sensitivity was achieved because of the increase in duty cycle. Additional benefits of the EFP include the decoding algorithm that removes noncoherent signals detected between the push pulses, which in turn reduces the background noise in the data, while the decoding process increases ion peak intensities by an order of magnitude by real time summing of the multiplexed mass spectra.96 With the combination of EFP and the proven Folded Flight Path (FFP) TOF technology, the 4D HR-TOFMS provides a resolution above 50 000 fwhm, a collection frequency up to 200 Hz and mass accuracies better than 1 ppm. Use of high-resolution MS has been demonstrated to be very important for petroleomics, due to the ability to resolve very narrow mass differences (Δm) between isobars. Indeed, a GC 511
DOI: 10.1021/acs.analchem.7b04226 Anal. Chem. 2018, 90, 505−532
Analytical Chemistry
Review
Figure 4. (A) GC × GC-VUV instrument (left). After the 2D separation, the eluate exits the 2D column and enters a 10 cm flow cell. The VUVabsorption spectrum of benzene with assignment of electronic transitions (right). Reproduced from Gröger, T., Gruber, B., Harrison, D., DarajiBozorgzad, M., Mthembu, M., Sutherland, A., Zimmermann, R. Anal. Chem. 2016, 88 (6), 3031−3039 (ref 11). Copyright 2016 American Chemical Society. (B) GC × GC-VUV chromatogram using flow modulation for 37 FAMEs (left). Six C20 FAMEs (DB:0-5) VUV spectra (right). The different FAMEs have distinctive spectra which were highly reproducible. Reprinted from J. Chromatogr. A., Vol. 1497, Zoccali, M., Schug, K.A., Walsh, P., Smuts, J., Mondello, L. Flow-modulated comprehensive two-dimensional gas chromatography combined with a vacuum ultraviolet detector for the analysis of complex mixtures, pp. 135−143 (ref 12). Copyright 2017, with permission from Elsevier.
data. However, the GC × GC-TOFMS produced narrower 2D peak widths and a lower LOD; both of these results were largely blamed on the need for a nitrogen makeup gas that is added in the transfer line leading to the VGA-100. To further evaluate the potential of GC × GC-VUV, sample introduction using needle trap micro extraction (NTME) on breath gas samples was conducted. The results were compared to previously collected GC × GC-TOFMS, with similar results obtained for the test analyte propanioc acid. Flow modulated GC × GC was recently coupled with a VGA-100 detector to evaluate the detector performance with petrochemical and 37 FAMEs along with six C20 FAMEs with DB:0-5 individual spectra are shown (Figure 4B).12 Because of a 2D flow rate of 12.0 mL/min this instrumental platform did not require a makeup gas flow at the transfer line. The research highlighted the ability to use the VUV data to deconvolute coeluting peaks and the ability to achieve pseudoabsolute quantification of analytes based on known absorption cross sections of target analytes, thus eliminating the need for traditional calibration. It was noted that extensive work was dedicated to method optimization and understanding the
facilitates the analysis of higher boiling point compounds. These VUV detectors have several key attributes: a linear response and no need for calibration, excellent isomer differentiation, and claimed robust system performance with minimal maintenance requirements. The basic principles of VUV detection were recently explored by Zimmerman and co-workers,11 including a feasibility study for detection of volatile organic compound (VOCs) and breath gases with the use of a VGA-100 in a GC × GC-VUV instrumental platform. A VUV-absorption spectrum of benzene with assignment of electronic transitions is provided to illustrate the detector performance (Figure 4A). To compare and evaluate the data, a GC × GC-TOFMS reference instrument was used. Similar experimental conditions with respect to flow rates and scan rates were implemented to achieve an objective comparison. The VGA-100 was used at the maximum scan rate of 90 Hz and the TOFMS was set to 100 Hz to closely match the data density produced. GC × GC with VUV detection for the VOCs and the breath gases was demonstrated to provide similar identification of all study compounds in a rigorous comparison to GC × GC-TOFMS 512
DOI: 10.1021/acs.analchem.7b04226 Anal. Chem. 2018, 90, 505−532
Analytical Chemistry
Review
effects of pressure resistance caused by the transfer line, flow cell, and waste line with VUV detection. Peaks with good tailing factors (1.0−1.2), asymmetry factors (1.0−1.3), and an average 2 D peak width of ∼600 ms were reported for FAME standards. Using the VUV database, spectral matching with good similarity was attained with an average similarity value of 97%. The VGA100 proved to possess satisfactory analytical performance that due to its high flow capability makes it a well suited method for GC × GC. Other Detector Methods. A very thorough evaluation of quadrupole MS (qMS) for application with GC × GC was recently reported by Armstrong, Sandra, and co-workers.98 Spectral scan rates were evaluated between 5.27 and 25.45 Hz, considerably slower than the scan rates for TOFMS, which can be up to 500 Hz. The qMS with a 20 Hz scan rate could still be used to identify analytes with a good degree of success, albeit with much reduced detection sensitivity and spectral quality than with TOFMS detection. The slow scan rate of qMS does have an adverse impact on data density for narrow peaks, for example, a 100 ms wide 2D peak would only be at best scanned two times at 20 Hz, well below the suggested 10−20 scans across the typical “narrow” 2D peak.99 However, these results are encouraging and useful to the researcher who wishes to benefit from the power of GC × GC chromatography combined with a MS but does not have access to a TOFMS. Two univariate detectors have been noted in recent literature for potential application with GC × GC: the Shimadzu Barrier Ionization Discharge (BID) detector100 and the Activated Research Company Polyarc System, which is an oxidationmethanation reactor101 working in conjunction with an existing FID. The performance of the BID was directly compared to the performance of an FID operated under the same conditions. The BID proved to be a suitable replacement for an FID, with its main advantage being higher sensitivity for most if not all compounds in the analysis.100 The main disadvantages were a reduced dynamic range, a high sensitivity to contaminants in the carrier gas, and a lack of linear response to the mass of carbon, requiring the use of internal standards. The Polyarc System is relatively new, and at this time no literature can be found with it being used with for GC × GC; however, the potential is there. One report in which 1D-GC was used to study the response of sulfur-containing compounds, the performance of a standard FID was compared to the Polyarcequipped FID dubbed the quantitative carbon detector (QCD).101 Retaining the fast response rate of an FID, the QCD was shown to provide an improved response over the FID by 2 orders of magnitude attributed to its equimolar carbon response even in the presence of sulfur containing compounds. Additional GC × GC Components and Implementations. Pressure tuning the 1D column in GC × GC was demonstrated to change the apparent polarity and selectivity of the columns.102 Two 1D columns (1D1 and 1D2) were linked serially via a microfluidic splitter device, and modulation onto the 2D column was performed by a longitudinally modulated cryogenic system (LMCS). The pressure tuning changes the relative 1D retention times and temperature of elution for the 1 D separation, which in turn effects the 2D retention times. As a result of the pressure tuning, the distribution and selectivity of target analytes could be tuned, resulting in a better 2D separation. In a similar way, multiple 2D columns can be utilized, using temperature to adjust the selectivity.103 With two 2 D columns (2D1 and 2D2) linked serially together, the
selectivity on the overall 2D separation could be tuned by two different instrumental approaches: by adjusting the length of the 2D1 column, or by installing the 2D1 column in a separate oven from the 2D2 column. Separation of the headspace of a coffee powder demonstrated the added value of tunable GC × GC by solving coelution of specific aroma compounds. By stepwise alteration of the selectivity of the 2D dimension, classes of compounds showing similar retention behavior could be discriminated, improving the overall GC × GC separation. An issue in GC × GC analysis is that some analytes are not volatile enough to be amenable to separation but still migrate onto the 1D column. By installing an inlet back flushing device that turns on at a specified time after injection, it is possible to prevent contaminants from migrating onto the GC column, preserving life of the 1D column and the efficiency, N, of that column.65 Under certain circumstances, a large source of band broadening for the 1D separation is the inlet and autoinjector. Using an improved injection source can increase the N of a standard GC × GC instrument by ∼10-fold.104 Using a cryogenic trap with a resistively heated column as an injection mechanism, narrow 1D peak widths were created resulting in an increased 1D peak capacity. While many advancements in modulation have been made in an attempt to improve 2D peak capacity, the pseudoisothermal 2 D separations result in 2D peak widths increasing as a function of analyte retention, which can be overcome by temperature programming the 2D to improve N.105 The shortest duty cycle used a PM of 4 s which required 1.5 s to cool the column to the GC oven temperature followed by 2.5 s of heating. Temperature programming on the 2D column was demonstrated with diesel fuel, and a better separation was achieved compared to nontemperature-programmed chromatograms. An issue with flow modulation is the high flow rates on the 2 D dimension which are required to efficiently flush the sample loop. This can be overcome by using two (or more) 2D columns106 and splitting the flow, but this can lower detection sensitivity (but provide more selectivity if using complementary detectors) or by using a multicapillary column which provides a higher N at high carrier gas velocities.107 Ionic liquid (IL) stationary phases have continued to evolve since their introduction to the commercial market. It has been demonstrated that the chemical selectivity can be tuned by changing the substituted groups on the cationic moiety.44 In addition it has been demonstrated that the structural features of the ILs can also change the selectivity.108 Indeed, using ILs on 1 D and 2D separations, safflower oil constituents were easily separated with a relatively short run time.109 A smart μGC × μGC system has been developed by Fan and co-workers, enhancing the separation power on 2D by using multiple 2D columns with separate detectors.110 The 1D eluate is monitored in real-time and a decision is then made to route the modulated 1D eluate to one of the 2D columns for further separation. All of the 2D columns are independent of each other, and their coating, length, flow rate, and temperature can be customized for desired separation results. This idea was also implemented in a smart GC3 system.111 Continuing research in smart μGC × μGC, a portable GC × GC was reported, with four separate channels for 2D separations, while also incorporating a micropreconcentrator/injector, micro-Deans switches, and microphotoionization detectors.112 Separation times of up 32 s were applied on the 2D separations. Using a 50 component mixture of various chemical classes, a 14 min 513
DOI: 10.1021/acs.analchem.7b04226 Anal. Chem. 2018, 90, 505−532
Analytical Chemistry
Review
Figure 5. (A) Demonstration of LECO ChromaTOF software. The High Resolution Deconvolution and Nontargeted Deconvolution algorithms extract analytes from the GC × GC data using a multidimensional deconvolution engine called Fast Accurate Robust Adaptive Deconvolution (FARAD), which operates automatically and without preliminary knowledge of chromatographic peak shapes or an estimate of the number of coeluting components. Triadimenol results are displayed for the chemical ionization mass spectrum, while providing the mass accuracy of the molecular ion. (B) Demonstration of GC Image software peak matching function. Use of the interactive template matching interface allows a direct comparison of two chromatograms via a retention time transform function, mass spectral Match Factor, and qualifier CLIC constraints. Here, a flowmodulated chromatogram is compared to a thermal-modulated chromatogram. (C) Demonstration of Markes ChromSpace software. A ylang oil separation (front) is compared to a suite of fragrance allergens (back). The linear cross sections are focused on the caryophyllene peak, with the instant library hit for this compound displayed on the lower bar. 514
DOI: 10.1021/acs.analchem.7b04226 Anal. Chem. 2018, 90, 505−532
Analytical Chemistry
Review
separation was demonstrated, achieving a 2D peak capacity of 430−530. A MDGC instrument has been developed to study the reversible molecular interconversion through specific isolation of a diastereo and enantiopure oxime.113 GC × GC was performed prior to heart cutting with a Deans switch. Individual pure enantiomers were then selectively cut from within the 2D separation space, cryofocused, and eluted on a 3D reactor column for E ⇌ Z isomerization under controlled oven temperature and flow. Heart cuts taken over the resulting interconversion distribution were then cryotrapped at the inlet of a 4D column, on which achiral separation allowed precise quantification of each E and Z isomer of the enantiomer. From peak areas and isomerization time, the forward and backward rate constants (kE→Z and kZ→E) were determined.
■
The common classes of data analysis methods include the following: deconvolution, pattern recognition, property prediction, and retention time/index modeling. All of the recent advances as they pertain to GC × GC data analysis will be presented. Before many of these data analysis methods can be applied, the GC × GC data must undergo preprocessing to facilitate accurate and meaningful implementation. Preprocessing procedures are critical in removing/minimizing artifacts in the data such as noise, baseline effects, and retention time shifts that would otherwise make performing chemometric analyses very difficult and less meaningful. Common preprocessing includes noise reduction procedures (smoothing etc.), baseline correction, normalization, and time alignment.114 There are two main approaches when handling data analysis: working from peak tables as is the case with many commercial software packages and other “user friendly” systems and working from the raw data as has been the mainstay in many of the academic advances in data analysis. For chemometric methods, additional care is often required to ensure the GC × GC instrument generates data that is amenable to many chemometric algorithms. Although the various chemometric methods differ in their applications, data requirements are fairly consistent between them. Many methods require data bilinearity or trilinearity for appropriate implementation. A suitable data density across the 1D and 2D peaks is also required. Many of these chemometric methods require a minimum amount of chemical selectivity, in both the separation (i.e., adequate resolution) and detection dimensions. We continue the section on data analysis with recent advances in commercial software packages, followed by advances in chemometric methods. Commercial Software. Implementation of multivariate data analysis approaches is provided in several commercial software packages, enabling users to process their data in order to gain more information. Most of these data analysis packages are integrated into the software required to run the instrument while others are a standalone package. Each software package enables users to gain more information from their chromatograms, with the majority of them using the peak table approach for data analysis. The ChromaTOF software package (LECO) is used to run their GC × GC-TOFMS and GC × GC-HR-TOFMS and also allows the user to process data with many different tools (Figure 5A). Nontarget Deconvolution and High Resolution Deconvolution allow for automated deconvolution and peak detection. After finding peaks of interest, mass spectra can be matched to the NIST database. Enhanced graphics allow the user to display information in a variety of different manners. In ChromaTOF-HRT, a deconvolution method can be applied to the high-resolution data. ChromaTOF-HRT allows for the calculation of the analyte chemical formula, which can be matched to accurate MS libraries. Mass defect plots can be readily created, which can be beneficial when looking for specific types of analytes. LECO has also introduced the Webbased “Simply GC × GC” method development tool which assists new as well as experienced users to develop or transform existing 1D-GC methods into GC × GC methods. Another major commercial software package is GC Image, which was specifically developed for the analysis of data from GC × GC and other multidimensional separation instruments (Figure 5B). It is compatible with many of the major commercial instrument platforms. GC Image allows for the visualization and analysis of individual chromatograms, or GC
DATA ANALYSIS
GC × GC is generally applied to complex samples and coupled with multichannel detectors such as the TOFMS or VUV, resulting in enormous amounts of data. The enormity of the data is especially true when implemented for routine analyses, when large numbers of samples or replicates are analyzed. Obviously, the task for the analyst is not over once the data is collected. Indeed, the data must be transformed into useful information, via data analysis. This transformation is often provided by commercially available software; however, the analyst may find that commercially available data analysis options do not meet their needs. For these cases, many users turn to chemometric methods that take advantage of the higher order dimensionality of the GC × GC data sets. Broadly speaking, chemometric methods aim to convert chemical data into information using algorithms based upon linear algebra and statistical principles. This review will cover advances in commercial software packages as well as advances in chemometric methods. Once the GC × GC data is collected, analysts ultimately aim to identify and quantify analytes (in either a targeted or nontargeted fashion) to address a variety of goals in the context of the experimental design: to characterize samples based upon the chemical measurements, to identify key compounds indicative of a particular cause and effect stimulus, and to relate chemical measurements to other measured properties of the samples. Targeted and nontargeted analysis refers to when the analytes of interest are either known or unknown beforehand, respectively. However, there are challenges in working with such complex and dense data sets. Some challenges include analyte peak overlap, difficult and timeconsuming computations, confident discovery of important (and often subtle) chromatographic differences between samples, and extraction of other meaningful information. Optimization of the experimental and instrumental conditions is very important in dealing with these issues. However, when instrumental capabilities have been fully optimized and have become a limiting constraint, the analyst must apply other approaches to achieve the desired goals. Data analysis approaches complement and extend the value brought by chromatographic separations, providing a means to achieve the mentioned goals from collected data while simultaneously further addressing the challenges. For example, deconvolution methods are implemented to mathematically resolve coeluting analyte peaks with the end goal of identification and quantification. 515
DOI: 10.1021/acs.analchem.7b04226 Anal. Chem. 2018, 90, 505−532
Analytical Chemistry
Review
Figure 6. Various data formats, analysis requirements, and associated chemometric methods. (A) Ideal data formats for a single sample, where the 2D retention times are aligned. Deconvolution is the primary goal, with common methods listed. The arrows in the unfolded data indicate the 2D retention time of the first modulated 2D peaklet, with modulations indicated by dashed lines. (B) Data formats for a single sample, illustrated (for clarity) with exaggerated 2D misalignment. Trilinear chemometric methods such as PARAFAC require sufficient alignment; otherwise, trilinearity is not required by PARAFAC2 and MCR-ALS. (C) Data formats for cross-sample analysis of multiple samples with or without misalignment on either 1 D, 2D, or both dimensions. Analysis goals include deconvolution, pattern recognition, and property prediction. Misalignment may exist within a single run (2D misalignment), across multiple runs (1D misalignment), or both. Retention alignment, or data binning/tiling may be necessary, followed by chemometric analysis of the unfolded data. (D) Analysis of multiple samples can be performed one sample at a time on the separate unfolded two-way arrays. Alternatively, multiple samples can be analyzed simultaneously by unfolding the data and concatenating the samples. This can be performed in a row-wise or column-wise approach.
Project can be used to manage multiple GC × GC runs. GC Image also comes with an Image Investigator software which allows for multivariate analyses such as sample classification, fingerprinting, and compound discovery. The most recent update allows for large data file support, mass calibration, and centroiding, as well as numerous properties for the investigation of HR-TOFMS data. ChromSpace software package (Markes International) is used to run their BenchTOF TOFMS instruments and also allows the user to further investigate their data (Figure 5C). Overlapped peaks can be deconvoluted if necessary and peak tables can be automatically created. After generation of the peak tables, mass spectra can be matched to the NIST database or to
an in-house library. The BenchTOF can collect mass spectra in Tandem Ionization (TI) mode, creating two different fragmentation patterns. Spectra from the two ionization voltages can be overlaid and the fragmentation patterns of the analyte can be observed at the same time. Each chromatogram has to be processed separately. ChromSpace comes with 14 and 12 eV libraries; however, these are not as populated as the NIST database for 70 eV mass spectra. Prior or after peak identification, numerous graphical tools are available to enhance visual analysis of the data. Other commercial and free software options are readily available for advanced chemometric analyses, including script languages with chemometric capabilities and toolboxes, as well 516
DOI: 10.1021/acs.analchem.7b04226 Anal. Chem. 2018, 90, 505−532
Analytical Chemistry
Review
Figure 7. (A) Deconvolution of metabolites in yeast cells using PARAFAC. (i) The raw mass spectrum and (ii) deconvoluted mass spectrum of citric acid are shown to illustrate removal of extraneous mass fragments via PARAFAC deconvolution, with the deconvoluted (iii) 1D and (iv) 2D peak profiles also provided. Reproduced from Mohler, R.E., Dombek, K.M., Hoggard, J.C., Young, E.T., Synovec, R.E. Anal. Chem. 2006, 78 (8), 2700− 2709 (ref 14). Copyright 2006 American Chemical Society. (B) Deconvolution of polycyclic aromatic hydrocarbons using MCR-ALS. (i) Modeled peak profiles from the targeted deconvolution of a chromatographic region of a heavy fuel oil sample containing 3,6-dimethylphenanthrene. The deconvoluted 2D profiles in five modulations are shown by the blue dotted line and black arrow. Shifting in the 2D retention time is estimated at 30 ms between successive modulations. (ii) Deconvoluted and (iii) standard reference mass spectra are shown for comparison. Reproduced from Parastar, H., Radović, J.R., Jalali-Heravi, M., Diez, S., Bayona, J.M., Tauler, R. Anal. Chem. 2011, 83 (24), 9289−9297 (ref 15). Copyright 2011 American Chemical Society.
where approaches include peak-table based, pixel-based, and global alignment strategies.117−127 Figure 6 summarizes the common data structures (unfolded versus folded, and single versus multiple samples) and issues (ideal versus misaligned) as well as the methods that can be applied to the various data manifestations. The data is often viewed as folded, either as a contour plot or pixelated. However, most chemometric methods are applied to the unfolded data, where the modulations are concatenated (Figure 6). After analysis, the data can be refolded. The chemometric methods referred to in Figure 6 are discussed in more detail below in the context of the specific analysis goal. Deconvolution. Despite the increased peak capacity and the additional separation dimension, analyte overlap is ubiquitous in GC × GC separations. Chromatographic peak deconvolution methods play an integral role in addressing this challenge to provide confident analyte identification, quantification, and peak purity assessment. Through the application of deconvolution methods, such as parallel factor analysis (PARAFAC) and multivariate curve resolution-alternating least-squares (MCR-ALS), overlapped peaks can be computationally separated on the 1D time, 2D time, and spectral dimensions in order to gain chemical information for specific analytes. With the deconvoluted peak data, analytes that were overlapped can be identified using the deconvoluted spectra and quantified with the peak profiles obtained, thereby facilitating calibration methods.115,128 Deconvolution methods can be applied in a targeted or nontargeted fashion, meaning that the analytes of interest are either known or unknown beforehand. In the application of these methods, the chromato-
as other software packages. A common script language is MATLAB by MathWorks, which is supported by numerous advanced chemometric software toolboxes.115 Three of the more prominent software toolboxes are Partial Least Squares (PLS)_Toolbox by eigenvector Research Incorporated, N-way toolbox from the University of Copenhagen, and Multivariate Curve Resolution-Alternating Least Squares from the Institute of Environmental Assessment and Water Research. Each toolbox has unique but similar graphical user interfaces (GUI) that help the analyst conduct numerous data analysis methods on multivariate data such as those discussed herein and also provide the ability to create graphics to support the data analysis. Besides MATLAB, there are other free, open source script languages available with their own toolboxes, including Python, R by CRAN, and GNU Octave. All programs have their own benefits and shortcomings. Other software packages include “The Unscrambler X” by CAMO Software, which allows for data input from a variety of platforms. Raw Data Formats for Chemometrics. Other than taking a peak table approach, analysts can choose to analyze the raw data provided by the GC × GC instrument using one of the mentioned script language packages. For chemometric methods as well as the commercial software packages, the data structure becomes an important consideration for successful implementation. Issues with the data, such as retention time misalignment, must be handled either within commercial software packages or using in-house algorithms in script language packages. Alignment algorithms can be applied within a chromatogram to improve data bilinearity or trilinearity116 or applied across multiple samples and whole chromatograms, 517
DOI: 10.1021/acs.analchem.7b04226 Anal. Chem. 2018, 90, 505−532
Analytical Chemistry
Review
determine the best model when the number of chemical components is unknown; additionally, multiple constraints are also possible in the application of MCR-ALS, including unimodality, nonnegativity, selectivity, and convergence constraints. MCR-ALS operates essentially the same as PARAFAC but is applicable to bilinear data. In other words, Tauler and coworkers have demonstrated that MCR-ALS can be applied to chromatograms regardless of the retention time shifting in the 2 D dimension,15,134 allowing for the use of a high temperature programing rate. This means that MCR-ALS is able to handle the misaligned sample case that is presented in Figure 6B without requiring 2D retention time alignment. For example, in Figure 7B, MCR-ALS is preferred over PARAFAC due to the use of a relatively high temperature programming rate (10 °C/ min) and long modulation period (6 s), giving rise to a high k′ range for the 2D separations, resulting in a significant degree of retention time shifting between successive modulations (∼30 ms).15 However, large retention time shifting in both GC × GC dimensions needs to be addressed to restore bilinearity before MCR-ALS is applied.116 MCR-ALS has been demonstrated in applications of complex samples such as petroleum, metabolomics, and environmental samples, where it can be utilized for calibration, classification/ prediction,135 and resolution.30,136,137 One limitation to MCRALS is the lack of unique solutions, also referred to as the presence of rotational ambiguities, particularly when one sample is analyzed. One way to overcome this issue is to simultaneously analyze multiple samples/data sets through matrix augmentation, resulting in an extended MCR-ALS method. Matrix augmentation can be achieved by unfolding the 2D data and concatenating the individual two-way arrays as either rows or columns (Figure 6D). Additional Deconvolution Methods. Recently, independent component analysis-orthogonal signal deconvolution (ICAOSD) has been introduced as a potentially computationally faster alternative to MCR-ALS. 138 This method is a combination of ICA, a blind source separation algorithm that maximizes the statistical independence between components, and principal component analysis (PCA), discussed in a later section, which replaces the traditionally used least-squares algorithm. This method has also been applied in an automated fashion for the deconvolution of compounds in metabolomics samples.139 As mentioned previously, deconvolution methods can also be applied in an automated fashion, known as global spectral deconvolution. For the purpose of improving nontargeted deconvolution, a spectral deconvolution method based on nonnegative matrix factorization (NMF), described elsewhere,140 has been developed for automated use on high-resolution mass spectral data.141 Pattern Recognition. A major goal in performing GC × GC analyses is to discover relevant features in the data across multiple samples. Many pattern recognition applications fall under this category of cross-sample analysis, including sample classification, class comparison, chemical fingerprinting, and chemical/biomarker discovery. These methods are classified as nontargeted approaches, where the relevant analytes may be unknown beforehand, and the experimental design may be either supervised or unsupervised. Supervised methods are used when information about the samples (i.e., classes) is known beforehand and thus utilized in the experimental design and operation of the method. Unsupervised methods are applied
graphic region of interest is selected and the data is unfolded (Figure 6A). Deconvolution can be applied to a single chromatogram or multiple samples by concatenating the data (Figure 6). In certain cases, there is a data trilinearity or bilinearity requirement. An ideal chromatogram is one in which there is no retention time misalignment on 2D (and/or 1D, depending on whether one or more samples are to be analyzed), meaning that the data has sufficient trilinearity for trilinear data analysis methods (Figure 6A). The common methods and their considerations are discussed in the sections below. PARAFAC. PARAFAC is a tensor rank decomposition method that has been applied successfully to GC × GCTOFMS data for deconvolution, noise reduction, and calibration. PARAFAC is based on alternating least-squares decomposition that requires at least three-way data, requiring a multiway detector that produces sufficiently trilinear data.129 The instrumental design, such as the temperature programming rate and PM selection, impacts the trilinearity of the data, particularly for compounds that are highly retained on 2D separations.13 A noticeable deviation from trilinearity may be observed as a shift in 2tR (2D retention times) between successive modulations, as illustrated in Figure 6B, unless care is taken with the separation conditions. Deviation from trilinearity is observed when a relatively long PM (∼5−8 s) and high temperature programming rates are used.13 If PARAFAC is to be applied to significantly nontrilinear data, then local data alignment must be performed on the data along the 2D dimension to restore sufficient trilinearity. Then, PARAFAC may be applied as in the ideal case of Figure 6A. Otherwise, the application of PARAFAC to significantly nontrilinear data can result in considerable errors in quantification.13 Figure 7A shows a graphical representation of PARAFAC deconvolution, demonstrating the resulting loadings plots in the 1D retention time, 2D retention time, and mass spectral dimensions. The application of PARAFAC to obtain a pure mass spectrum can increase confidence in analyte identification when compared to matching the raw mass spectrum to a standard reference spectrum (Figure 7A). PARAFAC was successfully applied in Figure 7A because a short PM (1.5 s) was used in conjunction with a reasonable temperature programming rate (8 °C/min) by Synovec and co-workers, resulting in sufficiently trilinear data.14 Recent advancements involving PARAFAC include application to GC × GC × GC data and for property prediction.130,131 With the application of PARAFAC to GC × GC × GCTOFMS, the data structure contained four quadrilinear dimensions: 1D retention time, 2D retention time, 3D retention time, and the mass spectral dimension.130 PARAFAC has also been applied in an automated fashion for nontargeted deconvolution of large sections and even whole GC × GCTOFMS chromatograms.132 A variation of PARAFAC, called PARAFAC2, loosens the data trilinearity requirement and is therefore more able to deal with 2D misalignment. Currently, PARAFAC2 has been applied more commonly to GC/MS data;133 however, this method is promising for application to GC × GC data in the future.16 MCR-ALS. MCR-ALS is an iterative method that initially estimates the pure chromatographic profiles and pure spectra and repetitively alternates and tests these values for convergence. Similar to PARAFAC, the number of components (rank) must be provided, and this value is often varied to 518
DOI: 10.1021/acs.analchem.7b04226 Anal. Chem. 2018, 90, 505−532
Analytical Chemistry
Review
Figure 8. (A) Characterization of petrochemical base oils based on hydrocarbons and physical properties using GC × GC-TOFMS followed by PCA. (i) Total ion current (TIC) chromatogram of typical sample with various chemical classes. (ii) PCA scores and (iii) loadings plots indicate classification of the base oil groups based on carbon number and double bond equivalency. (iv) PCA scores and (v) loadings plots for the physiochemical properties provide subgroup classification of base fuels due to viscosity index, kinematic velocity (at 40 and 100 °C), and carbon percentage of aromatic, paraffinic, and naphthenic hydrocarbons. Reproduced from Giri, A., Coutriade, M., Racaud, A., Okuda, K., Dane, J., Cody, R., Focant, J.F. Anal. Chem. 2017, 89 (10), 5395−5403 (ref 17). Copyright 2017 American Chemical Society. (B) Discovery analysis for yeast metabolomics using tile-based Fisher ratio analysis. (i) GC × GC-TOFMS separation with F-ratio hits circled. (ii) 200 F-ratio null distributions, with resulting null probability distributions (inset). (iii) F-ratio distribution of true class comparison (black) and distribution of all null probability distributions at 0.1% false discovery rate (blue). (iv) 1D peak profiles and (v) 2D peak profiles of cystathionine for six samples from each class, where the dashed lines represent the 1D and 2D tile sizes, respectively. Reprinted from J. Chromatogr. A., Vol. 1459, Watson, N.E., Parsons, B.A., Synovec, R.E. Performance evaluation of tile-based Fisher Ratio analysis using a benchmark yeast metabolome data set, pp. 101−111 (ref 18). Copyright 2016, with permission from Elsevier. (C) PLS model for prediction of fouling tendency of gas condensates feedstocks for steam cracking based on scaled GC × GC-FID chromatograms. (i) An average GC × GC-FID chromatogram after preprocessing (scaling and logarithmic transformation). (ii) PLS model after feature selection, illustrating the measured versus the cross-validated prediction of fouling. Reprinted from J. Chromatogr. A., Vol. 1501, Abrahamasson, V., Ristic, N., Franz, K., Van Geem, K. Comprehensive two-dimensional gas chromatography in combination with pixel-based analysis for fouling tendency prediction, pp. 89−98 (ref 19). Copyright 2017, with permission from Elsevier.
recognition methods that aim to classify samples into groups based on similarity, such as principal component analysis (PCA). These methods can be used to classify unknown
when there is much less known about the sample classifications a priori and this information is desired (i.e., sample class membership). Classification is used to describe pattern 519
DOI: 10.1021/acs.analchem.7b04226 Anal. Chem. 2018, 90, 505−532
Analytical Chemistry
Review
mass spectral dimension. In the latter case, summing to the TIC reduces a significant dimension of chemical selectivity, but the number of data points is significantly reduced. Figure 8A illustrates the utilization of PCA by Focant and co-workers for the purpose of characterizing petrochemical base oils based on volatiles and physiochemical properties.17 The scores plots show the separation of the different base oil groups based on the variables (identified analytes) highlighted in the loadings plots. PCA may be implemented for feature selection, due to its powerful abilities as a data reduction method. In many studies, PCA is applied for data preprocessing prior to the application of additional chemometric methods, such as deconvolution or other pattern recognition methods.131,135,142,143 Alternatively, PCA can be applied to reduced data sets to improve class separation.7,22,143−147 PCA can also be used to classify unknown samples based on the sample grouping of known samples. Multiway PCA is the extension of PCA to high order data, such as unfolded GC × GC data.25,135,148,149 Fisher Ratio Analysis. Nontargeted discovery-based analysis is an important focus as it allows the analyst to uncover chemical features of interest that are not known beforehand. Fisher ratio (F-ratio) analysis is a supervised nontargeted approach to discover underlying differences in samples. Supervision refers to prior classification of samples/chromatograms as they relate to the experimental design. The F-ratio approach is an analysis of variance method that provides data reduction to elucidate class distinguishing features. The F-ratio is calculated as the between class variance divided by the sum of the within class variance. Since the F-ratio calculation is based on the variance of the signals in the GC × GC data (raw data approach) or variance in quantified analytes (peak table approach), the F-ratio method prioritizes statistical significance over absolute signal/concentration with a higher F-ratio having a greater difference between the two classes. Similar to PCA, Fratio analysis can be applied prior to additional chemometric analysis, serving to improve variable selection,7,22,143,145,146 or after other data reduction methods have been applied.142 However, if sample class membership is known a priori, F-ratio analysis is preferred over unsupervised PCA, since within class variance in the samples severely hampers successful implementation of PCA.150 A critical aspect of F-ratio analysis is the alignment of the data prior to the calculation of the F-ratio (Figure 6C). If sufficient misalignment is present, this may result in false positives or false negatives. Thus, the implementation of F-ratio analysis can often take many different forms. Pixel-based F-ratio analysis means an F-ratio is calculated at every data point within the chromatogram, i.e., at every point in which a mass spectrum has been collected in the GC × GC separation. In order to maximize the performance of pixel-based F-ratio analysis, the data should be initially aligned. F-ratio analysis can also be performed on peak tables of quantified/identified analytes.7 After the peak tables are calculated, they are aligned so F-ratios can be calculated. While this is a simpler method then dealing with 2D retention time misalignment, peak tables may miss analytes present at low concentrations. Another way to deal with misalignment is binning the data based upon the 1D and 2 D peak widths in relation to the observed retention time shifting. However, this will result in splitting analytes into multiple bins and lower F-ratio values. In order to address this issue with binning, a method has been introduced to use a tiling scheme which creates bins at different locations enabling the
samples into known classes (origin, etc.) based upon the chromatographic data/features. Chemical fingerprinting falls under this category, in which the unique chemical signature of an unknown sample is used to determine identifying characteristics for that sample. Class comparison methods aim to identify significant differences between samples belonging to different classes, whereby class membership is known a priori, such as Fisher ratio analysis, which can assist/result in the discovery of biomarkers or chemical markers that differentiate sample classes. Pattern recognition methods can be used for feature selection or data reduction, followed by further chemometric analysis. Typically, because these methods are applied to/across (and require) whole chromatograms of multiple samples, data misalignment in both GC × GC dimensions becomes a significant issue in implementation (Figure 6C). This problem can be dealt with using data alignment methods but has also been addressed by data reduction techniques such as binning and tiling. The common pattern recognition methods (PCA, Fisher ratio, PLS-DA, etc.) will be discussed below in terms of method fundamentals, experimental and data requirements, common issues and ways to address them, and recent advances of each. Principal Component Analysis (PCA). PCA is a popular tool utilized in many applications for the statistical separation of sample classes, providing information on the variables according to which the samples are separated (loadings) in addition to the actual separations between the classes (scores). PCA is an unsupervised nontargeted classification method that is based on rotation of the space axes to encompass the greatest chemical variance in the measurements. The new axes are called principal components and are orthogonal to each other. The scores plot shows the degree of sample clustering, shedding information on distinct grouping of samples. For many applications, sample classes are known beforehand: origin/ source, type/variety, age/year, processing, and many other variables can be reflected in the samples. The loadings plot provides information about the basis of the groupings; in other words, the loadings can be used to determine which regions of the chromatogram (i.e., peaks/analytes) are important in differentiating the various sample classes. Since PCA determines the sources of the greatest variance, preprocessing is necessary to obtain meaningful results. In general, mean-centering and scaling are important steps before applying PCA to a data set; in chromatography, this is achieved through baseline correction and normalization. As seen in Figure 6C, PCA is often applied to the whole chromatogram across multiple samples; therefore, sufficient retention time alignment in both the 1D and 2D dimensions is very important for proper implementation. Various alignment algorithms are available, and binning is another option for dealing with issues caused by misalignment (Figure 6C). Herein, binning (or tiling) refers to summing the area of user-specified regions of 2D separation space, whereby the data density is reduced and misalignment is mitigated. Then, PCA is applied to the unfolded data in the form of a two-way array, where the number of rows is equal to the number of samples and the number of columns is equal to the number of variables (i.e., data points or bins) in the chromatogram. PCA can be applied to the whole chromatogram or a region of the chromatogram. Similarly, the analysis can be performed using all mass channels (m/z), a subset of all m/z, or the total ion current (TIC) chromatogram. In the former cases, the data must be completely unfolded along both time dimensions and the 520
DOI: 10.1021/acs.analchem.7b04226 Anal. Chem. 2018, 90, 505−532
Analytical Chemistry
Review
bin to be centered on the peak resulting in a maximum Fratio.18,151−153 After the F-ratios are calculated, a hit list is assembled indicating the 2D retention times, with the highest F-ratio indicative of the statistically most prominent feature or “hit”. Often, a F-ratio cutoff in the hit list is selected to determine the point at which the analysis becomes unreliable. Although it is sometimes possible to examine the entire hit list to determine which hits are true positives and which are false positives, this is a time-consuming step. Manual selection of the cutoff is possible but may require the user to wade through the hit list until too many false positives are discovered. Two different methods have been purposed to define a F-ratio threshold cutoff. The first is using the F-critical value,22,146 and the second method takes a null distribution approach, which takes advantage of the inherent noise (instrumental and chemical) in the data set.18,152,153 Many of these issues are illustrated in Figure 8B, where the performance of tile-based F-ratio analysis is evaluated for a large yeast metabolomics data set.18 The tile size indicated in Figure 8B is selected to adequately handle the retention time shifting on 1D and 2D dimensions. Additionally, the null distribution approach is presented to assist in choosing an appropriate F-ratio threshold, below which hits are likely to be insignificant (Figure 8B). For example, the F-ratio threshold of 15 determined from the null distribution approach (chosen as the 95% confidence limit) gives a higher F-ratio cutoff value than the F-critical of 5, meaning that there are significantly fewer hits to be evaluated with the null distribution approach.18 Additional Pattern Recognition Methods. There are other notable pattern recognition and classification methods well suited for the analysis of GC × GC data, including partial leastsquares-discriminant analysis (PLS-DA),29,154−156 linear discriminant analysis (LDA),157 ANOVA-simultaneous component analysis (ASCA),158 random forrest models,147,159 cluster analysis,7,160,161 and other supervised classification methods.162 PLS-DA is a very common classification method that is based upon classical PLS where the response is categorical instead of continuous. Automated classification of GC × GC data has also been addressed by the development of advanced scripting methods based on knowledge-based rules.163 Property Prediction. There is growing need to relate physical and/or chemical property data obtained via a variety of other methods (some analytical, some not so much) to the chemical information obtained from GC × GC. Often GC × GC analysis is a simpler, more robust, or a quicker method then the multitude of other measurement tools and methods used to obtain the sample property data. Property prediction is also important to link the chemical information contained in the chromatographic data to the property data, in order to create meaningful understandings between the two data sets. Partial Least Squares (PLS) Regression Analysis. Partial least-squares (PLS) regression analysis is the most commonly implemented chemometric method for the prediction of properties. A detailed review of the theory of PLS analysis can be found elsewhere,164 but briefly, PLS analysis attempts to correlate two data matrices (X-block and Y-block) by calculating loadings referred to as the number of latent variables (LVs). PLS analysis aims to model the covariance between these two matrices by finding the multidimensional direction in the X-block which explains the maximum multidimensional variance in the Y-block. In other words, PLS analysis uses analytes which have a large range of intensities across the GC × GC samples and attempts to
correlate them to the differences in the physical or chemical properties of the samples (the property data). PLS modeling provides a one-to-one (i.e., linear) correspondence between the measured property values relative to the predicted property values. While the model is a linear correlation, it is possible to scale the data (e.g., logarithmically, quadratically, etc.). PLS analysis provides the following two valuable outcomes. First, using a training set of samples, a linear correspondence of the previously measured property data is modeled using the GC × GC data, so subsequent analyses of GC × GC data of new samples can be used to predict the property value of the new samples without having to directly measure these properties on the new samples. Second, since the underlying relationship between the chemical compositions of the samples is correlated to the property data, a deeper understanding between the two measurement platforms is provided. PLS regression analysis is typically performed on the unfolded data set (Figure 6C). A potential issue for PLS analysis is run-to-run retention time shifting, which is addressed by either alignment or data binning approaches previously described. While both options have merit, binning has the additional benefit of saving computation time. Prior to the calculation of the PLS model, the X-block (GC × GC chromatograms) and Y-block (predicted properties) must be mean-centered. After calculation of the PLS model, cross validation is often performed to evaluate the model. There are many cross validation methods available for this purpose. The first outcome of the PLS modeling is the regression plot showing the relationship between the GC × GC chromatograms and the predicted property. This is typically shown as the measured property on the x-axis and predicted property on the y-axis. The other outcome of the PLS modeling is the linear regressions vectors (LRVs), which can be examined to determine which compounds are positively correlated (or negatively correlated) with the property being predicted. Analytes that exhibit positive values in the LRVs are correlated with increasing the specified property while negatively correlated analytes will exhibit negative values in the LRV and are correlated with decreasing the specified property. Figure 8C demonstrates the use of PLS by Van Geem and co-workers to predict fouling tendencies of gas condensates.19 Feature selection methods were used prior to generating the PLS models.19 The cross-validated prediction of the model versus the measured fouling is shown in Figure 8C. Recently, PLS analysis was used to compare GC × GC-TOFMS to physical properties of kerosene-based rocket propulsion fuel.165 Because of the huge data density of the data set, and observed retention time shifting, the data was binned in order to mitigate the impact of the retention time shifting and reduce computation time. The resulting PLS models tested the ability to correlate the chemical information obtained from GC × GCTOFMS to the physical properties of the fuels and also enabled conclusions to be drawn regarding how chemical composition affected those properties. Additionally, extension of PLS to high order data sets through n-way PLS (N-PLS) and unfolded PLS (U-PLS) has been achieved in GC × GC analysis applications.166,167 Additional Data Analysis Methods. Retention time and/ or index modeling is an area in which a key focus is to improve the identification of unknown peaks and optimize separation conditions by exploring the interdependent relationships between the separation dimensions and parameters. The continuing prevalence of GC × GC separations in various 521
DOI: 10.1021/acs.analchem.7b04226 Anal. Chem. 2018, 90, 505−532
Analytical Chemistry
Review
instrumental design and data analysis methods that address the challenges. Recent applications to highlight these advancements in several of these specialties is examined below. Forensic. Forensic analysis encompasses a wide variety of sample types, as each piece of evidence is unique to the circumstances in which the incidence occurs. Common matrixes include illicit drugs,10,136,156 human blood and tissue,22,146,179−186 arson,36,187 and explosives.188 Challenges in the field relate to the complexity of the sample, the ability to discern the origin of the evidence, as well as the need for identifying trace level analytes among a complicated background. Novel approaches to solve one or more of these will be described. Illicit Drug Analysis. Traditionally, the identification and analysis of potential illicit drugs has been completed by 1D-GC or LC coupled to MS. However, several studies have recently applied GC × GC to the characterization of chemical profiles of schedule I drugs.10,136,156 Using a previously optimized sample extraction method, GC × GC-TOFMS was implemented to examine 3,4-methylenedioxymethamphetamine (MDMA) samples that were seized by police, with a goal of providing the MDMA chemical profiles, in an effort to determine differences in impurity that could link the samples to a source or manufacturer.156 The development of a chemometric analysis routine using ANOVA and PLS-DA (PLS_Toolbox, eigenvector Research, Inc.) allowed for the identification of potential markers which would have been omitted from a traditional MDMA targeted method. Additionally, the method was shown to successfully identify MDMA samples based on their origin. Two chemometric methods, MCR-ALS and PCA, were combined in the data analysis workflow to deconvolute and classify Cannabis samples collected from several gardens.136 The goal was to test the ability of these chemometric methods, when using data collected with a GC × GC-qMS instrument and determine similarities or differences between samples. GC Image software enabled further chemometric analysis of the data set, focused principally on selected regions of interest. PCA of these regions of interest allowed for a direct comparison of peak responses across samples. Decomposition Profiling. Forensic investigations have relied on the use of human remains detection (HRD) dogs to locate locations where a decomposing body may have been present. However, little is known about what compound or series of compounds the dogs identify. There currently exist discrepancies in the profiles reported due to variation in sample collection as well as type of sample and environment in which the decomposition occurred. Recent studies have aimed to improve knowledge about decomposition profiles as well as establish an accepted approach that can be accepted in a court of law.22,146,179−186 A novel study using thermal desorption (TD) coupled to GC × GC-TOFMS compared pig carcass decomposition to human remains to determine whether porcine is an acceptable training tool for HRD dogs.183 HRD dogs are often trained using a synthetic decomposition scent.185 However, these samples often do not contain all identified compounds present during decomposition. A total of 10 000 hits were identified over the set of chromatograms, with Fisher ratio analysis rapidly reducing the number of compounds to those most relevant to the study. Overall, 300 compounds were identified as resulting from decomposition rather than environmental or instrumental factors. Finally, PCA assisted in determining important compound classes for each stage of decomposition.
applications, such as identification of sulfur-containing compounds in crude oil, drives development in this area.168 Recently, the development of new thermodynamic models for precise predictions of retention times has focused on reducing the number of measurements/experiments to establish the model20,51 and reducing the number of parameters required to decrease complexity of the model.21 Additionally, the effects of temperature and pressure on solute partitioning has been studied and incorporated into a new thermodynamic retention model.169 Model-based approaches are also used to estimate the separation performance of particular separation conditions with the goal of designing optimized separations for various samples. Common indicators (or metrics) of separation effectiveness are the number of apparent resolved peaks and 2D separation orthogonality. Systematic optimization of various chromatographic parameters can be evaluated in the context of these separation metrics, including starting oven temperature, flow rate, temperature program ramp rate, modulation period, column lengths, and stationary phase combinations, achieved in a way that is sample-independent.58,170,171 As GC × GC separations are being applied more often for routine analysis, there has been a move toward commercialization (i.e., wide distribution) and automation of data analysis methods to deal with the large data sets. The area of data analysis that most exemplifies this direction is peak detection, identification, and quantification. Besides the multitude of commercial software packages with automated peak finding and quantification features, many groups are aiming to develop software to use with popular packages (MATLAB, R, GC Image, and so on) that are available to address data analysis needs, such as targeted and nontargeted analysis, in an automated way.172−176 Other Web-based and independent software packages are available for automated, nontargeted identification of compounds based on mass spectral data.177,178
■
APPLICATIONS GC × GC has gained acceptance as a useful tool to analyze complex mixtures in a variety of fields, such as but not limited to the following application categories: (1) forensic, (2) environmental, (3) fuels, (4) food, flavors, and fragrances, and (5) biological, including metabolomics and biological VOC profiling. While the peak capacity gain over traditional 1D-GC has been demonstrated, analytical challenges remain, often due to the complex matrixes and dense data structure.48 For instance, in complex mixtures, there often exists a wide concentration dynamic range that necessitates an instrumental or data analysis platform capable of identifying trace levels of relevant analyte compounds in the sample matrix. Recent instrumental advances such as HR-TOFMS are capable of improving the resolution of chemically similar analyte compounds. However, discovering these trace level compounds can be further challenged by the addition of sample matrix interferences. As a result, sampling techniques have been developed to reduce the impact of the matrix. In addition to the challenges introduced due to wide dynamic ranges and matrix interferences, GC × GC methods often require significant development to address the variety of compound classes, which is highly sample dependent. Because of the complex nature of samples being studied, identifying statistically significant trace analytes can be difficult in the dense data structure. The use of GC × GC to analyze complex samples has continued to expand to more chemical specialties, due to advancements in 522
DOI: 10.1021/acs.analchem.7b04226 Anal. Chem. 2018, 90, 505−532
Analytical Chemistry
Review
Figure 9. (A) GC × GC chromatogram of neat white spirits (i). PCA results identifying characteristics for each manufacturing source (ii). Reprinted from Forensic Sci. Int., Vol. 267, Sampat, A.A.S., Lopatka, M., Vivó-Truyols, G., Schoenmakers, P.J., van Asten, A.C. Toward chemical profiling of ignitable liquids with comprehensive two-dimensional gas chromatography: Exploring forensic application to neat white spirits, pp 183−195 (ref 36). Copyright 2016, with permission from Elsevier. (B) Chromatograms from (i) soybean biodiesel, (ii) unprocessed soybean biodiesel, (iii) canola biodiesel, and (iv) bovine fat biodiesel. PCA results, whereby the clustering demonstrates unique characteristics for each sample type (v). Reproduced from Mongollón, N., Ribeiro, F., Poppi, R., Quintana, A., Chávez, J., Agualongo, D., Aleme, H., Augusto, F.J. Braz. Chem. Soc. 2017, 5, ́ 740−746 (ref 25), with permission from Sociedade Brasileira de Quimica. (C) (i) 70 eV contour plot of human blood and (ii) 14 eV contour plot of blood sample. Inset: zoom of 4 terpene isomers. (iii) PCA results at three time points and (iv) 95% confidence limit scores plot. Reprinted from J. Chromatogr. A., Vol. 1501, Dubois, L.M., Perrault, K.A., Stefanuto, P.H., Koschinski, S., Edwards, M., McGregor, L., Focant, J.F. Thermal desorption comprehensive two-dimensional gas chromatography coupled to variable-energy electron ionization time-of-flight mass spectrometry for monitoring subtle changes in volatile organic compound profiles of human blood, pp 117−127 (ref 7). Copyright 2017, with permission from Elsevier. (D) (i) 1D chromatogram of VOC profile of P. aeruginosa. (ii) 2D contour plot. (iii) Bubble plot indicating different compound classes and their relative 523
DOI: 10.1021/acs.analchem.7b04226 Anal. Chem. 2018, 90, 505−532
Analytical Chemistry
Review
Figure 9. continued intensities. Reprinted from J. Chromatogr. B., Vol. 901, Bean, H.D., Dimandja, J.M.D., Hill, J.E. Bacterial volatile discovery using solid phase microextraction and comprehensive two-dimensional gas chromatography-time-of-flight mass spectrometry, pp. 41−46 (ref 37). Copyright 2012, with permission from Elsevier.
to changes in composition over time, rather than differences in manufacturers themselves. A total of 67 analyte peaks were found to be substantial contributors to the first three PCs, resulting in the attempt to identify these compounds using the mass spectral data. The results are highlighted in Figure 9A. Only 19 of these analyte peaks were confidently identified using the NIST MS library for three reasons. The FID generally provided lower LODs than MS, so the “discovery” of the significant analyte peaks using the FID data set was more confident than their identification by MS. Second, some coelution remained despite the increased peak capacity using a GC × GC separation. Finally, multiple congeners of some compounds means a single structure cannot be associated with some of the analyte peaks. Despite these concerns, the conclusion is that the usage of both FID and MS for ignitable liquids analysis has many benefits. Environmental. With the implementation of laws that limit contamination levels in consumer products, soil, water, and air, the use of analytical chemistry in environmental analyses has trended toward recognition and quantification of toxins in a variety of matrixes.189 Traditional gas chromatographic methods such as 1D-GC remains useful when the contaminant is known but has severe shortcomings when the sample is complex and the contaminant(s) is/are unknown.190 The use of GC × GC for environmental applications has recently been used to identify emerging contaminants in wastewater48 and characterize carbazoles in lake sediment.190 Further studies will be discussed below, with an emphasis on instrumental and data processing methods that assist in complex contamination identification. Soil and Sediments. Lake sediment contaminated with petroleum was characterized using GC × GC-FID using a novel resistively heated modulator developed by Gorecki and coworkers.68 Petroleum hydrocarbon (PHC) contamination is a concern in the Antarctic and sub-Antarctic environments due to the impact on terrestrial and marine habitats.68 Using an eightlevel calibration of diesel standards, the LOD of PHCs was determined to be 11 mg/kg, which showed a significant improvement over the LOD of the previous 1D-GC method (64 mg/kg).191 After using GC Image to generate peak tables, the data was exported to MATLAB for PCA. The results of the PCA suggested the sample sites could be differentiated by the amount of PHCs present as well as the proportion of polar compounds. Snow, sediment, air, and delayed petcoke were analyzed via GC × GC-TOFMS using a liquid crystalline (LC-50) column coupled to a nanostationary phase (NSP-35) column (J&K Scientific) to study the potential of delayed petcoke as a contaminant source as well as develop a mass spectral library that can be used as a reference in future studies.192 The use of nontraditional columns allowed for the separation of a variety of polyaromatic hydrocarbons (PAHs). A total of 259 analyte peaks of interest were found in common when soil, snow, and air samples were compared to petcoke. These 259 peaks were classified into 21 isomeric groups. Of those 21 groups, only six were found to be statistically different among snow, sediment, and air collected within 30 km of the reference site. Because of
Ultimately, the study concluded that the decomposition profile for porcine is a suitable alternative for HRD dog training as it is closely related to that of human decomposition. Most decomposition studies have been performed over the course of a year or years, providing valuable information for the location of long-deceased remains. However, in the case of mass disasters, little is known about the variation between live human scent and recently deceased humans. A recent study aimed to uncover some of these differences or similarities by investigating an early post-mortem interval (PMI) of 0−72 h.146 Thermal desorption was again the sampling method of choice, followed by GC × GC-TOFMS analysis. The column set used was previously demonstrated to be beneficial for VOC profiling: Rxi-624Sil MS (30 m × 0.25 mm i.d. × 1.40 μm df) and Stabilwax (2 m × 0.25 mm i.d. × 0.50 μm df).181 Following the data collection, Fisher ratio analysis was applied as a feature selection preprocessing step, to remove compounds on the hit list below the appropriate F-critical value. PCA was then performed on the remaining hits using the CAMO Unscrambler X software. Ultimately, 105 compounds of interest were identified, with nitrogen and sulfur-containing compounds representing the two highest proportions of VOCs detected across all post-mortem samples. The results reported by Forbes and co-workers indicated that there was a highly dynamic VOC profile each hour, an important finding when considering the use of search dogs during a mass disaster. These studies on the use of natural training aids for HRD dogs were expanded by investigating the effect of various textiles on decomposition profiles.22 In this study, Nizio et al. used solid-phase microextraction (SPME) as the sampling method of choice, rather than TD. A DVB/CAR/PDMS fiber was chosen due to the wide compound range collected during optimization. Data analysis followed previous studies186 and utilized Fisher ratio to identify possible compounds of interest, followed by PCA. It was concluded that 100% of the cotton textiles are potentially a suitable material for HRD dog training, as they retain a large portion of the decomposition VOC profile. However, the wide variation in the profiles over time suggests that multiple samples may be necessary for training purposes. Ignitable Liquids and Explosives. Arson and explosives have been extensively studied by forensic professionals as both can lead to significant loss of life and property.36 Although traditional methods for arson analysis often relies upon GC/ MS methods,187,188 there has been increased use of GC × GC methods. Explosives analysis has the challenge that many of the main constituents are thermally instable with low vapor pressures. Targeting the minor constituents often provides insight into the source of the explosive without concern for the thermally unstable constituents. Two instrumental platforms, GC × GC-FID and GC × GC-TOFMS, were recently utilized to study neat white spirits, a common arson accelerant, from an extensive list of distributors in The Netherlands.36 For both platforms, the column set consisted of an Agilent DB-1 (30 m × 0.25 mm i.d. × 0.5 μm df) coupled to a DB-17 (1 m × 0.100 mm i.d. × 0.2 μm df). Using data from the GC × GC-FID analyses, PCA indicated that differences in samples may be due 524
DOI: 10.1021/acs.analchem.7b04226 Anal. Chem. 2018, 90, 505−532
Analytical Chemistry
Review
Figure 10. (A) Congeners of dibenzofurans detected in fire debris using a Rtx-dioxin column. Bar graph shows relative number of brominated dibenzofuran congeners in various household products. Reprinted from J. Chromatogr. A., Vol. 1369, Organtini, K.L., Myers, A.L., Jobst, K.J., Cochran, J., Ross, B., McCarry, B., Reiner, E.J., Dorman, F.L. Comprehensive characterization of the halogenated dibenzo-p-dioxin and dibenzofuran contents of residential fire debris using comprehensive two-dimensional gas chromatography coupled to time-of-flight mass spectrometry, pp 138− 146 (ref 46). Copyright 2014, with permission from Elsevier. (B) Top and bottom chromatograms indicate presence of flame retardants in cat hair. (i) Chromatogram includes high-resolution mass spectra. Reprinted from J. Chromatogr. A., Brits, M., Gorst-Allman, P., Rohwer, E.R., De Vos, J., de Boer, J., Weiss, J.M. Comprehensive two-dimensional gas chromatography coupled to high resolution time-of-flight mass spectrometry for screening of organohalogenated components in cat hair (ref 47). Copyright 2017, with permission from Elsevier. (C) Chromatogram of heavy paraffinic fractions separated by high temperature GC × GC. Identified compound classes are labeled and included up to n-C60. Reprinted from J. Chromatogr. A., Vol. 1509, Potgieter, H., Bekker, R., Beigley, J., Rohwer, E. Analysis of oxidized heavy paraffininc products by high temperature comprehensive two-dimensional gas chromatography, pp. 123−131 (ref 50). Copyright 2017, with permission from Elsevier. (D) Kerosene sample analyzed with different secondary IL columns. The best performing columns are those used in chromatograms (iii) and (iv). Reproduced from Hantao, L.W., Najafi, A., Zhang, C., Augusto, F., Anderson, J.L. Anal. Chem. 2014, 86 (8), 3717−3721 (ref 44). Copyright 2014 American Chemical Society. (E) 525
DOI: 10.1021/acs.analchem.7b04226 Anal. Chem. 2018, 90, 505−532
Analytical Chemistry
Review
Figure 10. continued Analysis of orange juice: (i) chromatogram is from FID and indicates regions where olfactory response was reported and (ii) chromatogram represents accurate mass quadrupole time-of-flight mass spectrometry results of orange juice. Reprinted from J. Food Res. Int., Vol. 75, Mastello, R.B., Capobiango, M., Chin, S., Monteiro, M., Marriott, P.J. Identification of odor-active compounds of pasteurized orange juice using multidimensional gas chromatography techniques, pp 281−288 (ref 27). Copyright 2015, with permission from Elsevier.
identified the three main groups of birds, based on location and feeding grounds. Ultimately, 83 PCBs were present in