research 1..28 - ACS Publications - American Chemical Society

application has expanded, there has been an evolution to develop more powerful instrumental and data analysis approaches to keep pace with the wealth ...
7 downloads 21 Views 6MB Size
Subscriber access provided by UNIVERSITY OF TOLEDO LIBRARIES

Review

Multidimensional Gas Chromatography: Advances in Instrumentation, Chemometrics and Applications Sarah E. Prebihalo, Kelsey L. Berrier, Chris E. Freye, H. Daniel Bahaghighat, Nicholas R. Moore, David K. Pinkerton, and Robert E. Synovec Anal. Chem., Just Accepted Manuscript • DOI: 10.1021/acs.analchem.7b04226 • Publication Date (Web): 31 Oct 2017 Downloaded from http://pubs.acs.org on November 2, 2017

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Analytical Chemistry is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 81

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Multidimensional Gas Chromatography: Advances in Instrumentation, Chemometrics and Applications

Sarah E. Prebihalo,a Kelsey L. Berrier,a Chris E. Freye,a H. Daniel Bahaghighat,a,b Nicholas R. Moore,a David K. Pinkerton,a and Robert E. Synoveca,*

(a) Department of Chemistry, University of Washington, Box 351700, Seattle, WA 98195, U.S.A (b) Department of Chemistry and Life Science, United States Military Academy, West Point, NY 10996, U.S.A

Fundamental Review prepared for consideration to publish in Analytical Chemistry

October 13, 2017

* CORRESPONDING AUTHOR: Tel: +1-206-685-2328; fax: +1-206-685-8665 EMAIL: [email protected]

1 ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 81

INTRODUCTION

Scope of Review. Analysis of volatile and semi-volatile analytes by gas chromatography (GC) methods is an indispensable tool in the analytical chemist’s tool box. A myriad of fields of study rely upon the application of GC methods to address an ever growing demand to provide useful chemical information from GC data. As the realm of GC application has expanded, there has been an evolution to develop more powerful instrumental and data analysis approaches to keep pace with the wealth of complex samples that require analysis. To address this challenge, advances in GC instrumentation having evolved from one-dimensional gas chromatography (1DGC) and heart cutting approaches such as (GC-GC), to instrumentation referred to broadly as multidimensional gas chromatography (MDGC), which can take on many forms. The principle form of MDGC that has gained wide implementation is comprehensive two-dimensional (2D) gas chromatography (GC × GC) as shown in Figure 1A, pioneered nearly 26 years ago by Liu and Phillips.1 When comparing 1D-GC (Figure 1B) relative to GC × GC (Figure 1C), the benefits of a secondary separation become evident. Blumberg and co-workers have theoretically determined that the 2D peak capacity provided by GC × GC compared to the peak capacity of 1D-GC is approximately an order of magnitude higher when the run times are held constant.2 This benefit is illustrated using a relatively complex sample of coffee. By adding a multivariate detector such as a time-of-flight mass spectrometry (TOFMS), another selective dimension of data is provided which may allow identification of analytes (Figure 1D). This review will focus essentially on GC × GC, with selected developments of other forms of MDGC also covered. In this regard, we focus primarily on research published since the last Fundamental Review published by Seeley and Seeley in 2013,3 with older publications covered as deemed necessary

2 ACS Paragon Plus Environment

Page 3 of 81

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

to provide additional insight into addressing the current challenges, and to put the more recent developments into historical context.

This review is organized into the following broad categories: instrumental advances, data analysis, and applications. Within the realm of instrumental advances, there has been significant progress in the areas of modulators and detectors. The modulator is often referred to as the “heart” of the GC × GC instrument (Figure 1A) as it transfers eluate from the primary 1D column to the secondary 2D column facilitating comprehensive separations. Modulators are often classified into three broad categories: (1) thermal,4,5 (2) valve-based,6 and (3) flow.7,8 There has also been significant instrumental advances in the area of detectors, such as high resolution timeof-flight mass spectrometry (HR-TOFMS),9,10 which provides significant gains in chemical selectivity relative to MS with unit mass resolution, as well as the vacuum ultraviolet (VUV) absorption detector.11,12 As GC × GC instrumentation has evolved to provide superior data, the desire by researchers to study more complex systems (and more samples per study), has led to the need to develop more powerful data analysis approaches while still addressing some of the obstacles of GC × GC, such as random and systemic retention time shifting13 and variation of peak intensities. Data analysis methods can be broadly divided into four categories: (1) deconvolution,14–16 (2) pattern recognition,17,18 (3) property prediction,19 and (4) retention time/index modeling.20,21 Advances in these methods have been provided by both commercial sources as well as in-house developed software. Finally, the ultimate goal of applying GC × GC and other MDGC technologies is to provide useful chemical information to address the needs of applications of interest. Several application categories are reviewed in some detail: (1) forensic,10,22 (2) environmental,23,24 (3) fuels25, (4) food, flavors and fragrances26–28, and (5)

3 ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 81

biological, including metabolomics and biological volatile organic compound (VOC) profiling.29,30

Overview of GC × GC Basic Principles. From a chromatographic perspective, the goal of a separation is to resolve as many analytes as possible while keeping the separation run time sufficiently short. GC × GC facilitates this goal concurrent while providing a comprehensive separation, in which the separations on the two dimensions are complementary. Overlapped analyte peaks on the 1D column have the opportunity to be resolved on the 2D column (Figure 1). In turn, the ability to perform meaningful data analyses on the GC × GC data (single sample runs and data sets) depends greatly on the instrumental (separation and detection) design and performance. Implementation of chemometric data analysis approaches rely upon the GC × GC separations being suitably optimized from a variety of perspectives. For instance, the analyst generally aims to completely utilize the 2D peak capacity. The larger the 2D peak capacity, the more chemical information that can be obtained. Generating narrow peak widths increases the peak capacity while also improving other peak characteristics, such as signal-to-noise ratio (S/N). However, there are other considerations when attempting to optimize the 1D and 2D peak widths, which are tied to the appropriate sampling density of the 1D separation via the modulator. The sampling density31 (also referred to as the modulation ratio, MR),32 is the 1D peak width divided by the time length of the 2D separation, i.e., the modulation period, PM. Appropriate PM selection relative to the 1D peak width is critical to provide an adequate sampling density, concurrent with maximizing the 2D peak capacity. Generally, the PM should be chosen to provide a sampling density of ~ 2 to 4. A smaller sampling density results in undersampling of the 1D separation which causes broadening of the 1D peak widths33–35 with a concurrent reduction in 1D resolution and potential loss of quantitative precision. 4 ACS Paragon Plus Environment

Page 5 of 81

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

One approach to simultaneously optimize the 2D peak capacity and sampling density is to initially optimize the peak capacity of the 1D separation, generating narrow 1D peak widths of ~ 2 to 6 s, and then use a relatively short PM of ~1 to 2 s to achieve the optimal sampling density of ~ 2 to 4.14,36,37 Many applications, however, may still benefit from increasing the 2D peak capacity by using a relatively long PM (~ 5 to 8 s), which is commonly practiced.25,26,28 With this longer PM, the 1D peaks are typically either undersampled or relatively broad (10-20 s), resulting in a trade-off by reducing the 1D peak capacity and the overall 2D peak capacity. Column selection, carrier gas flow rate, and temperature programming are all critical aspects for the successful application of GC × GC, and plays a big role in optimizing the 2D peak capacity. Stationary phase composition, column dimensions, and phase thickness are important factors to consider.2,38–43 There is a wide variety of stationary phases but in recent advances numerous column types such as ionic liquids44 and application specific columns have been developed.45–47 While there are many stationary phase combinations available, all applications employ a sufficiently orthogonal column set in order to provide complementary separations on both dimensions. The most commonly used column combination has been a nonpolar 1D column and polar 2D column,36,48 while for certain applications, a polar 1D and nonpolar 2D column set49,50 may be beneficial. For GC × GC separations, column inner diameter in conjunction with phase thickness greatly affects the separation efficiency, N, and loading capacity. Normally an analyst strives to use optimum flow rates which are dependent on column dimensions (length and inside diameter). Temperature programming rate and flow rate should be complementary creating optimal separations.42,51,52 Several reviews have covered optimization of GC × GC seperations.40,53,54

5 ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 81

INSTRUMENTAL ADVANCES As the GC × GC field has evolved, many instrumental technologies, both commercial and academic, have been developed to further enhance the separation power and detection of chemical species, and many excellent reviews have been written on GC × GC.3,55,56 As previously described with regard to Figure 1, a GC × GC instrument uses an injection interface to introduce analytes in a sample of interest onto the 1D column, and then at the end of the 1D separation, a modulator transfers analyte from the 1D column to the 2D column, and after a relatively rapid 2D separation, the analytes are detected. Most of the innovation described in this section focuses on modulators and detectors or a combination of both. Modulators, often termed the “heart” of the instrument, can be broadly classified in three categories: thermal, valve-based, or flow. For this review, we adhere to this trend, and also distinguish between commercially available versus those still only used in academic settings. While other modulator design categories can be rationalized, herein three different categories are defined because each mode is inherently different. Thermal modulation uses temperature control to trap 1D eluate followed by desorption onto the 2D column. Valve-based and flow modulation are often grouped together since both use a sample loop (or channel) to collect 1D eluate before reinjection onto the 2D column using a relatively high flow rate and both the 1D and 2D flows can be independently controlled; however, the two modes are significantly different. With valve-based modulation, the flow rates for 1D and 2D are not coupled, so the two columns are operated separately. Because the flows are decoupled, method development should be inherently easier with valvebased modulation. In flow modulation, the two column flows are coupled, resulting in more challenging method development. Each of the three modulator designs has distinct advantages and disadvantages, with the common goal to isolate and transfer eluate from 1D to 2D as

6 ACS Paragon Plus Environment

Page 7 of 81

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

efficiently and quickly as possible. The short and fast 2D separations produce very narrow peaks relative to the 1D separation, so detectors for GC × GC must have the ability to rapidly collect data in order to have an adequate number of detection intervals across the peak. In the area of detection advances, recent innovations have introduced high resolution mass spectrometers, including variable ionization to allow for more confident analyte identification, and also vacuum ultraviolet (VUV) absorption detection. To that end, there has been a large commercial expansion into the field of GC × GC detectors. Modulators. The modulator transfers eluate from the 1D separation to the 2D separation at a user defined time interval, termed the modulation period, PM. The modulator must repeatedly and precisely trap or collect eluate before reinjecting it onto the 2D column while still preserving the integrity of the 1D separation. There are many excellent review articles on modulators specifically discussing the history and development of this specific portion of the GC × GC field.3,56–61 The three modulator categories will now be presented: thermal, valve-based, and flow. Thermal Modulation. Thermal modulation is the most commonly applied technique and relies on low temperatures to trap and focus analytes as they elute from the 1D column and introduces them to the 2D column through rapid heating. There are three general types of thermal modulators: resistively heated trap, heated sweeper, and cryogenic focus, which is often divided into longitude movable trap and jet trap. The most frequently employed is the jet trap which uses strategically placed and timed jets of cryogenic gas or a combination of heat and cooled jets. Commercially available thermal modulators for GC × GC have been soundly demonstrated to be highly reliable, however recent innovations have been directed toward providing simpler and more cost-effective thermal modulator designs.

7 ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 81

A recent trend in both commercial and research sectors is the development of cryogenfree thermal modulators. Removing the need for liquid nitrogen (LN2) or carbon dioxide (CO2) simplifies cryogenic modulation and enables thermal modulation to be used for a broader range of applications. To this end, both ZOEX and LECO® Corporations manufacture closed cycle refrigerator type modulators. For example, ZOEX Corporation has introduced the ZX2 thermal modulator which employs a closed cycle refrigerator/heat exchanger to create a two stage loop modulator capable of modulating C7 and above. This design has been used to study petroleum products in several different studies.17,62–64 Likewise, J&X Technologies has introduced the first thermal modulator based upon commercial thermoelectric (TE) cooling; modulation can be tailored to a specific range of volatilities with the largest range of C8-C40. This modulator is relatively new but its use in a potential industrial application has been demonstrated (Figure 2A).4 A single-stage consumable-free modulator has also been developed using a coated stainless steel capillary trap.65 The desorption step is completed using a capacitive discharge power supply to resistively heat the trap, and the cooling function is accomplished using cooled ceramic pads. This modulator has been used to study honeybush tea66 and has been demonstrated to provide similar data relative to a commercially available LN2 quad jet thermal modulator in the study of environmental pollutants.67,68 A cost effective approach for thermal modulation was introduced as a “Do-It-Yourself” interface that can be built using two-low cost components that are commercially available.69 Indeed, this segmented loop-based thermal modulator is simple, requires no cyrogens, and was shown that GC × GC – FID chromatograms of petroleum samples and hop oils compared favorably to those obtained by state-of-art, consumable-free thermal modulators. Similarly, improvements to micro thermal modulators (µTM), used in the context of µGC × µGC, have been focused on replacing the cryogenic fluids with a solid-state TE cooler to

8 ACS Paragon Plus Environment

Page 9 of 81

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

further allow for a portable GC × GC system.70,71 To improve upon the trapping and desorption of a µTM using a TE cooler, an air-gap spacer has been added to enhance the temperature uniformity across the device’s channels.70 It was demonstrated that this improvement resulted in a 25% increase in peak intensity of a test analyte. Zellers and co-workers improved the performance of a µTM using a TE with temperature programming combined with using an ionic liquid as the stationary phase to assist in trapping the 1D eluate resulting in narrower 2D peaks at the point of reinjection.71 Cryogen-free modulation produces 2D peaks that are similar to those obtained by cryogen (either LN2 or CO2) modulation, and thus similar peak capacities on 2D, but the drawback of cryogen-free modulators is an inability to trap highly volatile species under ~ C7. Nevertheless, cryogen-free modulators should be generally suitable for a wide variety of academic research and private sector applications. Thermal modulators using cryogens remain extremely popular. A single-stage jet trap modulator is the simplest form of thermal modulation as it does not require any moving parts; however, single-stage thermal modulation is often plagued with breakthrough (i.e. not all the 1D eluate is trapped). Gorecki and co-workers demonstrated that introducing silica wool as a restriction into the column at the point of the jet can increase the trapping efficiency by slowing the gas flow during the desorption stage to prevent breakthrough (Figure 2B).5 The analytical performance of the restricted modulator was similar to a quad jet modulator. Loop-based thermal modulation was introduced as a viable alternative to singe stage modulation due to the issues normally associated with using a single-stage system. A loop-based system uses one set of jets to trap and then desorb the eluate which travels around the loop where it is trapped again, and then desorbed onto the 2D column. An important parameter for modulation in GC × GC is the carrier gas velocity at the point of reinjection (desorption).72,73 For loop-based thermal modulation,

9 ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 81

similar to other types of thermal modulation, the 1D and 2D separations are serially coupled (Figure 2C). This means that the 2D flow rate is directly influenced by the length, diameter, and flow rate of 1D. To optimize the reinjection conditions, a bleed line can be added using a Y connector at the end of 1D between the outlet of the modulator loop and 2D column. Others have used this technique to improve separation on the 2D column.74,75 Valve-Based Modulation. Instead of using thermal zones to trap eluate from the 1D separation, valve-based modulators collect the 1D eluate in a short collection loop which is then flushed and directed onto the 2D column. Valve-based modulators excel at modulating compounds across a wide range of boiling points (e.g. C1 to C40+), require minimal consumables, and have a relatively simple design. However, GC × GC instruments with a valve-based modulator are not as sensitive as those with a thermal modulator, and require high flow rates on 2

D column to quickly flush the collection loop. Valve-based modulation for GC × GC was first

performed with a diaphragm valve.76 Relatively narrow 2D peak widths were obtained with excellent 2tR reproducibility, but only 10% of the eluate reached the detector resulting in a loss in sensitivity, and the valve had a temperature limit of 175 °C. Improved performance was obtained when a sample loop was implemented and higher flow rates for the 2D separation were used to improve the modulation performance.77 Diaphragm valve technology for modulation in GC × GC has continued to improve. Recently, the temperature limit of 175 °C was overcome by a commercially available valve in which the temperature sensitive O-ring was replaced with a perfluoroelastomer-based O-ring, allowing reliable function up to 325 °C.6 Narrow, reproducible 2

D peak widths and 2tRs were obtained. Under the conditions used, it was demonstrated that only

~ 30% of the injected material made it to the FID detector, yet, the detection sensitivity was ~ 8fold higher than 1D-GC due to zone compression (Figure 2D). Using flow rates of 3 mL/min on

10 ACS Paragon Plus Environment

Page 11 of 81

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

the 2D separation, GC × GC with high temperature diaphragm valve modulation was demonstrated to be compatible with TOFMS detection.78 These high temperature valves have also been recently implemented in comprehensive three-dimensional GC coupled with TOFMS detection (GC × GC × GC – TOFMS).79 This GC3 instrument combined a high temperature diaphragm valve and quad jet LN2 as modulators. Flow Modulation. Flow modulation has gained increasing acceptance as a modulation method in GC × GC. Flow modulation is similar to valve-based modulation but two columns flows are coupled. In 2006, Seeley pioneered a flow modulation design in which 100% of the 1D eluate was transferred to the 2D column and on to detection80 and many subsequent fluidic modulator designs would be based on this original design. A comprehensive review was published in 2011 that a detailed a summary of flow modulation.59 The key event that precipitated the wider adoption of flow modulation was Agilent Technologies’ introduction of the Capillary Flow Technology (CFT), and many companies have subsequently introduced their own commercial version of the flow modulator.56 SepSolve has recently introduced the INSIGHT™ modulator, a valve-based reverse fill/flush modulator (Figure 2E).7 Many academic groups have also implemented their own version of the flow modulator. The use of flow modulation raises challenges when coupled with MS detection due to the relatively high 2D flow rates applied. Mondello and co-workers developed a flow modulator using a seven port plate in conjunction with a quadrupole MS.81 A proof-of-principle study was performed using this flow modulator with GC×GC coupled with a HR-TOFMS detection.82 Lower carrier gas flow rates (6-8 mL/min) compared to the typically high gas flows (~ 20 mL/min) still provided efficient reinjection of the 1D eluate, but required a longer flush time (i.e. reinjection time) to totally clear the loop83. Additionally, using a larger sample loop volume for 11 ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 81

collection of the 1D eluate resulted in improved 2D peak shapes.73 Using the original flow modulator design,80 a low flow rate of 4 mL/min on the 2D separations was demonstrated in conjunction with MS detection, while providing increased detection sensitivity relative to using 1D-GC.84 Similarly, using a high speed Deans Switch, constructed from commercially available parts, a low duty cycle modulator was demonstrated using a flow rate of 2 mL/min for the 2D separation.85 Narrow 2D peak widths and reproducible 2tR were achieved, but only ~ 10% of the 1

D eluate reached the detector due to the low duty cycle, resulting in decreased sensitivity. Using

this same high speed Deans Switch, a new approach to GC × GC modulation was conceived, whereby a pattern of primary eluate, “pattern modulation,” was transferred to the 2D column instead of a single pulse per each modulation.86 The data obtained was readily decomposed into a traditional GC × GC chromatogram using Lucy-Richardson deconvolution, and the resulting peaks were 16-36 times more intense and the 2D peak widths were 40-69% narrower than obtained by traditional flow modulation. Additionally, a multi-mode fluidic modulator has been introduced by Seeley and co-workers that is capable of performing, heart cutting (GC – GC), low duty cycle GC × GC, and total transfer GC × GC.8 Due to the high flow rates on the 2D separations, often ~ 20 mL/min, flow modulation is often not compatible with many mass spectrometers, unless this issue is addressed. This issue can be addressed in two different manners: either split some of the flow into a bleed column, or split the 2D exit flow into two detectors. Performing the latter allows for various combinations of detectors to be used in tandem that can be tailored to meet specific analytical goals. Armstrong, Sandra, and co-workers have shown that simultaneous detection using FID and quadrupole mass spectrometry (qMS) is very advantageous because the FID readily provides a reliable

12 ACS Paragon Plus Environment

Page 13 of 81

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

quantitative analysis, while the qMS spectral scan speed is sufficient to enable more confident analyte identification.87 The performance of forward and reverse fill/flush flow modulation has been compared for a wide range of concentrations.88 At low concentrations, the forward and reverse fill/flush were nearly identical, producing similar 2D peak widths and detection sensitivity. However, at high analyte concentrations, the reverse fill/flush was found to produce narrower 2D peak widths and thus a higher detection sensitivity. Thus, it was concluded that a reverse fill/flush is the preferred method because it is capable of handling a wider concentration dynamic range. Additionally, using an Agilent CFT modulator, it has been demonstrated that flow modulated GC × GC can result in an increased detection sensitivity compared to 1D-GC.89 Comparing the peak heights from 1D-GC and GC × GC, a 10-fold to 33-fold increase in signal intensity could be achieved depending on the detection sampling density. An interesting, yet unconventional, modulation approach, termed “partial modulation”, was introduced by Cai and Stearns in 2004.90 Using a custom built pulse flow modulator, small pulses of carrier gas are repetitively injected at the interface between the 1D and 2D columns, creating either local high or low concentration pulses in the eluate departing the 1D column, which are then separated on the 2D column. A PM of 1 s was applied and 2D peak widths ~ 60 ms were achieved. However, additional data processing was required to achieve a conventional appearing 2D chromatogram. Recently, this approach was improved upon by using a commercially available pulsed flow valve.91 Modulation periods as fast as 50 ms were reported with apparent 2D peak widths ranging from 12 to 45 ms. A three-step data processing procedure was introduced to convert the raw data into a format analogous to a typical GC × GC separation. The fast modulation period has potential to open up new directions in GC-based separations.

13 ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 81

Modulator Comparisons. Comparison of thermal and flow modulation did not appear in literature until 2011 when the resolving power of a LN2 quad jet thermal modulator was compared to an Agilent Technologies CFT modulator.92 Modulation of compounds C10 were focused and modulated better by the liquid cryogen modulator. It was concluded that both techniques were able to adequately separate light crude oil but different instrumental parameters (e.g. flow rates) were required for the two modulators in order to achieve the separation. Likewise, using heavy petroleum cuts as a test mixture, a cryogenic modulator was compared to both a forward and reverse fill/flush differential flow modulator.93 Four different types of flow modulators as well as a CO2 dual-jet modulator were investigated. The reverse fill/flush flow modulator was found to reduce band broadening and enhance peak intensity compared to a forward fill/flush modulator, and thermal and flow modulation produced similar chromatograms albeit with different instrumental conditions (e.g. flow rates). More recently, using the SepSolve INSIGHT™ flow modulator, the GC × GC analysis of volatile organic compounds (VOCs) was achieved with a cryogenically modulated GC × GC – TOFMS method adapted for the use with a reverse fill/flush flow modulator.7 Flow modulation of VOCs down to C4 was achieved producing separations similar to those obtained using cryogenic modulation. With flow modulation, only 20% of the injected sample reached the TOFMS due to flow splitting; however, a limit of detection (LOD) of 1-10 ppb was achieved. Detectors. Significant advances in detector technology for GC × GC have also been made. For proper data collection to support advanced chemometrics, discussed later, a detector should provide ~ 10-20 scans per peak94 which becomes a key challenge for detectors associated

14 ACS Paragon Plus Environment

Page 15 of 81

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

with GC × GC systems where the 2D separation peak widths are narrow, typically ~ 100-250 ms. Detectors are classified as either univariate or multivariate, depending upon if they produce a single data point at each detected time point (univariate), or a vector of data points (multivariate). An example of a univariate detector commonly used with GC × GC is the flame ionization detector (FID) which detects ions formed from the combustion of organic compounds in a hydrogen flame. This system is simple, robust, and can operate at a collection frequency that is more than sufficient, but analyte identification relies solely on matching the 2D retention times between a given analyte peak in a sample relative to a known analyte peak in a standard. Multivariate detectors, like mass spectrometers (MS), have the ability to provide both the analyte 2D retention times, but also produce spectral data used to “fingerprint” and identify the analyte based on known spectral databases. The most common implementation of MS with GC × GC is GC × GC-TOFMS, due to the high scan rate with TOFMS (up to 500 spectra/s). An appropriate MS for any application is typically judged by four key performance parameters: mass accuracy, mass resolving power, sensitivity and acquisition speed.95 All of these performance parameters are important; however, for applications with GC × GC, acquisition speed has to be sufficient to produce the required data density for successful data analysis, discussed later, and thus limits many MS instrument types from being used with a GC × GC. Time-of-Flight Mass Spectrometry (TOFMS). In the latest version of the Pegasus® TOF series, LECO® introduced a GC × GC instrumental platform (Pegasus® GC-HRT+ 4D) in which the TOFMS provides both hard electron ionization (EI) and soft chemical ionization (CI) sources, facilitating the option of comparison with classic library spectra (EI) and preservation of the molecular ion (CI). The Pegasus® high resolution TOFMS (HR-TOFMS) 4D, that is part of the Pegasus® GC-HRT+ 4D uses several unique advances to improve the performance of GC ×

15 ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 81

GC experiments. The primary advance is the Encoded Frequent Pushing™ (EFP™) feature which is a method of pulsing an orthogonal accelerator multiple times per transient to improve the duty cycle of the TOFMS. This advance minimizes the off duty cycle time, reducing the ions lost during the off duty cycle time. A 10-fold increase in detection sensitivity was achieved because of the increase in duty cycle. Additional benefits of the EFP™ include the decoding algorithm that removes non-coherent signals detected between the push pulses, which in turn reduces the background noise in the data, while the decoding process increases ion peak intensities by an order of magnitude by real time summing of the multiplexed mass spectra.96 With the combination of EFP™ and the proven Folded Flight Path® (FFP®) TOF technology, the 4D HR-TOFMS provides a resolution above 50,000 FWHM, a collection frequency up to 200 Hz, and mass accuracies better than 1 ppm. Use of high resolution MS has been demonstrated to be very important for petroleomics, due to the ability to resolve very narrow mass differences (∆m) between isobars. Indeed, a GC × GC coupled to a LECO® Pegasus GC-HRT+ 4D was used to distinguish the C3 (C17H19) versus SH4 (C14H23S) mass split (∆ = 0.0034 Da).9 The calculated theoretical minimum resolving power to distinguish this mass split is 66,000. With the combination of GC × GC, the EI and CI spectra collected by the HR-TOFMS at 25,000 FWHM, addressed this challenging analysis of C17H19 (223.1481 Da) and C14H23S (223.1515 Da) (Figure 3A). The Pegasus® GC-HRT+ 4D proved essential to provide the exact mass distinction of 3.4 mDa between the two species that enabled the identification of sulfur compounds in crude oil in order to satisfy regulatory requirements. The BenchTOF-Select™ TOFMS was recently introduced by Markes International, featuring Select-eV® ionization technology and the capability to conduct Tandem Ionisation® (TI) within one run. The TI capability collects alternating user defined ionization energies in the

16 ACS Paragon Plus Environment

Page 17 of 81

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

range of 10-70 eV producing both hard and soft electron ionization (EI) at a scan rate up to 100 Hz (50 Hz per ionization energy). The two ionization energies provide complementary chemical selectivity,97 with a key advantage of soft ionization being the added ability to distinguish and identify large isometric species. The variable ionization energy feature was recently applied to distinguish isomers using energies of 70 eV and 14 eV.64 For example, VOC signatures of human blood were examined at four ionization energies: 12, 14, 16 and 70 eV (with 12 and 70 eV shown in Figure 3B), in which complementary chemical information is provided by the changes in fragmentation patterns.7 The soft ionization (12 eV) demonstrates a dramatic increase in the molecular ion at m/z 100. The variable ionization source allowed for increased confidence when identifying analytes especially for closely eluting isomers. Vacuum Ultraviolet Absorption. As an alternate to MS detection, the feasibility of using Vacuum Ultraviolet (VUV) multivariate detection with GC × GC has been reported.11,12 Recently, VUV Analytics has released two universal benchtop detectors, the VGA-100 and next generation VGA-101. The former won the 2015 Research and Development Top 100 Award. These benchtop VUV detectors have a maximum 90 Hz spectrum acquisition frequency which is sufficient to produce the needed data density for typical GC × GC peaks. The key improvements of the VGA-101 are an increased scan range from 120-240 nm to 120-430 nm and an increase in maximum operational temperature from 300 to 430 °C which facilitates the analysis of higher boiling point compounds. These VUV detectors have several key attributes: a linear response and no need for calibration, excellent isomer differentiation, and claimed robust system performance with minimal maintenance requirements. The basic principles of VUV detection were recently explored by Zimmerman and coworkers,11 including a feasibility study for detection of volatile organic compound (VOCs) and

17 ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 81

breath gases with the use of a VGA-100 in a GC × GC-VUV instrumental platform. A VUVabsorption spectrum of benzene with assignment of electronic transitions is provided to illustrate the detector performance (Figure 4A). To compare and evaluate the data, a GC × GC-TOFMS reference instrument was used. Similar experimental conditions with respect to flow rates and scan rates were implemented to achieve an objective comparison. The VGA-100 was used at the maximum scan rate of 90 Hz and the TOFMS was set to 100 Hz to closely match the data density produced. GC × GC with VUV detection for the VOCs and the breath gasses was demonstrated to provide similar identification of all study compounds in a rigorous comparison to GC × GC-TOFMS data. However, the GC × GC-TOFMS produced narrower 2D peak widths and a lower LOD; both of these results were largely blamed on the need for a nitrogen makeup gas that is added in the transfer line leading to the VGA-100. To further evaluate the potential of GC × GC-VUV, sample introduction using needle trap micro extraction (NTME) on breath gas samples was conducted. The results were compared to previously collected GC × GC-TOFMS, with similar results obtained for the test analyte propanioc acid. Flow modulated GC × GC was recently coupled with a VGA-100 detector, to evaluate the detector performance with petrochemical and 37 FAMEs along with six C20 FAMEs with DB:0-5 individual spectra are shown (Figure 4B).12 Due to a 2D flow rate of 12.0 mL/min this instrumental platform did not require a makeup gas flow at the transfer line. The research highlighted the ability to use the VUV data to deconvolute co-eluting peaks and the ability to achieve pseudo-absolute quantification of analytes based on known absorption cross-sections of target analytes, thus eliminating the need for traditional calibration. It was noted that extensive work was dedicated to method optimization and understanding the effects of pressure resistance caused by the transfer line, flow cell, and waste line with VUV detection. Peaks with good

18 ACS Paragon Plus Environment

Page 19 of 81

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

tailing factors (1.0-1.2), asymmetry factors (1.0-1.3), and an average 2D peak width of ~ 600 ms were reported for FAME standards. Using the VUV database, spectral matching with good similarity was attained with an average similarity value of 97%. The VGA-100 proved to possess satisfactory analytical performance that due to its high flow capability makes it a well suited method for GC × GC. Other Detector Methods. A very thorough evaluation of quadrupole MS (qMS) for application with GC × GC was recently reported by Armstrong, Sandra and co-workers.98 Spectral scan rates were evaluated between 5.27 and 25.45 Hz, considerably slower than the scan rates for TOFMS, which can be up to 500 Hz. The qMS with a 20 Hz scan rate could still be used to identify analytes with a good degree of success, albeit with much reduced detection sensitivity and spectral quality than with TOFMS detection. The slow scan rate of qMS does have an adverse impact on data density for narrow peaks, for example, a 100 ms wide 2D peak would only be at best scanned two times at 20 Hz, well below the suggested 10-20 scans across the typical “narrow” 2D peak.99 However, these results are encouraging and useful to the researcher who wishes to benefit from the power of GC × GC chromatography combined with a MS, but does not have access to a TOFMS. Two univariate detectors have been noted in recent literature for potential application with GC × GC: the Shimadzu Barrier Ionization Discharge (BID) detector,100 and the Activated Research Company® Polyarc® System which is an oxidation-methanation reactor,101 working in conjunction with an existing FID. The performance of the BID was directly compared to the performance of an FID operated under the same conditions. The BID proved to be a suitable replacement for an FID, with its main advantage being higher sensitivity for most if not all compounds in the analysis.100 The main disadvantages were a reduced dynamic range, a high

19 ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 81

sensitivity to contaminants in the carrier gas, and a lack of linear response to the mass of carbon, requiring the use of internal standards. The Polyarc® System is relatively new, and at this time no literature can be found with it being used with for GC × GC, however the potential is there. One report in which 1D-GC was used to study the response of sulfur-containing compounds, the performance of a standard FID was compared to the Polyarc®-equipped FID dubbed the quantitative carbon detector (QCD).101 Retaining the fast response rate of an FID, the QCD was shown to provide an improved response over the FID by two orders of magnitude attributed to its equimolar carbon response even in the presence of sulfur containing compounds. Additional GC×GC Components and Implementations. Pressure tuning the 1D column in GC × GC was demonstrated to change the apparent polarity and selectivity of the columns.102 Two 1D columns (1D1 and 1D2) were linked serially via a microfluidic splitter device and modulation onto the 2D column was performed by a longitudinally modulated cryogenic system (LMCS). The pressure tuning changes the relative 1D retention times and temperature of elution for the 1D separation, which in turn effects the 2D retention times. As a result of the pressure tuning, the distribution and selectivity of target analytes could be tuned, resulting in a better 2D separation. In a similar way, multiple 2D columns can be utilized, using temperature to adjust the selectivity.103 With two 2D columns (2D1 and 2D2) linked serially together the selectivity on the overall 2D separation could be tuned by two different instrumental approaches: by adjusting the length of the 2D1 column, or by installing the 2D1 column in a separate oven from the 2D2 column. Separation of the headspace of a coffee powder demonstrated the added value of tunable GC × GC by solving co-elution of specific aroma compounds. By stepwise alteration of the selectivity of the 2D dimension, classes of compounds showing similar retention behavior could be discriminated, improving the overall GC × GC separation.

20 ACS Paragon Plus Environment

Page 21 of 81

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

An issue in GC × GC analysis is that some analytes are not volatile enough to be amenable to separation but still migrate onto the 1D column. By installing an inlet back flushing device that turns on at a specified time after injection it is possible to prevent contaminants from migrating onto the GC column, preserving life of the 1D column and the efficiency, N, of that column.65 Under certain circumstances, a large source of band broadening for the 1D separation is the inlet and autoinjector. Using an improved injection source can increase the N of a standard GC × GC instrument by ~ 10-fold.104 Using a cryogenic trap with a restively heated column as an injection mechanism, narrow 1D peak widths were created resulting in an increased 1D peak capacity. While many advancements in modulation have been made in an attempt to improve 2D peak capacity, the pseudo-isothermal 2D separations result in 2D peak widths increasing as a function of analyte retention, which can be overcome by temperature programming the 2D to improve N.105 The shortest duty cycle used a PM of 4 s which required 1.5 s to cool the column to the GC oven temperature followed by 2.5 s of heating. Temperature programming on the 2D column was demonstrated with diesel fuel and a better separation was achieved compared to non-temperature programmed chromatograms. An issue with flow modulation is the high flow rates on the 2D dimension which are required to efficiently flush the sample loop. This can be overcome by using two (or more) 2D columns106 and splitting the flow, but this can lower detection sensitivity (but provide more selectivity if using complementary detectors) or by using a multicapillary column which provides a higher N at high carrier gas velocities.107 Ionic liquid (IL) stationary phases have continued to evolve since their introduction to the commercial market. It has been demonstrated that the chemical selectivity can be tuned by

21 ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 81

changing the substituted groups on the cationic moiety.44 In addition it has been demonstrated that the structural features of the ILs can also change the selectivity.108 Indeed, using ILs on 1D and 2D separations, Safflower oil constituents were easily separated with a relatively short run time.109 A smart µGC × µGC system has been developed by Fan an co-workers, enhancing the separation power on 2D by using multiple 2D columns with separate detectors.110 The 1D eluate is monitored in real-time and a decision is then made to route the modulated 1D eluate to one of the 2

D columns for further separation. All of the 2D columns are independent of each other, and their

coating, length, flow rate and temperature can be customized for desired separation results. This idea was also implemented in a smart GC3 system.111 Continuing research in smart µGC × µGC, a portable GC × GC was reported, with four separate channels for 2D separations, while also incorporating a micropreconcentrator/injector, micro-Deans switches, and microphotoionization detectors.112 Separation times of up 32 s were applied on the 2D separations. Using a 50 component mixture of various chemical classes, a 14 min separation was demonstrated, achieving a 2D peak capacity of 430−530. A MDGC instrument has been developed to study the reversible molecular interconversion through specific isolation of a diastereo and enantiopure oxime113. GC × GC was performed prior to heart cutting with a Deans switch. Individual pure enantiomers were then selectively cut from within the 2D separation space, cryofocused, and eluted on a 3D reactor column for E ⇌ Z isomerization under controlled oven temperature and flow. Heart cuts taken over the resulting interconversion distribution were then cryotrapped at the inlet of a 4D column, on which achiral separation allowed precise quantification of each E and Z isomer of the

22 ACS Paragon Plus Environment

Page 23 of 81

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

enantiomer. From peak areas and isomerization time, the forward and backward rate constants (kE→Z and kZ→E) were determined.

DATA ANALYSIS GC × GC is generally applied to complex samples and coupled with multichannel detectors such as the TOFMS or VUV, resulting in enormous amounts of data. The enormity of the data is especially true when implemented for routine analyses, when large numbers of samples or replicates are analyzed. Obviously, the task for the analyst is not over once the data is collected. Indeed, the data must be transformed into useful information, via data analysis. This transformation is often provided by commercially available software, however the analyst may find that commercially available data analysis options do not meet their needs. For these cases, many users turn to chemometric methods that take advantage of the higher order dimensionality of the GC × GC data sets. Broadly speaking, chemometric methods aim to convert chemical data into information using algorithms based upon linear algebra and statistical principles. This review will cover advances in commercial software packages, as well as advances in chemometric methods. Once the GC × GC data is collected, analysts ultimately aim to identify and quantify analytes (in either a targeted or non-targeted fashion) to address a variety of goals in the context of the experimental design: to characterize samples based upon the chemical measurements, to identify key compounds indicative of a particular cause and effect stimulus, and to relate chemical measurements to other measured properties of the samples. Targeted and non-targeted analysis refers to when the analytes of interest are either known or unknown beforehand, respectively. However, there are challenges in working with such complex and dense data sets. Some challenges include analyte peak overlap, difficult and time-consuming computations, 23 ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 24 of 81

confident discovery of important (and often subtle) chromatographic differences between samples, and extraction of other meaningful information. Optimization of the experimental and instrumental conditions is very important in dealing with these issues. However, when instrumental capabilities have been fully optimized and have become a limiting constraint, the analyst must apply other approaches to achieve the desired goals. Data analysis approaches complement and extend the value brought by chromatographic separations, providing a means to achieve the mentioned goals from collected data while simultaneously further addressing the challenges. For example, deconvolution methods are implemented to mathematically resolve coeluting analyte peaks with the end goal of identification and quantification. The common classes of data analysis methods include the following: deconvolution, pattern recognition, property prediction, and retention time/index modeling. All of the recent advances as they pertain to GC × GC data analysis will be presented. Before many of these data analysis methods can be applied, the GC × GC data must undergo preprocessing to facilitate accurate and meaningful implementation. Preprocessing procedures are critical in removing/minimizing artifacts in the data such as noise, baseline effects, and retention time shifts that would otherwise make performing chemometric analyses very difficult and less meaningful. Common preprocessing includes noise reduction procedures (smoothing etc.), baseline correction, normalization, and time alignment.114 There are two main approaches when handling data analysis: working from peak tables as is the case with many commercial software packages and other “user friendly” systems, and working from the raw data as has been the mainstay in many of the academic advances in data analysis. For chemometric methods, additional care is often required to ensure the GC × GC instrument generates data that is amenable to many chemometric algorithms. Although the

24 ACS Paragon Plus Environment

Page 25 of 81

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

various chemometric methods differ in their applications, data requirements are fairly consistent between them. Many methods require data bilinearity or trilinearity for appropriate implementation. A suitable data density across the 1D and 2D peaks is also required. Many of these chemometric methods require a minimum amount of chemical selectivity, in both the separation (i.e. adequate resolution) and detection dimensions. We continue the section on Data Analysis with recent advances in commercial software packages, followed by advances in chemometric methods. Commercial Software. Implementation of multivariate data analysis approaches is provided in several commercial software packages, enabling users to process their data in order to gain more information. Most of these data analysis packages are integrated into the software required to run the instrument while others are a standalone package. Each software package enables users to gain more information from their chromatograms, with the majority of them using the peak table approach for data analysis. The ChromaTOF® software package (LECO®) is used to run their GC × GC-TOFMS and GC × GC-HR-TOFMS, and also allows the user to process data with many different tools (Figure 5A). Non-target Deconvolution® and High Resolution Deconvolution® allow for automated deconvolution and peak detection. After finding peaks of interest, mass spectra can be matched to the NIST database. Enhanced graphics allow the user to display information in a variety of different manners. In ChromaTOF-HRT®, a deconvolution method can be applied to the high resolution data. ChromaTOF-HRT® allows for the calculation of the analyte chemical formula, which can be matched to accurate MS libraries. Mass defect plots can be readily created, which can be beneficial when looking for specific types of analytes. LECO® has also introduced the web-based “Simply GC × GC™” method development tool which assists new as

25 ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 26 of 81

well as experienced users to develop or transform existing 1D-GC methods into GC × GC methods. Another major commercial software package is GC Image™, which was specifically developed for the analysis of data from GC × GC and other multidimensional separation instruments (Figure 5B). It is compatible with many of the major commercial instrument platforms. GC Image™ allows for the visualization and analysis of individual chromatograms, or GC Project can be used to manage multiple GC × GC runs. GC Image™ also comes with an Image Investigator software which allows for multivariate analyses such as sample classification, fingerprinting, and compound discovery. The most recent update allows for large data file support, mass calibration, and centroiding, as well as numerous properties for the investigation of HR-TOFMS data. ChromSpace® software package (Markes International) is used to run their BenchTOF™ TOFMS instruments, and also allows the user to further investigate their data (Figure 5C). Overlapped peaks can be deconvoluted if necessary and peak tables can be automatically created. After generation of the peak tables, mass spectra can be matched to the NIST database or to an in-house library. The BenchTOF™ can collect mass spectra in Tandem Ionisation® (TI) mode, creating two different fragmentation patterns. Spectra from the two ionization voltages can be overlaid and the fragmentation patterns of the analyte can be observed at the same time. Each chromatogram has to be processed separately. ChromSpace® comes with 14 and 12 eV libraries; however, these are not as populated as the NIST database for 70 eV mass spectra. Prior or after peak identification, numerous graphical tools are available to enhance visual analysis of the data. Other commercial and free software options are readily available for advanced chemometric analyses, including script languages with chemometric capabilities and toolboxes, 26 ACS Paragon Plus Environment

Page 27 of 81

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

as well as other software packages. A common script language is MATLAB® by MathWorks®, which is supported by numerous advanced chemometric software toolboxes.115 Three of the more prominent software toolboxes are Partial Least Squares (PLS)_Toolbox by Eigenvector Research Incorporated, N-way toolbox from the University of Copenhagen, and Multivariate Curve Resolution-Alternating Least Squares from the Institute of Environmental Assessment and Water Research. Each toolbox has unique but similar graphical user interfaces (GUI) that help the analyst conduct numerous data analysis methods on multivariate data such as those discussed herein, and also provide the ability to create graphics to support the data analysis. Besides MATLAB®, there are other free, open source script languages available with their own toolboxes, including Python™, R by CRAN, and GNU Octave©. All programs have their own benefits and shortcomings. Other software packages include “The Unscrambler® X” by CAMO Software, which allows for data input from a variety of platforms. Raw Data Formats for Chemometrics. Other than taking a peak table approach, analysts can choose to analyze the raw data provided by the GC × GC instrument using one of the mentioned script language packages. For chemometric methods as well as the commercial software packages, the data structure becomes an important consideration for successful implementation. Issues with the data, such as retention time misalignment, must be handled either within commercial software packages or using in-house algorithms in script language packages. Alignment algorithms can be applied within a chromatogram to improve data bilinearity or trilinearity116 or applied across multiple samples and whole chromatograms, where approaches include peak-table based, pixel-based, and global alignment strategies.117–127 Figure 6 summarizes the common data structures (unfolded versus folded, and single versus multiple samples) and issues (ideal versus misaligned) as well as the methods that can be applied to the

27 ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 28 of 81

various data manifestations. The data is often viewed as folded, either as a contour plot or pixelated. However, most chemometric methods are applied to the unfolded data, where the modulations are concatenated (Figure 6). After analysis, the data can be refolded. The chemometric methods referred to in Figure 6 are discussed in more detail below in the context of the specific analysis goal. Deconvolution. Despite the increased peak capacity and the additional separation dimension, analyte overlap is ubiquitous in GC × GC separations. Chromatographic peak deconvolution methods play an integral role in addressing this challenge to provide confident analyte identification, quantification, and peak purity assessment. Through the application of deconvolution methods, such as parallel factor analysis (PARAFAC) and multivariate curve resolution alternating least squares (MCR-ALS), overlapped peaks can be computationally separated on the 1D time, 2D time, and spectral dimensions in order to gain chemical information for specific analytes. With the deconvoluted peak data, analytes that were overlapped can be identified using the deconvoluted spectra, and quantified with the peak profiles obtained, thereby facilitating calibration methods.115,128 Deconvolution methods can be applied in a targeted or non-targeted fashion, meaning that the analytes of interest are either known or unknown beforehand. In the application of these methods, the chromatographic region of interest is selected and the data is unfolded (Figure 6A). Deconvolution can be applied to a single chromatogram, or multiple samples by concatenating the data (Figure 6). In certain cases, there is a data trilinearity or bilinearity requirement. An ideal chromatogram is one in which there is no retention time misalignment on 2D (and/or 1D, depending on whether one or more samples are to be analyzed), meaning that the data has sufficient trilinearity for trilinear data analysis

28 ACS Paragon Plus Environment

Page 29 of 81

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

methods (Figure 6A). The common methods and their considerations are discussed in the sections below. PARAFAC. PARAFAC is a tensor rank decomposition method that has been applied successfully to GC × GC –TOFMS data for deconvolution, noise reduction, and calibration. PARAFAC is based on alternating least squares decomposition that requires at least three-way data, requiring a multiway detector that produces sufficiently trilinear data.129 The instrumental design, such as the temperature programming rate and PM selection, impacts the trilinearity of the data, particularly for compounds that are highly retained on 2D separations.13 A noticeable deviation from trilinearity may be observed as a shift in 2tR (2D retention times) between successive modulations, as illustrated in Figure 6B, unless care is taken with the separation conditions. Deviation from trilinearity is observed when a relatively long PM (~ 5 to 8 s) and high temperature programming rates are used.13 If PARAFAC is to be applied to significantly nontrilinear data, then local data alignment must be performed on the data along the 2D dimension to restore sufficient trilinearity. Then, PARAFAC may be applied as in the ideal case of Figure 6A. Otherwise, the application of PARAFAC to significantly nontrilinear data can result in considerable errors in quantification.13 Figure 7A shows a graphical representation of PARAFAC deconvolution, demonstrating the resulting loadings plots in the 1D retention time, 2D retention time, and mass spectral dimensions. The application of PARAFAC to obtain a pure mass spectrum can increase confidence in analyte identification when compared to matching the raw mass spectrum to a standard reference spectrum (Figure 7A). PARAFAC was successfully applied in Figure 7A because a short PM (1.5 s) was used in conjunction with a reasonable temperature programming rate (8°C/min) by Synovec and co-workers, resulting in sufficiently trilinear data.14 29 ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 30 of 81

Recent advancements involving PARAFAC include application to GC × GC × GC data and for property prediction.130,131 With the application of PARAFAC to GC × GC × GC – TOFMS, the data structure contained four quadrilinear dimensions: 1D retention time, 2D retention time, 3D retention time, and the mass spectral dimension.130 PARAFAC has also been applied in an automated fashion for non-targeted deconvolution of large sections and even whole GC × GC – TOFMS chromatograms.132 A variation of PARAFAC, called PARAFAC2, loosens the data trilinearity requirement, and is therefore more able to deal with 2D misalignment. Currently, PARAFAC2 has been applied more commonly to GC-MS data;133 however, this method is promising for application to GC × GC data in the future.16 MCR-ALS. MCR-ALS is an iterative method that initially estimates the pure chromatographic profiles and pure spectra and repetitively alternates and tests these values for convergence. Similar to PARAFAC, the number of components (rank) must be provided, and this value is often varied to determine the best model when the number of chemical components is unknown; additionally, multiple constraints are also possible in the application of MCR-ALS, including unimodality, nonnegativity, selectivity, and convergence constraints. MCR-ALS operates essentially the same as PARAFAC, but is applicable to bilinear data. In other words, Tauler and co-workers have demonstrated that MCR-ALS can be applied to chromatograms regardless of the retention time shifting in the 2D dimension,15,134 allowing for the use of a high temperature programing rate. This means that MCR-ALS is able to handle the misaligned sample case that is presented in Figure 6B without requiring 2D retention time alignment. For example, in Figure 7B, MCR-ALS is preferred over PARAFAC due to the use of a relatively high temperature programming rate (10 °C/min) and long modulation period (6 s), giving rise to a high k’ range for the 2D separations, resulting in a significant degree of retention 30 ACS Paragon Plus Environment

Page 31 of 81

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

time shifting between successive modulations (~ 30 ms).15 However, large retention time shifting in both GC × GC dimensions needs to be addressed to restore bilinearity before MCR-ALS is applied.116 MCR-ALS has been demonstrated in applications of complex samples such as petroleum, metabolomics, and environmental samples, where it can be utilized for calibration, classification/prediction,135 and resolution.30,136,137 One limitation to MCR-ALS is the lack of unique solutions, also referred to as the presence of rotational ambiguities, particularly when one sample is analyzed. One way to overcome this issue is to simultaneously analyze multiple samples/data sets through matrix augmentation, resulting in an extended MCR-ALS method. Matrix augmentation can be achieved by unfolding the 2D data and concatenating the individual two-way arrays as either rows or columns (Figure 6D). Additional Deconvolution Methods. Recently, independent component analysisorthogonal signal deconvolution (ICA-OSD) has been introduced as a potentially computationally faster alternative to MCR-ALS.138 This method is a combination of ICA, a blind source separation algorithm that maximizes the statistical independence between components, and principal component analysis (PCA), discussed in a later section, which replaces the traditionally used least squares algorithm. This method has also been applied in an automated fashion for the deconvolution of compounds in metabolomics samples.139 As mentioned previously, deconvolution methods can also be applied in an automated fashion, known as global spectral deconvolution. For the purpose of improving non-targeted deconvolution, a spectral deconvolution method based on non-negative matrix factorization (NMF), described elsewhere,140 has been developed for automated use on high resolution mass spectral data.141 31 ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 32 of 81

Pattern Recognition. A major goal in performing GC × GC analyses is to discover relevant features in the data across multiple samples. Many pattern recognition applications fall under this category of cross-sample analysis, including sample classification, class comparison, chemical fingerprinting, and chemical/biomarker discovery. These methods are classified as nontargeted approaches, where the relevant analytes may be unknown beforehand, and the experimental design may be either supervised or unsupervised. Supervised methods are used when information about the samples (i.e. classes) is known beforehand and thus utilized in the experimental design and operation of the method. Unsupervised methods are applied when there is much less known about the sample classifications a priori and this information is desired (i.e. sample class membership). Classification is used to describe pattern recognition methods that aim to classify samples into groups based on similarity, such as principal component analysis (PCA). These methods can be used to classify unknown samples into known classes (origin, etc) based upon the chromatographic data/features. Chemical fingerprinting falls under this category, in which the unique chemical signature of an unknown sample is used to determine identifying characteristics for that sample. Class comparison methods aim to identify significant differences between samples belonging to different classes, whereby class membership is known a priori, such as Fisher ratio analysis, which can assist/result in the discovery of biomarkers or chemical markers that differentiate sample classes. Pattern recognition methods can be used for feature selection or data reduction, followed by further chemometric analysis. Typically, because these methods are applied to/across (and require) whole chromatograms of multiple samples, data misalignment in both GC × GC dimensions becomes a significant issue in implementation (Figure 6C). This problem can be dealt with using data alignment methods, but has also been addressed by data reduction techniques such as binning and tiling. The common pattern

32 ACS Paragon Plus Environment

Page 33 of 81

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

recognition methods (PCA, Fisher ratio, PLS-DA, etc.) will be discussed below in terms of method fundamentals, experimental and data requirements, common issues and ways to address them, and recent advances of each. Principal component analysis (PCA). PCA is a popular tool utilized in many applications for the statistical separation of sample classes, providing information on the variables according to which the samples are separated (loadings) in addition to the actual separations between the classes (scores). PCA is an unsupervised non-targeted classification method that is based on rotation of the space axes to encompass the greatest chemical variance in the measurements. The new axes are called principal components and are orthogonal to each other. The scores plot shows the degree of sample clustering, shedding information on distinct grouping of samples. For many applications, sample classes are known beforehand: origin/source, type/variety, age/year, processing, and many other variables can be reflected in the samples. The loadings plot provides information about the basis of the groupings; in other words, the loadings can be used to determine which regions of the chromatogram (i.e. peaks/analytes) are important in differentiating the various sample classes. Since PCA determines the sources of the greatest variance, preprocessing is necessary to obtain meaningful results. In general, mean-centering and scaling are important steps before applying PCA to a dataset; in chromatography, this is achieved through baseline correction and normalization. As seen in Figure 6C, PCA is often applied to the whole chromatogram across multiple samples; therefore, sufficient retention time alignment in both the 1D and 2D dimensions is very important for proper implementation. Various alignment algorithms are available, and binning is another option for dealing with issues caused by misalignment (Figure 6C). Herein, binning (or tiling) refers to summing the area of user-specified regions of 2D separation space, 33 ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 34 of 81

whereby the data density is reduced and misalignment is mitigated. Then, PCA is applied to the unfolded data in the form of a two-way array, where the number of rows is equal to the number of samples and the number of columns is equal to the number of variables (i.e. data points or bins) in the chromatogram. PCA can be applied to the whole chromatogram or a region of the chromatogram. Similarly, the analysis can be performed using all mass channels (m/z), a subset of all m/z, or the total ion current (TIC) chromatogram. In the former cases, the data must be completely unfolded along both time dimensions and the mass spectral dimension. In the latter case, summing to the TIC reduces a significant dimension of chemical selectivity, but the number of data points is significantly reduced. Figure 8A illustrates the utilization of PCA by Focant and co-workers for the purpose of characterizing petrochemical base oils based on volatiles and physiochemical properties.17 The scores plots show the separation of the different base oil groups based on the variables (identified analytes) highlighted in the loadings plots. PCA may be implemented for feature selection, due to its powerful abilities as a data reduction method. In many studies, PCA is applied for data preprocessing prior to the application of additional chemometric methods, such as deconvolution or other pattern recognition methods.131,135,142,143 Alternatively, PCA can be applied to reduced data sets to improve class separation.7,22,143–147 PCA can also be used to classify unknown samples based on the sample grouping of known samples. Multiway PCA is the extension of PCA to high order data, such as unfolded GC × GC data.25,135,148,149 Fisher Ratio Analysis. Non-targeted discovery-based analysis is an important focus as it allows the analyst to uncover chemical features of interest that are not known beforehand. Fisher ratio (F-ratio) analysis is a supervised non-targeted approach to discover underlying differences in samples. Supervision refers to prior classification of samples/chromatograms as they relate to 34 ACS Paragon Plus Environment

Page 35 of 81

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

the experimental design. The F-ratio approach is an analysis of variance method that provides data reduction to elucidate class distinguishing features. The F-ratio is calculated as the between class variance divided by the sum of the within class variance. Since the F-ratio calculation is based on the variance of the signals in the GC × GC data (raw data approach) or variance in quantified analytes (peak table approach), the F-ratio method prioritizes statistical significance over absolute signal/concentration with a higher F-ratio having a greater difference between the two classes. Similar to PCA, F-ratio analysis can be applied prior to additional chemometric analysis, serving to improve variable selection,7,22,143,145,146 or after other data reduction methods have been applied.142 However, if sample class membership is known a priori, F-ratio analysis is preferred over unsupervised PCA, since within class variance in the samples severely hampers successful implementation of PCA.150 A critical aspect of F-ratio analysis is the alignment of the data prior to the calculation of the F-ratio (Figure 6C). If sufficient misalignment is present, this may result in false positives or false negatives. Thus, the implementation of F-ratio analysis can often take many different forms. Pixel-based F-ratio analysis means an F-ratio is calculated at every data point within the chromatogram, i.e., at every point in which a mass spectrum has been collected in the GC × GC separation. In order to maximize the performance of pixel-based F-ratio analysis, the data should be initially aligned. F-ratio analysis can also be performed on peak tables of quantified/identified analytes.7 After the peak tables are calculated, they are aligned so F-ratios can be calculated. While this is a simpler method then dealing with 2D retention time misalignment, peak tables may miss analytes present at low concentrations. Another way to deal with misalignment is binning the data based upon the 1D and 2D peak widths in relation to the observed retention time shifting. However, this will result in splitting analytes into multiple bins and lower F-ratio

35 ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 36 of 81

values. In order to address this issue with binning, a method has been introduced to use a tiling scheme which creates bins at different locations enabling the bin to be centered on the peak resulting in a maximum F-ratio18,151–153. After the F-ratios are calculated, a hit list is assembled indicating the 2D retention times, with the highest F-ratio indicative of the statistically most prominent feature or “hit”. Often, a Fratio cutoff in the hit list is selected to determine the point at which the analysis becomes unreliable. Although it is sometimes possible to examine the entire hit list to determine which hits are true positives and which are false positives, this is a time consuming step. Manual selection of the cutoff is possible, but may require the user to wade through the hit list until too many false positives are discovered. Two different methods have been purposed to define a Fratio threshold cutoff. The first is using the F-critical value,22,146 and the second method takes a null distribution approach, which takes advantage of the inherent noise (instrumental and chemical) in the data set.18,152,153 Many of these issues are illustrated in Figure 8B, where the performance of tile-based F-ratio analysis is evaluated for a large yeast metabolomics data set.18 The tile size indicated in Figure 8B is selected to adequately handle the retention time shifting on 1

D and 2D dimensions. Additionally, the null distribution approach is presented to assist in

choosing an appropriate F-ratio threshold, below which hits are likely to be insignificant (Figure 8B). As seen in Figure 8B, the F-ratio threshold determined from the null distribution approach (chosen as the 95% confidence limit) gives a higher F-ratio cutoff value than the F-critical value, meaning that there are significantly fewer hits to be evaluated with the null distribution approach.18 Additional Pattern Recognition Methods. There are other notable pattern recognition and classification methods well suited for the analysis of GC × GC data, including partial least 36 ACS Paragon Plus Environment

Page 37 of 81

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

squares-discriminant analysis (PLS-DA),29,154–156 linear discriminant analysis (LDA),157 ANOVA-simultaneous component analysis (ASCA),158 random forrest models,147,159 cluster analysis,7,160,161 and other supervised classification methods.162 PLS-DA is a very common classification method that is based upon classical PLS where the response is categorical instead of continuous. Automated classification of GC × GC data has also been addressed by the development of advanced scripting methods based on knowledge-based rules.163 Property Prediction. There is growing need to relate physical and/or chemical property data obtained via a variety of other methods (some analytical, some not so much) to the chemical information obtained from GC × GC. Often GC × GC analysis is a simpler, more robust, or a quicker method then the multitude of other measurement tools and methods used to obtain the sample property data. Property prediction is also important to link the chemical information contained in the chromatographic data to the property data, in order to create meaningful understandings between the two data sets. Partial Least Squares (PLS) Regression Analysis. Partial least squares (PLS) regression analysis is the most commonly implemented chemometric method for the prediction of properties. A detailed review of the theory of PLS analysis can be found elsewhere164 but briefly, PLS analysis attempts to correlate two data matrices (X-block and Y-block) by calculating loadings referred to as the number of latent variables (LVs). PLS analysis aims to model the covariance between these two matrices by finding the multidimensional direction in the X-block which explains the maximum multidimensional variance in the Y-block. In other words, PLS analysis uses analytes which have a large range of intensities across the GC × GC samples and attempts to correlate them to the differences in the physical or chemical properties of the samples (the property data). PLS modeling provides a one-to-one (i.e. linear) correspondence between the

37 ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 38 of 81

measured property values relative to the predicted property values. While the model is a linear correlation, it is possible to scale the data (e.g. logarithmically, quadratically, etc.). PLS analysis provides the following two valuable outcomes. First, using a training set of samples, a linear correspondence of the previously measured property data is modeled using the GC × GC data, so subsequent analyses of GC × GC data of new samples can be used to predict the property value of the new samples without having to directly measure these properties on the new samples. Second, since the underlying relationship between the chemical compositions of the samples is correlated to the property data, a deeper understanding between the two measurement platforms is provided. PLS regression analysis is typically performed on the unfolded data set (Figure 6C). A potential issue for PLS analysis is run-to-run retention time shifting, which is addressed by either alignment or data binning approaches previously described. While both options have merit, binning has the additional benefit of saving computation time. Prior to the calculation of the PLS model, the X-block (GC × GC chromatograms) and Y-block (predicted properties) must be mean-centered. After calculation of the PLS model, cross validation is often performed to evaluate the model. There are many cross validation methods available for this purpose. The first outcome of the PLS modeling is the regression plot showing the relationship between the GC × GC chromatograms and the predicted property. This is typically shown as the measured property on the x-axis and predicted property on the y-axis. The other outcome of the PLS modeling is the linear regressions vectors (LRVs), which can be examined to determine which compounds are positively correlated (or negatively correlated) with the property being predicted. Analytes that exhibit positive values in the LRVs are correlated with increasing the

38 ACS Paragon Plus Environment

Page 39 of 81

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

specified property while negatively correlated analytes will exhibit negative values in the LRV and are correlated with decreasing the specified property. Figure 8C demonstrates the use of PLS by Van Geem and co-workers to predict fouling tendencies of gas condensates.19 Feature selection methods were used prior to generating the PLS models.19 The cross-validated prediction of the model versus the measured fouling is shown in Figure 8C. Recently, PLS analysis was used to compare GC × GC – TOFMS to physical properties of kerosene-based rocket propulsion fuel.165 Due to the huge data density of the data set, and observed retention time shifting, the data was binned in order to mitigate the impact of the retention time shifting and reduce computation time. The resulting PLS models tested the ability to correlate the chemical information obtained from GC × GC – TOFMS to the physical properties of the fuels, and also enabled conclusions to be drawn regarding how chemical composition affected those properties. Additionally, extension of PLS to high order data sets through n-way PLS (N-PLS) and unfolded PLS (U-PLS) has been achieved in GC × GC analysis applications.166,167 Additional Data Analysis Methods. Retention time and/or index modeling is an area in which a key focus is to improve the identification of unknown peaks and optimize separation conditions by exploring the interdependent relationships between the separation dimensions and parameters. The continuing prevalence of GC × GC separations in various applications, such as identification of sulfur-containing compounds in crude oil, drives development in this area.168 Recently, the development of new thermodynamic models for precise predictions of retention times has focused on reducing the number of measurements/experiments to establish the model20,51 and reducing the number of parameters required to decrease complexity of the

39 ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 40 of 81

model.21 Additionally, the effects of temperature and pressure on solute partitioning has been studied and incorporated into a new thermodynamic retention model.169 Model-based approaches are also used to estimate the separation performance of particular separation conditions with the goal of designing optimized separations for various samples. Common indicators (or metrics) of separation effectiveness are the number of apparent resolved peaks and 2D separation orthogonality. Systematic optimization of various chromatographic parameters can be evaluated in the context of these separation metrics, including starting oven temperature, flow rate, temperature program ramp rate, modulation period, column lengths, and stationary phase combinations, achieved in a way that is sampleindependent.58,170,171 As GC × GC separations are being applied more often for routine analysis, there has been a move towards commercialization (i.e. wide distribution) and automation of data analysis methods to deal with the large data sets. The area of data analysis that most exemplifies this direction is peak detection, identification, and quantification. Besides the multitude of commercial software packages with automated peak finding and quantification features, many groups are aiming to develop software to use with popular packages (MATLAB®, R, GC Image™, and so on) that are available to address data analysis needs, such as targeted and nontargeted analysis, in an automated way.172–176 Other web-based and independent software packages are available for automated, non-targeted identification of compounds based on mass spectral data.177,178

APPLICATIONS GC × GC has gained acceptance as a useful tool to analyze complex mixtures in a variety of fields, such as but not limited to the following application categories: (1) forensic, (2) 40 ACS Paragon Plus Environment

Page 41 of 81

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

environmental, (3) fuels, (4) food, flavors and fragrances, and (5) biological, including metabolomics and biological VOC profiling. While the peak capacity gain over traditional 1DGC has been demonstrated, analytical challenges remain, often due to the complex matrices and dense data structure.48 For instance, in complex mixtures, there often exists a wide concentration dynamic range that necessitates an instrumental or data analysis platform capable of identifying trace levels of relevant analyte compounds in the sample matrix. Recent instrumental advances such as HR-TOFMS are capable of improving the resolution of chemically similar analyte compounds. However, discovering these trace level compounds can be further challenged by the addition of sample matrix interferences. As a result, sampling techniques have been developed to reduce the impact of the matrix. In addition to the challenges introduced due to wide dynamic ranges and matrix interferences, GC × GC methods often require significant development to address the variety of compound classes, which is highly sample dependent. Due to the complex nature of samples being studied, identifying statistically significant trace analytes can be difficult in the dense data structure. The use of GC × GC to analyze complex samples has continued to expand to more chemical specialties, due to advancements in instrumental design and data analysis methods that address the challenges. Recent applications to highlight these advancements in several of these specialties is examined below. Forensic. Forensic analysis encompasses a wide variety of sample types, as each piece of evidence is unique to the circumstances in which the incidence occurs. Common matrices include illicit drugs,10,136,156 human blood and tissue,22,146,179–186 arson,36,187 and explosives.188 Challenges in the field relate to the complexity of the sample, and the ability to discern the origin of the evidence, as well as the need for identifying trace level analytes among a complicated background. Novel approaches to solve one or more of these will be described.

41 ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 42 of 81

Illicit drug analysis. Traditionally, the identification and analysis of potential illicit drugs has been completed by 1D-GC or LC coupled to MS. However, several studies have recently applied GC × GC to the characterization of chemical profiles of schedule I drugs.10,136,156 Using a previously optimized sample extraction method, GC × GC – TOFMS was implemented to examine 3,4-methylenedioxymethamphetamine (MDMA) samples that were seized by police, with a goal of providing the MDMA chemical profiles, in an effort to determine differences in impurity that could link the samples to a source or manufacturer.156 The development of a chemometric analysis routine using ANOVA and PLS-DA (PLS_Toolbox, Eigenvector Research, Inc.) allowed for the identification of potential markers which would have been omitted from a traditional MDMA targeted method. Additionally, the method was shown to successfully identify MDMA samples based on their origin. Two chemometric methods, MCR-ALS and PCA, were combined in the data analysis work flow to deconvolute and classify Cannabis samples collected from several gardens.136 The goal was to test the ability of these chemometric methods, when using data collected with a GC × GC – qMS instrument, and determine similarities or differences between samples. GC Image™ software enabled further chemometric analysis of the data set, focused principally on selected regions of interest. PCA of these regions of interest allowed for a direct comparison of peak responses across samples. Decomposition Profiling. Forensic investigations have relied on the use of human remains detection (HRD) dogs to locate locations where a decomposing body may have been present. However, little is known about what compound or series of compounds the dogs identify. There currently exist discrepancies in the profiles reported due to variation in sample collection as well as type of sample and environment in which the decomposition occurred.

42 ACS Paragon Plus Environment

Page 43 of 81

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Recent studies have aimed to improve knowledge about decomposition profiles, as well as establish an accepted approach that can be accepted in the court of law.22,146,179–186 A novel study using thermal desorption (TD) coupled to GC × GC – TOFMS compared pig carcass decomposition to human remains to determine whether porcine is an acceptable training tool for HRD dogs.183 HRD dogs are often trained using a synthetic decomposition scent.185 However, these samples often do not contain all identified compounds present during decomposition. Ten thousand hits were identified over the set of chromatograms, with Fisher ratio analysis rapidly reducing the number of compounds to those most relevant to the study. Overall, 300 compounds were identified as resulting from decomposition rather than environmental or instrumental factors. Finally, PCA assisted in determining important compound classes for each stage of decomposition. Ultimately, the study concluded that the decomposition profile for porcine is a suitable alternative for HRD dog training as it is closely related to that of human decomposition. Most decomposition studies have been performed over the course of a year or years, providing valuable information for the location of long-deceased remains. However, in the case of mass disasters, little is known about the variation between live human scent and recently deceased humans. A recent study aimed to uncover some of these differences or similarities by investigating an early postmortem interval (PMI) of 0-72 hours.146 Thermal desorption was again the sampling method of choice, followed by GC × GC – TOFMS analysis. The column set used was previously demonstrated to be beneficial for VOC profiling: Rxi-624Sil MS (30m x 0.25 mm ID x 1.40 µm df) and Stabilwax (2m x 0.25 mm ID x 0.50 µm df).181 Following the data collection, Fisher ratio analysis was applied as a feature selection preprocessing step, to remove compounds on the hit list below the appropriate F-critical value. PCA was then performed on the

43 ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 44 of 81

remaining hits using the CAMO Unscrambler® X software. Ultimately, 105 compounds of interest were identified, with nitrogen and sulfur-containing compounds representing the two highest proportions of VOCs detected across all postmortem samples. The results reported by Forbes and co-workers, indicated that there was a highly dynamic VOC profile each hour, an important finding when considering the use of search dogs during a mass disaster. These studies on the use of natural training aids for HRD dogs were expanded by investigating the effect of various textiles on decomposition profiles.22 In this study, Nizio et al. used solid-phase microextraction (SPME) as the sampling method of choice, rather than TD. A DVB/CAR/PDMS fiber was chosen due to the wide compound range collected during optimization. Data analysis followed previous studies186 and utilized Fisher ratio to identify possible compounds of interest, followed by PCA. It was concluded that 100% of the cotton textiles are potentially a suitable material for HRD dog training, as they retain a large portion of the decomposition VOC profile. However, the wide variation in the profiles over time suggests that multiple samples may be necessary for training purposes. Ignitable Liquids and Explosives. Arson and explosives have been extensively studied by forensic professionals as both can lead to significant loss of life and property.36 Although traditional methods for arson analysis often relies upon GC-MS methods,187,188 there has been increased use of GC × GC methods. Explosives analysis has the challenge that many of the main constituents are thermally instable with low vapor pressures. Targeting the minor constituents often provides insight into the source of the explosive without concern for the thermally unstable constituents. Two instrumental platforms, GC × GC – FID and GC × GC – TOFMS, were recently utilized to study neat white spirits, a common arson accelerant, from an extensive list of distributors in the Netherlands.36 For both platforms, the column set consisted of an Agilent DB-

44 ACS Paragon Plus Environment

Page 45 of 81

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

1 (30 m x 0.25 mm I.D. x 0.5 µm df) coupled to a DB-17 (1 m x 0.100 mm I.D. x 0.2 µm df). Using data from the GC × GC – FID analyses, PCA indicated that differences in samples may be due to changes in composition over time, rather than differences in manufacturers themselves. Sixty seven total analyte peaks were found to be substantial contributors to the first three PCs, resulting in the attempt to identify these compounds using the mass spectral data. The results are highlighted in Figure 9A. Only 19 of these analyte peaks were confidently identified using the NIST MS library for three reasons. The FID generally provided lower LODs than MS, so the “discovery” of the significant analyte peaks using the FID data set was more confident than their identification by MS. Second, some co-elution remained despite the increased peak capacity using a GC × GC separation. Finally, multiple congeners of some compounds means a single structure cannot be associated with some of the analyte peaks. Despite these concerns, the conclusion is that the usage of both FID and MS for ignitable liquids analysis has many benefits. Environmental. With the implementation of laws that limit contamination levels in consumer products, soil, water and air, the use of analytical chemistry in environmental analyses has trended towards recognition and quantification of toxins in a variety of matrices.189 Traditional gas chromatographic methods such as 1D-GC remains useful when the contaminant is known, but has severe shortcomings when the sample is complex and the contaminant(s) is/are unknown.190 The use of GC × GC for environmental applications has recently been used to identify emerging contaminants in waste water48 and characterize carbazoles in lake sediment.190 Further studies will be discussed below, with an emphasis on instrumental and data processing methods that assist in complex contamination identification. Soil and Sediments. Lake sediment contaminated with petroleum was characterized using GC × GC – FID using a novel resistively heated modulator developed by Gorecki and co-

45 ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 46 of 81

workers.68 Petroleum hydrocarbon (PHC) contamination is a concern in the Antarctic and subAntarctic environments due to the impact on terrestrial and marine habitats.68 Using an eightlevel calibration of diesel standards, the LOD of PHCs was determined to be 11 mg/kg, which showed a significant improvement over the LOD of the previous 1D-GC method (64 mg/kg).191 After using GC Image™ to generate peak tables, the data was exported to MATLAB® for PCA. The results of the PCA suggested the sample sites could be differentiated by the amount of PHCs present as well as the proportion of polar compounds. Snow, sediment, air and delayed petcoke were analyzed via GC × GC – TOFMS using a liquid crystalline (LC-50) column coupled to a nanostationary phase (NSP-35) column (J&K Scientific) to study the potential of delayed petcoke as a contaminant source as well as develop a mass spectral library that can be used as a reference in future studies.192 The use of nontraditional columns allowed for the separation of a variety of polyaromatic hydrocarbons (PAHs). Two hundred and fifty nine analyte peaks of interest were found in common when soil, snow and air samples were compared to petcoke. These 259 peaks were classified into 21 isomeric groups. Of those 21 groups, only six were found to be statistically different among snow, sediment and air collected within 30 km of the reference site. Due to the similarity in chemical profiles between the delayed petcoke and environmental samples, it was concluded that petcoke could be an important source of heterocyclic aromatics in the local environment. Water. Water samples are often used to determine the presence of contaminants being introduced into the environment via commercial processes. Specifically, waste water is used due to water treatment plants being the convergence point of urban pollution.49 Treated water is typically released into natural bodies of water, which can have an impact on marine and

46 ACS Paragon Plus Environment

Page 47 of 81

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

terrestrial ecosystems. Thus, significant research has been ongoing in creating methods for water analysis.23,48,49,193,194 Waste water treatment processes determine which compounds are released into the environment as well as the levels present. A rapid scan quadrupole mass spectrometer from Shimadzu (QP 2010 Ultra) was coupled to GC × GC, with the goal of developing an instrumental method capable of detecting estrogenic compounds with limited sample clean-up.49 Results indicated that the LOD was between 1.4 – 2.9 ng/L for the five estrogenic compounds analyzed. While the use of TOFMS has been demonstrated elsewhere for emerging contaminants,48 this study demonstrated that a less expensive MS option is achievable for nanogram level estrogen containing pollutants. Brominated pollutants in the water of Lake Geneva were studied using GC × GC coupled to an electron capture detector (ECD) from LECO®, as well as a GC × GC coupled to a TOFMS capable of both electron ionization and electron capture negative ionization (ENCI) sources (Zoex Corp.).23 Using both detectors provided a sensitive and selective method for the detection and quantification of a wide range of contaminants in complex samples with minimal sample preparation.195 The ECD improves quantification due to its selectivity and exquisite sensitivity for halogenated compounds, resulting successful determination of the levels of seven brominated persistent and bio accumulative pollutants (PBPs) in Lake Geneva. Biological Matrices. Due to the bioaccumulation of many pollutants, the study of biological matrices can provide valuable information into the fate of marine and terrestrial ecosystems. A non-targeted analysis of common bottlenose dolphins by Hoh and co-workers, identified novel DDT-related compounds that may not have be present in current monitoring studies.24 The use of GC × GC – TOFMS identified multiple DDT related compounds and

47 ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 48 of 81

discovered degradation products that were not currently monitored. Ultimately, this information provided a more complete knowledge of the transportation and fate of these molecules with a biological system then previously known. Persistent organic pollutants (POPs) such as PCBs in Leach’s storm petrel seabird have also been studied by Megson et al.196 The GC × GC – TOFMS method utilized was previously developed for environmental fingerprinting45. A Rtx-PCB (60 m x 0.18 mm I.D. x 0.18 µm df) column was used in the 1D dimension to select for PCB congeners. The most dominate PCBs discovered were CB-153, CB-118, CB-138 and CB-180. Further classification was performed using PCA, which identified the three main groups of birds, based on location and feeding grounds. Ultimately, 83 PCBs were present in