Performance Evaluation of a Nontargeted Platform Using Two

Jun 23, 2019 - Performance Evaluation of a Nontargeted Platform Using Two-Dimensional Gas Chromatography Time-of-Flight Mass Spectrometry Integrating ...
0 downloads 0 Views 4MB Size
This is an open access article published under an ACS AuthorChoice License, which permits copying and redistribution of the article or any adaptations for non-commercial purposes.

Article pubs.acs.org/ac

Cite This: Anal. Chem. 2019, 91, 9129−9137

Performance Evaluation of a Nontargeted Platform Using TwoDimensional Gas Chromatography Time-of-Flight Mass Spectrometry Integrating Computer-Assisted Structure Identification and Automated Semiquantification for the Comprehensive Chemical Characterization of a Complex Matrix Downloaded via 91.204.14.175 on July 20, 2019 at 08:24:04 (UTC). See https://pubs.acs.org/sharingguidelines for options on how to legitimately share published articles.

Arno Knorr,*,† Martin Almstetter,*,† Elyette Martin, Antonio Castellon, Pavel Pospisil, and Mark C. Bentley Philip Morris International Research and Development, Philip Morris Products S.A., Quai Jeanrenaud 5, CH-2000 Neuchâtel, Switzerland S Supporting Information *

ABSTRACT: Nontargeted screening methodologies are powerful approaches for comprehensive chemical characterization of complex matrixes. In order to maximize chemical space coverage, three analytical methods using two-dimensional gas chromatography with time-of-flight mass spectrometry for nonpolar, polar, and volatile compounds have been established. The structural identification process was streamlined with an in-house developed computer-assisted structure identification platform, which facilitated the identification of novel compounds and also delivered semiquantitative concentrations for all compounds. Key performance parameters for this nontargeted platform, including chemical space coverage, confidence for structural identification, accuracy of semiquantification, and performance of differential analysis, were evaluated. The automated structural identification process was assessed using a subset of 243 compounds (out of 2990), which were confirmed to be present in cigarette smoke using reference standards. Consistently high true positive identification rates between 88.2% and 96.2% across the different concentration ranges investigated were demonstrated. Accuracy for semiquantification was assessed by comparison with quantitative data from literature, where a maximum 4-fold deviation from available targeted analysis values was estimated.

I

also a need for an efficient (automated) data evaluation process for structural identification of compounds and semiquantified estimation of their abundance. Comprehensive two-dimensional (2D) gas chromatographic separation (GC × GC) coupled with fast-scanning time-offlight mass spectrometers (GC × GC−TOFMS) represents a well-established advanced field of gas chromatography/mass spectrometry (GC/MS).10 GC × GC−TOFMS is currently recognized as the most powerful separation technique in gas chromatography (GC), and its routine application for the characterization of complex matrixes has been proven and published in numerous scientific reports.10−21 However, despite advantages afforded by the use of comprehensive 2D separation techniques, the large diversity of compounds in complex matrixes such as cigarette smoke cannot be fully

n-depth chemical characterization of complex matrixes such as environmental or biological samples is of growing interest for research, and nontargeted screening (NTS) methodologies have become more popular in recent times, particularly in the field of metabolomics.1−6 Also of growing interest is the comparative evaluation of chemical composition between different sample matrixes, sometimes referred to as nontargeted differential screening (NTDS), where significant differences between comprehensive data sets are determined based upon semiquantitative information.7 Comprehensive chemical characterization of cigarette smoke, and more recently aerosol from heated tobacco and e-cigarette products, represents a significant challenge, with cigarette smoke estimated to contain more than 6000 compounds.8,9 Not only should the instrumentation deliver sufficient chromatographic and spectral resolution, but also the methods selected should cover the chemical space as broadly as possible, in order to minimize the risk of overlooking valuable information. In addition, due to the large volumes of data generated, there is © 2019 American Chemical Society

Received: April 4, 2019 Accepted: June 23, 2019 Published: June 23, 2019 9129

DOI: 10.1021/acs.analchem.9b01659 Anal. Chem. 2019, 91, 9129−9137

Article

Analytical Chemistry

Figure 1. Data processing workflow for NTS and NTDS using GC × GC−TOFMS. The workflow depicts the steps from raw processing to reporting using the in-house developed CASI software platform and algorithms (central box).

for particulate phase constituents [total particulate matter (TPM)], followed by two consecutive solvent traps for collecting the gas-vapor phase (GVP). Different solvents were used to collect GVP for the Nonpolar/Polar methods (dichloromethane/acetone 80:20 v/v) and the Volatile method [N,N-dimethylformamide (N,N-DMF)], each containing a set of method specific stable isotope labeled internal standards (ISTD) and retention index marker (RIM) compounds (see the Materials and Methods and Figure 2). Sample preparation was kept as simple as possible to minimize the possibility for artificial changes in chemical composition during collection/extraction. The workflow for data processing including raw data processing, data consolidation, compound identification, semiquantification, and testing and reporting is presented in Figure 1. The data processing is described in detail in the Materials and Methods section. For structure identification, electron ionization mass spectra files were submitted to the CASI platform for automated processing. CASI calculated predicted values for retention index (RI), 2D relative retention time (2DrelRT; based on analyte retention times in the second dimension relative to a stable isotope labeled n-alkane reference system22), 2D absolute retention time (2 DabsRT; for the Polar method only), and boiling point (BP) for each structural proposal using quantitative structure−property relationship (QSPR) models.22 Predicted values were then compared with experimental values, and the scores for each analytical property (i.e., RI, 2DrelRT, 2 DabsRT, and BP) were normalized from 0 (no similarity) to 1 (perfect match) using quadratic equations

resolved using a single analytical method. Most NTS applications require the use of multiple extraction and/or chromatographic methods that maximize the coverage of the chemical space for their application domain and, ideally, have some degree of overlap. In this publication, we present an approach that combines three analytical methods as the basis of an integrated workflow, from data acquisition to structural identification and semiquantification. The structural identification process is based on an in-house developed computer-assisted structure identification (CASI) platform22 that is designed to increase the confidence in structural proposals by combining mass-spectral database searches with quantitative structure−property relationship (QSPR) models that predict chromatographic properties, and uses a scoring algorithm. A description is given in the Supporting Information, which has been reported in detail previously.22 Raw data processing, data consolidation, semiquantification, and reporting have been integrated into the CASI core module as an established common automated workflow.



APPROACH In order to achieve comprehensive coverage of the chemical space relevant for smoke or aerosol from tobacco-based products, three analytical methods for GC × GC−TOFMS have been established: Nonpolar, Polar, and Volatile (see the Materials and Methods). Smoke from the 3R4F reference cigarette23 was generated according to a defined smoking regime24 and trapped using a Cambridge glass fiber filter pad 9130

DOI: 10.1021/acs.analchem.9b01659 Anal. Chem. 2019, 91, 9129−9137

Article

Analytical Chemistry

Figure 2. Trapping of whole smoke or heated tobacco product aerosol compounds. Trapping for Nonpolar and Polar methods (A) and for Volatile method (B) of whole smoke or aerosol compounds.

applying polynomials factorization. The three analytical property scores and the spectral similarity value were combined to calculate an overall CASI score:

(Effect)3 ( RANK =

m

∑1 Ly m

1000

CASI score = (NIST MF)(RIFIT)(2DRTFIT)(BPFIT)

n

+

∑1 Lx n

)

2

with

where NIST MF is the NIST match factor, RIFIT is the RI score, 2DRTFIT is the 2DrelRT (or 2 DabsRT) score, and BPFIT is the BP score. The CASI score was calculated for each structural proposal (HIT), and proposals were ranked in accordance with their calculated CASI score values (Figure 1, center box). Chromatographic files, peak areas, and HIT structure files were then submitted for semiquantification, which was performed using predefined rules to associate ISTDs with relevant compound classes based on their chemical features; a detailed description is given in the Supporting Information (Data Processing). Finally, simple descriptive statistical analysis (average and covariance) of sample groups/replicates (NTS) and, if required, determination of significant differences (Student t test followed by a mathematical ranking model) between samples (NTDS) were carried out. The ability of the NTDS approach to discriminate compositional differences between samples was assessed using matrix samples fortified with a set of compounds at varying concentrations. The NTDS workflow applied a two-step postevaluation approach. In a first step, semiquantified concentration values were compared groupwise using a Student t test (two-tailed distribution, heteroscedastic) with a minimum group size of three sample replicates per group. Compounds providing pvalues above 0.05 were considered as not significantly different and were filtered out. In the second step, compounds passing the filter (p < 0.05) were ranked according to a mathematical model, developed empirically, taking into account the relative differences (“Effect”) as well as the semiquantitatively determined absolute abundance for the respective compounds between the sample groups:

( Effect = (

m

∑1 Ly m m ∑1 Ly

m

n

− +

∑1 Lx n n ∑1 Lx

n

) × 100 )

where Lx and Ly are measured concentration values within groups x and y and m and n are the number of replicate measurements. To evaluate the performance of the differential screening approach, the TPM extract from a reference cigarette was fortified with standard solutions, each containing the same 10 compounds. Four solutions were prepared using different concentrations (fortification mixtures), which were added to four aliquot portions of the TPM extract. The four fortification levels covered a concentration range of 1−30 μg/cigarette, with x-fold differences in concentration of 2-, 5-, 10-, 20-, and 30-fold. Fortification level 1 (TPM-L1) was compared with fortification level 2 (TPM-L2), and fortification level 3 (TPML3) was compared with fortification level 4 (TPM-L4). Every sample was measured in triplicate and underwent the complete workflow. Theoretical RANK values were calculated for the fortification scheme, and compounds were sorted by relevance according to their theoretical rank values and assigned to a HIT number in both directions of comparison (1−5 and −1 to −5), for which HIT number 1 (or −1 in the opposite direction) represented the most important difference (30-fold difference), while HIT number 5 (or −5) represented the least important difference (2-fold difference). HIT numbers derived from the theoretical RANK values were then compared with experimentally determined HIT numbers (Table 11, Supporting Information: Differential Screening). 9131

DOI: 10.1021/acs.analchem.9b01659 Anal. Chem. 2019, 91, 9129−9137

Article

Analytical Chemistry



For all three GC × GC−TOFMS methods, the chromatographic raw data were first processed for baseline computing (baseline offset: 0.8), for peak finding using method-specific signal-to-noise (S/N) criteria (e.g., S/N 250 for the Nonpolar method), and for spectral deconvolution using the ChromaTOF built-in proprietary algorithm,25 combining seconddimension modulations using spectral match criteria (850), RI calculation, and mass spectral library matching for the ISTDs and RIMs (structural identification for the unknowns was performed as a later step within the CASI workflow). Chromatographic data were aligned by transferring the peak information on a pool sample (in case multiple sample types are in scope) into a calibration template within the ChromaTOF software. After assembling all relevant information, each sample was processed using three different peak width settings for integration (e.g., 0.06, 0.11, and 0.20 s for the Nonpolar method) in order to optimize the assessment of the peak areas for minor, medium, and major peaks (Figure 1, Supporting Information: Data Processing). Solvent peaks and column bleed were excluded using the classification function in ChromaTOF. In addition, nicotine was excluded in the Nonpolar and Polar methods, and triacetin was excluded in the Nonpolar method, as signal intensities for these compounds were above the linear working range of the detector. Result files were exported in .csv format. Prediction of Chemical Parameters for Nontargeted Screening. Models to predict RI, 2DrelRT, and BP values for the Nonpolar method have been developed and reported previously.22 For the more recently developed Polar and Volatile methods, new models were built using Percepta Batch [Advanced Chemistry Development, Inc. (ACD/Laboratories), Toronto, ON, Canada], BIOVIA Pipeline Pilot (BIOVIA, San Diego, CA, U.S.A.), Dragon (Kode srl. Pisa, Italy), and RapidMiner (RapidMiner, Inc., Boston, MA, U.S.A.) software. Experimental 2DrelRT values could not be used to develop a prediction model for the Polar method, as the series of RIMs used did not exhibit a linear relationship for the selected GC × GC column setup. Instead, 2 DabsRT values were used. The descriptors used to create the models were tested with various learning algorithms (k-nearest neighbors, multiple linear regression, and support vector regression). The best combination of descriptors and algorithms, providing the highest predictive correlation (r2), were selected for each method. Details of the prediction calculations, selected algorithms, and results are presented in QSPR Models for Predicting RI and 2Drel/2 DabsRT for Volatile and Polar Methods, Supporting Information.

MATERIALS AND METHODS Trapping and Sample Preparation. Nonpolar and Polar Compounds. For non/midpolar and polar whole smoke or aerosol compounds (determined using the Nonpolar and Polar methods, respectively), a 44 mm Cambridge glass fiber filter pad (purchased in bulk from Hollingsworth & Vose Air Filtration Ltd., Waterford Bridge, Kentmere, LA8 9JJ, U.K., and cut to size by Alfaset, Rue des Terreaux 46-50, 2300, La Chaux-de-Fonds) was used, followed by two microimpingers in series, each containing dichloromethane/acetone (80:20 v/ v; 10 mL). The microimpinger solvent contained a set of RIM compounds and stable isotopically labeled ISTDs and was maintained at a temperature of approximately −78 °C using a dry ice/isopropyl alcohol mixture. For the analysis of TPM collected on the Cambridge pad, the pad was extracted using the same solvent mixture, also containing RIM compounds and stable isotopically labeled ISTDs. For the analysis of GVP compounds, the solvent from both microimpingers was combined. Following a liquid−liquid partition with water, the dichloromethane layers were used to assess the non/ midpolar compounds and the aqueous layers were used for the analysis of polar compounds (Figure 2A). The aqueous layer was injected directly into the gas chromatograph for the Polar method using a water-resistant ionic liquid-phase precolumn. Direct injection of water without the need for derivatization of (water-soluble) polar compounds represents a new approach to analyze aqueous solutions by GC. The design of the workflow accounts for chemical species that are distributed between both solvent layers by summing up their concentration according to their presence in each layer. In order to prevent degradation of thermolabile constituents, extracts were introduced into the GC via a cool on-column injection technique. Volatile Compounds. Volatile compounds were trapped by passing whole smoke or aerosol through a Cambridge glass fiber filter pad followed by two microimpingers, each containing N,N-DMF (10 mL) and a set of RIM and ISTD compounds, maintained at approximately −60 °C (Figure 2B). For the analysis of TPM collected on the Cambridge pad, the pad was extracted using the same solvent, also containing RIM compounds and stable ISTDs. For the analysis of GVP compounds, the solvent from both microimpingers was combined. Samples were injected without further preparation using cool on-column injection into the GC × GC−TOFMS to analyze volatile compounds eluting prior to the solvent N,NDMF (BP 153 °C). Chromatography, Mass Spectra, and Chemical Materials. A detailed description is available in the Supporting Information (Analytical Methods), where GC × GC surface plots showing the distribution of the constituents between TPM and GVP fractions as well as RI and 2DrelRT/2 DabsRT prediction models for the Nonpolar, Polar, and Volatile methods are presented. GC × GC contour plots show RIMs and isotope-labeled ISTDs, and correlations of predicted RI and 2 DabsRT (Polar method) or 2DrelRT (Nonpolar and Volatile methods) versus their experimental values are presented. Data Processing. Data processing was performed using LECO ChromaTOF software version 4.5 (LECO Corporation, Saint Joseph, MI, US) for automatic peak finding, spectral deconvolution, peak integration, and alignment.



RESULTS AND DISCUSSION Chemical Space Coverage. The suite of three analytical methods was designed to provide maximal coverage for the chemical space attributable to cigarette smoke or aerosol from heated tobacco products, which was amenable for GC/MS analysis. In order to create a visual representation for the distribution of chemical constituents (i.e., the chemical space) known to be present within tobacco and tobacco smoke, log values for vapor pressure (logVP) and octanol/water partition (logPOW) coefficients were calculated for each constituent. Vapor pressure was calculated using ACD/Laboratories software version 2016 at 25 °C and expressed as mmHg. LogPOW was predicted using ACD/Percepta Batch software based on logPOW contributions of separate atoms, where 9132

DOI: 10.1021/acs.analchem.9b01659 Anal. Chem. 2019, 91, 9129−9137

Article

Analytical Chemistry

Figure 3. Coverage of the tobacco smoke and aerosol chemical space by three methods of nontargeted GC × GC−TOFMS analysis. Known tobacco smoke/aerosol space is colored gray. Structures proposed by CASI platform with CASI score ≥ 700 (generic) are colored: Nonpolar (green), Polar (blue), and Volatile (red-orange). LogPOW: logarithm of octanol/water partition coefficient. LogVP: logarithm of vapor pressure. Confirmed: structures confirmed by reference compound analysis.

structural fragments and intramolecular interactions between different fragments were derived from an internal experimental logP database.26 Compounds related to the tobacco plant and tobacco smoke/aerosol were identified from literature as well as from in-house data accumulated over several years of research and were curated using a chemical reference database.8,27 For a total of more than 10 000 compounds, a logVP and logPOW XY function is presented in Figure 3. The visualization of a data set for 3R4F whole smoke reflects the expected application domain for GC, located on the right side of the logVP axis. The graph shows that the coverage for individual methods overlapped, indicating that potential gaps in the target chemical space had been avoided. It is assumed that this chemical space also includes compounds that are present in the aerosol generated by heated tobacco products. Performance Limits. A data-driven hypothesis, using the results from the analysis of reference cigarette smoke, was formulated to evaluate the performance limits of the methods applied to the target chemical space, with increasing numbers of compounds as a function of decreasing abundance being anticipated. Extrapolation of the expected number of compounds identified as a function of concentration follows a logic rule; Figure 4A shows the concentration level at which the numbers of compounds identified by the nontargeted platform started to deviate from the expected number of compounds. This was interpreted as being the lower limit of the working range for the platform. The analysis of reconstituted whole smoke (TPM and GVP fractions combined) indicated this concentration level to be 0.1 μg/ cigarette According to this hypothesis, the compounds identified as being present at concentrations below 0.1 μg/ cigarette (orange columns, Figure 4A) may not accurately reflect the complexity of cigarette smoke. Indeed, the accumulated masses of compounds within low concentration ranges (in our example derived by 3-fold serial dilution) declined exponentially (Figure 4B). The sum of semiquantified masses for compounds detected below 0.1 μg/

Figure 4. Dotted line on top of the “number of compounds” bars represents the expected evolution of compound identification (A); semiquantified masses of compounds (B) for reconstituted whole smoke (sum of TPM and GVP fraction) of 3R4F identified by GC × GC−TOFMS NTS per concentration range.

cigarette represented 0.19% of the total mass detected for reconstituted whole smoke. Therefore, by setting a 0.1 μg/ cigarette cutoff limit, we estimated that at least 99% of the total mass of all compounds amenable to GC/MS were covered. The hypothesis for an exponential growth in the number of compounds moving toward lower concentrations was success9133

DOI: 10.1021/acs.analchem.9b01659 Anal. Chem. 2019, 91, 9129−9137

Article

Analytical Chemistry fully tested by reprocessing the most complex of our data sets, derived from the Nonpolar method using a more sensitive peak-finding criteria (S/N 150) compared to our standard method criteria (S/N 250). An exponential decline in accumulated mass moving toward lower concentration ranges was also confirmed. A detailed description is given in the Supporting Information (Evidence for the Evolution of Compounds in Cigarette Smoke). Confidence in Compound Identification. The CASI platform for GC × GC−TOFMS (published in this journal in 2013)22 was core to performing high-throughput identification of the 2990 compounds (Supporting Information: Constituent List 3R4F) present in a combined TPM and GVP data set derived from the 3R4F reference cigarette.22 The structural identities for a subset of 253 compounds were confirmed experimentally by the analysis of purchased reference standards. The remaining 2737 compounds were assigned an identification confidence, which was classified according to CASI score: HIGH (CASI score ≥ 795), MEDIUM (700 ≤ CASI score < 795), or NOT IDENTIFIED (CASI score < 700) (Figure 5). A total of 53.5% of the 2990 compounds in

Figure 6. Confidence for CASI score-based structural proposals relative to the semiquantified concentration range for constituents identified during the analysis of 3R4F [TPM and GVP combined (reconstituted whole smoke)] using NTS with GC × GC−TOFMS.

For less-abundant compounds, the number of unidentified constituents increased dramatically in comparison to those with reliable proposals. In addition to lower spectral quality affecting the identification process for less-abundant compounds, other factors may have been attributable, such as the increased presence of less-common structures at lower concentrations that were not available in any commercial mass spectral databases (e.g., representative of unfavorable reaction chemistry, complex thermal degradation, or rearrangement processes). The evaluation of a subset of the 3R4F data focusing only on compounds confirmed by reference standard revealed that 88.5% of HIT 1 (HIT ranked first according to CASI score) proposals for the 243 compound data set were true hits (Figure 7). Ten of the confirmed compounds (10 out of 253) were

Figure 5. Overview of confidence categories for structural identification of combined TPM and GVP data sets using NTS with GC × GC−TOFMS, overall and per method. Confirmed: structural proposal confirmed by injection of reference compound: HIGH, CASI score > 795; MEDIUM, CASI score ≥ 700; NOT IDENTIFIED, CASI score < 700. Left section shows the proportion per method (Nonpolar, Polar, and Volatile).

the data set had reliable structural proposals assigned by CASI (either “Confirmed”, “HIGH”, or “MEDIUM”), with CASI scores ≥700. While the probability for generating high-quality mass spectral data for abundant compounds was high, obtaining sufficient mass spectral quality for identifying lessabundant compounds represented a major challenge. As a starting point, the CASI algorithm utilized the comparison of experimental mass spectra with all spectra contained within multiple (seven) commercial mass spectral libraries (total >1.1 million spectra, see Table 4 in the Supporting Information, Principle of CASI). Mass spectral quality was a key parameter for making reliable structural proposals. As can be seen in Figure 6, there were higher numbers of reliably identified compounds compared to unidentified compounds where their abundance exceeded 100 ng/cigarette.

Figure 7. Performance of CASI score-based structural proposals (in terms of HIT 1, 2, and ≥2 proposals) in the 3R4F data set that have been confirmed by reference standard injection as a function of peak abundance.

excluded from the data set, as they were not present in commercial mass spectral libraries. In order to test the performance of the CASI score as a function of peak abundance, the data set was further divided into different concentration ranges. The performance of the CASI score in terms of its ability to propose the true structure as HIT 1 was found to be consistently high across the different concentration ranges, achieving accuracies between 88.2% and 96.2%. 9134

DOI: 10.1021/acs.analchem.9b01659 Anal. Chem. 2019, 91, 9129−9137

Article

Analytical Chemistry During the development and characterization of the CASI platform,22 it was determined that a CASI score of 795 or above was sufficient to have “HIGH” confidence in a structural proposal being made. On the basis of experience, a lower threshold of 700 for the CASI score was defined as a practical cutoff, below which structural proposals were not considered reliable enough to warrant further consideration. More than 90% of HIT 1 structural proposals with CASI scores >795 (ranges 795−900 and 900−1000) were correct (Figure 8). For HIT 1 proposals with CASI scores falling between 700 and 795, a lower but still substantial success rate of 76.5% for correct identification was achieved.

Figure 8. Performance of CASI-based identification for confidence on structural proposals in the 3R4F data set as a function of the CASI score.

Despite this outcome, the accuracy for correct identification was considered sufficient to assign a “MEDIUM” confidence to structural proposals. Considering both HIT 1 and HIT 2 proposals together for CASI scores in the range of 700−795, more than 94% of the correct structures were proposed. Accuracy for Semiquantification. Due to the nature of nontargeted approaches, there was no practical way for the inclusion of calibration curves (at least within a screening phase). Calibration curves could be included at a later stage for targeting a specific compound, group of compounds, or class of substances. Nevertheless, optimizing the accuracy for semiquantification toward that of a quantitative method would strengthen subsequent decision-making processes, such as the selection of constituents requiring confirmation by authentic reference compound. Currently, there are no approaches reported by the scientific community regarding an assessment for the accuracy of semiquantification of identified tobacco smoke constituents. This is probably due to the fact that the smoke/aerosol matrix is highly complex (thousands of compounds in full scan), and expectations for accuracy using a semiquantification approach in this field are still not defined. The approach presented here used multiple stable isotope labeled ISTD compounds per method that reflected the relevant compound classes covered. Accordingly, analytes were assigned to their class-relevant ISTDs, which increased the accuracy for their semiquantification. In order to investigate the accuracy of the semiquantitative process, results derived from this process were compared with results published using quantitative methods28,29 (Figure 9). Although public data on compound yields derived by quantitative methods were limited, and initial comparisons were restricted to only 1% of

Figure 9. Deviation of compound quantities determined using our semiquantitative approach in 3R4F compared to the published results using quantitative methods for (A) individual compounds and (B) compound classes.

the nontargeted data set reported (30 out of 2990 compounds), the deviations from quantitative results observed were compound-specific, and no obvious systematic trends were evident. With the exception of propylene glycol, none of the semiquantified concentrations for compounds exceeded a 4-fold deviation from quantitative values. Propylene glycol showed significant peak tailing, which represented a specific challenge for the generic peak integration process. Individual compounds were clustered into substance classes in order to identify any class-specific trends. As presented in Figure 9B, the class of alcohols and O-heterocycles, which was only represented by two compounds, expressed the largest deviation from quantitative values. For all other compound classes, the observed deviation was less than 2-fold. On the basis of the data comparison performed, it was concluded that 9135

DOI: 10.1021/acs.analchem.9b01659 Anal. Chem. 2019, 91, 9129−9137

Article

Analytical Chemistry

polar, and volatile constituents were successfully developed to provide maximum coverage for the chemical space anticipated for cigarette smoke, which would be amenable for GC. For each of the three analytical methods, prediction models for first and second dimensions (RI and 2DrelRT/2 DabsRT, respectively) were established. The CASI platform demonstrated capability to improve the speed and accuracy for compound identification by applying a standardized, automated process. Using the three established confidence categories for generic CASI structural proposals (HIGH, MEDIUM, and NOT IDENTIFIED), for compounds that were subsequently confirmed by reference standard, more than 90% of HIT 1 structural proposals in the “HIGH” category were demonstrated to be correct. A consistently high performance across a broad range of concentrations, as a function of peak abundance (86.1%−96.2% correct HIT 1 proposals), was also demonstrated. Nevertheless, the percentage of compounds for which a meaningful structure could be proposed was observed to be inversely correlated with their absolute abundance. Structural proposals for the “HIGH” and “MEDIUM” confidence categories were considered to be meaningful for compounds with an abundance of 0.1 μg/ cigarette or higher. While the number of compounds increased exponentially as their concentrations decreased, the mass contribution relative to the total chemical space identified declined logarithmically. Accordingly, the methodology as applied (with a threshold concentration set at 0.1 μg/cigarette) was considered to have comprehensively described the matrix, as the accumulated total mass for compounds below 0.1 μg/cigarette was estimated to be