A Comprehensive Two-Dimensional Retention Time Alignment

A comprehensive two-dimensional (2D) retention time alignment algorithm was developed using a novel indexing scheme. The algorithm is termed ...
0 downloads 0 Views 191KB Size
Anal. Chem. 2005, 77, 7735-7743

A Comprehensive Two-Dimensional Retention Time Alignment Algorithm To Enhance Chemometric Analysis of Comprehensive Two-Dimensional Separation Data Karisa M. Pierce,† Lianna F. Wood,† Bob W. Wright,‡ and Robert E. Synovec*,†

Department of Chemistry, Box 351700, University of Washington, Seattle, Washington 98195, and Pacific Northwest National Laboratory, Battelle Boulevard, P.O. Box 999, Richland, Washington 99352

A comprehensive two-dimensional (2D) retention time alignment algorithm was developed using a novel indexing scheme. The algorithm is termed comprehensive because it functions to correct the entire chromatogram in both dimensions and it preserves the separation information in both dimensions. Although the algorithm is demonstrated by correcting comprehensive two-dimensional gas chromatography (GC × GC) data, the algorithm is designed to correct shifting in all forms of 2D separations, such as LC × LC, LC × CE, CE × CE, and LC × GC. This 2D alignment algorithm was applied to three different data sets composed of replicate GC × GC separations of (1) three 22-component control mixtures, (2) three gasoline samples, and (3) three diesel samples. The three data sets were collected using slightly different temperature or pressure programs to engender significant retention time shifting in the raw data and then demonstrate subsequent corrections of that shifting upon comprehensive 2D alignment of the data sets. Thirty 12-min GC × GC separations from three 22-component control mixtures were used to evaluate the 2D alignment performance (10 runs/mixture). The average standard deviation of first column retention time improved 5-fold from 0.020 min (before alignment) to 0.004 min (after alignment). Concurrently, the average standard deviation of second column retention time improved 4-fold from 3.5 ms (before alignment) to 0.8 ms (after alignment). Alignment of the 30 control mixture chromatograms took 20 min. The quantitative integrity of the GC × GC data following 2D alignment was also investigated. The mean integrated signal was determined for all components in the three 22-component mixtures for all 30 replicates. The average percent difference in the integrated signal for each component before and after alignment was 2.6%. Singular value decomposition (SVD) was applied to the 22-component control mixture data before and after alignment to show the restoration of trilinearity to the data, since trilinearity benefits chemometric analysis. By applying comprehen* Corresponding author. E-mail: [email protected]. † University of Washington. ‡ Pacific Northwest National Laboratory. 10.1021/ac0511142 CCC: $30.25 Published on Web 10/22/2005

© 2005 American Chemical Society

sive 2D retention time alignment to all three data sets (control mixtures, gasoline samples, and diesel samples), classification by principal component analysis (PCA) substantially improved, resulting in 100% accurate scores clustering. Successful implementation of chemometrics relies on the reproducibility of multidimensional data sets, such as those provided by hyphenated analytical methods and two-dimensional (2D) separation instruments, which yield 2D sample profiles or images for each replicate run. Comprehensive 2D gas chromatography (GC × GC) provides such reproducible second-order separation (trilinear) data sets.1 However, over time, uncontrollable fluctuations in temperature and pressure, as well as matrix effects and stationary phase degradation, can cause retention time shifting along both chromatographic dimensions between chromatographic runs.2-11 When comparing chromatograms, retention time shifting increases the perceived complexity and chemometric “rank” of chromatographic data sets.1,12 Thus, retention time precision is crucial to the application of chemometric pattern recognition algorithms not only to GC × GC data but also to LC × LC,13-17 LC × CE,18,19 CE × CE,20,21 and LC × GC22,23 data, especially for data gathered over a long period of time, such as in (1) Prazen, B. J.; Synovec, R. E.; Kowalski, B. R. Anal. Chem. 1998, 70, 218. (2) Wang, C. P.; Isenhour, T. L. Anal. Chem. 1987, 59, 649-654. (3) Eilers, P. H. C. Anal. Chem. 2004, 76, 404-411. (4) Forshed, J.; Schuppe-Koistinen, I.; Jacobsson, S. P. Anal. Chim. Acta 2003, 487, 189-199. (5) Vogels, J. T. W. E.; Tas, A. C.; van der Greef, J. Chemom. Intell. Lab. Syst. 1993, 21, 249. (6) Booksh, K. S.; Stellman, C. M.; Bell, W. C.; Myrick, M. L. Appl. Spectros. 1996, 50, 139. (7) Nielsen, N.-P. V.; Carstensen, J. M.; Smedsgaard, J. J. Chromatogr., A 1998, 805, 17-35. (8) Johnson, K. J.; Wright, B. W.; Jarman, K. H.; Synovec, R. E. J. Chromatogr., A 2003, 996, 141-155. (9) Malmquist, G.; Danielsson, R. J. Chromatogr., A 1994, 687, 71-88. (10) Torgrip, R. J. O.; Aberg, M.; Karlberg, B.; Jacobsson, S. P. J. Chemom. 2003, 17, 573-582. (11) Pierce, K. M.; Hope, J. L.; Johnson, K. J.; Wright, B. W.; Synovec, R. E. J. Chromatogr., A 2005, in press. (12) Prazen, B. J.; Bruckner, C. A.; Synovec, R. E.; Kowalski, B. R. J. Microcolumn Sep. 1999, 11, 97-107. (13) Bushey, M. M.; Jorgenson, J. W. Anal. Chem. 1990, 62, 161-167. (14) Holland, L. A.; Jorgenson, J. W. Anal. Chem. 1995, 67, 3275-3283. (15) Opiteck, G. J.; Lewis, K. C.; Jorgenson, J. W. Anal. Chem. 1997, 69, 15181524.

Analytical Chemistry, Vol. 77, No. 23, December 1, 2005 7735

batch-to-batch reproducibility studies or sample class determinations (classifications) and fingerprinting studies.24 Prazenet al.1 and Fraga et al.25 developed objective retention time alignment techniques for 2D separations that were based on minimizing residuals in the generalized rank annihilation method. This type of algorithm improves retention time precision in both dimensions, but it requires a standard as well as an estimation of the rank of a subregion of the data. In addition, this rank-based method only aligns small regions of interest in the 2D data sets. Mispelaar26 developed an algorithm based on correlationoptimized shifting of local regions of the GC × GC chromatogram. The optimal correction for each subregion of interest was determined on the basis of moving the subregion around on a predefined grid overlaid on a standard target chromatogram and maximizing the inner product correlation of the sample subregion to a standard target subregion. Thus, this method also requires standards, and the corrections were limited to selected subregions of the GC × GC chromatogram. The chromatograms obtained by GC × GC separations of natural samples are similar in complexity to 2D-PAGE images of biological samples. Proteomics researchers are familiar with shifting in their 2D PAGE images, as well. Common tools used for aligning gel images include the software packages Melanie or PD-Quest,27 which require the user to choose at least two spots of sure identification in all of the images so that the images can then be matched to each other on the basis of the position of these two spots. Within-class variability is ignored by the software if the comparison is performed on replicate images of the same sample because the software produces a “synthetic” image which summarizes the common information and contains only the spots present in all of the within-class images. The synthetic class images are then compared to other synthetic class images to elucidate class-to-class variations. More recently, other proteomics researchers have circumvented the shifting problem in 2D-PAGE images by calculating the mathematical moments (orthogonal Legendre moments,28 Zernike moments,29 etc.) of the images and then selecting the (16) Chen, X.; Kong, L.; Xingye, S.; Fu, H.; Ni, J.; Zhao, R.; Zou, H. J. Chromatogr., A 2004, 1040, 169-178. (17) Wang, Y.; Zhang, J.; Liu, C.-L.; Gu, X.; Zhang, X.-M. Anal. Chim. Acta 2005, 530, 227-235. (18) Bushey, M. M.; Jorgenson, J. W. Anal. Chem. 1990, 62, 978-984. (19) Zhang, J.; Hu, H.; Gao, M.; Yang, P.; Zhang, X. Electrophoresis 2004, 25, 2374. (20) Michels, D. A.; Hu, S.; Dambrowitz, K. A.; Eggertson, M. J.; Lauterbach, K.; Dovichi, N. J. Electrophoresis 2004, 25, 3098-3105. (21) Liu, H.; Yang, C.; Yang, Q.; Zhang, W.; Zhang, Y. J. Chromatogr., B 2005, 817, 119-126. (22) Quigley, W. C.; Fraga, C. G.; Synovec, R. E. J. Microcolumn Sep. 2000, 12, 160-166. (23) Koning, S.; Janssen, H.-G.; van Deursen, M.; Brinkman, U. T. J. Sep. Sci. 2004, 27, 397-409. (24) Mispelaar, V. G. V; Janssen, H. G.; Tas, A. C.; Schoemakers, P. J. J. Chromatogr., A 2005, 1071, 229-237. (25) Fraga, C. G.; Prazen, B. J.; Synovec, R. E. Anal. Chem. 2001, 73, 58335840. (26) Mispelaar, V. G. V.; Tas, A. C.; Smilde, A. K.; Schoenmakers, P. J.; van Asten, A. C. J. Chromatogr., A 2003, 1019, 15-29. (27) Marengo, E.; Leardi, R.; Robotti, E.; Righetti, P. G.; Antonucci, F.; Cecconi, D. J. Proteome Res. 2003, 2, 351-360. (28) Chong, C. W.; Raveendran, P.; Mukundan, R. Pattern Recognit. 2004, 37, 119-129. (29) Kan, C.; Srinath, M. D. Pattern Recognit. 2002, 35, 143-154.

7736

Analytical Chemistry, Vol. 77, No. 23, December 1, 2005

moments with the highest discriminating power for classification. These moments essentially “filter” indiscriminant features from the images. The moments are translation-invariant, so when physical shifting occurs in the images, it does not affect the comparison of moments of different images. Rather, the “filtered” images provide a rapid global comparison of images without actually correcting shifting. Rather than calculating moments to compare “filtered” data or synthesizing summary data for comparisons, the work reported herein describes a fast, objective method of aligning the entire 2D separation data in both dimensions so that the entire aligned data set can be submitted to chemometric data analysis, either target analyte analysis or pattern recognition algorithms. The novel algorithm is called comprehensive 2D retention time alignment. It is really an adaptation of the previously published single dimension piecewise retention time alignment algorithm11 now adapted to 2D separations using a novel indexing scheme. We use the term comprehensive in the algorithm title because this algorithm corrects the entire data set in both separation dimensions and it preserves the separation information in both dimensions. In this report, the 2D alignment algorithm was applied to three different data sets composed of GC × GC separations of (1) three 22-component control mixtures, (2) three gasoline samples, and (3) three diesel samples. The data set of control mixtures provided a platform for quantitative evaluation of the improvement in alignment and maintenance of quantitative information in the data upon submission to the alignment algorithm. The gasoline and diesel data sets were used as progressively more complex samples, providing greater challenges to the alignment algorithm. Each data set was gathered using two different temperature and pressure programs to generate the kind of severe shifting that may be seen in long-term studies. The retention time shifting was such that some peaks were shifted along both dimensions to varying degrees and other peaks are shifted primarily along one of the dimensions to varying degrees. In this report, the 2D retention time alignment algorithm is demonstrated for peaks shifted up to and just past nearest neighbor peaks. It will be shown that the 2D retention time alignment algorithm objectively, comprehensively, and quickly corrects such retention time shifting. This is the first report of a 2D retention time alignment algorithm that objectively corrects retention time shifting in the entire GC × GC chromatogram. EXPERIMENTAL SECTION Algorithm - Alignment and Indexing Scheme. The piecewise alignment algorithm is a robust correlation-based local area alignment algorithm. Piecewise alignment divides the sample vector into windows of a user-specified length, W, and each window is iteratively shifted across the target vector for a maximum limit, L. In other words, during the shifting sequence, the starting position of the window relative to the target vector is L points before the original window location, whereas the ending position of the window is L points after the original window location. A correlation coefficient is calculated at each possible shift. The shift that yields the greatest correlation coefficient is the correction that is applied to the midpoint of the window. Corrections for areas between window midpoints are linearly interpolated to yield an overall correction function that is applied to the sample vector. Since there are two separation dimensions,

there are four user inputs: W1, L1, W2, and L2, where 1 and 2 indicate the first column and the second column parameters, respectively. In the previous report on one-dimensional piecewise alignment,11 an optimization method for determining the best value of W was described on the basis of maximizing the degree-ofclass separation between training set members on a scores plot. It was also determined for one-dimensional alignment that the optimal and most efficient value for L is equal to one unit greater than the worst shifting present in the data. Piecewise alignment is similar to the Correlation Optimized Warping algorithm7 (known as COW, which was obtained at www.biocentrum.dtu.dk/mycology/analysis/cow), but a major difference is that piecewise alignment maximizes the correlation coefficient for simple scalar shifts of local regions, whereas COW maximizes the correlation coefficient for interpolatively stretched and shrunk local regions. This difference in the two algorithms resulted in piecewise alignment being ∼10 times faster than COW for single-dimension GC data.11 Extrapolating this time savings for a 2D separation algorithm results in exponentially increased savings in computation time, ∼100-fold, relative to a 2D COW algorithm; however, a 2D COW algorithm has not been reported. The key to the comprehensive 2D alignment algorithm is the novel indexing scheme combined with the piecewise retention time alignment algorithm. The data for the three sample types were gathered using a modulation period (PM) that provided at least five injections onto the second column per first column eluting peak to maintain comprehensive quantitative accuracy. Thus, this valve-based GC × GC instrument provides data that conforms to the definition of comprehensive 2D separations.13,15,25,30-32 The data along the first column dimension were linearly interpolated 5-fold to provide at least 25 data points per first column peak profile. This means the retention time for the narrowest peak on the first column was definable to the nearest 4% of the peak width at the base, which is adequate since it has been shown (and will be shown in this work) that various chemometric algorithms tolerate an average standard deviation of retention time up to 8% of the peak width at the base, but precision worse than this begins to impede chemometric analyses.11,30,33-35 Increasing data density by linear interpolation is a common practice that is used to improve accurate determination of peak profiles in raw two-dimensional sample profiles without affecting alignment performance.30,36-40 In this work, this practice was followed since it is the first implementation of twodimensional retention time alignment for the entire two-dimen(30) Fraga, C. G.; Prazen, B. J.; Synovec, R. E. Anal. Chem. 2000, 72, 41544162. (31) Gorecki, T.; Harynuk, J.; Panic, O. J. Sep. Sci. 2004, 27, 359-379. (32) Bruckner, C.; Prazen, B.; Synovec, R. E. Anal. Chem. 1998, 70, 27962804. (33) Sinha, A. E.; Johnson, K. J.; Prazen, B. J.; Lucas, S. V.; Fraga, C. G.; Synovec, R. E. J. Chromatogr., A 2003, 983, 195-204. (34) Bahowick, T. J.; Synovec, R. E. Anal. Chem. 1995, 67, 631-640. (35) de Juan, A.; Rutan, S. C.; Tauler, R.; Massart, D. L. Chemom. Intell. Lab. Syst. 1998, 40, 19-32. (36) Yu, J.; Smith, V. A.; Wang, P. P.; Hartemink, A. J.; Jarvis, E. D. Bioinformatics 2004, 18, 3594-3603. (37) Gong, F.; Liang, Y.-Z.; Fung, Y.-S.; Chau, F.-T. J. Chromatogr., A 2004, 1029, 173-183. (38) Christensen, J. H.; Mortensen, J.; Hansen, A. B.; Andersen, O. J. Chromatogr., A 2005, 1062, 113-123. (39) Xie, L.; Marriott, P. J.; Adams, M. Anal. Chim. Acta 2003, 500, 211-222. (40) Fraga, C. G.; Prazen, B. J.; Synovec, R. E. J. High Resolut. Chromatogr. 2000, 23, 215-224.

sional separation space. However, since future applications of 2D alignment may involve third-order data where there are two separation dimensions and a spectroscopic dimension, interpolation may not be suitable. Thus, it will be shown in this report that alignment results for the raw (not interpolated) data essentially matched those for the interpolated data. In the first step of aligning these data, one of the chromatograms was chosen to be the target chromatogram. Then every other chromatogram in the data set was aligned to that target chromatogram in the following manner (also depicted in Figure 1A). The sample chromatogram is an M by N matrix where there are N units along the first column dimension and M units along the second column dimension. N is equal to five times the total run time divided by PM, and M is equal to the data acquisition rate (500 points/s) multiplied by the PM. Thus, the sample chromatogram is indexed along the first column dimension into M vectors, each N units long. One of those vectors in the sample chromatogram is projected onto the target chromatogram (also an M by N matrix). The target chromatogram is indexed along the first column dimension into M vectors, each N units long, and the target vector where the sample vector projection is overlaid is averaged with the L2 target vectors before and after that position. This average target vector is now the new vector to which the sample vector is aligned using the piecewise alignment algorithm. L2 target vectors before and after the projected location are used so that the vector to which the sample vector is aligned is objectively constructed and actually correlates with the sample vector. Refer to the beginning of the Experimental Section for a description of the piecewise alignment algorithm. All of the M vectors in the sample chromatogram are aligned along the first column dimension in the same manner. After that, the sample chromatogram is then indexed along the second column dimension into N vectors, each M units long. A vector of length M in the sample chromatogram is projected onto the target chromatogram and the target vector where the sample vector projection is overlaid and is averaged with the L1 target vectors before and after that position. The sample vector is now aligned to this average target vector along the second column dimension using the piecewise alignment algorithm. All of the N vectors in the sample chromatogram are aligned along the second column dimension in this manner. Instrumentation, Samples, and Data Collection. An Agilent 6890 gas chromatograph equipped with a flame ionizaton detector (Agilent Technologies, Palo Alto, CA) was modified to a valvemodulated GC × GC by face-mounting a high-speed, six-port micro diaphragm valve (VICI, Valco Instruments Co. Inc, Houston, TX) as described in a previous publication.33 A sample volume of 1 µL was injected and split 100:1 onto the first column using an Agilent 7683 autosampler (Agilent Technologies) and hydrogen carrier gas. The valve diverted effluent from the first column from the vent path onto the second column once each PM for 20 ms, while maintaining comprehensive GC × GC requirements.15,25,30-32 The first column was 12 m × 200 µm i.d. × 0.3 µm SPB-1, and the second column was 0.5 m × 100 µm i.d. × 0.2 µm poly(ethylene glycol) CP-Wax52CB. In the first study, three 22-component control mixtures (samples 1, 2, and 3 as defined in Table 1) were run 10 times each on the GC × GC using two different constant pressure programs (either 4.00 or 3.95 psi) to introduce substantial shifting. Analytical Chemistry, Vol. 77, No. 23, December 1, 2005

7737

Figure 1. (A) Schematic of comprehensive 2D retention time alignment algorithm. (B) Contour plot of GC × GC separation of control sample 1 in the first study. The numbered peaks have identities corresponding with Table 1. (C) Overlaid subregion containing peaks 1-5 of samples 1 and 3 replicates before alignment (raw data). Peaks are shifted in both separation dimensions to varying degrees due to the replicates’ being collected under two different pressure programs. (D) Overlaid subregion containing peaks 1-5 of samples 1 and 3 replicates after alignment. Retention time shifting was corrected along both separation dimensions for all of the peaks.

Table 1. Preparation of Three 22-Component Control Mixtures Used in the First Study peak

component

sample 1 mL

sample 2 mL

sample 3 mL

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

ethanol cyclopentane 1-propanol pentane hexane 1-butanol 2-pentanone 2-pentanol 1-pentanol 2-hexanone octane 2-heptanone nonane cyclooctane propylbenzene 3-octanone sec-butylbenzene 1-octanol undecane dodecane tridecane pentadecane

1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0

1.0 1.8 0.2 1.0 1.0 1.0 1.0 1.8 1.0 1.0 1.0 1.0 0.2 1.0 1.0 0.2 1.0 1.8 1.0 1.0 1.0 1.0

1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 0.2 1.8 1.0 1.0 1.0 1.8 1.0 0.2 1.0 1.8 1.0 0.2 1.0

The oven temperature was initially 40 °C, held for 1 min, then ramped 15 °C/min to 250 °C. Column 2 was held at 20.0 psi. The 7738

Analytical Chemistry, Vol. 77, No. 23, December 1, 2005

PM was 600 ms. The 22-component mixtures are described in Table 1. A GC × GC separation of sample 1 is shown in Figure 1B and the component identities are designated by a number corresponding with Table 1. In a second study, three gasoline samples (types M, S, and T) were obtained at the pump from local gas stations. The gasoline samples were run 10 times each on the GC × GC using two different temperature programs to introduce substantial shifting. One temperature program was initially 40 °C, held for 0.9 min, then ramped at 15 °C/min to 250 °C. The other temperature program was initially 40 °C, held for 1.0 min, then ramped at 15 °C/min to 250 °C. In both cases, the pressure on column 1 was held constant at 4.00 psi, and volumes of 1 µL split 100:1 were injected onto the first column for all of the gasoline replicates. The column 2 pressure was 20.0 psi. The PM was 750 ms. In a third study, three diesel samples (types 1, 2, and 3) were obtained from different filling stations. The diesel samples were run 10 times each on the GC × GC using two different temperature programs to introduce substantial shifting. One temperature program was initially held at 40 °C for 1.0 min, then ramped at 5 °C/min to 250 °C. The second temperature program was initially held at 40 °C for 1.1 min, then ramped at 5 °C/min to 250 °C. In both cases, the pressure was held constant at 4.00 psi, and volumes of 1 µL split 100:1 were injected onto the second column. The column 2 pressure was 20.0 psi. The PM was 750 ms.

Table 2. Results of Comprehensive 2D Alignment in the First Studya

peak

before tR,1 (min)

after tR,1 (min)

before tR,2 (ms)

after tR,2 (ms)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 mean

0.989 ( 0.013 1.038 ( 0.008 1.230 ( 0.027 1.290 ( 0.028 1.451 ( 0.028 1.791 ( 0.028 1.896 ( 0.028 2.058 ( 0.027 2.681 ( 0.025 2.798 ( 0.020 3.093 ( 0.018 3.837 ( 0.018 4.145 ( 0.018 4.323 ( 0.020 4.603 ( 0.018 4.845 ( 0.020 5.192 ( 0.020 5.739 ( 0.014 6.164 ( 0.014 7.082 ( 0.014 7.941 ( 0.020 9.543 ( 0.017 (0.020

0.980 ( 0.004 1.031 ( 0.005 1.203 ( 0.006 1.255 ( 0.001 1.414 ( 0.001 1.767 ( 0.005 1.869 ( 0.003 2.031 ( 0.004 2.663 ( 0.006 2.780 ( 0.008 3.067 ( 0.002 3.822 ( 0.001 4.125 ( 0.006 4.311 ( 0.001 4.592 ( 0.007 4.836 ( 0.014 5.185 ( 0.005 5.728 ( 0.002 6.158 ( 0.002 7.080 ( 0.001 7.940 (0.010 9.540 ( 0.004 (0.004

216.8 ( 3.0 206.8 ( 1.0 316.8 ( 10.0 107.8 ( 0.8 105.8 ( 0.6 405.8 ( 14.0 191.6 ( 2.8 309.2 ( 8.8 425.2 ( 12.8 209.6 ( 3.4 116.8 ( 0.6 215.2 ( 2.6 122.4 ( 0.6 143.0 ( 0.6 196.2 ( 1.6 204.8 ( 2.4 191.4 ( 2.0 372.8 ( 6.4 133.8 (0.4 140.4 ( 0.4 147.6 ( 1.6 162.8 (0.6 (3.5

217.4 ( 0.6 205.4 ( 0.4 326.0 ( 1.2 107.8 ( 0.4 105.0 ( 0.4 416.0 ( 1.2 190.0 ( 2.2 316.2 ( 8.8 431.8 ( 1.8 209.0 ( 0.8 116.8 ( 0.4 215.6 ( 0.4 121.4 ( 0.4 143.0 ( 0.6 195.6 ( 0.8 203.8 ( 0.2 188.8 ( 2.2 375.0 ( 1.2 134.4 ( 0.4 140.6 ( 0.4 147.2 ( 0.4 163.0 ( 0.2 (0.8

before mean integrated signal

after mean integrated signal

0.0413 0.0233 0.0543 0.0676 0.0513 0.0926 0.0659 0.1040 0.1010 0.0580 0.0870 0.0940 0.0662 0.1040 0.1420 0.0701 0.0880 0.1380 0.1160 0.1040 0.0766 0.1160

0.0415 0.0231 0.0538 0.0654 0.0504 0.0939 0.0634 0.1080 0.1030 0.0587 0.0920 0.0949 0.0683 0.1070 0.1430 0.0702 0.0885 0.1420 0.1280 0.1100 0.0800 0.1170

difference intgrated signal, % 0.5 0.9 0.9 3.3 1.8 1.4 3.8 3.8 2.0 1.2 5.7 1.0 3.2 2.9 0.7 0.1 0.6 2.9 10.3 5.8 4.4 0.9 2.6

a See Table 1 for peak identities. Standard deviation of retention time is given as ( values. Subscript 1 refers to the first column, and subscript 2 refers to the second column of the GC × GC instrument.

An in-house LabVIEW 6i (National Instruments, Austin, TX) program and a data acquisition board (model PCI-6035E, National Instruments) were used to modulate the valve and collect data at a rate of 500 points/s. Matlab 6.5 (The Mathworks, Natick, MA) was used for data processing on a dual channel AMD Athlon 64 bit processor. RESULTS AND DISCUSSION In the first study, three 22-component control mixtures, as defined in Table 1, were used to evaluate the comprehensive 2D alignment algorithm. A GC × GC chromatogram of sample 1 is shown in Figure 1B as a contour plot. The 22 components cover a large portion of the available separation space. Retention time shifting occurred along both chromatographic dimensions. The mean retention time and standard deviation of the component locations over all 30 replicates is listed in Table 2 for the unaligned raw data (labeled “before”). The chromatographic peak width at base along the first column was ∼0.067 min (i.e., ∼4 s). For the unaligned raw data, the average standard deviation of first column retention time was 0.020 min, or ∼30% of the peak width at the base, although many peaks were misaligned much more than 30%. The chromatographic peak width at the base along the second column was ∼40 ms. The average standard deviation of the second column retention time was 3.5 ms for the raw data, or ∼9% of the peak width at the base. Comprehensive 2D retention time alignment improved the retention time precision along both chromatographic axes using W1 ) 2.500 min, L1 ) 0.094 min, W2 ) 0.160 s, and L2 ) 0.040 s. Table 2 lists the mean retention time and standard deviation of the component locations over the 30 replicates for the data after alignment (labeled “after”). For the aligned data, the average standard deviation of the first column retention time was 0.004 min (∼6% of the ∼4 s peak width at base);

thus, retention time precision was improved by a factor of 5 relative to the unaligned data. For the aligned data, the average standard deviation of the second column retention time was 0.8 ms (∼2% of the ∼40 ms peak width at base); thus, retention time precision was improved by over a factor of 4. Alignment of the 30 control mixture chromatograms took 20 min. Visual inspection of the raw and aligned data for the 22component mixtures is instructive. A selected subregion containing peaks 1-5 of two raw GC × GC replicates are overlaid in Figure 1C. The retention time shifting in the raw data was such that peaks were shifted a distance greater than the chromatographic peak width along the first column dimension. Peaks were also shifted along the second column dimension in the raw data. The shifting along the second column dimension was greater for peaks 1, 2, and 3 than for peaks 4 and 5. Peaks 4 and 5 do not shift as much along the second column, since they are alkanes and as such are not sufficiently retained on the second column to engender shifting. The same subregion of the same GC × GC replicates after alignment are overlaid in Figure 1D. Retention time shifting was corrected along both dimensions of the 2D separation plane. Corrections similar to these were accomplished for all 22 peaks in all 30 replicates by 2D alignment. The quantitative integrity of the data following 2D alignment was also investigated using the 22-component mixtures. The mean integrated signal for all 22 peaks over all 30 replicates is listed in Table 2 for the raw data (before) and the aligned data (after). The average percent difference in the integrated signal before and after alignment was 2.6%, so it is reasonable to conclude that 2D alignment does preserve the quantitative information in the chromatograms. A more rigorous evaluation of the 2D alignment was undertaken in the context of chemometric criteria, namely, using Analytical Chemistry, Vol. 77, No. 23, December 1, 2005

7739

Table 3. SVD Evaluation of a Sample 1 Replicate and a Sample 2 Replicatea peak

before singular value 1

after singular value 1

difference singular value 1

before singular value 2

after singular value 2

difference singular value 2

1, 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

0.004 23 0.004 44 0.005 44 0.004 88 0.004 33 0.004 42 0.004 54 0.005 18 0.004 04 0.006 34 0.006 03 0.006 60 0.007 47 0.009 46 0.006 15 0.006 30 0.005 87 0.008 18 0.006 98 0.005 06 0.006 07

0.005 21 0.005 92 0.006 93 0.006 40 0.005 34 0.005 61 0.005 83 0.005 24 0.004 08 0.008 82 0.006 59 0.007 03 0.008 49 0.010 22 0.006 60 0.006 19 0.006 07 0.009 21 0.007 45 0.005 03 0.006 59

0.000 99 0.001 48 0.001 49 0.001 52 0.001 02 0.001 18 0.001 29 0.000 06 0.000 04 0.002 48 0.000 55 0.000 43 0.001 02 0.000 76 0.000 44 -0.000 11 0.000 20 0.001 03 0.000 47 -0.000 03 0.000 52

0.003 94 0.004 24 0.004 58 0.004 23 0.003 21 0.004 18 0.003 66 0.001 37 0.000 97 0.004 49 0.002 66 0.002 80 0.004 10 0.005 00 0.002 44 0.000 76 0.001 50 0.003 74 0.002 27 0.000 75 0.002 78

0.002 19 0.001 44 0.000 67 0.000 28 0.001 03 0.001 59 0.000 80 0.001 07 0.000 76 0.000 99 0.000 47 0.001 46 0.000 57 0.003 10 0.000 66 0.001 18 0.000 27 0.001 52 0.000 48 0.001 04 0.000 91

-0.001 75 -0.002 80 -0.003 92 -0.003 96 -0.002 17 -0.002 59 -0.002 85 -0.000 30 -0.000 21 -0.003 50 -0.002 20 -0.001 34 -0.003 53 -0.001 90 -0.001 78 0.000 42 -0.001 23 -0.002 22 -0.001 79 0.000 29 -0.001 87

a Results are representative of all 30 control replicates in the first study (see Tables 1 and 2). The difference in a given singular value is the singular value after 2D alignment minus the singular value before 2D alignment.

Figure 2. First 10 singular values for selected peaks before (solid line) and after (dashed line) 2D alignment of a sample 1 replicate gathered under pressure program 1 augmented with a sample 3 replicate gathered under pressure program 2 (see Table 1) for the first column dimension. Similar SVD results were obtained for the second column dimension (results not shown for brevity).

singular value decomposition (SVD). For the SVD evaluation, two chromatograms from the 22-component mixture study were analyzed at a time. For brevity, the SVD results of a sample 1 chromatogram and a sample 3 chromatogram are reported herein, and these results are representative of the entire 30-chromatogram data set. Identical subregions containing individual peaks were indexed from the two raw data replicates and were augmented into one matrix. This single matrix was submitted to SVD, and the first two singular values for each of the 22 peaks are listed in Table 3. (A single subregion encompassing peaks 1 and 2 was 7740

Analytical Chemistry, Vol. 77, No. 23, December 1, 2005

indexed and augmented for SVD because the shifting in the raw data caused the neighboring peaks to overlap). The same subregions were indexed and augmented for each of the peaks in the aligned data, and the first two singular values are also listed in Table 3. Due to alignment, the first singular value consistently increased and the second singular value consistently decreased for all 22 peaks. An increase in the first singular value followed by a decrease in the second singular value indicates that 2D alignment reduced the complexity of the data and thus restored trilinearity to the data for 18 of the 22 peaks tested. The remaining

Figure 3. (A) Scores plot of the raw unaligned 30 replicates of control mixtures (samples 1, 2, and 3) in the first study (see Table 1). Two pressure programs applied caused six clusters to form. The three clusters on the left side were run under a constant pressure program of 4.00 psi for column 1, and the three clusters on the right side were run under a pressure of 3.95 psi for column 1, holding the column 2 pressure constant at 20.0 psi in both programs. (B) Scores plot of the aligned 30 replicates of control mixtures (raw data with interpolation as explained in text, samples 1, 2, and 3) in the first study. (C) Scores plot of the aligned 30 replicates of control mixtures (raw data without interpolation, samples 1, 2, and 3).

four peaks (peaks 9, 10, 17, and 21) had singular values that were essentially unchanged by alignment. The first 10 singular values are plotted for peaks 3, 5, 11, and 17 in Figure 2 for the data before and after alignment. The SVD results presented were limited for brevity to a comparison of one replicate to another. On the other hand, PCA compares all of the replicates in the context of each other. Each replicate is matricized or “unfolded” prior to PCA. The scores plot obtained from submitting the 30 raw 22-component control mixture chromatograms to PCA is shown in Figure 3A. The raw

data scores form six clusters based on two pressure programs and three sample classes (there were five replicates per sample class per pressure program). PC1 captured 72.1% of the variance, and PC2 captured 11.5% of the variance in the data. The scores plot obtained from submitting the aligned data to PCA is shown in Figure 3B. Now, the aligned data scores form three tight clusters based on sample class. PC1 captured 63.3% of the variance, and PC2 captured 20.1% of the variance in the data. Thus, comprehensive 2D alignment corrected the retention time variations, and the first two principal components sufficiently captured the chemical class variations. For future implementations of this 2D alignment algorithm, data interpolation may not be possible if the user wants to align 2D separation data that also has a third spectroscopic dimension. For this reason, the raw (not interpolated) 22-component mixture chromatograms were submitted to 2D alignment and then PCA. The resulting alignment and scores plot (shown in Figure 3C) is very similar to that obtained for the interpolated data. In the second study, comprehensive 2D alignment was demonstrated with three gasoline samples that serve as more complex samples than the control mixtures in the first study. Three gasoline samples were run on the GC × GC using two different temperature programs to engender sufficient retention time shifting to tax the 2D alignment algorithm (there were five replicates per sample type per temperature program). One of the gasoline separations is shown in Figure 4A. The raw gasoline data (30 separations) was submitted to PCA, and the resulting scores plot is shown in Figure 4B. The type T scores were separate from the other two types of gasoline samples, but the type M and type S scores clustered together. PC1 captured 63.8% of the variance, and PC2 captured 22.9% of the variance in the data. The gasoline data was submitted to 2D alignment using W1 ) 7.168 min, L1 ) 0.154 min, W2 ) 0.180 s, and L2 ) 0.030 s. The resulting scores plot is shown in Figure 4C. The aligned data scores cluster tightly into three separate groups based on gasoline type. PC1 captured 84.3% of the variance, and PC2 captured 8.9% of the variance in the data. Comprehensive 2D alignment corrected the retention time variations and allowed PCA to capture the class variations while minimizing the effect of the different temperature programs used. Alignment of the 30 gasoline chromatograms took 21 min. In the third study, three diesels were run in replicate on the GC × GC using two different temperature programs. The diesels serve as very complex samples that fill a considerable fraction of the 2D peak capacity, thus posing a greater challenge for comprehensive 2D alignment. One of the diesel chromatograms is shown in Figure 5A. Much of the separation space is covered by diesel components. A selected subregion of two of the diesel chromatograms are overlaid in Figure 5B. In this subregion, the peaks are shifted in both dimensions and some peaks are shifted so much that they overlap neighboring peaks between the runs. The same subregion is overlaid in Figure 5C after the data set underwent 2D alignment. The parameters were W1 ) 2.503 min, L1 ) 0.125 min, W2 ) 0.200 s, and L2 ) 0.040 s. All of the peaks were correctly aligned. The raw diesel data set was submitted to PCA, and the resulting scores plot is shown in Figure 6A. PC1 captured 80.0% of the variance, and PC2 captured 8.5% of the variance in the raw data. The raw data clusters were based on variations from the different temperature programs, chemical Analytical Chemistry, Vol. 77, No. 23, December 1, 2005

7741

Figure 4. (A) Contour plot of GC × GC separation of type M gasoline in the second study. (B) Scores plot of the raw unaligned 30 replicates of gasoline (types M, S, and T) in the second study. Two different temperature programs were used. (C) Scores plot of the aligned 30 replicates of gasoline (types M, S, and T) in the second study.

variations, and other uncontrollable sources of variation. The aligned data was submitted to PCA; the resulting scores plot is shown in Figure 6B. PC1 captured 91.8% of the variance, and PC2 captured 2.5% of the variance in the aligned data. The scores for samples 1, 2, and 3 cluster tightly on the basis of class differences on PC1. The sample 1 and sample 2 scores cluster tightly on PC2, 7742 Analytical Chemistry, Vol. 77, No. 23, December 1, 2005

Figure 5. (A) Contour plot of GC × GC separation of type 1 diesel in the third study. (B) Overlaid subregion of type 1 diesel and type 3 diesel before alignment (raw data) in the third study. Peaks are shifted in both separation dimensions due to the replicates’ being collected under two different temperature programs. (C) Overlaid subregion of type 1 diesel and type 3 diesel after alignment in the third study. Retention time shifting was corrected along both separation dimensions for all of the peaks, even for peaks that overlapped neighboring peaks between runs.

but the sample 3 scores are separated on PC2 on the basis of the temperature programs used. However, it is important to note that the PC2 projections are negligible since the amount of variance captured on PC2 is negligible, as compared to that of PC1. That said, 2D alignment was successful at correcting the retention time variations so that PCA could correctly classify the data set. Alignment of the 30 diesel chromatograms took 29 min. Finally, to investigate the effect of having large class-to-class variations as well as retention time variations in a data set that

Figure 6. (A) Scores plot of the raw 30 replicates of diesel (types 1, 2, and 3) in the third study. See Figure 5A for a typical separation. (B) Scores plot of the aligned 30 replicates of diesel (types 1, 2, and 3) in the third study.

was submitted to PCA, the raw gasoline and diesel replicates were augmented into one data set. This unaligned raw combination data set was submitted to PCA, and PCA did capture the class-to-class variation, but it failed to capture the within-class variations in the scores clustering (not shown for brevity). The data sets from the gasoline and diesel studies were combined into one data set of aligned gasoline and diesel chromatograms. This aligned data set was submitted to PCA. PCA did, indeed, capture the class-to-class variations as well as the within-class variations of the 2D aligned data, yielding six tight, accurate clusters of scores based on sample class (not shown for brevity). CONCLUSIONS A comprehensive 2D retention time alignment algorithm was developed using a novel indexing scheme. This algorithm was applied to GC × GC separations of control mixtures, gasoline samples, and diesel samples to correct retention time shifting that was built into the data by using slightly different temperature and

pressure programs during acquisition. The 2D alignment algorithm was built to quickly improve retention time precision for the entire chromatogram and to improve the chemometric analysis of the data sets. More specifically, SVD was applied to the control mixture data before and after alignment to show the restoration of trilinearity to the data due to application of the comprehensive 2D alignment algorithm. In addition, classification by PCA was improved for all of the data sets (control mixtures, gasoline, diesel, and gasoline plus diesel) due to the comprehensive 2D retention time alignment of the chromatographic data. It was also shown that the algorithm preserved the quantitative information using simple integration of individual peaks before and after alignment of the control mixture data set. Aside from GC × GC data, this comprehensive 2D retention time alignment algorithm should in principle be applicable to any 2D separation data, such as LC × LC, LC × CE, CE × CE, and LC × GC. In future work, the comprehensive 2D retention time alignment algorithm will be adapted for 2D separation instrumentation combined with spectral detection in which the algorithm will preserve the spectral information as retention time shifting is corrected. In this report, peaks were shifted in both chromatographic dimensions, and some peaks in the raw data were even shifted past nearest neighbor peaks in successive chromatograms. It was demonstrated that the comprehensive 2D alignment algorithm can correct such shifting. In future work, the limitations of the algorithm in terms of how far peaks were shifted in raw data will be determined. It should be noted that relatively small perturbations in pressure and temperature were used to instigate retention time shifting. The algorithm may not directly apply to wildly different temperature and pressure programs in general where drastic changes in peak order may occur. ACKNOWLEDGMENT This work was supported by the Internal Revenue Service through an Interagency Agreement with the U.S. Department of Energy. The Pacific Northwest National Laboratory is operated by Battelle Memorial Institute for the U.S. Department of Energy under Contract DE-AC05-76RLO 1830. The views, opinions, and findings contained in this report are those of the authors and should not be construed as the official International Revenue Service position, policy, or decision unless designated by other documentation. Received for review June 22, 2005. Accepted September 18, 2005. AC0511142

Analytical Chemistry, Vol. 77, No. 23, December 1, 2005

7743