Objective Data Alignment and Chemometric Analysis of

Anal. Chem. 2001, 73, 5833-5840

Objective Data Alignment and Chemometric Analysis of Comprehensive Two-Dimensional Separations with Run-to-Run Peak Shifting on Both Dimensions Carlos G. Fraga, Bryan J. Prazen, and Robert E. Synovec*

Center for Process Analytical Chemistry, Department of Chemistry, Box 351700, University of Washington, Seattle, Washington 98195

Data from comprehensive two-dimensional (2-D) separation techniques, such as comprehensive 2-D gas chromatography (GC × GC), liquid chromatography/liquid chromatography (LC × LC) and liquid chromatography/ capillary electrophoresis (LC × CE) can be readily analyzed by various chemometric methods to increase chemical analysis capabilities. A retention time alignment, preprocessing method is presented that objectively corrects for run-to-run retention time variations on both separation dimensions of comprehensive 2-D separations prior to application of chemometric data analysis algorithms. The 2-D alignment method corrects for run-torun shifting of a sample data matrix relative to a standard data matrix on both separation time axes in an independent, stepwise fashion. After 2-D alignment, the generalized rank annihilation method (GRAM) is successfully applied, substantiating the performance of the alignment method. The alignment method should have important implications, because most 2-D separation techniques exhibit, in the context of chemometric data analysis, considerable run-to-run retention time shifting on both dimensions. Even when there are only three to four points/peak, that is, with three to four separations on the second dimension (column 2) per peak width from the first dimension (column 1), the 2-D alignment coupled with GRAM provides dependable analyte peak identification capabilities and adequate quantitative precision for unresolved analyte peaks. Thus, the 2-D alignment algorithm is applicable to lower data density conditions, which broadens the scope of chemometric analysis to high-speed 2-D separations. Comprehensive two-dimensional (2-D) separations are ideally suited for the analysis of complex samples and are emerging as powerful tools for chemical analysis. Even with a large peak capacity, the probability of peak overlap in 2-D separations can become quite severe, especially for highly complex samples.1-3 The probability of peak overlap becomes even more likely if one desires to speed up the analysis by designing a given separation (1) Davis, J. M. Anal. Chem. 1991, 63, 2141-2152. (2) Rowe, K.; Davis, J. M. Anal. Chem. 1995, 67, 2981-2993. (3) Rowe, K.; Bowlin, D.; Zou, M.; Davis, J. M. Anal. Chem. 1995, 67, 2994-3003. 10.1021/ac010656q CCC: $20.00 Published on Web 11/08/2001

© 2001 American Chemical Society

method to provide a reduction in the run time. This is because a reduction in run time generally goes hand-in-hand with a reduction in the resolving power along the first-dimension separation. Thus, traditional methods of chromatographic and electrophoretic data analysis, such as peak height and peak area measurements, become less effective as the analyst moves into the realm of highspeed chemical analysis. The limitations brought upon by the likelihood of peak overlap can be overcome, to a large extent, by the implementation of appropriate chemometric methods. Essentially, chemometric methods will effectively enhance the resolving power of 2-D separation methods. Previously, we have explored the use of the chemometric method called the generalized rank annihilation method (GRAM) for the analysis of overlapped peaks in comprehensive GC × GC and related methods.4-9 More recently, we have obtained a better understanding of the overall scope of the advantage a method such as GRAM provides in the analysis of 2-D separation data.10 Using sample and standard matrices of data for sections of the 2-D separations that contain the analyte(s) of interest, GRAM calculates the pure elution profiles of overlapped 2-D peaks.11-13 Each 2-D peak can then be individually reconstructed using its respective two-separation elution profiles. In addition, GRAM provides the concentrations for analytes in the sample relative to the standard. The standard can be simply prepared using the original sample by the standard addition method.5 To use chemometric methods such as GRAM, the unwanted shifting of a 2-D peak’s retention time(s) or migration time(s) (4) Fraga, C. G.; Prazen, B. J.; Synovec, R. E. J. High Resolut. Chromatogr. 2000, 23, 215-224. (5) Fraga, C. G.; Prazen, B. J.; Synovec, R. E. Anal. Chem. 2000, 72, 4154-4162. (6) Prazen, B. J.; Bruckner, C. A.; Synovec, R. E.; Kowalski, B. R. J. Microcolumn Sep. 1999, 11, 97-107. (7) Prazen, B. J.; Bruckner, C. A.; Synovec, R. E.; Kowalski, B. R. Anal. Chem. 1999, 71, 1093-1099. (8) Prazen, B. J.; Synovec, R. E.; Kowalski, B. R. Anal. Chem. 1998, 70, 218-225. (9) Bruckner, C. A.; Prazen, B. J.; Synovec, R. E. Anal. Chem. 1998, 70, 2796-2804. (10) Fraga, C. G.; Bruckner, C. A.; Synovec, R. E. Anal. Chem. 2001, 73, 675-683. (11) Sańchez, E.; Kowalski, B. R. Anal. Chem. 1986, 58, 496-499. (12) Sańchez, E.; Ramos, L. S.; Kowalski, B. R. J. Chromatogr. 1987, 385, 151-64. (13) Ramos, L. S.; Sanchez, E.; Kowalski, B. R. J. Chromatogr. 1987, 385, 165-180.

Analytical Chemistry, Vol. 73, No. 24, December 15, 2001 5833

between sample and standard runs must be objectively corrected. Indeed, run-to-run retention time shifting has been a severe impediment to the use of chemometric methods on data collected from separation techniques. We are addressing the retention time alignment problem, and have previously developed an objective, one-dimensional (1-D) alignment method to correct retention time shifts along one time axis.8 When used in conjunction with GRAM, this objective retention time alignment method resulted in excellent peak identification and quantification of severely overlapped peak profiles for GC × GC separations.5 Rank alignment, as the alignment method is referred to, has been critically developed and successfully applied to GC × GC separations that benefited by 1-D alignment on the first-column time axis only. For many comprehensive 2-D separation methods, run-to-run retention time variation on both dimensions will be observed, having a significant and detrimental effect on chemometric applications. We have now modified the 1-D alignment preprocessing method to determine and correct the run-to-run peakshifting along both separation time axes. Correction of retention time shifting on both separation dimensions broadens the scope considerably for the subsequent use of chemometric methods, such as GRAM. Thus, data from 2-D separation techniques, such as comprehensive 2-D gas chromatography (GC × GC),4,5,9,10,14-17 liquid chromatography/liquid chromatography (LC × LC)18-20 and liquid chromatography/capillary electrophoresis (LC × CE)21-25 can be readily analyzed by various chemometric methods to increase chemical analysis capabilities. In this work, we have taken 2-D data collected from a GC × GC instrument and have simulated 2-D data from other separation techniques by adding run-to-run retention time variation that is consistent with reported precision from recent reports for LC × CE.21,22 We report how the new 2-D alignment method, modified from our 1-D alignment method, corrects both dimensions in an independent, stepwise fashion. The 2-D alignment method will be shown to be both easy to apply and robust. We also investigate the performance of GRAM with the 2-D alignment method, especially as the number of runs on the second dimension is reduced to as few as three runs/peak eluting from the firstdimension separation. This last study has considerable implications for high-speed, comprehensive 2-D separations. THEORY The theory behind 2-D alignment is explained by first describing how the 1-D alignment method determines shifts along the (14) Liu, Z.; Phillips, J. B. J. Chromatogr. Sci. 1991, 29, 227-231. (15) Phillips, J. B.; Gaines, R. B.; Blomberg, J.; van der Wielen, F. W. M.; Dimandja, J. M.; Green, V.; Granger, J.; Patterson, D.; Racovalis, L.; de Geus, H. J.; de Boer, J.; Haglund, P.; Lipsky, J.; Sinha, V.; Ledford, E. B., Jr. J. High Resolut. Chromatogr. 1999, 22, 3-10. (16) Beens, J.; Blomberg, J.; Schoenmakers, P. J. J. High Resolut. Chromatogr. 2000, 23, 182-188. (17) Kinghorn, R. M.; Marriott, P. J. J. High Resolut. Chromatogr. 1998, 21, 620-622. (18) Bushey, M. M.; Jorgenson, J. W. Anal. Chem. 1990, 62, 161-167. (19) Opiteck, G. J.; Ramirez, S. M.; Jorgenson, J. W.; Moseley, M. A. Anal. Biochem. 1998, 258, 349-361. (20) Opiteck, G. J.; Lewis, K. C.; Jorgenson, J. W.; Anderegg, R. J. Anal. Chem. 1997, 69, 1518-1524. (21) Bushey, M. M.; Jorgenson, J. W. Anal. Chem. 1990, 62, 978-984. (22) Hooker, T. F.; Jorgenson, J. W. Anal. Chem. 1997, 69, 4134-4142. (23) Moore, A. W. J.; Jorgenson, J. W. Anal. Chem. 1995, 67. (24) Larmann, J. P. J.; Lemmo, A. V.; Moore, A. W., Jr.; Jorgenson, J. W. Electrophoresis 1993, 14, 439-447. (25) Lemmo, A. V.; Jorgenson, J. W. Anal. Chem. 1993, 65, 1576-1581.

5834

Analytical Chemistry, Vol. 73, No. 24, December 15, 2001

first-column time axis, and then by discussing how the 1-D alignment method can be modified to determine run-to-run shifts on both column time axes. The 1-D alignment method applies an iterative routine to determine the peak shift (along one time axis) between the 2-D peaks in common between an “unknown” sample data matrix, M, and a calibration standard data matrix, N. The data matrices M and N are usually small regions of 2-D separation data matrices that contain only the overlapped peaks of interest. The alignment method is based on the fact that the data matrix formed by augmenting M and N has a minimum pseudorank when the peaks in M and N are perfectly aligned. The pseudorank is defined as the rank of a data matrix in the absence of noise.26 Several methods exist for estimating the pseudorank.26-29 For a data matrix containing 2-D peaks, the pseudorank of the data matrix ideally equals the number of peaks. The 1-D alignment method produces an augmented data matrix by “stacking” M and N, as illustrated in Figure 1A. In Figure 1A, the augmented matrix is represented by two stacked contour plots, one representing M and the other, N. Both M and N are simulated 30- by 30-point data matrices containing the peaks for the same two components but at different concentrations. M and N are joined such that the augmented matrix has twice the number of data points on the column 2 axis than either M or N. The joining of M and N in this way is done to determine the peak shift on the column 1 time axis. When the peaks between M and N are aligned on the column 1 axis, then the pseudorank of the augmented matrix is at a minimum value. In the case of Figure 1A, the minimum pseudorank is two, which is the number of different chemical components. However, when the peaks between M and N are shifted on the column 1 axis, then the pseudorank of the augmented matrix will be greater than two. Hence, for shifted peaks, the 1-D alignment method essentially shifts the peaks in the M matrix until the pseudorank of the augmented matrix reaches a minimum value. In Figure 1A, the shifting of the peaks in M is represented by the double arrows. In practice, the alignment method does not determine the pseudorank of the augmented matrix as a function of peak shift. Instead, the percent residual variance is determined as a function of peak shift. The percent residual variance is defined as the sum of the secondary eigenvalues from the singular value decomposition (SVD) of the augmented matrix divided by the sum of all of the singular values and multiplied by 100 and a term related to the degrees of freedom.8 The secondary eigenvalues are those eigenvalues associated with the data variance of measurement error or noise. The percent residual variance reaches a minimum only when the pseudorank of the augmented matrix reaches a minimum. Figure 1B depicts the percent residual variance plot for the augmented matrix depicted in Figure 1A. The secondary eigenvalues used in generating the percent residual variance plot (see Figure 1B) are the complete sequence of eigenvalues, except the first two, produced by the SVD of the augmented matrix. The first two eigenvalues are called the primary eigenvalues. The number of primary eigenvalues, which is simply the estimated (26) Faber, N. M.; Buydens, L. M. C.; Kateman, G. Anal. Chim. Acta 1994, 296, 1-20. (27) Faber, N. M.; Buydens, L. M. C.; Kateman, G. Chemom. Intell. Lab. Syst. 1994, 25, 203-226. (28) Faber, K.; Kowalski, B. R. Anal. Chim. Acta 1997, 337, 57-71. (29) Malinowski, E. R. Factor Analysis in Chemistry, 2nd ed.; Wiley: New York, 1991.

Figure 1. (A) By joining the sample matrix M to the standard matrix N to form an augmented matrix, retention time alignment along the first dimension (column 1) of the 2-D separation can be achieved by shifting the peaks in M relative to those in N until a minimum pseudorank is achieved. (B) The correct alignment for M to N on the column 1 time axis is illustrated by the minimum in the % Residual Variance plot. Data interpolation was used to find the best time shift and minimum % Residual Variance. (C) Following alignment along the column 1 time axis, alignment along the second dimension (column 2) is achieved by joining the sample matrix M side-by-side with the standard matrix N to form an augmented matrix. Again, retention time alignment is achieved by shifting M relative to N until a minimum pseudorank is achieved. (D) The correct alignment for M relative to N on the column 2 time axis is illustrated in the % Residual Variance plot.

pseudorank of M, must be determined to initiate the alignment algorithm. The primary eigenvalues completely describe the data variance of all of the peaks in the augmented matrix when the peaks in M are aligned with those in N along the column 1 axis. The column 1 peak shift required for the alignment of M to N is the peak shift producing a minimum in the percent residual variance plot. In Figure 1B, the percent residual variance reaches a minimum value at a data-point shift of -2.5. In other words, the two peaks in M (see Figure 1A) elute 2.5 data points later than the same peaks (at different concentrations) in N. Retention time increases with data point number. A maximum expected peak shift of four was inputted into the alignment algorithm to generate the percent residual variance plot in Figure 1B. The decimal data point shift of -2.5 was obtained using data point interpolation as part of the alignment algorithm to more accurately determine the peak shift. Interpolation is required for accurate peak shift determination in situations where the data density is low (e.g., 3-4 data points/peak at base). Later in this manuscript, interpolated peak alignment is demonstrated on comprehensive 2-D data with a low data density. Up to this point, the theory and steps used by the 1-D alignment method for determining peak shift on one dimension have been described. Now we will discuss how the 1-D alignment algorithm can be used to determine peak shifts on both time axes. First, one must realize that the pseudorank of the augmented matrix that is produced to determine peak shift along one time axis is in no way affected by a peak shift present on the other axis. This means the 1-D alignment method can correct peak shift on the column 1 axis regardless of any peak shift on the column

2 axis, and vice versa. Hence, the 1-D alignment method can be used to a correct peak shift along the column 1 axis, and then it can be used to a correct peak shift along the column 2 axis. Determining the peak shift along the column 2 axis is accomplished by using the augmented matrix obtained by “stacking” M and N “side-by-side”, as depicted in Figure 1C. This new augmented matrix has twice the number of data points on the column 1 axis as either M or N. As inferred by the arrows in Figure 1C, the peaks in M are shifted together relative to those in N along the column 2 axis. This is done in order to find the shift resulting in the smallest percent residual variance. As shown in the percent residual variance plot (Figure 1D), a minimum residual variance is obtained with a shift of 2. This means that the two peaks in M (see Figure 1C) elute two data points earlier along the column 2 axis than the two peaks in N. A maximum expected peak shift of six was inputted into the alignment algorithm. Data point interpolation was not needed for this case and, therefore, it was not used. The need to have properly aligned peaks between the sample matrix M and the standard matrix N is crucial for the successful analysis of unresolved peaks by GRAM. GRAM is a chemometric method that resolves and quantifies overlapped peaks that are common to both M and N. Several papers cover the GRAM algorithm in detail.11,30-32 The specific requirements for GRAM analysis of comprehensive 2-D separation data have been reported.9 To use GRAM, the peaks present in M and N must be (30) Faber, N. M.; Buydens, L. M. C.; Kateman, G. J. Chemom. 1994, 8, 147-154. (31) Li, S.; Hamilton, J. C.; Gemperline, P. J. Anal. Chem. 1992, 64, 599-607. (32) Wilson, B. E.; Sańchez, E.; Kowalski, B. R. J. Chemom. 1989, 3, 493-498.


5835

Figure 2. Schematic representation of the comprehensive GC × GC instrument used to collect 2-D data for the study.

either bilinear or approximately bilinear. A bilinear peak is a 2-D peak that can be mathematically represented by the product of its elution profiles along each column axis. Generally, a bilinear peak in a contour plot appears as an elliptical zone that has its major and minor axes aligned with the two column axes. Contour plots from published GC × GC, LC × LC, and LC × CE papers depict 2-D peaks that appear bilinear.4,9,18,21,33 Using M and N, GRAM calculates the pure elution profiles of overlapped 2-D peaks. Each 2-D peak can then be individually reconstructed using its respective two elution profiles. In addition, GRAM provides the analyte concentrations for M relative to N. Similarly to the alignment method, the GRAM algorithm requires the input of the pseudorank for M or N, whichever is larger. No other inputs are required. Although in the studies presented here, both the sample and standard matrices contain the same number of components with no additional interfering components, it is important to mention that implementation of GRAM does not require the sample and standard matrices to contain the same number of components. Thus, it is not necessary for interfering components to be present in the standard. This is referred to as a “second-order advantage” and has been demonstrated recently for a variety of hyphenated separation methods.6,8,9 Essentially, both rank alignment and GRAM should perform well, irrespective of whether there is an equal number of components in the sample and standard matrices. EXPERIMENTAL SECTION Two solutions with heptane (HPLC grade, Fisher Scientific, Fair Lawn, NJ) as the solvent were made. Both solutions contained propylbenzene, PB (98% purity, Aldrich Co., Milwaukee, WI), m-ethyltoluene, MET (99% Aldrich), and 1,3,5-trimethylbenzene, TMB (99% Aldrich). The concentrations (w/w) for the three analytes, in the order listed above, were 0.418, 1.29, and 0.45% in solution 1 and 0.37, 0.399, and 0.842% in solution 2. Four replicate GC × GC runs were made for each solution. A schematic representation of the diaphragm-valve-based GC × GC system used to analyze the two heptane solutions is depicted in Figure 2. The GC × GC runs were performed using a Varian gas chromatograph (model 3600cx, Varian, Sugar Land, TX) equipped with a flame ionization detector (FID). An autosampler (model 7673A, Agilent Technologies, Palo Alto, CA) was used to inject 1 µL of each solution into a split/splitless injector at 250 °C, which operated in the splitless mode for the entire GC × GC 5836


run. The column oven temperature was held at 95 °C, and the FID, at 250 °C. The column head pressure was set at 12.8 psi. Helium was the carrier gas. Column 1 of the GC × GC system was a 9.2-m × 530-µm-i.d. capillary column with a 3-µm poly(dimethylsiloxane) film (SPB-1, Supelco, Bellefonte, PA). Column 2 was a 0.89-m × 180-µm-i.d. column with a 0.15-µm poly(ethylene glycol) stationary phase (Carbowax, Quadrex Corp, New Haven, CT). The diaphragm valve (model 11, Applied Automation, Bartlesville, OK) was actuated for 15 ms every 320 ms during an 80-s GC × GC run. Thus, the diaphragm valve essentially injected a small portion of the column 1 effluent onto column 2 every 320 ms. Although the cycle time between injections onto column 2 was 320 ms, the elution time for the three analytes on column 2 was approximately 1 s. In other words, each analyte reached the FID during the time between the second and third injection following the initial injection of the analyte onto column 2. A relatively short cycle time of 320 ms was used in order to initially get a large number of column 2 injections (∼14-15) across the base width of each analyte peak eluting from column 1. The effluent from column 1 was split after the diaphragm valve between column 2 and 0.5 m of 180-µm-i.d. silica tubing in order to maintain efficiency with the column 2 separations. The FID signal from the Varian was measured at a rate of 10 000 points/s by a data acquisition board (model AT-MIO-16XE50, National Instruments, Austin, TX) connected to a PC running LabVIEW 5.0 (National Instruments). The raw data was then boxcar-averaged to 500 points/s and transferred into Matlab 5.2 (The Mathworks Inc, Natick, MA) where it was converted into a matrix of data such that each row of the matrix represented a fixed time on column 2 and each column of the matrix represented a fixed time on column 1. A submatrix was selected from each data matrix for analysis. The submatrix was 44.8-64 s on the column 1 time axis and 0.941.09 s on the column 2 time axis, which corresponded to 61- by 76-points in size. This data matrix contained the overlapped 2-D peaks for PB, MET, and TMB. A total of eight of these data matrices, that is, four replicate GC × GC matrices (M1, ..., M4) from solution 2 and four replicate GC × GC matrices (N1, ..., N4) from solution 1, were obtained. Both retention time alignment and GRAM analysis of the data matrices were performed in Matlab. The algorithms for 1-D alignment, 2-D alignment, and GRAM were written in-house in Matlab code. The pseudorank used in applying all three algorithms was three, because the peaks of three components, PB, MET, TMB, were known to be present in the data matrices. Retention time alignment was only demonstrated on the eight data matrices after they were artificially shifted to mimic LC × CE separations. The expected maximum shifts entered into the 1-D alignment and 2-D alignment algorithms were 4 and 10 data points, respectively. These maximum shifts were based on the known true maximum shifts of the artificially shifted data. In real applications, the expected maximum shift can be objectively calculated on the basis of the typical retention time precision for the 2-D separations. Besides the pseudorank and the two expected maximum shift values, no other information was required to apply retention time alignment and GRAM analysis on the data matrices.

Figure 3. (A) 3-D plot of the standard data matrix N1 containing the peaks of three analytes: PB, propylbenzene; MET, m-ethyltoluene; TMB, 1,3,5,-trimethylbenzene. A total of four standard data matrices, N1, N2, N3, and N4, were obtained from replicate GC × GC analyses of solution 1 (see Experimental Section). (B) Sample data matrix M1, also containing the peaks for PB, MET, and TMB, but at concentration ratios relative to the standard of 0.89, 0.31, and 1.9, respectively. A total of four sample data matrices, M1, M2, M3, and M4, were obtained from replicate GC × GC analyses of solution 2.

Figure 4. (A) Deconvoluted peak matrix M1,MET containing the deconvoluted peak of MET in M1 (see Figure 3B) obtained by GRAM analysis of M1 and N1 (see Figure 3A). (B) An overlay of the averaged deconvoluted peak matrix for each analyte, PB, MET, and TMB, summed onto the column 1 axis obtained from 16 GRAM analyses using four replicate sample data matrices (M1, ..., M4) and four standard data matrices (N1, ..., N4). Each analyte is easily identified, even though the resolution was quite low. (C) As in B, except the averaged deconvoluted peak matrix for each analyte was summed onto the column 2 time axis. The high quality of the deconvoluted peak profiles on each column axes is indicative of the high run-to-run retention time precision obtained for this GC × GC data set.

RESULTS AND DISCUSSION GC × GC data was obtained from replicate runs of the two solutions described in the Experimental Section. The initial GC × GC data had almost no peak shifting in either dimension, primarily because of excellent column flow stability. At this juncture we must ascertain how suitable this initial (unshifted) GC × GC data will be as a benchmark for studying the effects of added run-to-run 2-D shifting and reduced data density. Figure 3A depicts, in 3-D plot format, an initial GC × GC data matrix used for this study. This data matrix, N1, represents a standard data matrix, N, and was obtained by the GC × GC analysis of solution 1. It contains the overlapped peaks for the components PB, MET, and TMB. The peaks in Figure 3A are overlapped to the point that accurate quantification using traditional methods, such as peak volume or height, is not possible. Figure 3B depicts an original data matrix obtained by the GC × GC analysis of solution 2. This matrix, M1, represents a sample data matrix, M. The peaks in M1 are from the same three components in N1 but at different concentrations. When GRAM analysis is performed on M1 and N1, depicted in Figure 3, parts B and A, respectively, the deconvoluted peaks for PB, MET, and TMB are produced for both M1 and N1. Each deconvoluted peak is represented by a data matrix just like M1 or N1 but without the presence of the

other two peaks. Figure 4A depicts the data matrix containing the deconvoluted peak for MET in M1 (see Figure 3B). For brevity, the other deconvoluted peaks are not shown. GRAM analysis also produces each deconvoluted peak’s concentration in M1 relative to N1. Because the GC × GC data used in this study consists of four replicate M matrices (i.e., M1, ..., M4) and four replicate N matrices (i.e., N1, ..., N4), a total of 16 different GRAM analyses are performed. Hence, 16 different deconvoluted peaks are generated for each component in the sample (i.e., solution 2) and for each component in the standard (i.e, solution 1). Figure 4B depicts an overlay of the averaged deconvoluted peaks for PB, MET, and TMB in the sample summed onto the column 1 axis. Each peak depicted in Figure 4B is the summation onto the column 1 axis of the data matrix obtained by averaging 16 deconvoluted peak matrices. Matrix summation permits the assessment of peak shape and resolution along a particular axis. Figure 4C depicts the averaged deconvoluted peaks for PB, MET, and TMB in the sample summed onto the second column axis. Figure 4B,C depicts deconvoluted peaks with realistic shapes, that is, they are nearly Guassian in appearance. The realistic peak shapes indicate reliable GRAM results.5 The predicted mean sample-to-standard concentration ratios of 0.95, 0.32, and 1.8 for Analytical Chemistry, Vol. 73, No. 24, December 15, 2001

5837

Table 1. Quantitative Results for GRAM Analysis mean sample/standard concentration ratio

gravimetric GRAM, original datab GRAM with 2-D alignmentc GRAM with 2-D alignment (1/4 data density)d

% RSD sample/standard concentration ratioa

PB

MET

TMB

PB

MET

TMB

ave

0.89 0.95 0.97 0.97

0.31 0.32 0.32 0.31

1.9 1.8 1.8 1.7

8.0 8.3 8.0

3.1 2.7 3.6

1.5 1.4 3.5

4.2 4.1 5.0

a % RSD equals the relative standard deviation, expressed as a percentage. b GRAM analysis of original data, as demonstrated in Figure 4. GRAM analysis of data after adding run-to-run shifting prior to applying 2-D alignment, as demonstrated in Figure 6B,C. d GRAM analysis of data, as in b, but after reducing the data density to one-fourth the original, as demonstrated in Figure 7.

c

PB, MET, and TMB, respectively, should be compared to the gravimetrically determined ratios of 0.89, 0.31, and 1.9, as reported in Table 1. The discrepancies in the sample-to-standard concentration ratio for the gravimetric and GRAM (unshifted original data) are then 6.7% for PB, 3.2% for MET, and 5.3% for TMB. Because very small quantities of these volatile components were used in sample preparation, these discrepancies in quantification accuracy are reasonable and consistent with the performance of GRAM in previous work.5,9 The percent relative standard deviations (% RSD) for the predicted concentration ratios are 8.0, 3.1, and 1.5% for PB, MET, and TMB, respectively. The possible sources of error with regard to these precision measurements are retention time variation in both separation dimensions, signal-to-noise levels, injection volume reproducibility for both separation dimensions, and analyte-to-interference-peak height ratio.10 And foremost, the degree of peak overlap, that is, resolution, is another important source of error.10 For instance, the larger % RSD for PB can be explained by the fact that its degree of peak overlap, as measured by its net analyte signal, is significantly more than the other analytes. The net analyte signal34 is defined as that part of the signal that relates uniquely to the analyte of interest and can be determined from the GRAM resolved peaks. Generally, as the net analyte signal decreases (i.e., gets worse), the % RSD for GRAM quantification gets larger.35 On average, the precision level for the three analytes is reasonable for the limited data set used in this study. It is instructive to note that the present work is not about the performance of GRAM, which has been reported in more detail earlier,4-10 but rather is about the performance of the 2-D alignment algorithm. To this end, the GRAM results for the unshifted original data serve adequately as the benchmark to compare the effects of added run-to-run 2-D shifting and reduced data density, as will now be addressed. The above GRAM concentrations were achieved without prior alignment of the four M and four N matrices, because the runto-run retention times on each time axis were sufficiently reproducible for GRAM analysis. However, there are 2-D separations that do not have adequate retention time precision on one or both time axes for GRAM analysis. In those cases, peak alignment between the M and N matrices are needed. Previously, 1-D alignment has been demonstrated on GC × GC data that only needed alignment on the column 1 axis.5 It is the goal of this paper to develop the concept of and demonstrate 2-D alignment on (33) Liu, Z.; Sirimanne, S. R.; Patterson, D. G., Jr.; Needham, L. L.; Phillips, J. B. Anal. Chem. 1994, 66, 3086-3092. (34) Lorber, A.; Faber, K.; Kowalski, B. R. Anal. Chem. 1997, 69, 1620-1626. (35) Bruckner, C. A. Doctoral Dissertation, University of Washington, Seattle, WA, 1998.

5838


comprehensive 2-D separation data needing alignment on both time axes. This is accomplished by first artificially shifting the GC × GC data (i.e., M1, ..., M4 and N1, ..., N4) and then aligning it using the 2-D alignment method. The artificial shifts were chosen to mimic real shifts obtained in LC × CE separations. To simulate the peak shifting found in LC × CE separations, the peak-width-based retention time precision, δ, had to be determined for the given GC × GC data and for published LC × CE data. δ is defined by the equation

δ ) st /4σt ) (1/4)(st /tr )N1/2

(1)

where st is the standard deviation of retention time, 4σt is the peak base width, tr is the mean retention time, and N is the plate count.36 For the GC × GC data, the δ values were 0.017 for column 1 separations and 0.012 for column 2 separations. The δ values are small enough that the GRAM results (e.g., peak shapes and concentration ratios) obtained with 2-D alignment were not significantly or statistically different from those obtained without alignment (see earlier paragraph for latter results). For LC × CE, the δ values were determined to be 0.087 for the LC column separations and 0.13 for the CE column separations. The δ values were based on two published LC × CE reports21,22 that provided the values for st, 4σt, st/tr, and N used to calculate δ. To simulate the peak-shifting occurring in LC × CE separations, the calculated δ values for the GC × GC data needed to be increased. This was accomplished by first calculating the N values for the column 1 and column 2 separations depicted in Figure 4B,C, respectively. The calculated N values are 2500 for column 1 and 3800 for column 2. Using eq 1 and the calculated N for each GC column, the relative retention time precision, or st/tr, needed to achieve the desired δ for LC × CE conditions is obtained. For column 1, st/tr needed to be 0.0069 in order to have a δ value of 0.087 with the GC × GC data, which is typical of LC separations in LC × CE. For column 2, st/tr needed to be 0.0083 in order to have a δ value of 0.128, which is typical of CE separations in LC × CE. Specific steps were performed to give the GC × GC data the st/tr values needed to simulate LC × CE data. First, a set of eight simulated retention times for column 1 and another set of eight for column 2 were generated. Each set represented the run-torun retention times of MET on a column time axis. The averaged retention time of each set equaled the retention time of the averaged deconvoluted MET peak depicted in Figure 4B for column 1 and Figure 4C for column 2. Each set’s relative standard (36) Bahowick, T. J.; Synovec, R. E. Anal. Chem. 1995, 67, 631-40.

Figure 5. (A) Single boundary contour line in an overlay plot for the four sample matrices M1 through M4 (see Figure 3B for M1 in 3-D plot format). Very little run-to-run retention time variation is observed, which is the reason the GRAM deconvolution results in Figure 4 were excellent. (B) Single boundary contour line in an overlay plot after random addition of run-to-run retention time shifts on both separation time axes that are consistent with LC × CE reports. (C) Overlay plot of four sample matrices in B after retention time alignment with the standard matrix N4 along the column 1 time axis as illustrated in Figure 1A,B. (D) Overlay plot of four samples matrices in C after retention time alignment with the standard matrix N4 along the column 2 time axis as illustrated in Figure 1C,D. Note that after alignment on both separation axes, the overlay plot in D is essentially the same as before adding retention time shifting as in A.

deviation equaled the st/tr value needed to simulate LC × CE separations. The simulated retention times were then randomly assigned among the four M and four N matrices such that each matrix had one unique column 1 retention time and one unique column 2 retention time. The peaks in the M and N matrices were then shifted by the amount needed so that the MET peak had the simulated column 1 and column 2 retention times. Figure 5A demonstrates the run-to-run peak shifting present in the original GC × GC data by depicting the unresolved peak outline for PB, MET, and TMB for each of the four M matrices. For brevity, the peak outlines for the four N matrices are not shown. Each peak outline is an isogram (contour line) connecting points having a specific signal intensity near the peak cluster base. An isogram is a line connecting points of equal value. All four isograms measure the same signal intensity. As expected, all four isograms in Figure 5A are tightly grouped because of the good retention time reproducibility of the GC × GC data. On the other hand, after adding 2-D shifting consistent with LC × CE separations, as previously described, the four M matrices in Figure 5B are obtained. Although for brevity, they were not depicted, the four N matrices with simulated retention times also have similar 2-D shifting. The artificially shifted M and N matrices will definitely need 2-D alignment before applying GRAM. As discussed in the theory section, the 2-D alignment method aligns the peaks in M to those in N along the column 1 axis and then along the column 2 axis. Figure 5C depicts the peak outlines of the four artificially shifted M matrices after alignment of the Figure 5B data along the column 1 axis using the 1-D alignment method with interpolation. Interpolation was used to artificially increase the column 1 data density by a factor of 2. All M matrices were aligned to one artificially shifted N matrix. Thus, only four pairs of sample/standard combinations were analyzed. Figure 5D depicts the four peak outlines following the second step of the alignment process, along the column 2 time axis. Again, all M matrices were aligned to the same N matrix. Interpolation was not used for column 2 alignment because of a relatively high data density of ∼32 data points/peak base width. Note how Figure 5A, which depicts the original data before artificial introduction of shifting, and Figure 5D, which depicts the artificially shifted data after 2-D alignment, are almost identical except for the overall position of the peak outlines. This is because the N matrix, to which each of the four M matrices was aligned, was also artificially shifted. Figure 5D only really demonstrates the reproducibly or precision of the alignment method. To determine the accuracy of

the alignment method, one must either compare the calculated shifts with the simulated shifts (i.e., true shifts) or perform GRAM analysis. The latter approach is taken, because it would be the approach used in real-world applications. The shape of a deconvoluted peak produced by GRAM analysis can be an excellent indicator of whether peaks are accurately aligned. Peak shape has already been reported as a parameter for determining the relative accuracy of GRAM analysis.5 Peak shape provides an objective assessment of GRAM analysis, because the GRAM algorithm does not put any restrictions on the predicted shapes of deconvoluted peaks. In essence, realistic peak shapes indicate valid GRAM results. The concept of using peak shapes for judging the validity of GRAM analysis can be applied for the determination of alignment accuracy. Assuming that peak shifting is the limiting factor for accurate GRAM analysis, a realistic peak shape would indicate accurate peak alignment, but an unrealistic peak shape would indicate the opposite. A good example of unrealistic peak shapes is depicted in Figure 6A. Figure 6A is an overlay of the deconvoluted peaks obtained from the GRAM analysis of the four M matrices depicted in Figure 5C summed onto the column 1 axis. These four M matrices are aligned along the column 1 axis only, although alignment along both axes is warranted. GRAM analysis was accomplished using the same N matrix used to align the four M matrices along the column 1 axis. It is difficult to tell, but there are a total of 12 peaks depicted in Figure 6A, 4 peaks for each component. The unrealistic peak shapes (e.g., from peaks having more than one apex and dipping below the baseline) indicate that not even one M matrix was accurately aligned with the N matrix. Although, for brevity, they are not shown, the deconvoluted peaks summed onto the column 2 axis look just as terrible as those depicted in Figure 6A. Following 2-D alignment, Figure 6, parts B and C, are each an overlay of the deconvoluted peaks obtained from the GRAM analysis of the four M matrices depicted in Figure 5D summed onto the column 1 and column 2 axes, respectively. Each of the four M matrices is aligned to the same N matrix along both column axes. GRAM analysis was accomplished using the same N matrix. Four peaks for each component are depicted in each figure. The proper peak shapes in Figure 6B,C support accurate 2-D alignment of each M matrix to the N matrix. Additional confirmation of accurate 2-D alignment comes from the GRAM predicted concentrations for PB, MET, and TMB in the sample relative to the standard. The mean ratio of the analyte concentration of the sample and standard is depicted in Table 1, row 3, and Analytical Chemistry, Vol. 73, No. 24, December 15, 2001

5839

Figure 6. (A) Applying GRAM to four sample matrices and a single standard matrix with retention time shifting randomly added and following alignment along the column 1 time axis only (see Figure 5C) produces poor deconvoluted profiles because shifting along the column 2 time axis now requires correction (12 profiles from four sample runs with three analytes). (B) After alignment along both axes (see Figure 5D), the GRAM results are quite satisfactory for the deconvoluted peak profiles on the column 1 time axis. Compare the profiles to Figure 4B. (C) Likewise, the deconvoluted peak profiles on the column 2 time axis are also good and can be compared to Figure 4C. Thus, the 2-D alignment algorithm successfully corrects for retention time shifting on both dimensions in an independent, stepwise fashion.

was determined for each component using all four M and four N matrices. The GRAM predicted mean concentration ratios compare favorably with the gravimetrically determined concentration ratios and those obtained from the GRAM analysis of the original GC × GC data (see Table 1, rows 1 and 2). Up to this point, 2-D alignment has been successfully demonstrated with the GC × GC data, which has the run-to-run retention time variation of LC × CE data. This GC × GC data has a relatively high data density of 14-15 points/peak base width along the column 1 axes because of fast column 2 injection cycle times. However, for most comprehensive 2-D separations, such as for LC × CE, the data density of the first column can be as low as 3-4 points/peak. Therefore, to more closely mimic LC × CE separations or a related 2-D separation method the GC × GC data density was reduced to ∼3-4 points/peak, that is, with 3-4 column 2 separations/peak base width on column 1. Then, the 2-D alignment method and GRAM analysis were applied after the reduction in data density. This data reduction was accomplished by creating a new set of M and N data matrices that included only every fourth second-column separation vector from the artificially shifted M and N matrices. Parts A and B of Figure 7 are an overlay of the deconvoluted peaks obtained from the GRAM analysis of four, data-reduced M matrices with the same datareduced N matrix summed onto the column 1 axis and column 2 axis, respectively. Each of the four M matrices was 2-D aligned with the same N matrix prior to GRAM analysis. Data point interpolation was used as part of the 2-D alignment algorithm to artificially increase the data density on the column 1 axis by a factor of 8. Figures 7A and 6B can be directly compared with one another, because initially, the same four M matrices and the same N matrix were used in generating both figures. Figures 7A and 6B are essentially the same except for the data density. Figures 7B and 6C can also be directly compared. These two figures are very similar expect for the peak intensities. The lower peak intensities in Figure 7B are due to the removal of analyte signals by the data reduction procedure. The 2-D alignment and the GRAM analysis of this new data were successful, as indicated by the realistic peak shapes in Figure 7A,B and the accurate GRAM concentration ratios listed in Table 1, row 4. The quantification % RSD did increase, as one might expect, but remained within an acceptable level. CONCLUSIONS It was demonstrated that a 2-D alignment method can be used to correct run-to-run retention time shifts along both time axes 5840 Analytical Chemistry, Vol. 73, No. 24, December 15, 2001

Figure 7. Application of GRAM to 2-D data with a reduced data density so that only three to four runs on the column 2 time axis are available to define the peak width eluting from the column 1 dimension. The deconvoluted peak profiles in A and B should be compared to Figure 6B and C, respectively. The results have a positive impact on the application of GRAM for 2-D separation methods, such as GC × GC and LC × CE operated in a high-speed mode.

for comprehensive 2-D separation data. We acknowledge that, ideally, controlled experiments with the 2-D alignment method should be undertaken with data collected using 2-D techniques other than GC × GC, such as LC × CE. The reader is warned that these additional studies are warranted in order to more fully ascertain the utility of the 2-D alignment algorithm coupled with a second order analysis method such as GRAM. Chemometric peak resolving, peak quantifying, and sample classifying methods show a great deal of promise as efficient means of interpreting the large amount of information contained in comprehensive 2-D separation date. The correction of run-to-run retention time shifts, which is addressed in this paper, is a vital step toward the implementation of these chemometric methods on comprehensive 2-D separation data. The 2-D alignment with GRAM was successfully extended to high-speed 2-D separation conditions with a reduced data density. ACKNOWLEDGMENT We thank the Center for Process Analytical Chemistry (CPAC) for partial support. Received for review June 12, 2001. Accepted September 21, 2001. AC010656Q

Objective Data Alignment and Chemometric Analysis of

Recommend Documents