Multichannel detection and numerical resolution of overlapping

spectrally. Although this discussion applies generally to all multichannel chromatographic systems, it is illustrated in particular for the case of GC...
1 downloads 10 Views 618KB Size
Anal. Chem. 1981, 53, 821-825

821

Multichannel Detection and Numerical Resolution of Overlapping Chromatographic Peaks F. J. Knorr, H. R. Thorsheim, and J. M. Harris" Department of Chemistry, University of Utah, Salt Lake City, Utah 841 12

A method of data analy:;ls for chromatographic systems employing multichannel detectors Is shown to allow resolution of components of mixtures which are overlapped both chromatographlcally and speci:rally. The multichannel chromatographic data form a matrlx, [D], which can be decomposed Into its factor matrlces [A], [a], and [C] which contain the spectra, concentrations, and chromatograms of the lndlvldual components, respective!ly. Trial retentlon tlmes, used to construct [C] from chromatographlc response theory, are optimized for minimal error In reproducing [D]. The spectral Information is extracted by multiplying [D] by the pseudolnverse of [C]. The technique is also successful in Identifying the number of major components In the data. An experimental demonstratlon of the methad on GC/MS data from overlapped binary and ternary mixtures is presented.

Chromatographic techniques which employ multichannel detection, such as gas chromatography-mass spectrometry (GC/MS) or gas chromatography-infrared spectrometry (GC/IR), are among the most powerful techniques for the qualitative and quantitative analysis of mixtures. These methods derive their power from the large number of independent informational degrees of freedom in the measurement. As a result, they represeint the method of choice for analysis of many complex biomedical and environmental samples. Yet these powerful techniques have their limitations, particularly when the signal from an analyte is severely overlapped with signals from background or neighboring components in both the time and spectral dimensions. In this paper, we present a new approach to analysis of data from multichannel chromatographic systems for identifying components of mixtures which are severely overlapped chromatographically and spectrally. Although this discussion applies generally to all multichannel chromatographic systems, it is illustrated in particular for the case of GC/MS. The literature of interpreting spectral data from overlapped chromatographic peaks is directed predominantly toward GC/MS. The data analysis techniques have in common the recognition that the spectiral pattern of a particular compound will rise and fall in unison when that compound elutes as a single species. One approach to using this behavior presumes that one of the m / e channels, in a region of chromatographic overlap, belongs to only one compound. This pure mass channel is identified as the one which in sharpest with a particular time window, determined by fitting the mass chromatograms to a logarithmic or polynomial rate function (1-3). The pure mass chromatogram, once identified, then serves as a template against which chromatograms at other mass channels are compared by correlation to extract the spectrum of the component in question. Following subtraction of the first component from the data set, the process is repeated to identify the spectra of other species in the mixture. The major limitation of this approach is the requirement of a unique mass fragment for each component identified, particularly in the analysis of compounds having similar struc-

tures where spectral and chromatographic overlap are both likely. An alternate approach to resolving overlapped GC/MS signals has been to subject the mixed spectral signal to library searching. This method seeks to find mass spectra from library files, which, when added together, would produce the mixed spectra measured (4-8). This approach is limited by the scope of the library available; in addition, the spectral matching must be performed by an efficient algorithm, otherwise computer connect time becomes prohibitively expensive. Furthermore, the correlation between the library and recorded spectra might not be ideal due to the instrument signature imposed on the data, requiring spectrum linearization procedures (6). To avoid the above limitations, we have developed a method of data reduction which makes use of the reproducible elution behavior of the chromatographic system to model the time response of the data, which eliminates the requirement of a pure mass chromatogram. The spectral information is extracted by a simple least-squares procedure, without requiring a library search. This data reduction concept was first demonstrated (9) on mixtures of fluorescent compounds, where the decay of emission following pulsed excitation was measured on a nanosecond time scale and the modeling of the time response was based on exponential relaxation of the excited state. The theory of this data reduction technique, as it applies to chromatography, is presented in general terms, together with a preliminary demonstration of its capability to resolve severely overlapped data from GC/MS.

THEORY Multichannel chromatographic data can be described in the form of an x by t matrix, [D], where x is the number of spectral channels (wavelengths,m / e values, etc.) and t is the number of the time channels at which spectral scans are gathered. If the components in the sample behave independently and the detector responds linearly, then the signal measured at spectral channel i and time interval j is the sum of the contributions from all n components

where Aik is the normalized spectral amplitude of compound k at spectral channel i, Qk is a quantitative factor related to the amount of compound k present, and c k j is the amplitude of the normalized chromatogram of compound k at time j . The elements in the data matrix from eq 1 are more conveniently expressed by the product of three matrices where [A] is an x by n matrix containing the normalized spectra of the n components in its columns, [Q] is a diagonal n by n matrix containing the quantitative factor of each component, and [C] is an n by t matrix containing the normalized chromatograms of the n components in its rows. The resolution of the mixture requires that the data matrix be decomposed into the factors [A], [&I, and [C]in order to

0003-2700/81/0353-0821$01.25/0 0 1981 Arnerlcan Chemical Soclety

822

ANALYTICAL CHEMISTRY, VOL. 53, NO. 6, MAY 1981

obtain the spectra, amounts, and chromatograms of the individual components. To carry out this decomposition, the rows of [C] are modeled as Gaussian functions convoluted with a single-sided exponential to account for the tailing due to extracolumn dead volume (10-12). Since the instrument response function can initially be measured by using pure samples to determine the number of theoretical plates, N , and the characteristic tailing decay time, T , the construction of the entire trial chromatogram matrix, [C’], depends only on one parameter for each component, the retention time, tRk ck;

= G , * expUAt/T)

(3)

where At is the time interval between spectral scans and Gkj

(Tkk-1(2n)-1/2 exp[GAt - tR,)2/2q2]

(4)

where (Tk

= tR,N-lI2

(5)

For a particular chromatographic trial matrix, [C’], the corresponding best spectral-amplitude matrix product, [A’] [Q], can be found by multiplying the data matrix by the pseudoinverse (13) of [C’]. [A’l[Ql = [Dl [C’lT([C’lCC’1T)-l

(6)

This process yields the product [A’] [Q] corresponding to the least-squared error, x2,between the actual data matrix [D] and the model matrix [D’] for a given [C’], where

ID’] = [A’l[Q’ICC’l

(7)

and where

which is weighted by the inverse of the number of degrees of freedom, x t - n(x + 1). The actual value of x2 thus obtained depends on the choice of retention times used to model [C’]. The choice of retention times is guided by a SIMPLEX (14) minimization of x2 with respect to a tRk. To aid the search in obtaining results which are meaningful, an additional penalty may be added to x2 for negative spectral values. The error function which is actually minimized to find the optimum retention times is

proper value. If the temporal model used in the fit is correct, then, when n is the proper value, only high-frequency random noise is left in the residuals matrix. The contribution of this noise to x2 cannot significantly be decreased by further increasing n due to the low-frequency nature of the model. As a result, x2 does not decrease further by increasing n beyond its true value, particularly when one accounts for the additional loss of degrees of freedom in calculating x2. This approach to determining the number of components in the mixture is an attractive alternative to methods which attempt to identify the number of independent contributing functions in the data set using factor or principal component analysis (15-18). The major problem with these methods is the fact that the eigenvectors of the covariance matrix, which are determined, have little relationship to the actual functional form of the physical process. The above approach, on the other hand, searches for an optimum fit to the data using, as a basis set, functions which are the proper representations of the physical system. As a result, the determination of n is much less sensitive to noise, which is the most independent function in any real data set.

EXPERIMENTAL SECTION To illustrate this method of separation of severely overlapped mixtures, we gathered repetitively scanned GCMS data on binary and ternary mixtures of poorly resolved compounds. The mixtures 0.05 used were as follows: (1) 0.05 M 1,3-dimethylnaphthalene, M 2-ethylnaphthalene, and 0.05 M 2,6-dimethylquinoline in CH2Cl2; (2) 0.05 M 1,3-dimethylnaphthalene,0.05 M 2-ethylnaphthalene in acetone; (3) 0.07 M 3-methylcyclohexanol,0.05 M 1-heptanol,and 0.05 M 3-methylcyclohexanonein CH2C12.The mass spectra of the individual componentsof these mixtures were gathered under identical instrumental conditions to compare the spectra separated by numerical means. The chromatographwas 3 ft X 2 mm i.d. glass column packed with 1% OV 17 on Chromosorb W, interfaced to the mass spectrometer through a jet separator. The naphthalene mixture was eluted with an initial column temperature of 130 O C , temperature programmed at 20 “C/min starting at injection time. The ketone-alcohol mixture was eluted at an initial temperature of 30 “C, programmed at 20 “C/min. Injector temperature was 225 “C. Volumes of 1.0 FLof the mixtures were injected. Helium carrier gas flow rate was 18 mL/min. The mass spectrometer system was an LKB 9000s interfaced to a DEC PDP 11/40 running RT-11 V2 operating system which controlled spectrometer functions and data aquisition. Ion source pressure was 2 X lo4 torr; ion source temperature was 250 “C. Electron energy was 70 V and electron trap current was 60 FA. Repetitive scanning period for this system was a relatively slow 5

where Pnegis a constant of the order of 1-100 which sets the amount of penalty for negative spectral values. When the minimum in ERR is found, each column of the product matrix [A’][Q] is divided by the largest element in the column, thus constructing a normalized matrix of spectra, [A’], which makes comparisons to reference spectra simpler. The largest elements found in each column are then placed along the diagonal of the quantitation matrix, [Q]. The preceding data matrix treatment has presumed the number of components in the data set to be known. Since this is rarely the case, it is fortunate that the method is also well suited to the determination of n. If the above procedure is carried out for increasing values of n, one can expect the minimum x2 found to continue to decrease so long as n is too small. This is because the peak shapes that are used to model [C’], as in eq 3, represent a low pass filter of the time information that can be added to [D’] in an effort to obtain [D]. As long as the value of n is too low, there will be low-frequency features in the residuals matrix which can only be corrected by allowing the number of components to increase to the

9.

The repetitively scanned GC/MS data were copied to magnetic tape, and the data manipulation described here was performed on a DEC PDP 11/45 running RSX-11M operating system, affectionately known as Petie. With this system, it was possible to manipulate up to 30 by 30 data matrices. Matriceswere formed from time windows of width less than or equal to 30 scans; mle channels of negligible intensity were deleted until the number of rows was less then or equal to 30. The chromatographic system parameters, N and T, were evaluated from the total ion current signal measured for the elution of a pure compound. The standard deviation, u, of the underlying Gaussian and the exponentialdecay parameter, T , were calculated from the second and third moment of the peak as previously described by Lochmuller and Summer (12). The chromatographic system typically exhibited -750 theoretical plates and a tailing decay time, T = 8 s. To guide the search algorithm in finding the correct spectra, we set the penalty = 100. Values of P constant for negative spectral values at Pneg as small as 1still allowed the data to be separated,but the spectg assignments were significantly poorer.

RESULTS AND DISCUSSION The GC/MS data matrix for the ternary naphthalene mixture, 2-ethylnaphthalene, 1,3-dimethylnaphthalene,and

ANALYTICAL CHEMISTRY, VOL. 53, NO. 6, MAY 1981

823

Figure 1. GC/MS data matrix, ternary naphthalene mixture. Note tall of solvent peak on left. Figure 3. Mass spectra of 1,3dimethylnaphthalene: (A) spectrum of the isolated compound; (6) spectrum of second sample component numerically resolved from the data matrix.

Figure 2. Mass spectra of 2-ethylnaphthalens: (A) spectrum of the Isolated compound; (B) spectrum of first sample component numerically resolved from the data matrix. Flgure 4. Mass spectra,of 2,0dlmethylquinollne: (A) spectrum of the Isolated compound; (6) spectrum of third sample component numerically resolved from the data matrix.

Table I. Determination of the Number of Analyte Components (Values of xl(sca1ed)'" Vs. (n - l)* mixture (no. of analytes)

1

( n - 1) 2 3

4

1.14 3.78 1.00 9.45 (1) naphthalenes (3) 1.02 1.00 1.04 (2) naphthalenes (2) 1.58 1.00 2.31 1.00 20.77 (3) alcohols-ketone (3) a Values of xz are scailed to the minimum value found. Since the solvent peak, tailing into the data matrix, is easily separated out by the algorithm, the number of analyte components is one less than the total number in the mixture, n. 2,6-dimethylquinoline, is shown in Figure 1. The solvent tailing into the time window is apparent to the left. It is clear that the signals from the analytes are severely overlapped. Application of the data analysis procedure, while incrementing the value of n,was succei3sful in identifying the correct number of components. The value of x2,as determined by using eq 8, decreased rapidly until the proper value of n was used to construct [C'] as shown in Table I. Further increases in n do not reduce x2,since the fit does not improve any faster th$ the loss of degrees of freedom. As a check on the method, a binary naphthalene sample, 2-ethylnaphthalene and 1,3-dimethylnaphthalene, was run for comparison, and the x2 values minimized properly. If the correct number of components are used to construct [C'], and the retention times are adjusted for a minimum x2, then the normalized spectra of the individual components should occupy the columns of the matrix, [A'], factored from

Table 11. Spectral Error for Numerical Resolution of Mixtures re1 [A] tR, s error,a % 128 15 (1) 1 38 8 152 10 127 13 ( 2) 1 35 10 62 21 (3) 66 23 3-methylcy clohexanone 78 7 a The absolute values of the spectral errors are added and normalized to the total spectral amplitude. Only those masses represented in the data matrix are included in the error calculation. mixture

component 2-ethylnaphthalene 1,3-dimethylnaphthalene 2,6-dimethylquinoline 2-ethylnaphthalene 1,3-dimethylnaphthalene 3-methylcyclohexanol 1-heptanol

the data by the pseudoinverse of [C']. The spectra taken from [A'] are shown in Figures 2-4, together with the spectra of the isolated components gathered on the instrument under identical conditions. The major features of the spectra have clearly been resolved successfully, even though none of the three componentshas a significant masa peak which is unique. The relative error in the spectra, numerically resolved from the data matrix, were determined by comparison with the spectra of the isolated components, as shown in Table 11. Several masses represented in each of the isolated component spectra are missing in the numerically resolved spectra since

824

ANALYTICAL CHEMISTRY, VOL. 53, NO. 6, MAY 1981

Flgure 5. Chromatograms of the naphthalene mixture components, reconstructed from the matrix product [Q'] [C']. The total ion current signal, solid triangles, is plotted for comparison.

Flgure 7. Mass spectra of 1-heptanol: (A) spectrum of the isolated compound; (B)spectrum of second sample component numerlcally resolved from the data matrix.

A

L,

,

, , 128 '

,

,

i4b '

,

,

iB B '

,

,

im'

,

71

I

T

Figure 6. Mass spectra of 3-methylcyclohexanol: (A) spectrum of the isolated compound; (B) spectrum of first sample component numerically resolved from the data matrix. several of the lowest intensity mass channels in the original data have been deleted in constructing the data matrix, in order to keep that matrix within the limits of the memory size. In calculating the relative error in the spectra, only those mass channels represented in the data matrix are used so as not to bias the results by a technical limitation. On the average, the deleted mass problem would contribute an additional 4% relative error to the results reported. To further illustrate the magnitude of the numerical separation problem in this case, we reconstructed the chromatograms of the components, including the quantitation information, by the matrix product [Q][C']. The rows of this matrix are plotted in Figure 5 along with the total ion current signal. The chromatographic resolution of the first and second peak pairs is R = 0.25 and R = 0.36, respectively, where R = 1 indicates "base line" resolution (19). A second example of numerical separation of a ternary mixture is provided in order to illustrate a complimentary situation in terms of spectral and temporal overlap. The spectra of the alcohols and ketone analytes, shown in Figure 6-8, have a more rich fragmentation pattern and several significant peaks which are nearly unique or pure masses. The chromatographic resolution for the two peak pairs, R = 0.12 and R = 0.35, respectively, is illustrated in the reconstructed chromatograms in Figure 9. The poorer chromatographic resolution of the alcohol components leads to a much larger relative error in spectral assignment (see Table 11)despite the greater richness of spectral information. Part of the difficulty in numerically separating these particular components arises from the fact that their retention times differ by less than one scan period of the spectrometer. A slow sampling rate sig-

Flgure 8. Mass spectra of 3-methylcyclohexanone: (A) spectrum of the isolated compound; (B) spectrum of third sample component numerically resolved from the data matrix. I:

I

,

10

Flgure 9. Chromatograms of the alcohols-ketone mixture components, reconstructed from the matrix product [a'] [C'], The total ion current signal, solid triangles, includes a larger fraction of the solvent tail due to the short retention times of the components. nificantly erodes the reliability of distinguishing components having nearly equivalent retention times. Furthermore the spectral assignments in this data analysis depend on numerical construction of an accurate representation of the time dependence of a particular mass channel. The spectral separation does little to help overcome this problem; if the retention times of two components are indistinguishable, the

ANALYTICAL CHEMISTRY, VOL. 53, NO. 6, MAY 1981

algorithm will simply return a single Component having a spectrum equivalent to the sum of the two components. This data set points, out a possible pitfall of interpreting spectra separated numerically. Although the general features of the 1-heptanol spectra were accurately reproduced, a new peak at m / e 112 appears which is absent in the isolated compound spectrum. Clearly, some intensity at this mass has been incorrectly assigned from the adjacent 3-methylcyclohexanone component. This type of error is particularly serious, since this mass might be mistaken for the parent ion of the compound in a subsequent interpretation of the spectrum. Minor peaks which have large amplitude neighboring components must, therefore, always be suspected of error; similarly, interpretation of precise isotopic ratios is subject to error under these circumstances. Since the predominant peaks and tlheir relative intensities are, however, extracted with good precision, library search routines should be able to make effective use of the numerically separated spectra. The data matrix analysis has successfully performed separation of major components having a high degree of chromatographic and spectral overlap. Further work will be required to characterize the performance on minor Components at a more favorable scan rate. One would not expect x2 to drop as rapidly when a minor component is being extracted; other clues, however, may be useful in distinguishingbetween a phantom and a real but minor component. When n exceeds the correct value in the present work, either a major component splits into two having nearly identical spectra and similar quantitation element or a spectrum appears having a large fraction of negative amplitude and a very small quantitation element. This behavior is characteristic of the algorithm attempting to adjust the peak shapes to account for small differences between the model of the chromatographic response and the actual data. A minor but real component should, however, separate out as a unique spectrum having a small fraction of negative amplitude. The utility of this information will need to be evaluated as a function of the spectral overlap with adjacent components. This data matric analysis technique does not depend on the particular multichannel detector used as long as the restriction of linear response, described in the theory section,

825

can be satisfied. This could provide a significant improvement for detection methods such as IR, UV-VIS, and fluorescence where the likelihood of spectral overlap is even greater than for mass spectrometry. The overall data matrix reduction concept, that is, to model the response in the overdetermined dimension and fit to the data matrix by optimization of a small number of parameters in the model using the pseudoinverse to obtain the second dimension response, was easily transferred from its original development for time-resolved fluorescence (9). This opens up the possibility for application in any number of "hy-phen-ated" analytical methods (20) where a linear response is observed.

LITERATURE CITED Hites, R. A.; Blemann, K. Anal. Chem. 1070, 42, 855-860. Biller, J. E.; Blemann, K. Anal. Lett. 1074, 7 , 515-528. Gromey, R. G.; Stefik, M. J.; Rindflelsch, R. C.; Duffleld, A. M. Anal. Chem. 1076, 48, 1368-1375. McLaffetty, F. W.; Hertel, R. H.; Villwock, R. D. Org. Mass. Spectrom. 1074, 9 , 690-720. Abramson, F. P. Anal. Chem. 1975, 47, 45-49. Davidson, W. C.; Smlth, M. J.; Schaefer, D. J. Anal. Lett. 1077, 70, 309-331. Atwater (Fell), B. L.; Venkataraghavan, R.; McLafferty, F. W. Anal. Chem. 1979, 57, 1945-1979. Welss, M.; Sokolow, S., presented at the 28th Annual Conference on Mass Spectrometry and Allled Toplcs, New York, June 1980. Knorr, F. J.; Harrls, J. M. Anal. Chem. 1081, 53, 272-276. Gladney, H. M.; Dowden, B. F.; Swalen, J. p. Anal. Chem. 1060, 41, 883-888. Grushka, E.; Myers, M. N.; Giddings, J. C. Anal. Chem. 1070, 42, 21-26. Lochmuller, C. H.; Sumner, M. J . Chromatogr. Scl. 1080, 78, 159-165. Strang, G. "Linear Algebra and Its Applications"; Academlc Press: New York, 1976; Chapter 3. Morgan, S. N.; Deming, S.L. Anal. Chem. 1073, 45, 278A-284A. Davis, J. E.; Shephard, A.; Stanford, N.; Rogers, L. E. Anal. Chem. 1074, 46, 821-825. Ritter, G. L.; Lowry, S. R.; Isenhour, T. L.; Wilkins, C. L. Anal. Chem. 1076, 48, 591-595. Malinowski, E. R. Anal. Chem. 1077, 49, 612-617. Halket, J. M. J . Chromatogr. 1079, 785, 229-241. Karger, B. L.; Snyder, L. R.; Horvath, C. "An Introduction to Separatlon Sclence"; Wiley-Intersclence: New York, 1973; Chapter 5. Hirschfeld, T. Anal. Chem. 1080, 52, 297A-312A.

RECEIVED for review October 14, 1980. Accepted February 3,1981. This research was supported in part through funds provided by the National Institute of Health Biomedical Research Support Grant No. RR 7092.