Anal. Chem. 1082, 54, 63-66
83
T heoreticaI Limitat ions of Gas Chromatographic/ Mass Spect romelric I dentIif ication of MuItic omponent Mixtures Davld Rosenthal' Chemistry and Life Sclences Group, Research Triangle Institute, Research Triangle Park, North Carolina 27709
Autornatlc dlrect search techniques frequently fall to provide correct identlflcation of spectra when analyzed by gas chromatography/mass spectrometry (GC/MS). Detailed examlnation of several of these cases revealed that the misldentlfled spectra were mhrlures of two or more components wlth preclsely the same retention tlme. Probability calculationsfor this sltutatlon showed that this condltllon occurs more frequently than may havo been prevlousiy reallzed and that, for example, a 200-comlponent mlxture uvill, on the average, contain 40 compoundrs (20 % of the coimponents) appearing In the form of multiplets, even when uslng caplllary columns wlth 2-s resolutlon diurlng the course of 60-mln runs. I n practlcai applkations, ilhe situation may 1Se substantially worse owing to the presence of closely spaced isomers. This conslderatlon constltutes a theoretlcal llmH for the eff ectlvenass of GC/MS analysls of rnultlcornponent inlxtures when Identiflcatlons are carried out by direct search methods.
Identification of organic compounds by pattern recognition of their mass spectra as they elute from gas chromagraphic columns has become one of the primary methods for volatile organic analysis, particularly when the technique is applied to complex mixtures of components ( J ) . Methods such as those of Biemann (2),Clerc (3),or PBM ( 4 ) have been used successfully for years for the identification of compounds whose spectra are stored in reference libraries assembled for searching purposes. In general, when components are well separated by gas chromatography (GC) and are present in sufficient concentration to provide clear spectra, all of these systems are able to identify individual clomponents with little difficulty. In our laboratories, we have implemented a mass spectral search system which was designed primarily to process large volumes of data which are frequently encountered in complex environmental samples (5). Processing is carried out by means of a series of programci which detect individual components, remove unwanted noise from their spectra, and submit the spectra to a forward library search for identification. In a typical run, using 2-s wins over a 60-min period, from 20 to over 300 componentis have been isolated and processed through the system in an automatic fashion. When the computer results were examined carefully, it was found that in all cases where complex mixtures witlh a large number of components were present, some of the spectral identifications were clearly in error, or showed correlation factors which were much lower than would be expected for identification of the components with a high probability of success. These results were in sharp contraet with other runs using completely synthetic mixtures, in which the components were specifically chosen so as not to interfere with one another in the gas chromatographic run. In the latter cases, nearly every component found was correctly identified by the computer. Present address: GCA Corporation, Bwlington Division, Bedford, MA 01730.
Further investigation of this anomaly revealed that in nearly every case where poor identifications were encountered, the reason for the computer failure was that the spectrum presented to the library search routine corresponded to that of a mixture of components rather than that of a single compound. This result was unexpected, since mixtures of spectra were supposed to be deconvoluted by an algorithm incorporated into the "cleanup" portion of the processing routine. Detailed examination of the spectral intensity profiles in the region containing the misidentified components revealed that the deconvolution program could not detect the presence of a mixture because the two components in question showed relative retention times closer than one scan (2 s) apart. Under these conditioins, the deconvolution routine assumed that the spectrum encountered was of a single component and no deconvolution was attempted (nor was it possible with the existing algorithm). The observation of the presence of multiple components which overlap and whose relative retention times were too similar to allow deconvolution led us to examine the entire question of determining, statistically, how frequently such events were likely to occur. Previous analysis of this general situation has been directed toward the problems which arise in GC analysis unenhanced by mass spectrometry (MS) (6). Under these conditions the investigator has only a single dependent variable (detector response, equivalent to total ion current (TIC) in GC/MS) to measure against time. Owing to the availability of full spectra in each. scan, GC/MS is able to effectively deconvolute TIC traces of very close eluents, making it unnecessary to work with the TIC trace alone. The problem of overlapping peaks has not been considered previously in this light, taking into account the enhanced capability of the GC/MS technique.
THEORETICAL SECTION Treatment of Overlapping Peaks. In order to treat this problem mathematically, we applied the techniques of elementary combinatorial analysis (7). The basic problem to be solved was the calculation of the probability of the occurrence of more than one peak within a given time window during the course of a GC run. The minimum window which we used was the time of a single scan of the mass spectrometer, or approximately 2 s. We next assumed that, for the purposes of calculation, two components would be identified and separated by the deconvolution scheme if the difference in the retention time of their maxima exceeded 2 s. The system thus described is exactly analogous to the well-known situation in combinatorial analysis, often referred to as the "balls in the boxes" problem. In this case, we have a series of GC peaks which are distributed into a sequence of time slots, and the problem we need to solve is the probability distribution to be expected for finding a particular mix of peaks in the available slots. As a first approximation, we made the further assumption that there was an equal likelihood of any individual peak to be found in any particular slot. This assumption is not strictly true in the real system. If the peaks are not randomly distributed, then the likelihood of overlap will be greater than that calculated. This point will be dealt with later.
0003-2700/82/0354-0063$01.25/00 1981 American Chemical Society
64
ANALYTICAL CHEMISTRY, VOL. 54, NO. 1, JANUARY 1982
1W
2W
300
'Km
500
6W
MI1
I / l'OO3
100
Figure 2. Distribution of molecular weights in a 26 000 member mass spectrum library.
42
tf
i
1
1
1 0 3 2 0 0 3 x ) 4 0 0
Number ol Peaks In Mlxhln
Figure 1. Scans required for 50% probability of total separation. If we consider p peaks which are distributed into c slots (cells), then the total number of possible arrangements of peaks (with replacements) is cp. If we further define a series of j groups, which for each group i there is defied the variable ni,the number of such cells containing oci peaks each (the occupancy number), then the total number of possible configurations K of one type is given by
The probability P of finding such a configuration would then be
P = K/cP (2) The general formula, eq 2 can be made specific for any individual case being considered. For example, if we were interested in calculating the probability of occurrence of a distribution of peaks p in which there were s singlets, d doublets, t triplets, and z cells in which no peak appears, the probability for this configuration would be (3) Multiple applications of eq 3 with different configurations give the individual probabilities, which sum to 1.0. Successive application of this approach under a variety of conditions yielded the results described below.
RESULTS AND DISCUSSION Using the equations developed in the previous section, it was possible to do a number of different calculations to bring the problem of overlapping GC peaks into perspective. Computation of the probability factors was carried out on a Hewlett-Packard Access 2000 computer system. Programs were written in the BASIC language. In order to avoid system overflows, we made extensive use of logarithms. Factorials up to 15! were calculated directly. For factorials larger than E!, we made use of Sterling's approximation
-
+ +
log n! l o g 6 (n 0.5) log n - n (4) One approach to the problem involved answering the following question: If one were to carry out a series of GC/MS runs with progressively larger numbers of components, how rapidly would one have to scan in order to be assured with some reasonable probability that all of the components were separated? This question reduces to the calculation of the probability of singlets occurring for varying combinations of scan speeds and peak numben. We assumed a 1h total period for the entire chromatographic run and calculated the scan speed necessary to give a 50% probability of finding each component in a separate time window. The results are shown
in Figure 1. From this treatment, it is seen that if one considers the limit of today's technology to be equivalent to a full scan in 1s, then somewhat over 60 components could be fully resolved to a level of 50% confidence. At a 2 s/scan rate, the probable number of fully separable components is reduced to 45. If one wished to guarantee full resolution of a 200-component mixture (with a 50% success rate), then over 20 000 individual scans would have to be recorded during a 1-h run,which is equivalent to acquiring each scan in less than 200 ms. This capability is entirely beyond the realm of current technology. We note, then, that as the number of components increases, the likelihood of total separation of all the components diminishes very rapidly. We therefore attacked the problem from a slightly different viewpoint. Rather than insist on total separation of all components, we chose to determine, for a particular achievable set of scan conditions, what proportion of the components would most likely be resolved and what proportion would be overlapped. Before doing this, however, it was necessary to consider, for practical applications, to what extent our assumptions about the random distribution of components in a single chromatogram were valid. Although it is nearly impossible to speak to this point in any individual case, we felt intuitively that the assumption of uniform distribution of chromatographic peaks was not strictly warranted. Since retention times in GC are roughly correlated with molecular weight, we decided to determine the molecular weight distribution in our existing 26 000 mass spectrum library. The distribution pattern is shown graphically in Figure 2. Although it may be argued that a large library of this nature is not representative of the mix of compounds presented in a typical GC run, this graph vividly accentuates the fact that compounds tend to cluster about a mean molecular weight ( M / z = 160), which would most likely result in a very nonrandom distribution under GC conditions. An additional point which can be made is that if two compounds elute in adjacent cells (Le., their retention times differ by 2-4 s), then they are nearly impossible to deconvolute completely, particularly if they have a number of peaks in common, because of random variations in the measurement of peak intensities, particularly a t low levels. The assumption of randomness further implies that the spectra of closely eluting components are completely independent of each other, while in actual fact, there is frequently a large element of correlation of common ions in such spectra. For these reasons, we decided that from a realistic and practical standpoint, even if scans were taken at 2-s intervals, the clustering of molecular weight and the likelihood of fiiding shared ions would result in the effective loss of a power of 2 in the resolving power of the GC/MS system. Thus, for a 1-h run with 2-s scans, effectively only 900 different time intervals are available in which the components can be distributed. The subsequent calculations were therefore carried out under the assumption that 900 individual time intervals were available. Calculations were carried out, using eq 3, to determine, for varying numbers of components, the percent of the total distributions which were singlets under a typical set of scan conditions. We assumed a 60-min run, with 2-s scans, and an effective probable resolution of 2 scans (900compartments).
ANALYTICAL CHEMISTRY, VOL. 54, NO. 1, JANUARY 1982 65
1
Standard Mixture
536-537
80
* 60
EZ
I
199
40
YI
E
im
150
203
4,
2
20
Number 01 Componnb
Figure 3.
Number of overlaps, with standard deiviation vs. number of
0
components. The results of these calculations show that with one component, all the peaks are, of course, singlets, and that the percent of singlets observed decreases approximately linearly, so that for 200 components, approximately 82% are still singlets, the remainder being multiplets. The most probable number of singlets me shown in Figure 3 along with their standard deviation (c). This value was calculated using the equation where n, represents the number of occurrences with probability p t , with the condition >lpL= 1.0 (8). I t may be seen from Figure 3 that if a mixture contains as many as 200 components, then fully 20% of them are likely to occur on the column a8 multiplets, making theii. identification by usual search procedures highly suspect. Application to a Real Example. We felt that this finding was sufficiently important to test it by real example measured under laboratory conditions. A synthetic mixture of 46 compounds was prepared by dissolving the components in a solution of methanol (300-@DOng/kL for each component). The mixture consisted of a very broad range of organic compounds chosen in random fashion with a wide range of molecular weight (70-250) covering a large variety of functional types. The mixture was subjeched to GC/MS analysis using an LKB 2091 gas chromatograph-mass spectrometer fitted with a 25-m, narrow bore glass capillary column coated with SE-30. Helium carrier was passed through a t a rate of 1.8 mL/min and the column temperature was programmed a t 8 OC/min from 70 to 260 "6. Scans were taken at 1.7-s intervals for approximately 30 min. Nine hundred scans/run were acquired. The TIC trace of the GC run is (shown in Figure 4. Data for this run were subjected to the peak finding algorithm developed by liindfleisch and colleagues (9) ao modified in these laboratories (IO). Thirty-five components in the mixture were detected. According to our calculations, under such circumstances it might be expected that one oir two of these components were in fact doublets, with two components eluting at precisely the same point. A careful study of the spectra waci then undertaken to see if this was in fact observed. Of the major peaks in the spectrum, it was noted that all but two of them were well-resolved singlets. The two com. ponents which were shown to be doublets were centered a t spectrum no. 199 and spelctra no. 536537 (marked with arrows on the TIC plot on Figure 4). Spectrum no. 199 was shown to be a nearly 1:l mixture of 2-chlorotoluene and 1,Z-di. chlorobenzene (Figure 5 ) . Analysis of spectra no. 536-537 showed them to represeint a mixture of fluorene and n-hexadecane, which were only partly resolved b37 the deconvolutionl program. Visual inspection of the total TIC plot would give no indication at all that these two components were in fact
100
300
200
500
400
600
700
800
SCAN NUMBER
Flgure 4. ~
TIC plot of standard mixture.
100 -
PChlorotoluene
80-
z$
3
60-
40-
i
I
io,
!
20
30
40
50
60
70
80
90
!
100
110
,
.,\
120
130
140
I / 150
180
Mla
Flgure 5. Mass spectrum of
observed mixture and library spectra of
its Individual components.
coeluting mixtures, and, indeed, it was not until a painstaking manual search O F the individual spectra was carried out that this fact was revealed. It is, of course, implicit in a study of this kind that because of the probabilistic nature of the occurrence of overlaps, it is impossible to predict in advance where these events are likely to occur. This work waEi not undertaken to prove that multiplets in GC/MS runs occur. When working in the field, these observations are legion. We wish only to show that both in theory, and by practical example, the occurrence of overlapping components is far more prevalent than may have previously been realized. It is important to understand this when one is designing a search procedure to identify multiple component mixtures analyzed by GC/MS. In complex mixtures, where the number of components exceed 100, it is a virtual certainty that it is not possible to identify more than a fraction of the components by first-order techniques. Since most of the existing commercially available computer search procedures involve forward searching from libraries, this work serves as a reminder that there is a mathematically definable limit to which such methods can be extended. Beyond that, statistical factors limit the degree to which successful identifications can be accomplished. Other methods for handling these complex cases have been considered and some of them have achieved considerable success. The techniques which can be applied in such instances include reverse search, spectrum stripping, classifi-
66
Anal. Chem. 1982, 5 4 , 66-68
cation by retention time and iterative processing of the identified components, and combined strategies. By calling attention to this problem, it is hoped that further improvements in searching procedures will emerge.
ACKNOWLEDGMENT We wish to thank Pamela Gentry for running the mass spectra, and Kenneth Tomer and William F. Hargrove for helpful advice and comments. LITERATURE CITED (1) Chapman, J. R. “Computers in Mass Spectrometry”; Academic Press: London, 1978; Chapter 6. (2) Hertz, H. S.; Hltes, R. A.; Biemann, K . Anal. Chem., 1071, 43, 681. (3) Naegll, P. R.; Clerc, J. T. Anal. Chem., 1974, 48, 739 A. (4) Pesyna, G. M.; Venkataraghavan, R.; Dayrlnger, H. E.; McLafferty, F. W. Anal. Chem., 1076, 48, 1362.
(5) Rosenthai, D.; Bumgarner, D. G.;Brown, W.; Hargrove, W. F. Proceedings of the 25th Annual Conference on Mass Spectrometry, Washington, DC, A.S.M.S., 1977; p 287. (6) Karger. B. L.; Snyder, L. R.; Horvath, C. “An Introduction to Separation Science”; Wlley: New York, 1973; p 157. (7) Feller, W. “An Introductlon to Probabillty Theory and Its Applications“, 2nd ed.; Wiley: New York, 1959; Vol. 1, Chapter IV. (8) Feller, W. “An Introduction to Probablilty Theory and Its Applications”, 2nd ed.; Wiley: New York, 1959; Voi. 1. p 213. (9) Dromey, R. 0.;Steflk, M.; Rindfleisch, T. C.; Duffleld, A. M. Anal. Chem. 1076, 48, 1368. (10) Hargrove, W. F.; Rosenthal, D.; Cooley, P. C. Anal. Chem. 1981, 53, 538.
RECEIVED for review February 5,1981. Accepted September 8,1981. Portions of this work were presented by D. Rosenthal and W. F. Hargrove, 29th Annual Conference on Mass Spectrometry and Allied Topics, New York, May 1980, p 125.
Determination of Heroin by Circular Dichroism John M. Bowen, Terry A. Crone, Robert K. Kennedy, and Nell Purdle” Department of Chemlstty, Oklahoma State Universl& Stillwater, Oklahoma 74078
A method of analysis for heroin Is descrlbed In whlch separatlon of the drug is not a prerequlslte to Its quantitative determination. The normally encountered dlluents and adulterants present in conflscated samples do not exhlbit the phenomenon of clrcular dlchroisrn and are noninterferlng. Analysis is uncompllcated, qulck, and easliy reproduclbie. Correspondence wlth deterrnlnatlons which used gas chromatography Is better than 1 %.
Identification of illicit substances is a major effort of criminalistics laboratories and, consequently, very time-consuming. Any new technique, or modification of presently acceptable techniques, which has the potential to alleviate the load, is worthy of development. In this work we are making a case for the acceptance of circular dichroism (CD) spectropolarimetry as an important addition to the arsenal of analytical methods. We will show that the method is both quick and quantitative and refer to its general applicability in the identification of drugs of abuse. In most states, legislation requires only the qualitative identification of a controlled substance, since mere possession alone is sufficient grounds for conviction. Even so, ratification of the presence of an anonymous compound in a complex and cleverly contrived mixture still can be difficult, particularly since the controlled substance is usually a very minor component. Confiiation of the presence of a suspected substance is a consequence of positive responses from a number of complementary tests. In the case of heroin the standard accepted testing procedures include color spot-tests (I),microcrystalline tests (2), and a variety of instrumental methods (3) such as GC (4,5), HPLC (6),IR and UV absorption spectrophotometry (I, 7), and mass spectrometry (8,9) in ita many forms such as CIMS, CEMS, MID, etc. Diluenta, adulterants, and metabolites can mask the specificity of the tests for heroin, which is the motivation for their presence in the first place. Separation, therefore, is often a prerequisite to identification. In what might be one of the most complex procedures, there are instances where heroin is first extracted, then converted to 0003-2700/82/0354-0066$01.25/0
morphine, and finally derivatized prior to analysis. Even then, positive identification requires that a comparison be made with a standard each and every time an analysis is performed, because of variations in instrument parameters. Day to day variations in instrument parameters are not a factor in quantitation by UV absorption spectrophotometry. The method is not too specific, and not always preferred, because of the broad unstructured bands. To introduce specificity to the screening and analysis of drugs by UV absorption, we have measured the circular dichroism spectra of a number of opiates in anistropic (IO,11) and isotropic (12, 13) solvent media. I t should be emphasized that a CD spectrum, in which ellipticity is plotted vs. wavelength, is really no more than a modified absorption spectrum. The ellipticity is directly proportional to the difference in absorption between left and right circularly polarized light. The qualitative success of the technique has been adequately demonstrated for all three solvent systems (10-13). Of the three, the aqueous solvent system is most easily disposed to quantitative studies and the molar ellipticity coefficients are available for 10 opiates (12).The spectra are quite unique for the opiates which we have investigated and could be used as confirmation after separation from mixtures. We were more intrigued by the prospeds of identification without separation. To this end we obtained four heroin confiscates which were analyzed for the drug before and after extraction and compared the results with an in-house quantitation by GC. The results from all three procedures are in excellent agreement. EXPERIMENTAL SECTION Standard samples of heroin as the hydrochloride were obtained from the National Institute for Drug Abuse via the Research Triangle Institute. Four samples of confiscated drugs were generously provided by the Criminalistics Laboratory of the Oklahoma State Bureau of Investigation. Three of these were typical of what is known as “brown”heroin, the fourth was a white specimen which proved to be very high in heroin composition. Each sample had been recovered from dead case files prior to incineration. CD measurements were made on a Cary 61 spectropolarimeter over the wavelength range 220-350 nm. Sample sizes were on the order of 2-4 mg and were dissolved in 25 mL of either aqueous 0 1981 American Chemical Society