Weighted file ordering for fast matching of mass spectra against a

Oct 1, 1981 - Fred W. McLafferty , Sheauchi Cheng , Kathleen M. Dully , Chuan-Jie Guo , In Ki Mun , David W. Peterson , Steven O. Russo , David A. Sal...
0 downloads 6 Views 293KB Size
1938

Anal. Chem. 1981, 53, 1938-1939

integration. The resulting values are then directly proportional to the local concentrations. Concentration maps constructed in this manner are found to be consistent with known properties of these flames and in fact should be more reliable than those derived from other types of measurements. The precision of the measurements can be judged from the goodness of the individual fits. The average standard deviation of the observed transmittance values compared to the best-fit Voigt profile is on the order of 0.03 (transmittance). This is comparable to the reproducibility of the reference intensity (lo)over a scan period. The precision of the temperatures here is limited by the fitting interval, Le., f50 K. Flame jitter limits the precision in the positions to about 0.4 mm. Improvements can be expected by using a cooled vidicon tube and by further stabilizing the laser intensity, which in the present arrangement is affected by an occasional microbubble in the dye jet. Also, more bits in the A/D conversion, higher spectral resolution, and more scans will improve the signal-to-noise level so that intervals smaller than 100 K can be justified in the fitting process. Finally, a higher degree of smoothing can be used to treat the data, as long as care is taken not to degrade the spatial resolution. Since the light source is a tunable laser at high resolution, this concept can be used to determine vibrational temperatures, rotational temperatures, and electronic temperatures. Together with local atomic and molecular concentrations, one can study the dynamics in, for example, combustion chambers in a remote-monitoring configuration. If a direct memory access capability is present in the vidicon interface, data can be gathered at the rate of 1/30 s/frame to provide a complete scan in 1 s. The laser can also be gated (5) to extend the temperature measurement capability to pulsed events, as long

as the events can be repeated enough times to generate the spectral scan point by point. We expect that demonstrations of these concepts will be forthcoming shortly.

LITERATURE CITED (1) Henning, F.; Tlngwaidt, C. 2. fhys. 1828, 48, 805-823. (2) Lewis, B.; von Elbe, G. “Combustlon, Flames and Explosions of Gases”; Academlc Press: New York, 1951; pp 261-265. (3) Riband, G. R . Hebd. Seances Acad. Scl. 1930, 190, 369-371. (4) Mavrodineanu, R.; Bolteux. H. ”Flame Spectroscopy”; Wiley: New York, 1985; pp 26-37. (5) Steenhoek, L. E.; Yeung, E. S. Anal. Chem. 1881, 53, 528-532. (6) Kielkopf, J. F. J. Opt. SOC.Am. 1873, 63,987-995. (7) Hinnov, E.; Kohn, H. J. Opt. SOC. Am. 1957, 47, 156-162. (8) Sobolev, H. H.; Mezhericher, E. M.; Rodln, G. M. Zh. Eksp. Teor. F k . 1851, 21, 350-366; Chem. Abstr. 1952, 46, 43591. (9) Kuo, J. C.; Yeung, E. S. J. Chromatogr. 1881, 223, 321-329. (10) Milne, W. E. “Numerlcal Calculus”; Princeton University Press: Princeton, NJ, 1949; p 242. (11) Harrison, G. R. “M.I.T. Wavelength Tables”; M.I.T. Press: Cambridge, MA, 1969; p 60. (12) Margenau, H.; Watson, W. W. Phys. Rev. 1833, 44, 92-98.

Edward S. Yeung* Larry E. Steenhoek William G. Tong Donald R. Bobbitt Ames Laboratory and Department of Chemistry Iowa State University Ames, Iowa 50011

RECEIVED for review March 19,1981. Accepted July 10,1981. Ames Laboratory is operated for the U.S. Department of Energy by Iowa State University under Contract No. W7405-eng-82. This research was supported by the Director for Energy Research, Office of Basic Energy Science, WPASKC-03-02-03.

Weighted File Ordering for Fast Matching of Mass Spectra against a Comprehensive Data Base Sir: The time required for spectral identification is a serious restriction to the efficiency of gas chromatograph/mass spectrometer/computer (GC/MS/COM) systems for the analysis of complex organic mixtures. Although commercial systems can measure a complete mass spectrum in 1-3 s ( I ) , deduction of the structural information usually requires much more time (2-7) unless identification from only a limited list of target compounds is desired (3). The probability based matching (PBM) system (3), which appears to have the best retrieval capability of available search algorithms (4),requires 15 s on a laboratory minicomputer to search 24 000 spectra (our current data base contains 41429 spectra) (8). Thus matching spectra of all components from a complex mixture on the GC/MS computer can double the instrument time required for a GC/MS run, a prohibitive increase in most laboratories. Dromey (6) recently proposed an ordered-file search procedure which reduced the necessary matches by an impressive amount. We report here that file-ordering based on the statistical importance of the spectral data gives a substantial further improvement; for GC/MS/COM systems this makes feasible the display during the GCfMS run of component identifications based on a comprehensive file search. EXPERIMENTAL SECTION For PBM the mass and abundance of spectra are weighted on a log base 2 scale according to their statistical occurrence in a large data base (3, 4), designated as mass “uniqueness” (U)and

-

“abundance” (A) values. For the condensed reference spectra used in matching, 15-27 (depending on molecular weight) peaks are selected according to their U + A values (highest first), with a secondary preference for peaks of highest mass. For the ordered reference file proposed here, each condensed spectrum is filed according to the m / z value of its most significant peak (MSP); of the peaks selected for the condensed spectrum, MSP is the first whose abundance is >9% of that of the base peak and whose mlz value is > 50 (if none of the first three peaks has m/z > 50, the first peak is chosen). The 9% abundance minimum reduces the probability of not matching the reference spectrum of a minor mixture component for which such a peak may not appear in the unknown spectrum. For a PBM search, condensed reference spectra are matched against the full unknown spectrum (a “reverse search”) (3). These reference spectra are limited to those filed under MSP mass values corresponding to the peak masses in the unknown spectrum, using the ten peaks of highest U + A value (secondarily,those of highest mass) plus any additional peaks whose U + A value differs from the highest value by no more than three units (peaks used must also have U + A > 0). If the unknown spectrum was not scanned below m/z 51, all reference spectra filed under MFP mass values < 51 are also searched. The system was tested with an IBM 370/168 computer using a reference file of 41 429 spectra of 32 403 compounds (8). As unknowns representing pure compounds, 431 spectra were randomly selected from those of compounds for which multiple spectra were present in the file (7); the average molecular weighta of reference and unknown compounds were 208 and 200, respectively. A PBM search of the entire reference file retrieved

0003-2700/81/0353-1938$01.25/00 1981 American Chemical Society

19139

Anal. Chhem. 1981, 53, 1939-1942

934 other spectra of the same compounds.

RESULTS AND DISCUSSION Matching the 431 unknown spectra of pure compounds against only the reference spectra selected as described (“test a”) retrieved all the 934 other spectra of t,he same compounds that were found searching the entire data base. The number of reference spectra searched was reduced from 41 429 to an average of 2925 (7.5%),representing spectra filed under an average of 20 MSP mass indices. The total computer time required was reduced by a comparable amount (from 2 to 0.2 9).

In additional tests (b and c), the reference spectra searched were limited to those filed under MSP mass indices corresponding to peaks in the unknown spectrum whose U + A values differ from the highest value by (b) no more than two units and (c) no more than one unit, with no requirennent for the minimum number of peaks. These tests retrieved (b) 99.2% and (c) 97.5% of the reference spectra by searching only (b) 3.8% and (c) 1.6% of the reference file. As a “reverse search” system, PBM has improved capabilities for idenlification of minor components in mixtures. The loss of performance for such components under test a conditions can be estimated from the test b and c results; the latter should represent components present in -50% and -25% concentration, respectively, of those tested under a conditions, based on the log base 2 scab of “A value” abundances. Thus a loss of only a few percent in reliability is expected for identification of a 25% component. The pioneering proposal of Dromey (6) for searching an ordered mass spectral file showed impressive results; 68% of spectra were retrieved in searching 1%of the file, 82% in 2%, 92% in 3%, and 100% over 3%. However, these represented the system’s ability to retrieve the identical spectrum from the file; in contrast, our tests utilized other spectra of the unknown compound, as would be the case for a real unknown. Further, it was reported that the presence of an impurity peak of m / z 40-99 could produce a false file-order value, severely limiting the system’s applicability to mixture spectra (6). For our system the classification of each reference spectrum according to the mass of its most “important” peak (as determined by U + A) divides the data base into 732 categories. The majority of these contain approximately the same number of entries, as the U + A values used are based on the statistical occurrence of peaks in the reference fie. (Classifying reference

spectra by the mass of their most abundant peak without mass-uniqueness weighting gave 608 categories with a much wider range of entries, requiring 80% more spectra to be searched.) Thus the search time for spectra filed under 20 of these categories will be small compared to that for the whlole file, while it is not surprising that the entries corresponding to the 20 (on average) most important peaks of an unknown spectrum should contain the correct reference spectra, if present. The real-time performance possible using PBM in tlhis manner on computers of GC/MS/COM systems depends on several obvious factlors, such as the capability for PBM processing in competition with higher priority tasks such as data collection. Implementation on a GC/MS/COM system is in progress; at present, search results can be obtained from up to six spectra per iminute during the GC/MS run, and a substantial increase in this performance appears feasible.

ACKNOWLEDGMENT The authors are indebted to R. G. Dromey and D. W. Peterson for helpful discussions. LITERATURE CITED (1) Burllngame, A. L.; Baillie, T. A.; Derrick, P. J.; Chizhov, 0. S. Atrial. Chem. 1980, 52, 214R. (2) Hertz, H. S.; Hites, R. A.; Biemann, K. Anal. Chem. 1971, 43, Ei81. (3) Mclafferty, F. W.; Hertel, R. H.; Villwock, R. D. Org. Mass Spectnm. 1974, 9, 690. (4) Pesyna, G. M.; Venkataraghavan, R.; Dayringer, H. E.; McLafferty, F. W. Anal. Chem. 11976, 48, 1362-1368. (5) Pesyna, 0.M.; Mclafferty, F. W. “Determination of Organic Structures by Physical Methods”; Nachod, Zuckerman, Randall, Eds.; Academic Press: New York, 1976; pp 91-155. (6) Dromey, R. G. Anal. Chem. 1979, 51, 229. (7) Atwater, B. L. Ph.1). Thesis, Corneil University, 1980. (8) Stenhagen, E.; Abrahamsson, S.; Mclafferty, F. W. “Registry of Mlass Spectral Data”, extended version of magnetic tape; Wiley: New Yiwk, 1978.

In Ki Mun Daniel R. Bartholomiew Douglas €3. Stauffer Fred W. McLafferliy* Department of Chemistry Cornel1 University Ithaca, New York 14853 RECEIVED for review December 22, 1980. Resubmitted April 20,1981. Accepted ,July 17,1981. Research sponsored by .the National Science Foundation, Grant No. CHE-7910400.

AIDS FOR ANALYTICAL CHEMISTS Graphical Method for Obtaining Retention Time and Number of Theoretical Plates from Tailed Chromatographic Peaks Wllliam E. Barber and Peter W. Carr” Deparfment of Chemistry, University of Minnesota, Minneapolis, Mlnnesota 55455

Recently there has been a great deal of interest in methods of characterizing chromatographic peaks. Ideal Gaussian profiles (1)are rarely if ever observed. Any number of internal and extracolumn processes (2-8) may lead to peak asymmetry; thus in practice peaks are usually tailed. Such asymmetry can greatly complicate the measurement of the number of theoretical plates (N) on a column. Conventionally N is defined as

N = tP2/mz

(1)

where t, and rilz are the retention time (Le,,time of appearance of the peak maximum) and the variance (i.e., second cent.ral moment) of the peak, respectively (2). The equivalent expression for a pure Gaussian is

N =

0003-2700/81/0353-1939$01.25/00 1981 American Chemical Society

tR2/O2

(2)