Computer methods in analytical mass spectrometry. Identification of an

Methods of applying a digital computer to process this information are considered. The first step in such processing is the identification of an unkno...
0 downloads 0 Views 614KB Size
Computer Methods in Analytical Mass Spectrometry Identification of an Unknown Compound in a Catalog L. R. Crawford Diuision of Chemical Physics, C.S.I.R.O., Chemical Research Laboratories, Melbourne, Australia

J. D. Morrison Diuision of Physical Chemistry, La Trobe Uniuersity, Bundoora 3083, Victoria, Australia The combination of a gas-liquid chromatograph and mass spectrometer is capable of producing a great amount ,of information about a chemical compound’s structure. Methods of applying a digital computer to process this information are considered. The first step in such processing is the identification of an unknown which is known to be a member of a catalog by comparison. The specificity of the complete mass spectrum as a means of identification and the effects of impurities and instrumental errors on making the comparison are discussed. Some search techniques suitable for use with the ASTM catalog of 6 strongest peaks for some 3200 substances are described. The time taken for identification of an unknown in this catalog has been reduced to 1.7 seconds.

THECOMBINATION OF a gas-liquid chromatograph and mass spectrometer (GLC-MS) provides an extremely powerful tool for the identification of organic compounds, in particular of relatively low molecular weight substances, such as food flavor components and insect pheromones. These substances have molecular weights usually less than 350 amu. The GLC-MS removes much of the need for prepurification of samples, while its sensitivity can be such that a single run of a n hour or so on a submicrogram sample may lead to the identification of some 50 or more components. The use of this instrument is accordingly becoming widespread but has produced a serious problem in the sheer bulk of mass spectral data which is produced, even when using only a low (1 in 1000) resolution mass spectrometer. A mass spectrum covering the range of mje 12-500 can be recorded every 3 seconds. While most of the substances emerging from the GLC are expected and known, having occurred in previous samples, a quick identification of them is often needed for confirmation of the column calibration, etc. This can be done by a skilled operator, but rapidly becomes a fatiguing task if a run is long. It therefore becomes worthwhile to consider if the identification can be carried out in any automatic manner. A brief note by S. Abrahamsson, S. Stenhagen-Stallberg, and E. Stenhagen ( I ) was the first published proposal to use a computer to search mass spectra. Rapid progress has been made in the digitization of mass spectral data, and it is possible now t o purchase apparatus which will produce a paper or magnetic tape record which can be converted into a list of mass numbers and heights of all the peaks detected. A completely automated data collection system for mass spectra is described by Hites and Biemann (2). In the present series of papers it will be assumed that such a record is available, and it is proposed to discuss ways

whereby the mass spectrum can be processed to give a n identification. If possible, such a n identification should be made within 20 seconds, allowing continuous operation of the GLC. IDENTIFICATION OF AN UNKNOWN SPECTRUM IN A CATALOG

Over the past 20 years an impressive catalog of the mass spectra of reference compounds has been built up (3) and the simplest approach to this problem is clearly that of comparing the mass spectrum of an unknown with such a catalog of known spectra. This raises immediately the question-just how specific is a mass spectrum? The fact that mass spectra can be used for quantitative analysis, by setting up simultaneous equations in the peak heights at certain mass numbers for a set of pure components and an unknown mixture and solving for the relative concentrations of the components, indicates that the mass spectra can be specific, but as far as is known, no general survey of such specificity has been made. Where the unknown is known to be of a limited set, the solution of such a set of equations will of course give both qualitative and quantitative analysis (4). It would not be very practicable as a general method, since it would involve the diagonalization of large matrices and the results would be seriously affected by scatter in the spectra of the unknowns. NORMALIZATION OF MASS SPECTRA

An essential, in any method of comparing mass spectra, is that they be placed on comparable scales by some normalization procedure. The method used in the API tables of making the largest peak in each spectrum equal 100 is not good for the following reasons : First, it puts all its weight on the one peak which is most likely t o be overloaded or not fully recorded at fast scanning speeds. Any error in this peak height affects the whole normalized spectrum. Second, the largest peak in the spectrum very often is not a t all significant from the point of view of identification. There are a number of normalization procedures which are preferable, each having special advantages. The simplest of these is to make the sum of all the K observed peak heights, Pn, equal unity: k

C P n = I n=1

(3) Index of Mass Spectral Data, ASTM Publication No. 356, American Society for Testing and Materials, Philadelphia, Pa.,

(1) S. Abrahamsson, S. Stenhagen-Stallberg, and E. Stenhagen, Biochem. J., 92, 2p (1964). ( 2 ) R. A. Hjtes and K. Bjemann, ANAL.CHEM.,39, 965 (1967).

1464

ANALYTICAL CHEMISTRY

1963. (4) D. D. Tunnicliff and P. A. Wadsworth, ANAL. CHEM.,37, 1082 (1965).

This is by far the simplest, but it still puts undue weight on the large peaks in the spectrum. The relative importance of the smaller peaks may be increased by normalizing so that

1.14

v 6 = l n=l

or

5

..

"==I

n=l

A method with more theoretical justification is:

0,3

k

Pnz = 1

(4)

n=l

More elaborate methods of normalizing where only the peaks within limited mass ranges are normalized may be necessary, as will be discussed in the next section. TESTS FOR SIiMILARITY

A very simple method of comparing two mass spectra is given by the discrepancy factor =

~

n= 1

Pnref

- Pnunknawn

(5)

where n is now taken over all observed masses in both spectra. The two sets of peaks must be normalized by Equation 1. For complete similarity of the two spectra, D will equal 0.0, for complete dissimilarity-Le., no peaks at a common mass number-D will equal 2.0. In fact of course D = 2.0 is most unlikely. Values of D when one spectrum is compared with a series of others are listed in Table I. These are typical of some 200 complete spectra which were taken. When normalization is carried out according to Equation 2, the square roots of the peak heights must be used in Equation 5. Where normalization according to Equation 4 is used, it is more appropriate to use the formula k

D

sweep (b) Lion Oil Research Monsanto, 1960, 180" machine, voltage

sweep

k

D

MASS Figure 1. Ratio for peak heights a/b for mass spectra of n-tridecane (a) National Bureau of Standards, July 1950,180"machine, magnet

= n=l

(Pnrer- Pnunknown)2

The corresponding discrepancies for these other methods of normalization without and with scatter are also given in Table 1. Identification of an unknown will be achieved by choosing the reference compound for which D is the least. EFFECTS OF RANDOM ERRORS IN PEAK HEIGHTS

For a variety of experimental reasons, no two measurements of a mass spectrum of a pure compound are likely to be exactly identical. The discrepancy factor for identity is therefore never zero. A 5 random error in peak heights gives a D factor for identity of 0.05. Scatter was added to the test spectrum to an average amount of approximately +20% on all peaks, and the effect of this on the discrepancy value is shown in Table I. The values suggest that normalization by Equation 2 is slightly superior to the others when scatter is present. It is clear from the D values in Table I that an even

Discrepancy Value D Obtained by Comparing Mass Spectrum of Methyl Butanoate, with and without Scatter, with a Series of Library Spectra using Equations 1, 2 and 4 Discrepancy factor D Equation 1 Equation 2 Equation 4 Library Pure With scatter Pure With scatter Pure With scatter 0.78 0.84 0.88 0.90 0.90 0.99 n-Nonane 0.76 0.78 0.76 0.78 0.77 0.85 3-Hexanone 1.39 1.39 1.50 1.50 1.78 1.78 Ethylbenzene 1.32 1.32 1.03 1.02 1.45 1.43 Tert-Butylamine Methyl butanoate 0.00 0.20 0.00 0.10 0.00 0.20 1.38 1.38 1-Methyl-3-ethyl benzene 1.46 1.46 1.75 1.75 0.69 0.76 0.81 0.84 0.83 0.92 n-Octane Ethyl isopropyl ether 1.16 1.17 0.81 0.80 1.20 1.17 1.18 1.18 0.89 0.89 1.20 1.18 2-Heptanol 1.28 1.27 1.03 1.04 1.42 1.41 Triethylamine Pentanoic acid 1.18 1.18 0.74 0.73 1.13 1.12 1.19 1.19 Cyclohexene 1.09 1.08 1.32 1.31 0.89 0.92 2-Methyl 3-pentanone 0.80 0.83 0.90 0.97 1.11 1.11 Hexanoic acid 0.74 0.73 1.07 1.06 1.13 1.09 Methyl-3-butyl ether 0.81 0.78 1.21 1.16 1.18 1.19 2-Ethyl hexanol 1.06 1.06 1.29 1.29 1.22 1.22 1.20 1.21 1.46 1.47 1,SHexadiene 0.97 1.00 1.09 1.09 1.21 1.20 Octanal 1.26 1.24 Ethyl proponoate 1.02 1.01 1.50 1.48 1.10 1.10 3-Methyl-2-hexanol 0.91 0.91 1.14 1.14 Table I.

VOL. 40, NO. 10, AUGUST 1968

1465

Table 11. Reciprocal Discrepancies ( l / D >Obtained by Comparing the Mass Spectra of Two Mixtures with and without Scatter, with a Series of Library Spectra Using Equations 1,2, and 4 (i) 90 triethylamine 10 2-heptanol Reciprocal discrepancy factor 1/D Equation 2 Equation 4 Equation 1 Library Pure With scatter Pure With scatter Pure With scatter n-Nonane 0.81 0.82 1.OO 0.98 0.74 0.73 3-Hexanone 0.85 0.86 0.96 0.94 0.77 0.77 Ethylbenzene 0.72 0.72 0.65 0.65 0.54 0.54 Tert-Butylamine 0.76 0.76 1.20 1.16 0.76 0.74 Methyl butanoate 0.80 0.80 1.03 1.01 0.73 0.72 1-Me-3-ethyl-benzene 0.72 0.72 0.66 0.66 0.54 0.54 n-Octane ' 0.82 0.83 1.05 1.03 0.77 0.75 Et isopropyl ether 0.80 0.80 1.00 0.99 0.73 0.72 0.83 0.82 1.50 2-Heptanol 1.42 0.88 0.85 4.51 8.80 5.74 Triethylamine 4.62 7.90 4.79 Pentanoic acid 0.80 0.81 1.19 1.15 0.84 0.82 Cyclohexene 0.81 0.81 0.94 0.93 0.70 0.69 2-Me 3-pentanone 0.84 0.85 0.98 0.96 0.78 0.77 Hexanoic acid 0.80 0.80 1.16 1.11 0.80 0.77 Me-s-butyl ether 0.80 0.80 1.20 1.16 0.84 0.81 2-Et hexanal 0.77 0.77 0.93 0.92 0.67 0.67 1,5 Hexadiene 0.79 0.79 0.80 0.80 0.64 0.64 Octanol 0.84 0.85 0.99 0.99 0.75 0.76 Ethyl propanoate 0.83 0.83 0.94 0.92 0.74 0.72 3-Me-2-hexanol 0.83 0.82 1.10 1.07 0.78 0.75 (ii) 70 methyl s-butyl ether 20% 2-methyl-3-pentanone10% hexanoic acid n-Nonane 0.98 1.03 1.09 1.10 0.91 0.98 1.04 1.08 1.08 1.12 3-Hexanone 1.02 1.oo 0.72 0.72 0.69 0.68 Ethylbenzene 0.56 0.55 0.79 0.80 1.19 1.22 Te rt- but yla mine 0.75 0.77 0.99 1.04 1.34 1.40 Methyl butanoate 0.93 0.96 0.73 0.73 0.70 0.70 1-Me-3-Et-Benzene 0.57 0.56 0.97 1.02 1.16 1.17 0.95 n-Octane 1.02 0.88 0.89 1.53 1.50 0.98 Et Isopropyl Ether 0.96 0.87 0.89 1.51 1.50 2-Heptanol 0.99 0.99 0.81 0.82 1.14 1.12 Triethylamine 0.81 0.79 0.91 0.92 1.69 1.6i 1.09 Pentanoic acid 1.09 0.82 0.87 0.94 0.96 0.73 Cyclohexene 0.76 1.06 1.09 1.14 1.19 1.06 2-Me 3-pentanone 1.05 0.94 0.97 1.74 1.74 Hexanoic acid 1.10 1.13 -, 3.60 2.52 4.05 3.31 Me-s-butyl ether 3.22 2.15 0.85 0.88 1.02 1.05 0.83 2-Et Hexanal 0.86 0.81 0.86 0.84 0.85 0.67 1,SHexadiene 0.70 0.92 0.97 1.OO 1.02 0.86 Octanal 0.90 0.96 0.92 1.06 1.04 0.84 Ethyl Propanoate 0.79 0.89 0.92 1.14 1.19 3-Me-2-Hexanol 0.89 0.93

-

-

-

-

-

-

Table 111. Time Required for Comparison of Unknown with ASTM Library, Showing Effect of Various Methods of Filtering Number of Correct result Library order Filter-Compare if true unknowns 1st 2 In 10 Time, sec None 30 29 30 22.4 Unsorted Asc. base peak mass 1st 2 masses same as 30 27 27 3.4 reference 1st 2 Asc. base peak mass 1st 3 masses same as 30 27 27 3.7 reference 1st 3 Asc. base peak mass 1st 4 masses same as 30 23 23 4.6 reference 1st 4 Asc. base peak mass SMR and 1st 2 masses 30 28 28 4.1 as above Asc. base peak mass Either of the 1st 2 30 29 29 4.7 masses = base mass ref Asc. base peak mass 1st or 2nd mass = 1st 30 29 30 5.6 or 2nd ref mass As above to 3rd mass Asc. base peak mass 30 29 30 8.0 SMR 30 29 30 6.4 Desc mol wt Smr and 1st 2 masses 30 28 28 3.2 Desc mol wt Times are the total times taken to retrieve one unknown from the ASTM library of 6 strongest peaks, including data reading and p r e processing, print out, and tape rewinding time. These searches were run on the CDC 3200, processing five unknowns at a time. SMR means significant mass range, as defined in the text.

1466

ANALYTICAL CHEMISTRY

greater amount of random error in peak heights of an unknown will not seriously affect its identification by Equations l,2,or4. EFFECT OF IMPURITIES ON IDENTIFICATION

Unknown substances are very often mixtures and the unknown mass spectrum will then be the sum of the mass spectra of several components. When the D values and their reciprocals for comparison of such a spectrum with reference spectra are calculated, it appears, in many cases at least, that it is still possible to detect the main component and sometimes some of the other components also. Examples of this are given in Table 11. Fortunately, when using the GLC-MS, such mixture spectra are less common and, if found, usually have one predominating component.

R

:I

EFFECT OF SYSTEMATIC ERRORS IN PEAK HEIGHTS

It is often observed that when the mass spectrum of a given substance is measured on different spectrometers and especially on dift'erent types of mass spectrometers-e.g., 60°, go", 180°, etc.-systematic differences in the pattern occur. This is especially marked when a mass spectrum scanned by changing ion accelerating voltage is compared with one obtained by a magnetic field sweep. This difference may be a monotonic curve, as occurs for ntridecane (Figure l), but often is not. It sometimes shows up very unusual discriminatory effects for fragmentations involving multiple bond breakages. Frequently the only reference spectrum available differs in this way. In such a case, the simple discrepancy factor is no longer so satisfactory. In this case, a normalization of the ion peaks within a series of mass ranges-e.g., 7-20, 21-34, 35-48, etc.-can be carried out for the reference and unknown spectra and the overall discrepancy factor taken as the sum of the separate discrepancies for each small mass range. This method is satisfactory, but relatively lengthy. A comparison made using the square root of the peak heights, normalization by Equation 2, appears to be almost as satisfactory, because the importance of the large but relatively unimportant peaks is reduced. The results obtained in this preliminary survey indicated that mass spectra are highly specific indeed and that the identification of an unknown spectrum in a catalog where the whole mass spectrum was used is an almost trivial problem. Even when a reduced number of peaks is used-e.g., a test was made using only the peaks from mje = 1 to 75-identification is usually made. Talroze, Raznikov, and Tantsyrev (5, 6) have demonstrated that in some cases an even smaller number of comparison peaks are adequate. While the methods described so far give almost certain identification when a large number of mass peaks are used, they are wasteful of computer time. The comparison between an unknown and one reference spectrum takes a time of the order of 0.1 sec, when using a CDC 3600 computer with drum storage. Also, the conversion of all the complete API mass spectra to magnetic tape storage would be a formidable task.

(5) V. V. Raznikov and V. L. Talroze, Dokl. Akad. Nauk SSSR,

170, 379 (1966). (6) V. L. Talroze, V. V. Raznikov, and G. D. Tantsyrev, Dokl. Akad. Nauk S S S R , 159, 182 (1964).

0

2

4

6

8

10

I2

14

16

I8

20

NUMBER OF UNKNOWNS IN BATCH

Figure 2. Reduction of time for unknowns as number of unknowns run per batch is increased Computers used were Control Data Corp. CDC 3200 and CDC 3600

USE OF THE ASTM CATALOG OF 6 STRONGEST PEAKS

The ASTM compilation of the 6 strongest peaks for some 3200 uncertified mass spectra is available (American Society for Testing Materials) on punched cards, one card for each, containing the mass numbers and heights of the 6 strongest peaks in the spectrum, the molecular weight, the compound name, and a reference number. The peak heights are normalized to make the strongest peak equal 100. It is of interest to examine how successfully a limited set of peaks as in this catalog can be used in the identification of unknowns. A qualitative approach of this kind has been described previously by Pettersson and Ryhage (7). The mass spectrum of the unknown must be sorted to give its six strongest peaks and then normalized appropriately for comparison. A discrepancy factor can then be calculated for the unknown and a reference spectrum on the basis of the six peaks from each by Equation 1. Where both spectra have a peak in common, the peak heights are subtracted; where they do not, the peak height which is missing is taken as zero and a large contribution from the other peak height to the discrepancy is obtained, The contents of the reference cards are loaded on a magnetic tape. To speed up comparison, a batch of 100 library spectra are buffered into store, and buffering in of the next 100 is initiated while comparisons are being made. The storage location is switched back and forth so that comparisons are made with one lot of spectra while the next lot are being read in. Each library spectra are compared in turn with each of the unknown spectra and copied into a list of ten compound names for which the discrepancies are smallest, with the discrepancies in increasing order down the list.

(7) B. Pettersson and R. Ryhage, Arkiu. Kemi,26, 293 (1966). VOL. 40, NO. 10, AUGUST 1968

1467

time. The tape is then read in smoothly and a minimum of calculation time is lost. When the unknown presented for comparison is a member of the catalog, this method is very satisfactory, giving correct identification every time. Where the unknown is an actually recorded spectrum, the percentage of successful identification decreases to some extent. However, even when the first identification is wrong, the substance name occurs almost always in the ten names listed.

'"i

I +++

FILTERING METHODS TO REDUCE SEARCH TIME

2d0

4dO SdO M0L. MFlSS

8dO

Figure 3. Number of molecules listed with molecular weights at each mass number for the ASTM compilation of 3200 mass spectra

The time taken for this comparison is still large, and the method is in fact not that which would be used in a manual search of the catalog. In such a manual search, a preliminary filtering process take place-that is, by looking at only certain portions of the library spectra a decision can be made whether or not to proceed with the comparison, to jump to the next comparison, or to stop the search. NONCOMMON MASS REJECTION

After a complete scan of the tape or a sufficient scan of the tape, the ten names and discrepancies are printed and the tape is rewound. The time taken for an unfiltered complete scan of the catalog in this way is 45 seconds using a Control Data Corporation CDC 3200 computer. By reading in up to 20 unknowns at once, keeping separate lists for each comparison with a reference, and filtering, the time can be reduced to 1.7 seconds per compound (Figure 2). Depending on which computer is used, the comparison process may be faster or slower than the tape reading so that either the computer has to wait for the buffering operation to finish before beginning on the next area of store, or the tape is read in a jerky manner as the tape is stopped while the computer finishes on one area of store and switches to the other. By adjusting the number of unknowns treated simultaneously, a balance can be obtained between calculation and buffering time such that the calculation time is just less than the buffering

If a record is rejected solely because the six masses listed are not the same as those found for the unknown, then the number of comparisons which will have to be made will be considerably reduced. In practice, various errors in experimentally determined peak heights may mean that the six strongest peaks are not always the same for a given compound and this criterion is too stringent. Several other methods of such rejection have been tested, The maximum amount of such filtering which could be tolerated was given by the following. A detailed comparison of the peak heights was made only if two or three largest peaks in both the unknown and the reference coincided without regard to their order. The values in the ASTM library of spectra are rounded off to two significant figures. It is more economical in computer storage and calculation time to use integer arithmetic. For comparison, the peak heights of the unknown must be con-

Table IV. Results of Search Program on Two Unknowns 2-Methyl 1-pentanol or 2- Ethyl 1- butanol ? Possible substances Mol wt 1055A 1-Butanol, 2-ethyl102 175C Acetic anhydride 102 779c 102 Isobutyrate, methyl1649A Isobutyric acid, methyl ester 102 390A 102 Acetate, isopropyl623C 102 Isobutyrate, methyl24C 102 Di-n-propyl ether 1053A 102 1-Pentanol,2-methyl389A 102 Acetate, n-urouyl102 392A 10 Butanoate,-meihyl2Ethyl 1hexanol ? Unknown Mol wt Possible substances 624C 130 1 Hexanol, 2-ethyl124C 130 2 Propionate, t-butyl1159A 130 3 Propanoate, isobutyl130 830A 4 Di-n-butyl ether Igasl 1152A 130 5 Propanoic anhydride 130 1158A 6 Propanoate, n-butyl130 65H 7 1-Hexanol, 2-ethyl4c 130 8 +Amyl acetate 130 5C 9 Isoamyl acetate 677C 130 10 1,3-Dioxolane,2-vinyl-4-methylolRetrieval time was 3.43 seconds. Unknown

1468

ANALYTICAL CHEMISTRY

Discrepancy, 13.0 14.7 14.9 15.1 15.3

16.0 16.4 18.6 18.8 22.4 Discrepancy, 5.3 13.4 16.3 16.5 19.7 21.0 21.7

22.6 23.0 25.1

x

verted to integers, either by rounding off or by truncation. As might be expected, over a range of test compounds more retrievals are achieved when the unknown spectra are rounded off rather than truncated. MOLECULAR WEIGHT REJECTION

If the molecular weight of an unknown sample is known, and the catalog tape is arranged with the spectra sorted in order of descending molecular weight, the spectra can be skipped until the group of spectra with the correct molecular weight is reached. These can be compared and the search stopped when it is passed. To be successful this method depends on a reliable determination of the unknown molecular weight. The need for this can be partly overcome by defining a range of masses which will most probably include the parent mass. Except when an impurity is present, the parent mass is unlikely to be less than the highest recorded mass minus 9 amu. It may however be greater-e.g., for substances such as alcohols, which tend to lose H2O readily. A system was tried in which the spectra were examined if their molecular weights were at the three highest recorded masses of the unknown and these plus 15, plus 18, and plus 44 to allow for a possible loss of CH3, HzO, or C02. It might be expected that skipping the spectra in the library tape in descending order in blocks until the desired masses are encountered would give much greater speed. Figure 3 shows the number of spectra to be examined at each mass number. However, the skip function has to read

the tape to find the file marks so that this method was found to give an increase in scanning speed of only the same order as that using noncommon mass rejection. If a disk or addressable tape were available, a much greater increase might be possible. These two methods of rejection can be combined, giving a slight increase in speed as is shown in Table 111. A typical output from the computer is given in Table IV using as unknowns two spectra published by Pettersson and Ryhage (7). In both cases shown a correct identification is made. CONCLUSION

If a catalog of complete mass spectra can be stored, the problem of retrieval and identification of a given spectrum is almost a trivial one. Even when using the ASTM catalog of six strongest mass spectral peaks, a very successful search routine can be devised and, by making use of simple filtering techniques, identifications can be achieved in times commensurate with real time of the GLC-MS. ACKNOWLEDGMENT The authors are indebted to Caulfield Technical College for the use of its card sorting machine in this work,

RECEIVED for review December 18, 1967. Accepted April 15, 1968.

Computer Methods in Analytical Mass Spe-ctrometry Empirical Identification of Molecular Class L. R. Crawford Division of Chemical Physics, C.S.I.R.O.,Chemical Research Laboratories, Melbourne, Australia

J. D. Morrison Division of Physical Chemistry, La Trobe University, Bundoora, Victoria, 3083, Australia

The use of the computer to identify from a mass spectrum the functional groups present in an unknown molecule i s discussed. Four empirical methods of carrying out this operation are described. Three of these involve the derivation of “avera e” mass spectra representative of various functional groups, or of atom and bond structure content. The fourth makes use of mass spectral correlations similar to those derived by McLafferty. A weighted mean is then taken of the results. It is concluded that at least in the case of low molecular weight compounds, a successful group identification can usually be made.

of this series (1) dealt with the problem of the recognition of an unknown spectrum, which is known to be a member of a catalog. It was shown there that the mass spectra were highly specific and that even in the case where the samples were impure, or noise was present, in many cases at least identification in this way was a rather trivial problem.

THE FIRST PAPER

(1) L. R. Crawford and J. D. Morrison, ANAL.CHEM.,40, 1464

(1968).

It was also evident that the category of search could be reduced in various ways and thereby this identification process was speeded up considerably. The present and succeeding papers will deal with the more general problem where the unknown spectrum may not be listed in a catalog and where the identification has to proceed ab initio. Perhaps the first thing that a mass spectrometrist does on examining a mass spectrum of an unknown is to look for certain key peaks which are indicators of the presence of functional groups in a molecule (2). These peaks lie usually in the mass region between mje = 12 and mje = 200. If a preliminary identification of the class of a molecule can be made, the subsequent elucidation of the complete structure thereby becomes much easier. For some analytical work, a complete identification is not even required, all that is wanted is a determination of the type of molecule present and, for example, the relative amount of (2) F. W. McLafferty, “Mass Spectral Correlations,” Aduan. Chem. Ser. 40, Analytical Chemistry 1963. VOL. 40, NO. 10, AUGUST 1968

0

1469