Compound classifier based on computer analysis of low-resolution

cal compound class based on low resolution mass spectral data has been developed. The method relies on computer analysis of sets of standard spectra,...
1 downloads 0 Views 1MB Size
third longer fusion times were used (30 to 35 minutes at 380 “C). The fusion reaction products of each compound were clear white. An infrared analysis of each fusion product showed only resorcinol and phloroglucinol, respectively. The first samples of m-benzenedisulfonate were taken as received from the supplier. The average of six determinations was 92.5% as the disulfonate. However, when further purified samples were taken (elemental analysis showed 99.4% of the theoretical amount of sulfur) the average of five determinations was 100.4%. The fusion of 2,7naphthalenedisulfonate was found to go to completion without difficulty at the normal fusion conditions. The average of the standard deviations for all the compounds given in Table V is about 2 . 0 x for sample sizes between 3 and 10 mg. The alkali fusion of p-chlorobenzenesulfonategave 50 % conversion to resorcinol rather than the predicted product, p-chlorophenol. An NMR spectrum of the starting sulfonate showed that it was indeed, para substituted. However, an infrared identification of the fusion products showed only the presence of resorcinol. A probable explanation of this result is that both the halogen and sulfonate groups of halogenated sulfonates undergo alkaline hydrolysis. An eliminationaddition mechanism with formation of a benzyne intermediate which leads to the resorcinol formation (15) has been suggested. When utilizing a high temperature reaction such as alkali fusion, the possibility of thermal degradation of the starting

sample or of the reaction product is always present. To help more easily establish first, which sulfonates can be analyzed by fusion and second, the upper limit for their fusion temperature that can be safely used, an examination of their thermal stabilities is suggested. The data obtained for a large number of sulfonates and several phenolates are presented in Table VI. This aspect of the fusion reaction was studied using thermogravimetry in an atmosphere of both air and helium. Most sulfonate salts examined here were stable to temperatures well above 400 “C and some are stable above 500 “C. Sulfonic acids, however, start to decompose at much lower temperatures than their corresponding salts. The acids can be analyzed without difficulty, however, because they are quickly neutralized by the alkali, and fusion of the potassium sulfonate salt proceeds normally. The phenolates are much more stable in a helium atmosphere than in air. For this reason, the fusion reaction oven must be continuously purged with a flow of helium.

RECEIVED for review August 9, 1971. Accepted October 14, 1971. This work was supported by the National Science Foundation, Grant GP 12171.

(15) S. Oea, N. Furukawa, and T. Asari, BUN. Chew. SOC.Jap.,

42, 177 (1969).

A Compound Classifier Based on Computer Analysis of Low Resolution Mass Spectral Data Geochemical and Environmental Applications Dennis H. Smith’ Organic Geochemistry Unit, School of Chemistry, Bristol University, Bristol BSB ITS, England

A computer-based method fok determination of chemical compound class based on low resolution mass spectral data has been developed. The method relies on computer analysis of sets of standard spectra, reducing these large data sets to a much smaller “correlation set.” The correlation set, consisting of “ion series spectra” of each class, is used in subsequent automatic computer classification of mass spectra. This approach is particularly important in analysis of data from coupled gas chromatograph/ mass spectrometer systems where large numbers of spectra of separated components of complex mixtures can be classified rapidly and further structural information elicited based on this classification. Although initially programmed for compound classes relevant to geochemical and environmental studies, the correlation set and structural information programs can easily be expanded to include classes important in other areas of research. The simplicity of the method lends itself readily to small computer or semiautomatic methods of data reduction and analysis. COMPLEX MIXTURES of organic compounds are encountered in many chemical investigations. The analysis of these mixtures presents formidable problems of separation, isolation, and identification of the many individual components present. In investigations in organic geochemistry and environmental 1 Present address, Department of Chemistry, Stanford University, Stanford, Calif. 94305.

536

ANALYTICAL CHEMISTRY, VOL. 44, NO. 3, MARCH 1972

chemistry, the situation is frequently further complicated by the availability of only small amounts of material. In these cases, every chemical separation or isolation procedure carries with it great risks of contamination and sample loss. These analytical problems are being countered to some extent by development of sophisticated instrumentation capable of dealing with small amounts of material, the use of computers in data handling and analysis, and new chemical procedures for derivatization and volatilization of organic material. The combined gas chromatograph/mass spectrometer (GC/ MS) has made the most significant impact in the area of instrumentation. When such an instrument is coupled to a digital computer for data acquisition and reduction, large amounts of data on the separated components of mixtures can be acquired rapidly and accurately, thereby exploiting the full capabilities of GC/MS in terms of sample throughput and providing data in a format suitable for further analysis (1-5). (1) R. A. Hites and K. Biemann, ANAL.CHEM., 40,1217 (1968). (2) C. C. Sweeley, B. D. Ray, W. I. Wood, J. F. Holland, and M. I. Krichevsky, ibid., 42,1505 (1970). (3) D. H. Smith, R . W. Olsen, F. C. Walls, and A. L. Burlingame, ibid., 43, 1796 (1971). (4) D. Henneberg and G. Schomberg, “Advances in Mass Spectrometry, Vol. 5,” A. Quayle, Ed,, The Institute of Petroleum, London, in press. (5) W. E. Reynolds, V. A. Bacon, J. C. Bridges, T. C. Coburn, B. Habern. J. Lederberg, E. C. Levinthal, E. Steed, and R. B. Tucker, ANAL.CHEM., 42,-1122 (1970).

Parallel to this development, significant advances in the use of computers in analysis of mass spectral data have been made. The reader is referred to a recent publication by Hertz et al. (6) for a summary of these advances. Derivatization procedures have now been developed to the point where most classes of compounds can be rendered suitable for GC/MS analysis, including such diverse compounds as porphyrins (7) and amino-glycoside antibiotics (8). This means that many components of complex mixtures can, in principle, be treated by GC/MS followed by computer analysis of the data. Many of the computer-based approaches to interpretation of low resolution mass spectra make considerable demands on computer facilities and have not been sufficiently generalized to cope with the variety of compound classes confronting most areas of research. With this consideration in mind and in view of the specific chemical problems of this research group, a different approach was decided upon. This approach involves computer analysis of low resolution GC/MS data with the primary goal of determination of compound class of each separated component. Compound classification has been a subject for investigation in the petroleum industry for many years, utilizing both low (9,lO) and high (1113) resolution mass spectral data of total mixtures. Other approaches to this problem have been discussed by Crawford and Morrison (14) and Hites and Biemann (15), the latter group employing GC/MS techniques. The present investigation was undertaken to develop a general approach reflecting modern developments in GC/MS instrumentation keyed to many different compound classes over a wide molecular weight range and independent of the presence or absence of certain key peaks as an indicator of compound class. An over-simplified view of this approach is to carry out the separation and analysis of complex mixtures in the computer rather than at the laboratory bench. A CORRELATION SET

It is widely known that mass spectra of a given compound class are generally quite similar. Interpretation of an unknown mass spectrum is carried out on this basis. From the standpoint of computer classification of spectra, it is desirable to express this similarity in a simple format sufficient to identify a compound class and to distinguish this class from all other classes. This expression can be termed a “correlation set,” and has two important advantages. The first is that the correlation set would contain a small amount of information representing the summary of a much larger set of data consisting of individual mass spectra of each member of each class. This is particularly important as small computers with limited memory capacity become integral parts of (6) H. S. Hertz, R. A. Hites, and K. Biemann, ANAL.CHEM., 43,681

(1971). (7) D. B. Boylan and M. Calvin, J. Amer. Chem. SOC.,89, 5472 (1967). 42, 1661 (1970). (8) K. Tsuji and J. H. Robertson, ANAL.CHEM., (9) “ASTM Standards on Petroleum Products and Lubricants, Vol. 1,” 38th Ed., American Society for Testing and Materials, Philadelphia, Pa., 1961, p 1120. (10) A. Hood, “Mass Spectrometry of Organic Ions,” F. W. McLafferty, Ed., Academic Press, New York and London, 1963, Chapter 12, p 597. (11) W. G. Seifert and R. M. Teeter, ANAL.CHEM., 42, 180 (1970). (12) T. Aczel, D. E. Allan, J. H. Harding, and E. A. Knipp, ibid., p 341. (13) W. G. Seifert and R. M. Teeter, ibid., p 750. (14) L. R. Crawford and J. D. Morrison, ibid., 40,1469 (1968). (15) R. A. Hites and K. Biemann, ibid., 42,855 (1970).

mass spectrometer or other laboratory instrument systems (3). The correlation set should require a relatively small area of memory and could be searched through very rapidly. The second advantage is that, assuming the method of correlation is chosen properly, characterization of a class can be accomplished in the absence of standard mass spectra of some members of the class. Subsequent classification of spectra that may not have been available during development of the correlation set can then be done correctly. ION SERIES SUMMATIONS AND SPECTRA One approach to the construction of a correlation set for classification of mass spectra is the use of ion series summations, as employed in the mass spectrometer/computer system used for the preliminary organic analysis of returned lunar samples (3, 16,17) and suggested independently by Crawford and Morrison (14). This approach involves expressing a low resolution mass spectrum as a set of numbers representing the percent contribution to the total ionization of each of the fourteen distinguishable ion series. Mass spectra are thus viewed as a set of fourteen ion types consisting of, for the case of saturated alkanes, the possible ion series CnH2n+2f, C,Hz,+I+, C,H2,+. . .CnH2,--llf (only fourteen are possible because members of the homologous series CnH2n-12+and C,H2,+2+ have the same nominal masses). This method of visualizing the spectra is also related to a method for presentation of high resolution mass spectral data (18). Presentation of mass spectra in the form of “rectangular arrays” which give an indication of the distribution of ion intensities in various ion series has proved useful for correlation of mass spectra with structure for a limited number of classes of compounds (19, 20). In order to be useful as a means of classification, the fourteen ion series summation values must be approximately reproduced in the spectra of all members of a class. For low resolution spectra the requisite similarity of the summations within a class has not been systematically investigated, although one study involving high resolution data has been reported (21). In this study, it was noted that the C,H2,-10+ series was relatively constant for aliphatic aldehydes and ketones, although data on other ion series for these compounds were not reported. The steps in devising the correlation set were, therefore, a detailed investigation of the similarity of ion series summations for mass spectra of each member of a given class, and a determination of the uniqueness of these summations when comparing this class to other classes. Class Definition. A proper definition of compound class is required for this approach to be successful. It is not expected that compounds containing heteroatoms with the possibility of intense rearrangement ions (for example, esters) will show similar ion series summations when the rearrangement is suppressed or eliminated by branching at the appropriate carbon atom. For this reason, the following definition was adopted. Considering a general molecule,

R-(CHZ)~H

(n

=

0,1,2 . .)

(1)

(16) The Lunar Sample Preliminary Examination Team, Science, 165, 1211 (1969). (17) Ibid., 167, 1325 (1970). (18) A. L. Burlingame and D. H. Smith, Tetrahedron, 24, 5749 (1968). (19) M. C. Hamming and R. D. Grigsby, “Proceedings of the 15th Annual Conference on Mass Spectrometry and Allied Topics,” May 14-19, 1967, Denver, p 107. (20) B. Petterson and R. Ryhage, ANAL.CHEM., 39,790 (1967). (21) R. Venkataraghavan, F. W. McLafferty, and G. E. Van Lear, Org. Muss. Spectrom., 2, l(1969). ANALYTICAL CHEMISTRY, VOL. 44, NO. 3, MARCH 1972

537

701

MLKANE

METHYL-n-ALKANOATE

I

8

IO

I2

14

16

I8

20

22

24

26

28

CUBON MMER

2nt2 I

I

I

E

I2

I

I

I6

t

I

!

20

I

I

J

I

I

32

28

24

I

I

36

Figure 3. Per cent total ionization us. carbon number data for the 2n 1 , 2 n - 1 , 2 n - 10 and 2n - 11 ion series for the compound class methyl-n-alkanoate

+

CARBON MER

Figure 1. Per cent total ionization us. carbon number data for the 2n 2,2n 1,2n, and 2n - 1 ion series for the compound class n-alkane

+

+

MONO-I1-ALKYL

50

IO

I2

14

I6

18

20

BENZENE

22

24

26

CARBON NUMEER

Figure 2. Per cent total ionization us. carbon number data for the 2n 1,2n - 5,2n - 6, and 2n - 7 ion series for the compound class mono-n-alkyl benzene

+

These tapes are referred to hereafter as the “6281” and “3000” data tapes, respectively. These data were supple-, mented where indicated by mass spectral data from this laboratory. Initial emphasis was placed on compound classes important to the geochemical and environmental research of this laboratory to restrict somewhat the number of classes to be included in the correlation set. The characteristics of the spectra included in the library place another restriction on the method of classification. Many spectra have ions recorded only above MmIe 36 or W m j e 60. Because comparisons of ion series summations are useful only if all data are treated in the same way, it was necessary to perform summations above mje 35 so as to include the maximum number of spectra. Treatment of the mass spectra in developing the correlation set involves a computer search of the library for members of a given class. Ion series summations are performed automatically for each spectrum using the equation,

+ m + 14n)

Z(30

s,

=

c 10)

(1)

j

compounds having R constant as n varies as indicated are considered to be in the same class if branches or extensions of the alkyl chain do not affect the basic fragmentation pattern by participating in rearrangements or other fragmentation pathways not normally available to lower homologs. This definition can lead to a large number of compound classes, where major classes, say ketones, esters, alcohols, and so forth, are divided into a number of sub-classes. In practice, however, such subdivision may not be possible, primarily due to lack of mass spectra of standard compounds. For these cases, the ion series summation data should still be sufficient to determine basic classification with further subdivision left to more specialized programming. In certain instances, where repetitive determinations of the presence of a single compound or a small number of compounds in a complex mixture are required, it may be desirable to determine ion series summations of single compounds. In summary, it may be necessary and/or useful to relax this definition of class depending on available reference spectra and the particular chemical problem at hand. Method of Summation. The mass spectral data used in this study are contained in a library of mass spectra available on two IBM compatible magnetic tapes, one containing 6281, the other 3000 spectra, available from the Mass Spectrometry Data Centre, Aldermaston, Reading RG7 4PR, England. 538

ANALYTICAL CHEMISTRY, VOL. 44, NO. 3, MARCH 1972

where Z(j) is the relative intensity of mass j , rn = 1,2. . .14, n = 0,1,2. , ,,and S , is the per cent contribution of ion series m to the total ion current. Replicate spectra of the same compound from different contributors and recorded on different spectrometers are frequently encountered in the library, and the summation results are averaged in these cases. Summation data are then output for each member of the class. Each ion series is designated, for purposes of consistent notation, with reference to n-alkanes. Thus the ion series designation for masses corresponding to molecular ions of n-alkanes (empirical formulas C,H2,+2) is 2n 2 (for n 2 3, mje 44, 58, 72,. . .). Masses corresponding to saturated fragment ions from an n-alkane fall in the series 2n 1 (for n 2 3, m/e 43, 57, 71,. . .). Similarly, molecular ions of alkyl ben-

+ +

zenes (R- D C H 2 - )

would fall in the ion series 2n

-6

(m/e 92,106,. . .) and molecular ions of acyclic, aliphatic ketones, because they have the same nominal masses as n-alkanes, in the ion series 2n 2. This notation, while consistent, does not of course represent the true hydrogen atom content of compounds containing elements other than carbon and hydrogen. The data presented in the subsequent examples are concentrated on higher carbon numbers, generally above CWObecause chemical procedures encountered in geochemical and

+

Q-AL K A N E

70-

70

-

8-ALK-I-E

NE

I

I

60 50 4030-

60 -

-

50-

403 0-

2010-

20IO1

1

1

I

I

l

I

l

I

,

I

l

I

I

I

-n - A L K A N - 2 4 N E

70-

70 -

I

I

I

I

I

I

-

50-

4030-

I

ALKYL-CYCLOHEXANE

60 50 4030-

60-

I

I

I

20IOI

I

I

I

I

I

I

I

I I

t

I

I

II - 1 1 -10-9

-8 -7 -6 -5

-4

-3 -2 -I +O +I +2

ION S E R I E S . Z n

Figure 5. Ion series spectra for the compound classes n-alk-1-ene and n-alkyl cyclohexane

environmental studies, generally involving solvent extraction and evaporation prior to GCIMS analysis, discriminate against volatile, lower molecular weight material. Data on lower carbon numbers indicate that generalizations adopted for higher carbon numbers are followed, although ion series values change slightly due to relatively higher contributions to the total ionization from masses c36, and this is verified by summation of the spectrum of n-Cl3 alkane (catalog No. AST 1503). For classes in which R (structure I) contains a functional group other than alkyl, similar results are obtained. Important ion series us. carbon number data for mono-n-alkyl benzenes ( R = ~ C H , C H , C H , C H , - )

are presented in

Figure 2. This graph illustrates the interesting behavior noted for some classes, where certain ion series steadily decrease or increase with carbon number, while others remain constant. In this instance, the 2n - 7 and the 2n 1 series, formally representing simple cleavages with charge retention on either the aromatic or alkyl fragment, show slight decreases and increases, respectively, with carbon number as might be expected as the alkyl chain length increases. The 2n - 6 series, however, remains relatively constant in spite of the fact that this series has significant contributions from both the

+

molecular ions and rearrangements of the type

m/e 92 2n - 6series

The ion series variations noted in this compound class do not, however, alter the characteristic distribution, obtained by averaging the ion series summations, sufficiently to affect their utility. This distribution is very different from others included in the present correlation set. These variations may, in fact, be useful in further treatment of the data, as will be discussed in a subsequent section. The results obtained for the most abundant ion series in the class methyl-n-alkanoates (methyl esters of n-fatty acids) are presented in Figure 3. Again, although slight variations are noted as a function of carbon number, the ion series distribution is surprisingly constant. Ion Series Spectra. One means of visualizing the ion series distributions is a simple bar graph of per cent total ionization us. ion series (2n - 11,to 2n 2), termed an “ion series spectrum.” [A related presentation has been termed a reduced mass spectrum (14, a somewhat less specific term.] This presentation is useful for quick comparison of classes that have molecular ions in the same ion series (same nominal masses) to determine relative features of the distributions. For example, ion series spectra for n-alkanes and n-alkan-2ones are presented in Figure 4, and n-alk-1-enes and n-alkyl cyclohexanes in Figure 5. Similarly, facile comparison of classes possessing one or more of the same functional groups is possible, as illustrated for methyl-n-alkanoates and dimethyl-n-alkan-dioates in Figure 6. The same mental processes used in initial examination of a conventional low resolution mass spectrum can be exercised in the examination of the derived ion series spectrum, because the ion series spectrum represents a graphical summary of the distribution of ion types present in the original data.

(z2)

+

ANALYTICAL CHEMISTRY, VOL. 44, NO. 3, MARCH 1972

539

-

70 60 -

CLASSIFIER

METHYL-Q-ALKANOATE

-

5040-

INPUT

UNKNOWN

.*--

SPECTRW

30CALCUUTE ION

-11 -10 -9

-8 -7 -6 -5

-4

-3 -2 -I to +I +2

SERIES

SPECTRUM

eMISMATCH

ION SERIES, 2n

Figure 6. Ion series spectra for the compound classes methyl-n-alkanoate and dimethyl-n-alkan-dioate

1 Y

OTIIOW

DETEBXINE

From the large quantity of data (covering many compound classes) examined, the following generalization can be made, Within the restricted definition of class mentioned above, a compound class can be characterized for the purpose of classification by its ion series spectrum. In addition to providing a means for development of a correlation set, this statement has interesting physical implications. Stated another way, this observation implies that, within a class, extension of the alkyl chain, while changing the intensity us. mass distribution of peaks, has little or no effect on the distribution of ion types that make up the various ion series. A summary of ion series values for compound classes considered to this date are included in Table I, along with the carbon number range over which reference spectra were available. In addition to classes of geochemical and environmental interest, several other common classes have been included, for example, alcohols, ethers, and amines. Some of these classes would normally be derivatized to other classes (e. g., alcohols -+ trimethylsilyl ethers) prior to GC/MS analysis. It must be stressed that certain compound classes, such as branched alkanes, aliphatic ketones, and amines, do not have sufficient reference spectra available in the existing library for a detailed breakdown into subclasses. In these cases there is a possibility of some uncertainty in classification. For example, one of the important differences between ion series spectra of n-alkanes and ketones is the 2n 2 series, larger for ketones due in part to intense rearrangement ions. A ketone spectrum wherein rearrangements are suppressed may resemble both the ketone and n-alkane classes. This situation should be dealt with by obtaining more reference spectra for these somewhat incomplete classes, thereby modifying and extending the data in Table I.

+

THE CLASSIFIER: COMPOUND CLASSIFICATION AND STRUCTURAL INFORMATION

The data in Table I comprise a correlation set. Although it is possible to use the correlation set manually, by performing ion series summations on an unknown spectrum and comparing the results to the table, this procedure is best done 540

0

ANALYTICAL CHEMISTRY, VOL. 44, NO. 3, MARCH 1972

I INPUT

SPECTRU!4

K CONCLUSIONS

Figure 7. Flow chart of the CLASSIFIER program illustrating the basic steps in classification of mass spectra

using a computer, particularly in the case of potentially large quantities of GC/MS data. In terms of computer usage, this correlation set is extremely efficient. If each class is keyed with a number designation rather than a name, and bit packing used to encode the class number, the molecular weight of the lowest member, the number of double bond equivalents and heteroatoms for the class, information on each class can be stored in sixteen words, assuming a 16-bit word computer and four different heteroatoms. Additional information would require another word of storage. However, compared to storing each datum in a separate word, unpacking of this information requires extra computer time. MISMATCH Values. Compound classification proceeds using the Classifier program outlined in Figure 7, utilizing an ICL Model 4/75 computer programmed primarily in Fortran. Ion series summation for an unknown spectrum is carried out exactly as described previously (Equation 1). The resultant values are then compared to each class in the correlation set and a “MISMATCH” value calculated as

Table I. Correlation Set

Carbon number rangeb 8-36 8-24 8-21 15-21 8-28 8-18 8-14 8-16 8-12 10-25

2 n - 11 0.2 0.1 0.1 0.2 0.1 0.0 0.1 0.0 0.0 0.1

2 n - 10 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.0 0.0 0.0

2 n -9 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.0 0.1 0.1

2 n -8 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.0 0.1 0.1

Ion series 2 n 2 n 2 n 2 n -,7 - 6 - 5 - 4 0.1 0.1 0.3 0 . 2 0 . 1 0 . 1 0.3 0.2 0.1 0.1 0.3 0.2 0.1 0.1 0 . 2 0 . 1 0.1 0.1 0 . 3 0 . 2 0 . 2 0 . 2 0.6 0.5 0.8 0 . 5 1.8 1.0 0.3 0.2 0.7 0.5 1 . 0 0.7 3.3 2.6 1.3 0.5 3.2 4.4

2 n -3 2.7 3.8 3.9 0.3 3.5 7.1 10.3 8.5 33.6 41.1

2 n -2 1.9 1.5 1.3 0.8 0.8 5.9 3.1 17.4 12.8 19.6

2 n -1 18.7 17.4 17.6 11.9 15.5 38.6 51.3 54.5 24.6 20.9

-

2 n 2 n 2n +1 +2 12.1 59.9 3.8 14.2 59.0 3.3 17.3 56.0 3.1 15.7 64.6 6.0 10.9 65.4 3.1 25.2 20.8 0.9 18.2 12.0 0.6 11.8 5.9 0 . 2 6.8 13.9 0.5 2.9 5.4 0 . 4

Claw Source n-Alkane MSDC iso-Alkane MSDC anteiso- Alkane MSDC Isoprenoid alkane OGU Highlybranchedalkane MSDC n-Alk-1-ene MSDC Branched alkene MSDC n-Alkyl cyclohexane MSDC n-Alk-1-yne MSDC n-Alkyl decalin MSDC n-Alkyl perhyd. anthracene phenanthrene 14-26 MSDC 0 . 3 0.2 0.5 0.8 3.3 2.3 21.7 12.1 27.2 9 . 0 16.4 1.9 3.8 0.5 Sterane 27-30 OGU 0.0 0.0 0.4 0.3 15.5 13.9 16.2 8.5 18.4 4 . 3 13.1 1.8 7 . 1 0.4 n-Alkyl benzene 10-26 MSDC 0 . 5 0.4 1.9 2.6 32.9 34.7 7.0 1.1 2.6 0.6 5.9 1.4 7.9 0.5 Triterpane-Id 29-30 OGU 0.1 0.0 1 . 2 1.7 5.8 3.1 26.8 9.7 26.2 7.0 14.6 1.5 2.1 0.2 Triterpane-I1 30 OGU 0.8 0 . 4 2.9 3.1 8.6 5.9 17.0 8 . 0 23.4 8.7 15.5 1.8 3.2 0.7 Triterpane-I11 30 OGU 0 . 2 0 . 1 2.3 3.3 9.1 5.4 16.0 7 . 3 27.1 7.2 16.7 1.8 3.1 0 . 4 Triterpane-IV 30 OGU 0 . 8 0 . 2 6.1 4.1 13.3 8.6 15.2 5.8 17.6 2.7 12.1 1 . 3 11.2 1.o 7 DBz’s, 2-ring PNA 10 MSDC 7.6 5.2 2.9 2.4 5 . 0 7.7 5.2 1 . 2 1 . 3 0.1 0.6 3.7 6.1 51 .O 7 DB=’s, 2-ring subs. PNA 11-14 MSDC 9.8 2.1 1.9 2.6 3.9 2.4 1 . 8 0.9 2.2 2.2 4.6 3.1 33.0 29.5 8 DB=’s, biphenyl 12 MSDC 2.9 2.8 2.4 5.6 4.9 3.8 3.3 1.7 3.8 8 . 2 12.0 39.4 6.6 2.6 8 D k ’ s , subs. biphenyl 13-14 MSDC 2.9 1.9 2.4 2.6 4.6 2.2 3.8 2.4 11.0 9.3 24.1 24.9 5.7 2.2 9 DB=’s, PNA 13 MSDC 2.3 1.6 1.8 1.6 2.6 2.4 6.1 5.3 27.1 32.6 10.8 3.3 1.4 1.1 10DB=’s, 3-ring PNA 14 MSDC 1 . 9 5 . 5 8.1 6.7 4.3 8.7 2.4 42.8 10.6 3.5 1.5 1.5 1.2 1 . 3 10 DBs’s, 3-ring subs, PNA 15-18 MSDC 2.6 2.8 4.1 3.6 8.9 5.7 24.5 28.3 10.3 3.8 1.9 0.9 1.1 1.5 11 DB=’s, PNA 15 MSDC 1.2 1.8 4.8 5.6 18.3 24.5 23.5 10.2 5 . 0 1.4 1 . 4 0.8 0.7 0.8 12DB=’s,CringPNA 16 MSDC 11.7 10.5 5.9 44.6 9.4 2.2 1.5 1.7 1.2 1.0 0.6 1.0 1.5 7 . 2 12 D B S s , 4-ring subs. PNA 17-18 MSDC 5.9 3.3 17.6 39.0 11.0 3.4 3.4 6.9 3.1 0.9 0 . 8 0.8 1.9 2.0 13DB=’s,CringPNA 18 MSDC 7.8 37.2 9 . 4 3.3 2.7 1.7 1 . 3 1 . 4 1.6 0.9 1.3 4.7 9.2 17.5 13 D E S , 4-ring subs. PNA 19-26 MSDC 16.0 29.2 8.2 2.5 2.8 4.1 3.0 1.5 1.7 1.8 3.2 3.9 12.4 9.7 14 DBz’s, binaphthyl 20 MSDC 7.3 2.1 1.5 1.2 2.0 1.8 2.0 1.8 2.0 3.8 5.2 20.9 20.7 27.7 15 DB=’s, 5-ring PNA 20 MSDC 1 . 3 1.0 0.7 0 . 8 0.7 0.9 0.8 2.2 2.6 10.5 10.1 51.7 13.3 3.4 16 D E S , PNA 22-24 MSDC 2.6 3.2 3.2 3.2 3.8 3.3 4.5 4.8 5.6 40.4 16.7 3.8 2.5 2.4 n-Alkan-1-01 8-29 MSDC 0 . 9 0.1 0.1 0 . 1 0.4 0.3 0.6 0.7 6.3 8.7 35.1 19.8 24.6 2.3 Di-n-alkyl ether 8-20 MSDC 4.9 0.8 0.7 0 . 1 0.1 0.1 0.2 0.1 3.0 2.8 19.3 17.9 46.8 3.2 +Alkyl aldehyde 8-14 MSDC 2.2 0.4 0.1 0.0 0 . 2 0.1 0.6 0.8 11.4 14.2 22.3 15.7 23.8 8 . 2 n-Alkan-2-one 8-15 MSDC 8.9 0.4 0.0 0.0 0.1 0.1 0 . 2 0 . 2 4.5 5.3 11.6 5.3 32.8 30.6 Other aliphatic ketone 8-21 MSDC 2.7 0.3 0.1 0.0 0.1 0.1 0.5 0 . 4 6.0 2.5 13.7 4.8 51.2 17.6 Methyl-n-alkanoate 8-27 MSDC 26.3 27.0 5.1 0.4 0.2 0 . 2 1.1 0.4 3.3 1.9 14.6 4.5 13.2 1 . 8 Dimethyl-n-alkan-dioate 8-24 MSDC 20.3 8.8 1.1 0 . 4 0.8 0.5 2.2 1.2 5.4 7.8 27.0 11.6 7.6 5.3 2,6-Isoprenoid Me-ester 15-20 OGU 18.8 26.9 1.7 0.2 0.5 0.0 1.1 0.1 2.9 2.6 20.0 6.0 17.6 1.6 3,7-Isoprenoid Me-ester 16-21 OGU 27.2 16.5 4.8 0.4 0 . 1 0.1 1.0 0.3 3.7 1 . 8 19.7 6.1 17.5 0.8 4,8-IsoprenoidMe-ester 15-22 OGU 27.5 10.1 1 . 6 0 . 0 0.1 0.0 1 . 2 0.2 3.6 1.4 20.5 5.9 25.9 2.0 5,9-Isoprenoid Me-ester 13-23 OGU 18.0 20.9 3.7 0.1 0.3 0 . 0 1.2 0.7 3.8 2.0 21.0 7.1 19.7 1.5 OH- or MeO-Me-ester 19-25 MSDC 13.3 5.8 9.5 1.5 0.8 0.5 1.2 1.1 7.7 7.6 21.4 7.1 18.5 4.0 Keto-Me-ester 9-19 MSDC 11.2 8 . 4 1.3 0.2 0.3 0.2 1.9 1.3 5.9 4.6 20.5 9.9 25.0 9.3 Isoprenoid acetate 16-22 OGU 1.0 0 . 1 1.9 0.0 0.0 0.0 0.0 0.0 2.7 5.1 33.5 24.0 30.2 1 . 5 Di-n-alkyl phthalate 10-26 MSDC 1.2 0.7 1.1 4.3 5 . 5 3.7 45.0 8.5 4.5 2.4 8.8 5.4 7.6 1 . 3 1-n-Alkyl amine (1 ”) 8-11 MSDC 10.9 2.2 0 . 2 0.2 0.6 2.7 0.7 1.3 5 . 0 4.0 20.5 13.2 15.3 23.2 >1-n-Alkyl amine (1”) 8 MSDC 2.8 0.1 0.0 0.0 0.3 0.1 0 . 2 0 . 2 2.6 0.8 8.6 6 . 0 5.1 73.2 2 O- or 3“-Alkyl amine 8-14 MSDC 5.8 0 . 3 0.0 0 . 0 0.1 0.1 0 . 2 0 . 3 3.2 1 . 2 10.2 9.1 14.7 54.8 a PNA refers to polynuclear aromatic; alkyl substituted classes are mono-alkyl unless otherwise indicated. The indicated range may not include all the carbon numbers within the range. MSDC refers to Mass Spectrometry Data Centre spectra; OGU refers to Organic Geochemistry Unit spectra. The triperpane classes are hydrocarbons: I corresponds to 8,lCmethyl substitution; I1 corresponds to 13,lCmethyl substituticm; I11 corresponds to 7-membered ring C; IV corresponds to 9,lOmethylene (cyclopropane ring).

MISMATCH

=

14

absolute value (Correlation Set Seriesn

-

n=l

unknown

series,) (2)

The lower the MISMATCH value, the more closely the unknown spectrum matches this class. A perfect match would have MISMATCH = 0, a perfect mismatch, MISMATCH

= 200. This comparison procedure can be carried out more rapidly if required by subdividing the correlation set according to one or more abundant ion series, thus restricting the number of comparisons that must be made. The best matches are then output, completing the first phase of the treatment. An example of the magnitude of MISMATCH values observed when the Classifier is requested to operate on spectra used in developing the correlation set is presented in Table

ANALYTICAL CHEMISTRY, VOL. 44, NO, 3, MARCH 1972

541

~~

~

~~~~

Table 11. Classiecation Results for Mono-n-Alkyl Benzenes

Compound +Butyl benzene

Class assignment 1) n-Alkyl benzene 2) 11 B k ’ s , PNA 1) n-Alkyl benzene 2) 11 DBs’s, PNA 1) n-Alkyl benzene 2) 11 D W s , PNA 1) n-Alkyl benzene 2) 11 DBSs, PNA 1) n-Alkyl benzene 2) 11 DBz’s, PNA 1) n-Alkyl benzene 2) 11 DBE’s, PNA 1) n-Alkyl benzene 2) 11 DBs’s, PNA 1) n-Alkyl benzene 2) 11 DB=’s, PNA 1) n-Alkyl benzene 2) 11 D W s , PNA 1) n-Alkyl benzene 2) 11 D B d s , PNA

n-Pentyl benzene n-Hexyl benzene n-Heptyl benzene n-Octyl benzene n-Decyl benzene n-Dodecyl benzene n-Tetradecyl benzene n-Hexadecyl benzene n-Eicosyl benzene

11, for the class of mono-n-alkyl benzenes. This example was chosen to be representative of the magnitude of the largest MISMATCH values that may be expected for a class which exhibits slight variations in ion series values us. carton number (Figure 2). It is notable that MISMATCH values for the second rated class are very different from those of the correct class. Using these simple procedures, the major step of basic class indication can readily be made. Certain ambiguities within a major class (e.g., acyclic alkanes) can occur, and it is expected that further expansion of the correlation set may result in further ambiguities. In any case, it must be remembered that the original mass spectrum is always available for more sophisticated treatment to derive additional structural information to resolve these ambiguities. Molecular Weight and Formula. With classification completed, certain types of compauter treatment of the data become much simpler tasks. The fmt step, as indicated in Figure 7, is to determine the molecular weight and molecular formula from the unknown spectrum. In this procedure only the classes that meet an empirical limitation of MISMATCH