Classification of compounds by the factor analysis of their mass spectra

Factor analysis Is a statistical technique which can analyze large sets of mass spectra into a smaller setof simpler pat- terns, the factors of the ma...
0 downloads 0 Views 1MB Size
The present method is sufficiently accurate (100%) and reproducible (f4fig) to be used as a simplified method for the analysis of any organic compound after identification and purity have been established by thin-layer chromatography.

LITERATURE CITED (1) J. B. Marsh and D. B. Weinstein, J. Lipid Res., 7 , 574 (1966). (2)A. Marzo, P. Ghirardi. D. Sardini, and G. Meroni, Clin. Chem. (WinstonSalem, N.C.), 17, 145 (1971). (3) J. H. Bragdon, J. Biol. Chem., 190, 513 (1951). (4) J. S. Amenta, J. LipidRes., 5 , 270 (1964). (5) J. Folch. I. Ascoli, M. Lees, J. A. Meath, and F. N. LeBaron. J. Biol. Chem., 191, 833 (1951). (6) E. Vioque and R. T. Holman, J. Am. Oil. Chem. Soc., 39, 63 (1962). (7) M. 2. Nechaman, C. C. Sweeley, N. M. Oldham. and R. E. Olson,J. Lipid Res., 4, 484 (1963) (8) L. F. Eng, Y . L. Lee, R. B. Hayman, and B. Gerstl, J. Lipid Res., 5 , 128 (1964). (9) F. Snyder, Anal. Biochem., 9, 183 (1964). (10) F. Snyder and H. Kimble, Anal. Biochem., 11, 510 (1965). (11) V. Slawson, A. W. Adamson. and J. F. Mead. Llpids, 8, 129 (1975).

V. P. Skipski, J. J. Good, M. Barclay, and R . B. Reggio, Biochim. Bophys. Acta, 152, 10 (1968). R. J. Komarek, R. G. Jensen, and B. W. Pickett, J. Lipid Res., 5 , 268 (1964). E. Ceven and C. Head, Anal. Biochern., 10, 23 (1965). 0. S.Privett and M. L. Blank, J. Am. Oil Chem. Soc., 39, 520 (1962). 0. S. Privett, M. L. Blank, D. W. Codding, and E. C. Nickell. J. Am. Oil Chem. Soc., 42, 381 (1965). H. P. Kaufman and D. K. Mukherjee. Fefte. Seifen, Anstrichm., 67, 752 (1965). C. M. Van Gent, Z. Anal. Chem., 236, 344 (1968). C. P. Freeman and D. West, J. Lipid Res., 7 , 324 (1966). G. Rouser, J. Chromatogr. Sci., 11, 60 (1973). L. J. Nutter and 0. S. Privett, J. Chromatogr., 35, 519 (1968). R. J. Nicolosi, S.C. Smith, and R. F. Santerre, J. Chromatogr., 60, 111 (1971). H. A. Blough and J. P. Merlie. Virology, 40, 685 (1970). D. Kritchevsky, L. M. Davidson, H. K. Kim, and S. Maihotra, Clin. Chim. Acta, 46, 63 (1973). W. M. Sperry and M. Webb, J. Bid. Chem., 187, 97 (1950). E. Van Handel and D. B. Zilversmit, J. Lab. Clin. Med., 5 0 , 152 (1957).

RECEIVEDfor review October 22, 1975. Accepted January 26, 1976.

Classification of Compounds by the Factor Analysis of Their Mass Spectra Richard W. Rozett’ and E. McLaughlin Petersen Chemistry Department, Fordham University, Bronx, N. Y. 10458

Factor analysis is a statistical technique which can analyze large sets of mass spectra into a smaller set of simpler patterns, the factors of the mass spectra. We illustrate one use of these patterns, the partitioning of the compounds Into classes which show similar mass spectral behavior. Typical, principal, and partial factors are used in turn to provide different classes. The masses at which the intenslties of the fragment Ions are measured are grouped in a similar manner. Polar and triangular graphs provide a summary of the Information contained in the mass spectra of the 22 isomers

of C10H14.

The factor analysis (FA) of mass spectra is a statistical method which simultaneously analyzes large groups of mass spectra. It separates noise from significant information, and removes the redundancy from correlated spectra. Common features of the set of compounds are separated from behavior peculiar to one or a few compounds. As a first step FA determines p , the number of significantly different kinds of behavior shown by the compounds in their mass spectra (MS). It then isolates and defines each type of behavior. P row factors (compound-like patterns, “loadings”, F ) and p column factors (mass-like patterns, “scores”, S ) are calculated. The product of the row factors by the column factors approximately equals the original data matrix, Y (Equation 1). yij

f.

k=l

sikfkj’

(1)

In other words, the original n X m data matrix, Y, consisting of the intensities of n compounds measured a t the same m masses, is decomposed into a product of an n x p score matrix, S, and an m X p loading matrix, F (Equation 2). F‘ is the transpose of the F matrix. Y = SF‘

(2)

FA has been used extensively in chemistry (1-15). A somewhat simpler form, principal component analysis, is frequently encountered (16-19). The procedures appropriate for the FA of mass spectra have been studied ( 1 4 ) . Methods for calculating the principal, typical, basic, and partial factors of mass spectra have been developed ( 1 5 ) . On the other hand, there are no established procedures for classifying compounds by the FA of their mass spectra. This is one purpose of the present article. The quantitative classification of MS should be useful. On the simplest level, it provides a summary of the information contained in a large number of MS. Removal of noise, redundancy, and peculiarities leads to a simple and economical representation of the information. Determination of the common features and class structure of the MS provide simultaneous insight into a complete set of spectra. Compound identification and library search can be simplified. One might consider the use of FA for the classification of compounds a needless complication. Classical chemical methods built upon elemental formulas, functional groups, and structural descriptions seem to provide direct and powerful methods which are adequate for the classification of compounds. But the classification produced by FA is not identical with the classification based on static atomic, electronic, and structural features. I t is based upon the kind of information present in MS, and this is essentially kinetic and mechanistic in character. A MS is a summary of the dissociation reactions undergone by a parent ion. FA classifies compounds on the basis of these fragmentation reactions. The results have no one-to-one correspondence with static structural classes. The quantitative construction of empirical classes is currently an area of active research (20-23). Only two aspects of this subject need concern us: the relationship of FA to classifying properties, and the character of an acceptable class. An empirical measurement is equivalent to the score ANALYTICAL CHEMISTRY, VOL. 48, NO. 6, MAY 1976

817

Table I. The 22 Isomers of CloH14 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

n -Butylbenzene

Isobutylbenzene sec-Butylbenzene tert -Butylbenzene 1-Methyl-2-propylbenzene 1-Methyl-3-propylbenzene 1-Methyl-4-propylbenzene

1,2-Diethylbenzene 1,3-Diethylbenzene 1,4-Diethylbenzene 1 -Isopropyl-2-methylbenzene 1-Isopropyl-3-methylbenzene 1-Isopropyl-4-methylbenzene 1,2-Dimethyl-3-ethylbenzene 1,2-Dimethyl-4-ethylbenzene 1,3-Dimethyl-2-ethylbenzene 1,3-Dimethyl-4-ethylbenzene 1,3-Dimethyl-5-ethylbenzene 1,4-Dimethyl-2-ethylbenzene 1,2,3,4-Tetramethylbenzene 1,2,3,5-Tetramethylbenzene 1,2,4,5-Tetramethylbenzene

based upon the representation of an entity as a vector, a directed line segment, in some multidimensional space. The angle between vectors can be taken as a measure of similarity. Zero angle implies perfect similarity; 90’ implies a complete difference, Le., the presence of a fundamentally different property. The angular measure of similarity will be used in this paper. FA has a fundamental relationship to angular measures of similarity, and angular measures are particularly easy to represent graphically. Angles provide a kind of general language into which the many different aspects of FA may be translated and so compared (28). One familiar example of an angular measure of similarity is the correlation coefficient. The usual (Pearson) correlation coefficient is the cosine of the angle (a,) between two vectors which have been transformed separately to have a zero mean (Equation 3).

and y k are the mean of the j t h and kth row of the matrix of MS respectively. In the present study an alternate form of the correlation coefficient is used, the coefficient of congruence (Equation 4) (29). I t is the cosine of the angle (ao) between two untransformed vectors.

yj

of a compound on the property measured. A class of compounds is found if a number of compounds score similarly on all the properties measured. The empirical properties of MS, e.g., the fragment ion intensities from a compound a t the masses studied, have a number of unpleasant characteristics. They are correlated, i.e., they do not represent completely independent properties. They are redundant, i.e., more empirical properties are measured than there are independent characteristics to measure. FA can simplify the empirical properties. It can eliminate redundant properties, and redefine the residual properties so that a minimal number of optimal properties are retained. FA replaces the large number of complex empirical properties with a smaller number of simpler new properties, the factors. Only the scores on these optimal properties need be taken into account when clusters of compounds are identified. While the FA of MS guarantees the existence of one or more sets of the optimal properties needed to define the clusters, it does not guarantee the presence of the clusters themselves. No clusters may be present. Empirically a t least two kinds of distributions of compounds are found. Some are continuously or randomly distributed, without any obvious grouping or structure; others are distributed in patches or clusters separated from each other by zones of free space. Even if clusters appear clearly, the further question may arise about how one should handle overlapping clusters. Should one accept as two classes, clusters which partly overlap? To some extent the decision is arbitrary. In this work, we have decided to use FA as a way to simplify the representation of the clusters, and then use intuitive methods to pick out the clusters. Therefore we accept as valid classes only those which are isolated from each other by free space. A more sophisticated mathematical and statistical procedure could be used to identify the clusters which are present in the results of the FA. The emphasis here is on FA and only intuitive clustering methods are used on the results of the FA. In order to represent clusters defined by one or more properties, a measure of the similarity of scores on a property is necessary. There are a number of such measures. If entities are represented as a point in a multidimensional space, then the distance between the points may be taken as a measure of similarity (24-27). A cluster in such a case is the group of all those things which have a slight distance from one another. A second kind of measure of similarity is 818

ANALYTICAL CHEMISTRY, VOL. 48, NO. 6, MAY 1976

The reasons for preferring a0 to cym are those mentioned previously (14) for preferring intensities measured about the true zero of the mass spectral scale to those transformed into measurements about the mean of all of the measurements for the compound in question. Mass spectral intensities are measured on a scale with a significant zero, i.e., zero implies that there is no ionic fragment with that mass. The coefficient of congruence preserves the information about the real zero; the correlation coefficient does not because it recenters the scale to the mean of the vector. Both the coefficient of congruence and the correlation coefficient rescale the length of each vector to 1. Only the angle between the vectors is significant not the absolute length of each vector. In other words, the difference between the ionization efficiency of each compound is not taken into account. Since MS are usually renormalized so that the largest peak is set to 100, this is not a significant omission. Table I lists the formula of the 22 compounds whose a0 is reported in Table 11. These angles will be of interest later in the article.

CLASSIFICATION BY TYPICAL FACTORS The result of F A is a set of p row factors, each of which looks like a row of the original data matrix, and p column factors, each of which looks like a column of the data matrix. The row and column factors are not unique. One of the simplest ways to derive a set of factors is to choose the p row factors, for example, from among the mass spectra of the n compounds. The corresponding column factors are then calculated so that they optimize the f i t of the recalculated data to the original set of mass spectra. If there are three factors, for example, then the set of the three mass spectra most representative of the other compounds are chosen from among the measured spectra. The best set of three are those which permit the data to be recalculated most accurately, In the case under study here, the MS of the 20 benzenoid compounds with the formula C10H14 are studied (Table I). Four carbon atoms are found in the benzene ring side chains in a variety of ways ranging from n -

Table 11. Angle in Degrees between Isomers of ClOHld 1 1 2 3 4 5 6

I 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

0

2 9 0

3 76 80 0

4 64 64 81 0

5 78 82 5 83 0

6 76 80 6 81 4 0

1 I9 83

5 84 2

5 0

8 13 I5 42 44 42 40 43 .O

9 75 71 44 42 44 43 45 4 0

1 0 14 16 49 31 50

48 50 8 6 0

butylbenzene to the tetramethylbenzenes. Three factors are needed to account for 99% of the variation of the mass spectra (14, 15). The three most representative isomers proved to be n-butylbenzene (l), 1-methyl-2-propylbenzene (5), and 1,3-dimethyl-2-ethylbenzene(16). When three column factors are chosen from among the experimental measurements, the intensities a t masses 119 (&HIl+), 105 (CeHg+), and 91 (C7H7+) prove to be the most typical. One way to present the results of a FA is the polar plot, such as that shown in Figure 1. The angles between the radial lines are the only significant information present in the graph; the lengths of the radii are not significant. Each radial line is labeled with the number of one or more of the isomers which are identified in Table I. Several isomers label the same angle only when space prevents their separate representation. The major information present in any one of the three polar plots of Figure 1 is the angle between the reference axis, found at the lower left of the semicircle, and any other vector. The angle between any arbitrary pair of vectors in the plot may or may not be significant, as we shall see later. Three polar representations are present in Figure 1 since there are three important independent kinds of behavior shown by the set of 22 mass spectra (14, 15). Each of the representative isomers, 16,5,and 1, has its own semicircle. The lower semicircle represents the angles between isomer 16, 1,3-dimethyl-2-ethylbenzene,and the other 2 1 isomers. The angle is a measure of similarity defined by Equation 4 and recorded in Table 11. The middle semicircle shows the angles from isomer 5, 1-methyl-2-propylbenzene, and the upper semicircle represents the angles from isomer 1, n-butylbenzene. An angle of zero degrees from the reference vector implies that the two mass spectra are perfectly similar. An angle of 90° implies that the pair of MS are radically different, or orthogonal. An angle of 180° with reference to the angular zero would imply perfect anticorrelation between the pair, but no such pairs are found. In fact no angles larger than 90° are found in the figure because of the nonnegative character of mass spectral intensities. This restriction of the angles to the first quadrant is not important in the case studied here, but it could make it advisable to use oblique factors when one hopes to pass the factors through clusters of the experimental measurements. The angles shown in Figure 1 are

1 1 1 2 I1 77 71 I1 83 82 18 18 83 82 81 81 84 83 41 41 39 38 33 33 0 1 0

1 3 1 4 I1 18 I1 18 83 78 18 21 83 78 81 16 84 18 41 36 39 34 33 28 1 7 1 6 0 7 0

15 18 78 I9 20 I9

I1 80 31 35 29 5 5 6 3 0

16 I9 80 81 21 81 79 81 39 31 31 5 4

17 80 80 82 22

a2

5 4 4 0

80 83 41 38 33 4 4 4 5 5 2 0

18 78 79 22 18

19 71 18 16 21 16

I1

15

79 31 35 29 8

I1 35 32 27 8

I

I

8 3 5 5 6 0

8 2 3 5

I8

I 4 0

21 22 I7 15 I7 15 80 80 25 24 80 80 78 18 81 81 40 40 38 38 33 33 15 16 14 15 15 16 12 13 13 13 13 15 14 16 9 1 1 12 13 1 5

20 71 71 81 24 80 I9 81 40 38 33 15 14 15 12 12 13 13 9 11 0

0

4

0

.

--. ... . . .

-

/--,/’

ALa

\

Figure 1. Angle of congruence (ao) in degrees between the typical isomers, 1, 5, and 16, and the other isomers (Table I)

equivalent to a table of correlation coefficients, or angles between the isomers such as those shown in Table 11. But the FA allows one to simplify the representation significantly. The number of correlation coefficients necessary to represent the relationships shown in Figure 1 would be 231, i.e. n ( n - 1)/2; the number of angles shown in Figure 1 are 63 or p ( n - 1). In Figure 1, a noncompact bundle of vectors is shown in the lower semicircle, isomers 4 and 8-22. In the middle plot, a compact.cluster of radii is shown which contains isomers 3 and 5-7, and another cluster containing isomers ANALYTICAL CHEMISTRY, VOL. 48, NO. 6, MAY 1976

819

Figure 2. Plot of the fragment ion intensity for each isomer at each of the three typical masses, 91, 105, and 119

8-10, the diethylbenzenes. In the upper plot, a compact bundle composed of isomers 1 and 2 appears. A second representation of the mass spectral data is shown in Figure 2. It is also constructed around the typical factors of the MS, but the intensities of the representative fragment ions at masses 119 (CgHll+), 105 (CsHg+),and 91 (C,H7+) are used, rather than the typical isomers. A compound which had a MS with an intensity at mass 119, but zero intensities at masses 105 and 91, would appear at corner A on the graph. If the zero intensities were a t masses 119 and 91, it would appear at B. Zero intensies at masses 119 and 105 would make it appear at corner C. Figure 2 represents a special plotting technique which can be applied only to cases with three factors. The three-dimensional solution is represented on a plane using a triangular graph. The three coordinates plotted are the experimental intensities for each isomer measured a t each of the representative masses. The graph requires that the sum of the three intensities for any one isomer be normalized to 100. This is not a significant restriction, since it implies that each isomer is weighted equally. In fact this makes the graph into a plot of the percent intensity at the three representative masses for each isomer. There are five clusters of isomers in the triangular plot. One cluster appears in corner A, containing the isomers 11-22 (Table I). A second cluster appears in corner B, consisting of isomers 3 and 5-7. A third cluster appears in corner C with isomers 1 and 2. A cluster also appears along the AB edge with the diethyl isomers, 8-10. A fifth solitary point on the AC edge is isomer 4, tert-butylbenzene.

CLASSIFICATION BY PRINCIPAL FACTORS When one looks for typical factors, a number of possible sets of typical factors must be tried and the fit of each compared, to arrive at the most representative set. A more direct way to derive a set of factors is desirable. Eigenanalysis of the product of the original data matrix by itself is such a method. The principal row and column factors are the eigenvectors of the Y'Y and the YY' matrices, respectively. The eigenvectors have the property that the first factor, Le., the eigenvector associated with the largest eigenvalue, is oriented along the direction in space which accounts for the largest part of the variance of the data. The second eigenvector is directed in a direction which accounts for the maximum part of the variance unaccounted for by the first 820

ANALYTICAL CHEMISTRY, VOL. 48, NO. 6, MAY 1976

Figure 3. Angle of congruence (010) in degrees between the principal factors and the isomers

eigenvector, and so on. The eigenvectors, or principal factors, are especially useful for deciding how many factors are necessary to account for the variation of the measured spectra (15). Three polar plots associated with the angles from each of the three eigenvectors are shown in Figure 3. A comparison of Figure 3 with Figure 1 shows that no experimental vector occupies the zero angle position in the lower left of the semicircle here. This is occupied by the appropriate eigenvector. A cluster containing 16 isomers appears in corner A, isomers 4 and 8-22. In corner B, a second cluster occurs, isomers 3 and 5-7. In corner C, two isomers are grouped, 1 and 2. The figure illustrates the character of the eigenvectors. The first eigenvector, A, is a kind of grand average of all of the MS. Many of the isomers do not differ significantly from the average. The second and third eigenvectors, B and C, represent significant differences from the average behavior. In no case does the eigenvector pass through an experimental MS. Figure 2 is a triangular plot in which the coordinates of the points representing the isomers are the intensities for each isomer measured at masses 119, 105, and 91. Figure 4 is a different kind of triangular plot. The column factors from principal component analysis are plotted. The column factors are an n X p matrix, (22 X 3). The three columns of the matrix are each normalized to the square root of the eigenvalue, and each element is then squared. Each of the three elements in a row of the matrix is used as a coordinate of the isomer for the plot. The plot requires that the coordinates all be positive, hence the squaring. The sum of the three coordinates must be equal to 100, hence the row is normalized to sum to 100. The graph has a significant theoretical interpretation. It plots the percent variance associated with each factor for each isomer. In Figure 4, three clusters appear. At corner A, isomers 4 and 8-22 occur; in corner B , isomers 3 and 5-7; and in corner C, isomers 1 and 2. The triangular plot contains the same information as the three polar plots. (The minor dif-

Figure 4. Plot of the square of the three loadings from principal component analysis of the transpose of the data matrix, Y

ferences will be pointed out later.) This can be shown by folding the triangular plot in three different ways. If corner B is rotated toward corner C in Figure 4, sweeping the points along with the edge of the graph, then a linear representation results. When this is bent around the circumference of a circle, a polar plot very similar to polar plot A results. The isomers appear in the same order. When corner A is folded into C, the order of the B polar plot is reproduced. When A is folded into B, then the C polar plot is produced. The triangular graph, Figure 4, sums up the same information as the three polar plots of Figure 3. A single three-dimensional plot sums up three one-dimensional plots. The comparison of Figures 3 and 4 points out some of the peculiarities of the polar plot. As we mentioned previously, the principal information present in the polar plot is the angle of similarity between the reference axis and any other vector. But other information is present, the similarity between any pair of vectors on the plot. This information is not as obvious as the similarity with the reference vector. Three cases can occur. The pair of vectors compared may be grouped near the angle zero, the reference axis in the lower left of the semicircle. Then the small angle between the two truly indicates that the two isomers have similar MS. In a second case, two vectors are a t 90’ from each other; the two isomers in such a case are, indeed, very different, independent, or orthogonal. The third case can be misleading: a pair of‘ vectors close to each other, but a t 90’ from the reference axis. The small angle between the isomers does not imply that the two isomers are similar; they may be similar or different, They are both in a plane perpendicular to the reference axis, but within this perpendicular plane they may be parallel or perpendicular to each other. The perpendicular plane has two independent dimensions in the present case. If there are more than three factors, the perpendicular hyperplane may have more than two dimensions.

CLASSIFICATION BY PARTIAL FACTORS Typical factor analysis and principal factor analysis resolve the set of mass spectra into row and column factors with special properties. In the former case, either the row or the column factors are taken directly from among the experimental measurements. In the latter, the principal factors are chosen so that each succeeding factor accounts

Figure 5. Angle of congruence (eo)in degrees between the partial factors and the isomers

for a maximum of the unaccounted variance of the data set. A third kind of factor is the partial factor (15). Partial row and column factors each preserve the properties of the original MS, i.e., all the numbers of both the row and column factors are greater than or equal to zero. Since there are no negative ionic intensities in a mass spectral measurement, none are allowed in the partial factors. Principal factors always show negative intensities in the second and subsequent factors. And while the typical factors taken from experiment cannot show negative intensities, the corresponding calculated row or column factors usually have negative coefficients. When the factors preserve the properties of the original measurements, we can treat the original set of MS as a mixture of a smaller number of simpler partial mass spectra. Each partial spectrum contributes its part to the “composite” experimental MS. The set of MS as a whole, and each spectrum individually, is treated as a weighted average of the partial factors. All the techniques useful for the study of MS can be applied to the smaller set of simpler partial mass spectra. They often seem to be associated with the fragmentation pattern of a single important ion present in the MS. Figure 5 is a polar plot of the results of the three partial factors produced by the varimax rotation of the principal factors (15). Note that the reference factors, the vector on the lower left edge of each semicircle, are no longer identical with one of the experimental vectors, as in the typical factor case (Figure 1).Nor is the reference vector far from the experimental cluster, as in the principal factor case (Figure 3). It is a characteristic of this kind of transformation that the partial vector tends to pass through a cluster of the experimental data. The results in the display are similar to those in the former polar plots. In the lower plot, a cluster of isomers 4 and 11-22 appears, with a suggestion of a separation of isomers ANALYTICAL CHEMISTRY, VOL. 48, NO. 6, MAY 1976

821

~~~

~

Table 111. Formula of Ions W hich Occur at a Certain Mass Mass

Mass

134 133 120 119

91 79 78 77 65 51 43

117

106 105 103 93 92

Figure 6. Plot of the square of the three loadings from varimax rotation, Q (transpose of the data)analysis

4 and 20-22 from the 11-19 group. Isomers 8-10, the diethyl isomers, seem to be an intermediate cluster. In the middle polar plot of Figure 5 , the isomers 3 and 5-7 are a compact cluster, and 8-10 again form an intermediate group. Finally, in the upper plot, isomers 1 and 2, n- and isobutylbenzene cluster. Isomer 4, tert-butylbenzene, may form a separate class. The triangular plot, Figure 6, is prepared in the same fashion as Figure 4, but from the partial factors. It is, therefore, a summary of the three polar plots in Figure 5. The three polar plots are summed up except for the separation of isomers 20-22, the tetramethylbenzenes. On the graph they appear too close to be distinguished. The best summary of the isomer classification is found in this figure. Three kinds of behavior appear, the corners of the triangular graph. Each kind of behavior is close to but not identical with any one experimental MS. In addition to the corner clusters, two intermediate or derived clusters appear, isomers 8-10 and isomer 4. The corner clusters are primary clusters, each fairly closely identified with the partial factor. The edge clusters are linear combinations of the corner properties, and therefore are a kind of weighted average of two corner properties. No clusters appear in the interior of the plot, i.e., there are no clusters which depend on a linear combination of all three properties. In fact the definition of the properties is changed by FA in order to move the isomers in the interior toward an edge, and those on an edge toward a corner, in so far as this is possible. Clusters are constructed so that they depend on as few properties as possible. From the results of Figure 6 and from subsequent results to be shown, one can suggest an identity for each of the corner properties into which the MS of the isomers have been resolved. The partial factor of corner A is a fragmentation pattern associated with CgH11+ and similar ions. It shows the property of isomers which lose a single carbon atom from the parent ion. Corner B represents the fragmentation pattern primarily associated with the C&g+ ion. It represents the behavior of the isomers which lose two carbon atoms from the parent ion. Corner C represents the fragmentation pattern primarily associated with the fragmentation of C7H7+ ion. It represents the behavior of isomers which lose a propyl group from the parent ion. The edge clusters represent those which lose both methyl and ethyl groups (the diethyl isomers, 8-10), and the single iso822

ANALYTICAL CHEMISTRY, VOL. 48, NO. 6 , MAY 1976

41

39 27

mer which loses both a methyl group and a propyl group, tert-butylbenzene, isomer 4. These dynamic chemical properties are related to and limited by static structural properties. For example, ortho, meta, and para isomers always occur in the same cluster. But there is no one-to-one correspondence between cluster and the number and kind of side chains in the isomers. CLASSIFICATION O F MASSES BY THE FACTOR ANALYSIS OF MS The procedures which have been used to separate isomers into groups of compounds with the same sort of dissociation patterns can be used equally well to separate masses into groups which show the same sort of pattern across the isomers. The classification of compounds is useful for the identification of unknown MS and perhaps for the search of whole libraries of spectra. The classification of masses is useful for the study of the mechanisms of ionic dissociation within a fragmentation pattern. FA can determine how many different kinds of behavior are shown by all the fragment ions, and how many of these different kinds of behavior are shown by any one fragment ion in particular. The isomer classification problem is much simpler than the classification of the masses, in the present case a t least. The isomers studied here are inherently similar. The patterns of the intensities of the fragment ions are more dissimilar. Twenty-two isomers measured a t 20 masses were studied. These 20 masses include the ten largest peaks for all the isomers. Table 111 lists the chemical composition of the fragment ions which appear at the 20 masses studied. These masses are used to label the graphs which follow. Classification of Masses by Typical Factors. Figure 7 shows the angles between the three most representative masses and all the other masses. The angles were calculated by the application of Equation 4 to the transpose of the data matrix. The greater complexity of the classification of masses is clear by a comparison of Figure 7 with Figure 1, the corresponding plot for the isomers. Many more intermediate vectors occur. The gaps which separated out clusters are not as apparent. Figure 8 is a triangular graph which plots the intensities at all masses of the three representative isomers, 16,5, and 1 (Table I). Figure 8 corresponds to Figure 2 of the isomers. Four clusters appear in the figure. Cluster A contains masses 133 (Cl&13+), 120 (CgH12+), 119 (CgHlr+), and 117 (CgHg+). Cluster B holds masses 106 (CsHlo+) and 105 (CsH9'). Cluster C consists of masses 93 (C7Hg+), 92 (C7Hs+), 91 (C7H7+), and 43 (C3H7+). A central cluster, ABC, contains the masses 134 (ClOH14+), 103 (CsH7+), 79 (CsH7+), 78 (CsHs+), 76 (CsH5'), 65 (C5H5+), 51 (C4H3+), 41 (C3H5+),39 (C3H3+),and 27 (C*H3+). Classification of Masses by Principal Factors. Figure 9 is a polar plot representing the angles between each of the principal factors and the experimental measurements at

Figure 7. Angle of congruence (010) in degrees between the typical masses, 91, 105, and 119, and all the other masses (cf. Table 111)

Figure 9. Angle of congruence pal factors and the masses

(010)

in degrees between the princi-

F

43

Figure 10. Plot of the square of the three loadings from principal Figure 8. Plot of the fragment ion intensities for each mass for the three typical isomers, I,5, and 16 (Table I)

any one mass for all the isomers. It is the equivalent for masses of Figure 3. The clusters of the three polar plots are less compact and the gaps are less obvious than in Figure 3. Figure 10 sums up the three polar plots of Figure 9 in a single representation. The simplification brought about by the principal factors is quite remarkable. The central cluster present in Figure 8 has disappeared. Fourteen masses are clustered in the A corner, which represents the first eigenvector. Near corner B, masses 105 and 106 cluster. In corner C masses 43 and 92 occur. Two masses are spread along the AC edge, 91 and 93. Perhaps one might want to separate out mass 79 on the AB edge. Most of the masses con-

component analysis gregate near the first eigenvector which describes a kind of average behavior for the masses. The obscuring of differences characteristic of the first eigenvector is clearly shown by the graph. Some discrepancies between the three-dimensional triangular graph and the three one-dimensional polar plots appear here for the first time. For example, in Figure 7, A, mass 133 (CloH13+), appears at an angle of about 50° from the first eigenvector. In the triangular plot, Figure 8, it appears close to the eigenvector, the corner A. In C of Figure 7, mass 43 appears a t about 40’ from the third eigenvector. In Figure 8, on the other hand, it is nearer to the C corner, the third eigenvector, than any other mass. The triangular graph is not precisely a summary of the three polar graphs. ANALYTICAL CHEMISTRY, VOL. 48, NO. 6, MAY 1976

823

Figure 12. Plot of the square of the three loadings from varimax

rotation

Figure 11. Angle of congruence (ao)in degrees between the partial factors and the masses

These differences occur for a number of reasons. The separate polar plots are unweighted. The variance of the data is excluded from the graph by normalization. The variance of the data, on the other hand, is included in the triangular graph by the normalization of the factors to the square root of the eigenvalue. A second reason for the difference is the weighting inherent in the requirement that the coordinates of the triangular graph sum to 100. But the major reason for the difference is due to the fact that the polar plots record angles, Le., correlations, as they appear in the data. ,They are the angles between the factors and a column of the original data matrix. In the triangular graph, one is dealing with relationships within the recalculated data, the results of the factor analysis. Since the factors account for over 99% of the variance of the data, the agreement between the two plots is generally very good. But for masses whose variance is not well accounted for, such as masses 133 and 43 ( 1 5 ) , the two plots may not correspond completely. Classification of Masses by Partial Factors. Figure 11 displays the angle between the experimental intensities at each mass for all the isomers and the partial factors derived by the varimax rotation of the principal component factors. Three clusters appear, one in each of the polar plots. One cluster contains masses 120 (CgH12+) and 119 (CgHll+). Another group is comprised of masses 106 (CSHlo+) and 105 (CsH9+). A final cluster is made up of masses 92 (C7H8') and 91 (C7H7+). Other clusters are not clear. In Figure 12, three clusters are obvious; the cluster at the A corner contains masses 133,120, 119 and 117. The B corner is occupied by masses 106 and 105. The C corner contains masses 92, 91, and 43. The other clusters, if they exist, are more difficult to identify. Perhaps two clusters on the AC edge contain mass 93 and mass 41, respectively. On the AB edge masses 103 and 77 are close to each other, and sepa824

ANALYTICAL CHEMISTRY, VOL. 48, NO. 6, MAY 1976

rately, mass 79. All the other masses are in the interior of the plot and not clearly grouped. A detailed account of the results of the classification of masses by the factor analysis of the mass spectra of the C10H14 isomers would require the introduction of a whole series of new considerations. This will be done in another place. But some amount of the results which appear in Figure 12 can be given quickly. Three-corner clusters are fairly well defined and we shall try to account for the property defined by each factor, and for the membership of each mass within that cluster. The cluster in the A corner contains the ions 133 (CloH~s+),120 (CgH12+), 119 (CgHll+), and 117 (CgHs+). All these ions have the common predecessor, CloH14' (134). Mass 119 is generally formed by fi cleavage of the neutral, CH3. Mass 120 is formed by fi cleavage plus the migration of a hydrogen atom. It is quite expectable that these reactions should be highly coupled. a cleavage of a methyl neutral can also form mass 119, and certainly does so in the case of the tetramethyl benzenes. Loss of a H atom is strongly coupled with the loss of methyl groups by cy cleavage. The presence of mass 117 in the group is no doubt due to the presence of its predecessor, mass 119, which forms mass 117 and hydrogen. The B cluster contains masses 106 (CsHlo') and 105 (C8H9+). The two ions have as a common progenitor, CloH14+. The latter is formed primarily by p cleavage to lose an ethyl group. p cleavage of an ethyl group plus H migration accounts for the coupling of mass 120 with mass 119. The C cluster is comprised of masses 92 (C7HBf), 91 (C7H7+),and 43 (C3H7+). /3 cleavage of a propyl group is the main source of mass 91. /3 cleavage and H migration accounts for the presence of mass 92. Mass 43 may seem to be anomalous, but it can be accounted for by its coupling with mass 91. The sum of masses 91 and 43 is 134, the mass of the parent ion. Both mass 91 and 43 apparently are formed from the parent ion directly. In a certain number of cases, the electron partitions with the 91 mass species; more often it is carried off by the mass 43 fragment. In conclusion one can see that the FA of the MS can lead to insight into the classes of compounds present, and also into the classes of behavior of ions present within the spectra. The method results in a concentration of information, and the ready graphical summary of the information

present in large groups of MS. Figures 6 and 1 2 in particular provide a kind of summary of the information present in the MS of the isomers of C10H14.

LITERATURE CITED E. R. Malinowski, Doctoral Dissertation, Stevens Institute of Technology, Hoboken, N.J., 1961. P. T. Funke, E. R. Maiinowski, E. E. Martire, and L. Z. Poliara, Sep. Sci., I,661 (1966). E. R. Malinowski and P. H. Weiner, J. Am. Chem. SOC., 92, 4193 (1970). P. H. Weiner. E. R. Malinowski, and A. R . Levinstone, J. Phys. Chem., 74, 4537 (1970). P. H. Weiner, Doctoral Dissertation, Stevens Institute of Technology, Hoboken. N.J.. 1971. P. H. Weiner and E. R. Malinowski, J. Phys. Chem., 75, 1207 (1971). P. H. Weiner and D. G. Howery, Anal. Chem., 44, 1189 (1972). P. H.Weiner and J. F. Parcher, Anal. Chem., 45, 302 (1973). P. H. Weiner, J. Am. Chem. SOC.,95, 5845 (1973). J. C. Stover, Doctoral Dissertation, Fordham University, Bronx, N.Y.. 1974. P. H. Weiner, H. L. Liao. and B. L. Karger, Anal. Chem., 46, 2182 (1974). J. B. Justice, Jr., and T. L. Isenhour. Anal. Chem., 47, 2286 (1975). E. McLaughlin Petersen, Doctoral Dissertation, Fordham University, Bronx, N.Y., 1975. R. W. Rozett and E. M. Petersen, Anal. Chem., 47, 1301 (1975).

(15)R. W. Rozettand E. M. Petersen. Anal. Chem., 47, 2377 (1975). (16)R. M. Wallace, J. Phys. Chem., 64, 899 (1960). (17)D. Macnaughtan, Jr., L. B. Rogers, and G. Wernimont, Anal. Chem.. 44, 1421 (1972). (18)N. Ohta, Anal. Chem., 45, 553 (1973). (19)J. T. Bulmer and H.F. Shurveil, J. Phys. Chem., 77, 256 (1973). (20)M. R. Anderberg, "Cluster Analysis for Applications", Academic Press, New York, N.Y., 1973. (21)M. G. Kendail, "Discriminant Analysis and Applications", T. Cacoullos, Ed., Academic Press, New York, N.Y.. 1973,p 179. (22)J. T. Tou and R. C. Gonzaiez, "Pattern Recognition Principles", Addison-Wesley, London, 1974. (23)P. H. A. Sneath and R. R. Sokal, "Numerical Taxonomy", W. H. Freeman, San Francisco, Calif.. 1973. (24)P. C. Jurs and T. L. Isenhour, "Chemical Applications of Pattern Recognition", Wiley-lnterscience. New York, N.Y., 1975. (25)T. L. Isenhour. B. R. Kowalski, and P. C. Jurs, Crit. Rev. Anal. Chem., 4, l(1974). (26)B. R . Kowalski. "Computers in Chemical and Biochemical Research", Vol. 2,C. E. Kopfenstein and C. L. Wilkins. Ed., Academic Press, New York, N.Y., 1974. (27)T. L. isenhour and P. C. Jurs, Anal. Chem., 43, 20A (1971). (28)M. R. Anderberg, "Cluster Analysis for Applications", Academic Press, New York, N.Y., 1973,p 98. (29)R. J. Rummel. "Applied Factor Analysis", Northwestern University Press, Evanston, Ili., 1970,p 461.

RECEIVEDfor review October 16, 1975. Accepted February 9, 1976.

Application of Alkali Ions in Chemical Ionization Mass Spectrometry R. V. Hodges and J. L. Beauchamp' The Arthur Amos Noyes Laboratory of Chemical Physics, California Institute

A technlque is described for obtaining mass spectra consisting solely of quasi-molecular ions formed by addition of an alkali ion to the sample molecule. Alkali ions are generated by thermionic emission externally to the enclosed ion source and injected into a reagent gas containing a trace amount of sample. Alkali Ions initially bind to the reagent molecules and then are transferred to the sample in bimolecular reactions. Experlmental conditions, choice of a reagent gas, and potential applications are discussed.

of Technology, Pasadena, Calif. 9 1 125

+

quasi-molecular ion (P X)- (3, 4 ) . Recent studies have demonstrated similar behavior for alkali cations (5-1 0). The use of alkali ions as reagent species for CI mass spectrometry has not, however, been previously explored. We report studies which delineate the experimental conditions and suggest applications for using alkali ions as reagent species in CI mass spectrometry. The binding energy of a Lewis base, B, to an alkali ion, M+, is defined by the enthalpy change for Equation 1. B-M+

Chemical ionization (CI) mass spectrometry ( I ) is a form of mass spectrometry in which the sample is ionized in ionmolecule reactions with reagent ions. The sample is introduced into the mass spectrometer as a trace component in a reagent gas. The reagent ions are usually produced by electron impact ionization and ion-molecule reactions in the reagent gas. Generally CI mass spectra contain few ions and often exhibit major molecular or quasi-molecular ions, thus giving the molecular weight of the substance. This is particularly useful in cases where the electron impact mass spectrum does not yield a molecular ion. An effective method to avoid fragmentation and produce a quasi-molecular ion is to bind a relatively inert reagent ion to the neutral sample molecule. For example, when Si(CH3)4 is used as a reagent gas, the principal reagent ion formed is Si(CH3)3+.This ion forms addition complexes with representative compounds containing a range of functional groups to give the abundant quasi-molecular ion (P + 7 3 ) + ( 2 ) .Halide ions, which can be generated conveniently from appropriate reagent gases in dissociative attachment processes, bind to a variety of functional groups under CI conditions to give the

-

B

+ M+

D(B-M+)

= AH

(1)

Values for the enthalpy, entropy and free energy changes in Equation 1 for B = H20 and Ar are listed in Table I. The enthalpy and free energy changes decrease monotonically in proceeding from Li+ to Cs+. In the columns labeled p(H2O) and p(Ar) are listed the pressure of HzO and Ar at which the equilibrium ratio [B-M+]/[M+] is unity at 298 K. Clearly, the binding energy of the reagent ion to the sample molecule is an important consideration in choosing a reagent ion. The binding energy must be great enough to permit a significant stable population of the complex at the partial pressure of the species being analyzed. It is evident from the data in Table I that Li+ is the preferred alkali ion reagent for trace analysis. Using Li+ to detect a species with a partial pressure of Torr in a sample at lo-' Torr (1ppm), it is desirable to have an enthalpy of binding in excess of 21 kcal/mol. This assumes a typical entropy change of 20 eu for Equation 1. Recently, complexes of alkali ions with Lewis bases have been generated and studied using the techniques of ion cyclotron resonance spectroscopy (6-8). These complexes are formed in bimolecular reactions of Li+ and Na+ with certain alkyl halides. For example, Li+ reacts with isopropyl ANALYTICAL CHEMISTRY, VOL. 48, NO. 6, MAY 1976

825