Factor analysis of the mass spectra of the isomers of C10H14

basic features of the mass spectra. The three most repre- sentative isomers and masses are described as typical fac- tors. Basic factors, fundamental ...
0 downloads 0 Views 1MB Size
Factor Analysis of the Mass Spectra of the Isomers of CI0Hl4 Richard W. Rozett and E. McLaughlin Petersen Chemistry Department, Fordham University, Bronx, New York, N. Y. 10458

The mass spectra of the 22 Isomers of the benezenold derivatives with the formula CjoH14 are factor analyzed. Mathematical, principal, typical, basic, and partial factors of the mass spectra are defined. The principal factors are used to establish that three factors are sufficient to describe the basic features of the mass spectra. The three most representative Isomers and masses are described as typical factors. Basic factors, fundamental molecular properties which account for the mass spectra, are searched for with some success. Partial factors are shown to be a great help for the study of mass spectra. The original set of mass spectra is analyzed into a set of component mass spectra which are less numerous, simpler, and more easily interpretable than the experimental measurements.

The mass spectra of large organic molecules often contain hundreds of intensities. Detailed structural and mechanistic information is present in the measurements, but it is not easily accessible. One reason for this is the lack of a convenient theory relating the intensities to fundamental molecular properties. In addition, the correlated and redundant measurements present within a given mass spectrum (MS) further inhibit the interpretation of the spectrum. Finally, in order to discover the properties of functional groups and classes of compounds, large numbers of mass spectra must be analyzed simultaneously. A computer-oriented method is needed. Factor analysis (FA) promises to be an objective, quantitative method which can remedy these problems. FA is a multivariate statistical technique which simultaneously analyzes multiple measurements on many compounds. The correlations between the mass spectra of many compounds, for example, are analyzed to determine how many independent properties are a t work behind the measured variation, and how these independent properties are related to the original mass spectra. FA can eliminate redundant information and noise, economize on the storage requirements of the data base, extract significant information, determine class characteristics, and test hypotheses about the interpretation and origin of group properties. A methodological study has established the factor analytic techniques appropriate to mass spectra ( I ). Different types of data transformation have been investigated. We concluded that data scaling and centering techniques should take advantage of the real scale zero and the consistent scale unit found in mass spectral measurements. The invariance of the results under various forms of data selection, data transposition and factor compression is assured only if the appropriate techniques and criteria are used ( I ) . FA produces, in the most superficial sense, the mathematical factors of the data. The usefulness of such factors for a chemist and their interpretation, is not immediately obvious. The main concern of the present study is to develop meaningful interpretations of the factors of mass spectral data. FA has been applied to other kinds of data in the past, such as GC’ peak shapes (2),retention indices in GLC (3, 4 ) , solvent shifts in NMR (5, 6), gas and solution acidities (7), and IR and UV measurements (8-11). But the work of other investigators is less helpful than might be ex-

pected because of the peculiarities of mass spectral data. In some ways, mass spectral data is an ideal field for FA. The real zero and uniform scale unit, the linearity of the intensities, the large quantity of data available, and the lack of theoretical insight, all contribute to the usefulness of FA for the study of MS. The test case and techniques developed in the previous study are employed here ( I ) . The MS of the 22 isomers of the benzenoid derivatives with the formula C l ~ H l l(Table I) provide a simple, varied, and well-studied case open to intercomparison and independent validation. The matrix of covariances about the origin is analyzed, Le., the original zero and unit of the experimental measurements are retained. For MS, this procedure has been found most invariant to data transposition, and robust to the substitution of different criteria for data compression. Absolute intensities are used throughout, i.e., calibrated rather than relative intensities. This choice has been shown to be useful, but not crucial ( I ) . The precision and accuracy of the 22 MS studied are not known. Undoubtedly they vary with the size of the peak, and the state of tune of the CEC 103C mass spectrometers used to take the measurements. A reasonable average esti-mate for the repeatability of such standard mass spectra is 1%of the peak height of the intensity.

MATHEMATICAL FACTORS Before looking a t factors from a specifically mass spectral point of view, it will provide some perspective to recall three interpretations of the factors of a data set. For the sake of simplicity, we limit the following discussion to the case of orthogonal factors. Factor as an Independent Property. The data matrix analyzed by FA consists of the mass spectra of n compounds each recorded at the same m masses. A mass spectrum is a collection of the measurements of the abundance of all the ions with the same mass over charge ratio formed from a single neutral compound in a mass spectrometer. Because all 22 compounds in the present study have the same molecular weight, and contain only carbon and hydrogen, the measurement a t any given mass represents the abundance of a unique ionic formula. The abundance of the ion formed a t any mass from the neutral parent compound represents a measurement of property of the compound: the proclivity of the compound to form this particular ion. The data matrix analyzed, therefore, expresses the measurement of m properties of n compounds. The measured properties are interdependent, correlated and redundant. By FA, they are resolved into a small number of independent properties, the factors. The factors, nonetheless, reproduce the observed variation of the data effectively. FA determines the number of really different properties present in the data. The criteria for being “really different” were previously discussed under the title Factor Compression ( I ) . The factors of mass spectral data, therefore, express the number of really different kinds of compounds or the number of different measurements which must be taken on the n compounds in order to define their behavior adequately. A factor is one of the independent properties a t work behind the observed spectral variation.

ANALYTICAL CHEMISTRY, VOL. 47, NO. 14, DECEMBER 1975

2377

Table I. The Twenty-Two Isomers of C,,H,, 1. n-Butylbenzene 2. Is0 buty lbenzene 3. sec-Butylbenzene 4. tert-Butylbenzene 5. 6. 7. 8. 9. 10. 11.

1-Methyl-2-propylbenzene 1-Methyl-3-propylbenzene 1-Methyl-4-propylbenzene

1,2-Diethylbenzene 1,3-Diethylbenzene 1,4-Diethylbenzene 1-Isopropyl-2-methylbenzene 12. 1-Isopropyl-3-methylbenzene 13. 1-Isopropyl-4-methylbenzene 14. 1,2-Dimethyl-3-ethylbenzene 15. 1,2-Dimethyl-4-ethylbenzene 16. 1,3-Dimethyl-2-ethylbenzene 1 7 , 1,3-Dimethyl-4-ethylbenzene 18. 1,3-Dimethy1-5-ethylbenzene 19. 1,4-Dimethyl-2-ethylbenzene 20. 1,2,3,4-Tetramethylbenzene 21. 1,2,3,5-Tetramethylbenzene 22. 1,2,4,5 -Tetramethylbenzene Factor as a Pattern. From the point of view of a mass spectroscopist, perhaps it is most congenial to consider a factor as a pattern associated with a data set. The original set of fragmentation patterns can be reduced to a smaller number of patterns each of which expresses a different property of the original set. The fragmentation pattern of a compound is a set of numbers in a certain order, a pattern. Each factor is also an ordered set of numbers which can be displayed as a bar graph, a pattern. Linear combinations of the factor patterns can regenerate the original data matrix within some predetermined degree of accuracy. The factors are a set of characteristic patterns, a set of empirical functions, in terms of which one can expand the original data set. The less numerous, simpler, and independent characteristic patterns contain the information of the original data in a more revealing and more compact way. Factor as t h e Axis of a Coordinate System. Finally, one can look at the factor of a data set as the axis of a coordinate system appropriate for ’the representation of the data. This may be more obvious if we recall that the original mass spectra of the n compounds measured a t rn masses can be represented as n points in an rn dimensional space. Each dimension of this space represents one of the masses at which the intensity of each compound is measured. This original coordinate system is very complex. Its axes are oblique or correlated. There are surplus axes, Le., more axes than are needed to span the space or represent the data adequately. FA analyzes this original coordinate system in order to determine the minimum number of axes needed to represent the variation of the data adequately. It then constructs an orthogonal coordinate system with the minimum number of axes. Each of these axes is a factor of the data, and represents an independent property of the data. Each axis is a pattern associated with the data. In more exact mathematical language, one constructs a function space in which one represents the data. Each axis of the space is an empirical basis function which represents an independent dimension of the data. In this coordinate system, the original measurements may be represented efficiently and simply. Representation of the abundances of rn ions from n compounds as n points in an rn dimensional space has just been discussed. An equally valid expression of the data matrix presents rn points in an n dimensional space. Each dimension or axis of this space represents one of the compounds whose mass spectrum is being analyzed. This second coor2378

dinate system can be factor analyzed in precisely the same way that the first coordinate system was analyzed. It can be replaced by an orthogonal coordinate system with a minimum number of axes. Each of these axes is a factor of the data, and represents a really different kind of compound. This second kind of factor is appropriately called a column factor because it has the same number of elements as a column of the original data matrix, n. The previously discussed kind of factor is appropriately called a row factor because it has the same number of elements as a row of the original data matrix, rn. Together the row and column factors form a complete factor analysis of the data matrix. The collection of row factors are called the “loadings”, the p by rn matrix, F. The collection of column factors are traditionally called “scores” by factor analysts, the n by p matrix, s. The product of the scores by the transpose of the loadings, F’, is the (transformed) n by rn data matrix, Y (Equation 1). Y=SF’ (1) It may be more enlightening to express the same relationship in another form (Equation 2). Yij

=

f:

k=l

Sik f k j

yrj, the intensity of the fragment ion with mass j produced from compound i, is a linear sum of terms each of which is the product of a compound dependent term, S i k , and a mass dependent term, f k j . p of these terms are needed to adequately reconstitute the data from the compound and mass factors.

PRINCIPAL FACTORS The factors of a data set are not unique. No one set of factors is satisfactory for every purpose. If factors are looked upon as the axes of a coordinate system, then it is clear that more than one coordinate system may be used. Coordinate systems may have different orientations in space. One may be oblique, another orthogonal, even though both span the same space. One of the main purposes of the present study is to develop meaningful interpretations of the factors of mass spectral data. From the point of view of the mass spectroscopist, we found it helpful to define four different kinds of factors: principal, typical, basic, and partial factors. They will be discussed in turn. Principal (component) factors result from the eigenanalysis of the covariance matrix. These factors are the eigenvectors of the covariance matrix. Principal factors constitute the axes of an orthogonal coordinate system such that the axis associated with the largest eigvalue lies in the direction which accounts for most of the variance of the data. The second eigenvector is orthogonal to the first and is oriented in the direction which accounts for the maximum of the residual variance, and so forth ( I ) . The first eigenvector expresses the average or common behavior of the variables. Subsequent eigenvectors describe differences from the mean behavior. Principal factors are useful to determine the dimensionality of the space, i.e., the number of factors needed to reproduce the data matrix adequately. The criteria for factor compression were discussed previously ( I ) . If the matrix of covariances about the origin is analyzed, as we do here, the two important criteria for factor acceptance, the mean square data reproduction error and the average eigenvalue criterion, both generate the same number of factors. Table I1 summarizes 24 different tests applied to the C l ~ H l case. l The twenty-four tests arise from the same data set by applying to it four different methods of data transforma-

ANALYTICAL CHEMISTRY, VOL. 47, NO. 14, DECEMBER 1975

Table 11. Number of Factors Accepted according to Different Methods of Data Treatment Transposition

No. of factors

Total, ; d l

R

Q

3 4 5 6

16 3 3

6 2 2

1

8

0 1

1 0 1

10 1 1 0 0 0

I

Criteria

h

Transformations

-

(99%)

I

5

0

1 0 1

1 0 0 0

CO

Cm

RO

Rm

4 2 2

6

5

0 0 0

1

3 2 0

0

0

0 1

r

e

0

0 0 0

0 0

0

2 3

1

0 0

0 0

0

1

Table 111. Experimental and Calculated Intensities for Isomers 1, 5 , and 16 Isomer 1

43

77

*

9/

I05

I19

/34

MASSES Flgure 1. Three values for each of the 20 masses in the mass spectra of C10HI4 (cf. Table Ill) Loadings from R analysis, scores from 0 analysis, principal component orientation, covariance about the origin. Each factor is normalized to 1

tion, three different criteria for factor acceptance and two forms of data transposition, R and Q ( I ) . In the Table, the 24 tests are arranged in four different ways. The first column lists the number of factors which result from a certain method of data analysis. The second column lists the total number of ways a given number of factors was produced by all 24 methods. The third and fourth columns list the number of times a certain number of factors was produced by the 12 analyses of a direct (R)matrix, and the 12 analyses of the transpose ( Q ) matrix. The next three columns list the number of times a given number of factors was produced by three different criteria for factor acceptance. The first criterion accepts as a meaningful independent characteristic, those dimensions of the data which have an eigenvalue (or variance) larger than the average eigenvalue, X, (or average variance) of the m original data variables. The second criterion, X (S9%), accepts as many dimensions as is necessary to account for 99% of the variance of the data. The third criterion accepts those dimensions sufficient to account for the variance which exceeds an estimate of the average error of the data, ( I ) . The final four columns of Table I1 list the number of times a given number of factors was produced by the four different forms of data transformation. The capital R heading a column of the Table implies that each coJumn of the data matrix was normalized to unity before the analysis. A subscript m implies that the column mean was subtracted from the column of the data matrix before the analysis. C,, therefore, represents the analysis of the unchanged data, C , analysis of the covariance matrix, and R, analysis of the correlation matfix, using the terms as they are defined by workers in Statistics. From the Table one can conclude that only the C, data transformation is invariant with respect to the choice of the criterion for factor acceptance, and to the transposition of the data. The constant conclusion for the

Isomer 1 6

Exptl

Calcd

Exptl

Calcd

Exptl

Calcd

134 133 120 119

14.16 0.05 0.05 0.49 0.39 0.58 4.93 1.15 2.22 31.19 51.80 1.60 3.60 3.42 6.01 4.30 1.59 3.08 6.01 6.65

14.03

19.18

0.00 0.03

0.10

18.88 0.15 0.22 2.01 1.43 9.56 93.86 3.81 0.31 1.19 6.42 4.93 2.81 8.38 2.98 5.15 0.09 1.93 6.63 1.74

16.82 0.62 5.44 60.50 3.03 0.69 3.83 2.11 0.62 0.70 6.53 1.66 1.51 4.52 3.21 3.60 0.05 3.21 5.69 3.94

17.91 1.66 5.83 59.81 3.42 0.61 3.46 2.22 0.12 -0.58 6.10 1.86 1.62 4.68 3.29 4.06 -0.09 4.52 6.32 4.33

117

2f

Isomer 5

Mass

106 105 103 93 92 91 I9 18 71

65 51 43 41 39 21

0.42 0.43 0.52 5.01 0.94 2.15 31.66 51.31 1.62 2.69 2.91 6.03 4.37 5.46 5.62 6.94 6.03

0.27 1.99 1.12 8.92 94.00 3.20 0.35 2.20 6.00 4.14 2.31 1.61 3.38 4.35 0.64 3.26 1.13 1.41

Table IV. Comparison of Experimental and Calculated Intensities for the 22 Isomers of C,,H,, at Masses 119,105, and 91 amu Isomer

1

2 3 4 5 6

119 Exptl

0.49 0.69 1.3 51.1 2.0 3.4

I

1.1

8 9

.28.0 30.9 34.4 14.7 64.1 13.6 54.5 61.6 60.5 19.5 53.4 56.3 50.0 55.1 41.4

10 11

12 13 14

15 16 11 18 19 20 21 22

91

105

Calcd

0.42 0.71 1.1

56.5 2.0 3.9 0.9 28.0 30.6 34.0 13.4 63.2 12.3 54.3 67.4 59.8 18.3 53.9 56.4 52.6 58.7 44.2

Exptl

Calcd

Exptl

Calcd

4.9 0.54 15.4 0.38 94.0 89.9

5.0 0.49 15.2 -0.28 93.9 90.6 99.1 28.6 28.3 25.5

51.8 58.2 10.2 21.3 6.0 7.8 6.5 7.2 5.9 1.3 12.2 10.2 11.9 6.8 9.1 6.5 8.4 6.7 7.9 6.5

57.3 59.8 8.4 22.9 6.4 9.1 6.3 6.1 5.1 6.7 11.3 9.8 10.9

100.0

28.1 28.5 25.8 2.1 2.1 2.0 6.1 6.1 3.8 3.6 4.8 l..l

1.5 2.0 1.4

1.1 1.4

0.96 6.2 6.0 3.5 2.9 5.3 7.8 3.3 4.0 3.2

1.0

6.0

7.1

9.4 6.1 8.3 1.4 8.1

8.5 9.4 8.2

C, transformation (covariance about the origin) is that three factors are sufficient to represent the mass spectral data. Two thirds of all the tests agree with the same conclusion. Figure 1shows the three eigenvectors or principal factors of the masses of the C l ~ H l mass l spectral data. The isomer principal factors have been illustrated previously in Figure

ANALYTICAL CHEMISTRY, VOL. 47, NO. 14, DECEMBER 1975

2379

Table V. Average Reproduction Error for the Five Best Fitting and the Five Worst Fitting Sets of Three Isomers Isomers

1 1 2 1 1

3 1

2 5 4

5 7 5 5 5 5 14 5 6 21

16 16 16 17 19 16 16 7 7 22

Av error

0.71

Isomers

Av error

Masses

Av error

0.71

4 22 21 20 2

2.2 1.7 1.6 1.4 1.3

134 41 92 91 133

2.7 1.3 1.2 1.0 0.9

0.72 0.73

0.73 2 3 17

23 24

7 of Ref. 1. The first principal factor describes the average behavior of all the compounds (masses). The subsequent factors represent significant differences from the average behavior. When the isomer principal factors ( R scores in the terminology of Ref. 1)matrix multiply the mass principal factors (R loadings in the terminology of Ref. 1) the data are reproduced. Table I11 compares the reproduced data for 3 of the isomers with the original data. Table IV compares the reproduced and original intensities for three of the masses. As we shall see in the next section, these three masses and three compounds prove to be the most representative sets of typical factors.

TYPICAL FACTORS Principal factors are useful to define the dimensionality of the mass spectral data, but the principal factors themselves may not be specially meaningful. Principal factors are simple in a mathematical sense, not necessarily in a mass spectral sense. For example, the first principal factor may describe the behavior common to a number of different functional groups all lumped together. This pattern may be no more interpretable than the original spectra. Typical factors, on the other hand, have a direct and simple meaning. Of all kinds of factors, they are the most closely related to the experimental measurements. After one has determined how many essentially different kinds of behavior are present in the spectra, one can try to pick out from the measured mass spectra the best representative of each kind of behavior. These are the typical factors. Target rotation permits us to do this in a quantitative fashion ( 5 ) .The criterion used to define the success of the procedure which will be used here, is the average deviation of the recalculated data from the experimental measurements. Other criteria are available and will be used later on. Typical Isomers. In our study, three factors are present. Therefore, we must determine which three of the 22 isomers best represent the factors of the data. This can be done isomer by isomer, but such a procedure has the disadvantage that the three isomers which fit best individually, need not be the best set when taken together. This occurs whenever two or more of the isomers which fit best represent the same property. But it can also happen even if the three isomers are the best individual representatives of the separate factors. It is necessary to test sets of three isomers simultaneously. This procedure provides a small deviation and a good fit only if the three factors are adequately represented by the three test vectors chosen. Table V lists the five sets of three isomers which best typify the three factors of the data. The MS of isomers 1, 5 , and 16, n-butylbenzene, 1-methyl-2-propylbenzeneand 1,3-dimethyl-2-ethylbenzene (Table I), reproduce the MS of all the isomers with an average absolute deviation from the original data of 2380

Table VI. Average Reproduction Error for the Five Worst Fitting Masses and Isomers When Isomers 1, 5, and 16 Are the Typical Factors

0.71. Table V also reports the five sets of three isomers with the highest data reproduction error. Isomers 4,21, and 22, tert-butylbenzene, 1,2,3,5-tetramethyl- and 1,2,4,5-tetramethylbenzene, have an average error of 24. There are at least two reasons why certain sets of isomers perform badly. First, isomers 21 and 22 represent the same factor. This can be seen from the similarity of the representations of the two isomers in Figure 8 of Ref. 1. Therefore one of the three factors is not represented by the isomer set. Second, some isomers are maverick or unique. They have a significant amount of variation which is not shown by the other isomers. As we shall see, isomer 4,tert- butylbenzene, is such an isomer. From a study such as that partially reported in Table V, one can conclude that certain isomers belong to the same class, i.e., typify the same factor. For example, in Table V, isomer 5 can be replaced by isomer 7 without any significant increase in the average error. In the same way, isomer 2 can substitute for isomer 1, and isomer 17 or 19 can replace isomer 16. From a complete study, one can conclude that the first factor is represented by isomers 1 and 2, nand isobutylbenzene. The second factor is most similar to isomers 3, 5, 6, and 7 , sec-butylbenzene, 1-methyl-2-propylbenzene, l-methyl-3-propylbenzene,and l-methyl-4propylbenzene, respectively. The third factor is most similar to the MS of isomers 11 to 22, the isopropylmethylbenzenes, and all the tri- and tetra-substituted benzenes. The MS of some isomers represent combinations of two factors, rather than a single factor. The MS of isomers 8, 9, and 10, the diethylbenzenes (Table I), can be represented as a weighted sum of factors 2 and 3. Isomer 4, tert-butylbenzene, is a linear combination of factors 1 and 3. Figure 8 of Ref. 1clearly shows this. If one supposes that isomers 1, 5 , and 16 best represent the factor patterns, one still may ask how well the different isomers and the different masses are reproduced in this coordinate system. Clearly, isomers 1, 5, and 16 are reproduced exactly. The isomers and masses represented most poorly are listed in Table VI. tert-Butylbenzene is the least well represented, along with the tetramethylbenzenes, 20, 21, and 22. These are the most untypical or unique isomers. If further factors were used, these isomers would be candidates for the role. The mass which is least typical is the parent peak, mass 134.Masses 41 and 92 are also poorly reproduced. Typical Masses. Expanding the original measurements in terms of the MS of typical isomers, is natural. The same procedure can be carried out with typical masses. The three most representative masses can be chosen in the same way that the representative isomers were chosen. Table VI1 shows the average deviation of the predicted data for the five best sets of three masses. Masses 91, 105, and 119 have an average absolute deviation of 0.64. Table VI1 also lists the 5 sets of masses which represent the data most poorly. Masses 105, 119, and 133 have an average error of 389. This large deviation has the same explanation as the poor fit among certain sets of isomers. With these

ANALYTICAL CHEMISTRY, VOL. 47, NO. 14, DECEMBER 1975

~~

Table VII. Average Reproduction Error for the Five Best Fitting and the Five Worst Fitting Sets of Three Masses Av error

Masses

91 92 91 65 92 91 91 105 105 105

105 105 105 105 77 119 93 119 106 119

0.64 0.67 0.68 0.87 0.89 32 36 92 382 389

119 119 120 119 119 120 119 120 119 133

0

r 27

43

77

91

105

I19

I34

MASSES.

Table VIII. Average Reproduction Error for the Five Worst Fitting Masses and Isomers When Masses 91, 105, and 119 Are the wpical Factors Isomers

Av error

Masses

Av error

4 22 21 20 3

2.1 1.2 1.1 0.9 0.8

134 92 41 133 39

3.4 1.3 1.2 0.9 0.7

three masses, only two of the factors are represented, and mass 133 is a unique and atypical mass. A complete study such as that partially represented by Table VI1 allows one to classify the masses into groups. From the Table, it is clear that mass 91 may substitute for mass 92, and the measurement at mass 120 substitute for the measurement a t mass 119 without seriously affecting the average error. Among the masses related to a single factor are 91 and 92 related to the first factor, 105 and 106 to the second, and 119 and 120 to the third. No simple combinations of two factor masses are found. All the rest have a contribution from all three factors. The classification of masses lacks the neat simplicity of the classification of isomers. Table VI11 lists the five isomers and masses which are least well represented when masses 91,105,and 119 for the 22 isomers are used as factors. Obviously masses 91, 105, and 119 are perfectly reproduced. The least well-reproduced mass spectra are those of isomers 4 and 20,21and 22 (Table I). The same isomers were poorly represented in the system of factors formed by the mass spectra of isomers 1, 5, and 16. Masses 134,92,and 41 are poorly reproduced in the typical mass representation, just as they were poorly reproduced in the typical isomer representation. Figure 2 shows the mass factors produced when the experimental intensities of the 22 isomers a t masses 91,105, and 119 are used as typical factors. Figure 3 shows the experimental intensities of isomers 1, 5, and 16 a t the same masses. These two figures correspond to Figures 9 and 10 of Ref. 1. The significance of both of these similarities will be discussed in the conclusion.

BASIC FACTORS Principal factors are mathematically simple, typical factors are directly related to experimental measurements, basic factors are closely connected t o fundamental molecular properties. A basic factor is a molecular property which accounts for the observed variation of the set of mass spectra. Just as we used typical isomers as test factors, we can see whether some molecular property such as melting point or polarizability fits well as a test factor. The target rotation form of hypothesis testing is appropriate here just as it was for the typical factors. But when investigating basic

Figure 2. Three values for each of the 20 masses in the mass spectra of Cf0HI4(cf. Table Ill) Loadings from R analysis, target rotation, covariance about the origin. Suggested scores were the intensities for the 22 isomers at the masses 119, 105, and 91 In that order. Each factor is normalized to 1

27

43

77

91

105

1/9

134

MASSES Figure 3. Three experimental intensities for each of the 20 masses in the mass spectra of CI0Hl4 (cf. Table Ill) The isomers are 16, 5, and 1 (Table I) in that order. The intensities for each isomer are normalized to 1

factors, it is helpful to look a t factors individually, rather than in sets of three. I t is unlikely that we shall chance upon all three factors simultaneously. When testing sets of test factors simultaneously, we used the average error as a measure of the success of the test. For single test factors it is more appropriate to define a new measure of fit, the relative factor error (RFE). When a factor is tested, a least squares estimate of the test factor within the space of the principal factors is calculated. The deviation of the least squares factor from the test factor is defined as the factor error. The factor error is a function of the magnitude of the values of the test property. T o obtain a measure of fit independent of the magnitude of the test property, one can divide the factor error by the length of the test vector, i.e., the root mean square of the elements in the test factor. This is the relative factor error. T o provide a norm of comparison for the study of basic factors, we list in Table IX the RFE for some typical factors, five isomers and five masses separately in a three-factor space. Geometrical, physical, and thermodynamic properties were used as test factors. The geometric factors are listed in Tables X, XI, and XII. They include Weiner path numbers (sum of the number of bonds between all pairs of carbon atoms) (12),the total number of side chains for an isomer, and the number of methyl, ethyl, and propyl side chains in an isomer, each tested separately. Other factors tested are the number of ways an isomer can form an ion of given mass by breaking a single bond, by cleavage of a carbon-carbon bond CY to the ring, and by cleavage of a bond p to the ring. Table XI1 shows the average absolute data reproduction error for various sets of three geometric factors.

ANALYTICAL CHEMISTRY, VOL. 47, NO. 14, DECEMBER 1975

2381

Table IX. Relative Factor Error (RFE) for Five Typical Isomers and Five Typical Masses Isomer

RFE X lo3

Mass

RFE X lo3

Physical properties

RFE X l o 3

7 19 5 16

3

105 119 120 91

3 4 6 11

Melting point Boiling point Density Refractive index

175 124 115 74

77

17

1

4

4

7 8

Table XIV. Relative Factor Error for Various Thermodynamic Properties of the Isomers of C,,H,,

Table X. Relative Factor Error for Various Geometric Test Factors

Thermodynamic properties

Geometric properties

RFE X lo3

Weiner Path number Total number of side chains Number of terminal methyl groups ethyl groups propyl groups butyl groups Number of ways to obtain ion of mass 119 105 91

108 241 454 646 598 319

AHo,- (298", g ) S> (298", 9)

A G Y (298", g ) --log Kp (9) Cop (298")

AH", AH"" Ionization potential

152 556 411

Table XI. Relative Factor Error (RFE) for the Number of Ways to Obtain Masses 91,105, and 119 by CY and 0 Cleavage Mass Type of cleavage R F E X lo3 119 ar 455 105 CY 646 597 91 167 119 167 105 P 103 91 P

p"

Table XII. Average Absolute Deviation of Recalculated from Experimental Intensities for Various Sets of Three Geometric Test Factors Test factor

Av error

Number of methyl, ethyl, and propyl side chains Total number of ways to obtain masses 91, 105, and 119 by all methods of cleavage of side chains Number of ways to obtain masses 91, 105, and 119 by CY cleavage Number of ways to obtain masses 91, 105, and 119 by 0 cleavage

85.7 8.7 86.0 3.1

Table XI11 lists the RFE for various physical properties of the isomers (13), and Table XIV shows the RFE for various thermodynamic properties of the isomers, including the heat capacity, C, the standard enthalpy of formation, AHfo, the entropy, free energy, and equilibrium constant for the formation of the compound from its elements (14). The heat of vaporization, AHu, the heat of combustion, A", (15), and the ionization potential (16) of the neutral isomers were also used as test vectors. If one compares the fit of the typical factors reported in Table IX with the fit of the basic factors reported in Tables X to XIV, it becomes clear that no basic factor fits as well as the typical factors. The RFE of the typical factors ranged from 0.003 to 0.017, the RFE of the geometric factors in Tables X and XI range from 0.11 to 0.65. The physical property test factors of Table XIII range from 0.074 to 0.175.The RFE of Table XIV, the thermodynamic properties, seem more promising. While AHfo has an RFE of 0.187, the other thermodynamic properties range from 2382

Table XIII. Relative Factor Error (RFE) for Various Physical Properties

RFE

x

103

187 40 46 65 54 74 7

6

0.006 to 0.065, within the range of the typical factors. But this interpretation of Table XIV is misleading. Of the properties tested in the Table only AHfo had the complete 22 values. The other properties in the Table had at most 7 values. Since any vector with 3 values can fit perfectly in a 3-dimensional space, the fewer values in the test factor, the better the apparent fit. The RFE is a function of the number of missing values in the test factor. The low values of the thermodynamic properties, apart from A H f O , is merely an indication that two thirds of the values of the test factor are missing. The most promising basic factor is shown in Table XIII. Beta cleavage to produce the three most important fragment ions provides a fit about five times worse than the typical factors. No adequate set of basic factors has been identified, although the number of ways to produce the major ions by p bond cleavage is certainly indicated as the nearest thing to a basic factor identified. We investigated the reason for this failure. The functional form of the data and the test factor was varied. We tried the various combinations of the logarithm to linearize the interdependence of data and factor (17), but the RFE does not improve. It may be that properties of the ions, rather than properties of the parent neutral molecules must be used. Perhaps vibrational frequencies, when they are available, will prove to be a basic factor, as one expects from the Quasi-Equilibrium Theory of Mass Spectra (QET) (18).Finally, the failure to find adequate basic factors may be a consequence of the kinetic character of mass spectra. A mass spectrum results from the sampling of a multicomponent system of ions whose abundances are the result of a complex set of interconnected reaction rates. This kinetic system is sampled at one time only. If one wants to derive basic factors of MS, it may be necessary to measure the MS as a function of residence time in the ion source, separate out the time dependence by an 0, P, S, or T form of factor analysis (Figure 3 of Ref. l),and then analyze the time-independent mass spectra. Such a higher order FA is possible, but it presents an entirely new set of problems which we will not look into here. The significance of basic factors is fairly obvious. With them, one could derive mass spectra from fundamental molecular properties. One could predict mass spectra or missing values of a mass spectrum. In fact, if one could identify all the basic factors of mass spectra, one could construct an empirical theory of mass spectra which would sum up all the information present in mass spectral measurements,

ANALYTICAL CHEMISTRY, VOL. 47, NO. 14, DECEMBER 1975

and provide the interface between a fundamental derivation of MS and mass spectral libraries. This empirical theory of MS has not been realized as yet. Only future work can establish whether it is possible.

PARTIAL FACTORS Principal factors h d p define the dimensionality of the set of mass spectra, typical factors pick out spectra representative of the factor properties, partial factors provide analytical insight into the set of mass spectra. If one looks on the set of mass spectra analyzed as a kind of mixture of various components, then FA can look for the components or elements of the mixture. Partial factors or component factors are these constitutive parts of the mixture, i.e., linear combinations of the parts which regenerate the observed whole of the mass spectral data set. In one sense, every kind of factor defined thus far is a “component” factor, i.e., linear combinations of the factors which reconstitute the observed spectra. We distinguish partial factors from other component factors in several ways. First, partial factors must preserve the character of the original spectra, e.g., they must have positive intensities, unlike principal factors which necessarily have negative elements in the second (difference) factor. Partial factors preserve the character of the measured MS as the components of a mixture preserve the characteristics of the mixture. Preserving the character of the original spectra ensures that the methods used to investigate MS in general, can be used to analyze partial factors also. Second, unlike typical factors, partial factors are not limited to observed variables. Typical factors will analyze the observed MS into fundamental classes only if one is lucky enough to find a pure example of each basic type among the data. There is no guarantee that one will find the pure components among the sampled spectra. To put it another way, if there are p typical factors, then necessarily p of the mass spectra remain unanalyzed into simpler elements. On the other hand, if one removes the restriction that the factor must be found among the observed spectra, then all the spectra may be analyzed into simpler elements. A partial factor is, therefore, a sort of pure and perhaps unobserved typical factor. The set of mass spectra as a whole and each spectrum individually is treated as a weighted mixture of the partial factors. In specifically mass spectral terms, we resolve the observed spectrum into a set of partial mass spectra. The set of all the observed fragmentation patterns is resolved into a smaller set of simpler fragmentation patterns. Each partial or component mass spectrum may be associated with a particular ion decomposition pathway, perhaps associated with the dissociation of a single important ion. Each partial mass spectrum may define the decay of structurally different parent ions, i.e., describe the mass spectrum of geometrically and electronically non-equivalent parent ions. Partial factors can be generated by the varimax rotation of the principal factors. Varimax rotation is an analytical method, like the principal factor method and unlike the target rotation method which generates typical and basic factors. Equation 3 expresses the function which must be maximized to perform the varimax transformation (19).

The h’s are weights for each variable. The f ’ s are the factor loadings, i.e., numbers relating the factors to the variables originally measured, (cf., Ref. 1 for terminology). In varimax rotation, intermediate loadings are changed into high and low loadings insofar as this is possible. A mixed dependence on several of the original variables becomes a strong dependence on one and a weak dependence on others. Sev-

L 27

43

77

9/

105

119

134

MASSES Flgure 4. Three values for each of the 20 masses in t h e mass spectra of CIOHj4(cf. Table Ill) Loadings from R analysis, varimax rotation, covariance about the origin. Each factor is normalized to 1

era1 other methods for producing partial factors are known, e.g., special methods of target rotation, and oblique analytical methods, but we will not discuss them here. Partial Factors of C l ~ H l dIsomers. Figure 4 shows three factors derived by the varimax rotation of the mass factors. Figure 8 of Ref. 1 shows the factors derived by the varimax rotation of the isomer factors. The first partial factor in Figure 4 has a base peak a t mass 91, corresponding to the ionic formula C7H7+, and to the loss of the neutral C3H7, mass 43. Associated with the base peak is mass 92, corresponding to the ion C7Ha+, and to the loss of the neutral C3H6, mass 42. Factor 1 is associated with the isomers n-butyl- and isobutylbenzene. Cleavage of the C-C bond (3 to the ring accounts for the general feature of the base peak; (3 cleavage together with H atom rearrangement accounts for mass 92. Also peculiar to this factor is mass 43, corresponding to the ion C3H7+, and to the neutral C7H7. This peak is clearly associated with the base peak ion. The ionization potential of C3H7 must not be very much greater than the ionization potential of C7H7, so a certain fraction of the electrons is extracted from the smaller fragment. The first factor is also associated with tert -butylbenzene, but we will discuss, that case with the third factor. The second partial factor in Figure 4 is dominated by mass 105 due to the formula CsHS+ and corresponding to the loss of C2Hb. Associated with this base peak is a higher than average intensity at mass 106, CsHlo+ and C2H4 loss. This partial factor is related to isomers 3, 5, 6, and 7, i.e., sec- butyl, and the methylpropylbenzenes. (3 cleavage of each will produce the base peak. (3 cleavage followed by H atom transfer will result in mass 106. The second factor is also related to isomers 8, 9, and 10, the diethylbenzenes, but they will be discussed with the third factor. The third factor of Figure 4 has a largest intensity at mass 119, corresponding to the presence of CsHll+ and the loss of CH3. Associated with it is the peak at mass 120 due to CsH12+ and CH2 loss. Factor 3 is closely associated with isomers 11-22, the isopropylmethylbenzenes, the ethyldimethyl isomers and the tetramethyl isomers. (3 cleavage of the isopropylmethyl isomers and the ethyldimethyl isomers will produce the base peak. (3 cleavage together with hydrogen atom rearrangement will produce mass 120. (3 cleavage followed by loss of Hz from the ion results in mass 117, ion CsHs+, which is also associated with the third factor. In the tetramethyl isomers, 20-22, no /3 C-C bond is present. Cleavage of the (3 C-H bond produces mass 133 which is associated with this factor. a cleavage of a methyl group is a necessary path to mass 119 for these isomers. This variation in mechanism is presumably the reason why the tetramethyl isomers are non-typical in the study of typical factors.

ANALYTICAL CHEMISTRY, VOL. 47, NO. 14, DECEMBER 1975

2383

Table XV. Angle in Degrees through Which Each of the Three Factors Must Be Rotated to Coincide for Principal Factors (PC), Typical Factors (TR), Partial Factors (VM), and Experimental Factors (EX) R analysis Factor PC-EX

1 2 3

18.2 26.7 20.4

TR-EX

10.6 4.9 6.0

Q analysis

VM-EX

3.2 6.1 8.4

PC-EX

TR-EX

VM-EX

19.9 27.1 32.5

12.9 5.0 19.3

3.7 4.4 12.3

Isomer 4, tert-butylbenzene, and isomers 8, 9, and 10, the diethylbenzenes, are each related to two factors. The former isomer is related to factors 1 and 3; the latter to factors 2 and 3. p cleavage of tert-butylbenzene accounts for its dependence on factor 3. But, in addition, another decomposition pathway leading to the loss of a total of three carbon atoms accounts for its dependence on factor 1. The mechanism of this dissociation has been studied extensively (20). The diethyl isomers, 8, 9, and 10, depend on factor 3 through p cleavage. But a second dissociation pathway is present which results in the loss of a total of two carbon atoms to account for its dependence on factor 2. Single a cleavage, double @ cleavage together with H atom rearrangement, or a concerted double @ cleavage together with H atom rearrangement are possible. Loss of C Z His~ slightly favored energetically over the loss of CH3, and is obviously highly favored over the simultaneous loss of CH3 and CH2 (211.

CONCLUSIONS The test case chosen for this study has several special characteristics. The MS of the 22 isomers are more similar than a random collection of MS. Because of the elemental simplicity of the hydrocarbons, the intensity measured at each mass corresponds to the abundance of a definite ionic formula. Because all the isomers have the same molecular weight, detection of a definite ionic formula uniquely implies the loss of the same neutral fragment. Analysis of the abundances of ionic fragments is also an analysis of neutral fragments. All the compounds analyzed are benzenoid. The stability of the ring leads to mass spectra with a few strong features. Aliphatic hydrocarbons, or compounds with a number of different functional groups, should not show such a simple structure. Table XV compares the principal factors (PC), the typical factors (TR), and the partial factors (VM) by comparing them all to experimental factors (EX). The different kinds of factors can be looked on as a coordinate system with different orientations. The angle necessary to rotate a particular solution into the experimental measurements is listed in the Table. The three experimental factors of R analysis are, respectively, isomers 16, 5 , and 1,i.e., the data shown in Figure 3. The three experimental factors of Q analysis are, respectively, the measurements at mass 119, 105, and 91, Le., Figure 10 of Ref. 1. The three columns under R analysis in the Table, briefly compare Figures 1 and 3, Figures 2 and 3, and Figures 4 and 3, respectively.

2384

The three columns under Q analysis compare Figures 7 and 10, Figures 9 and 10, and Figures 8 and 10 of Ref. 1. From the Table, one can see that the principal factors in general are most different from the experimental factors, the typical factors are a t an intermediate position, and the partial factors are quite close to the most representative masses and isomers; an average of six degrees separates them. Partial factors derived by varimax rotation turn out to be very closely related to the most typical isomers and masses. FA of this set of 22 mass spectra analyzed the experimental measurements into a set of three patterns by eliminating noise, redundant information, and unique features. Typical and partial factors were the most productive. FA of the MS provided a synthesis of the original spectra. Clusters and a structure of classes were identified. Isomers (and masses) were compared to each other, and interrelated in an objective and comprehensive way. The factor patterns provide a concentrated summary of the information present in the MS. FA simplified the MS in a number of ways. Most obviously, it derived three patterns from 22. It isolated mass properties from compound properties. It separated the complex original patterns into a set of independent properties. The partial patterns are simpler and more open to interpretation than the original spectra. Finally, the technique of testing of hypotheses in a quantitative fashion allows one to identify the actual factors and reject false factors. A quantitative method of factor interpretation is available. Partial factorization of mass spectra provides a powerful tool for the simultaneous study of large groups of MS.

LITERATURE CITED (1) R. W. Roiett and E. M. Petersen, Anal. Chem., 47, 1301 (1975). (2) D. Macnaughtan, Jr., L. B. Rogers, and G. Wernimont, Anal. Chem., 44, 1421 (1972). (3) P. H. Weiner and D. G. Howery, Anal. Cbem., 44, 1189 (1972). (4) P. H. Weiner and J. F. Parcher. Anal. Chem., 45, 302 (1973). (5) P. H. Weiner, E. R. Malinowski, and A. R. Levinstone, J. Pbys. Cbem., 74, 4537 (1970). (6) P. H. Weiner and E. R. Malinowski, J. Phys. Chem.. 75, 1207 (1971). (7) P. H. Weiner, J. Am. Chem. SOC.,95, 5845 (1973). (8) J. J. Kankare, Anal. Cbem., 42, 1322 (1970). (9) 2 . 2. Hugus, Jr., and A. A. El-Awady, J. Pbys. Chem., 75,2954 (1971). (10) N. Ohta, Anal. Chem., 45, 553 (1973). (1 1) J. C. Stover, Doctoral Dissertation, Fordham University, New York, 1974. (12) J . Farguharson and M. Sastri. Trans. Faraday SOC.,33, 1474 (1937). (13) "Handbook of Chemistry and Physics", 49th ed., Chemical Rubber Co.. Cleveland, Ohio, 1968-69. (14) D. R. Stull, E. F. Westrum, Jr., and G. C. Sinke, "The Chemical Thermodyamics of Organic Compounds", Wiiey and Sons, New York, 1969, pp 373 722 . (15) J. D. Cox and G. Pilcher. "The Thermochemistry of Organic and Organometallic Compounds", Academic Press, New York, 1970, pp 170- 173. (16) V. I. Vedeneev et al., "Bond Energies, Ionization Potentials and Electron Affinities", Edward Arnold, London, 1966, p 159. (17) P. H. Weiner and J. F. Parcher, Anal. Chem., 45, 302 (1973). (18) H. M. Rosenstock, M. B. Wallenstein, A. L. Wahrhaftig, and H. Eyring, Proc. Net. Acad. Sci. USA, 38, 667 (1952). (19) R. J. Rummel, Applied Factor Analysis". Northwestern University Press, Evanston, Ill., 1970, p 391. (20) H. M. Grubb and S.Meyerson, "Mass Spectrometry of Organic Ions", F. W. McLafferty. Ed., Academic Press, New York, 1963, pp 453-527. (21) E. McLaughlin Petersen. Doctoral Dissertation, Fordham University, 1975.

..

~~

RECEIVEDfor review May 27,1975. Accepted September 2, 1975.

ANALYTICAL CHEMISTRY, VOL. 47, NO. 14, DECEMBER 1975