Application of pattern separation techniques to mass spectrometric

Sequence analysis of oligodeoxyribonucleotides by mass spectrometry. 2. ... System for Automated Data Analysis Using Pattern Recognition Techniques...
0 downloads 0 Views 1MB Size
Table V. Partial Mass Spectra and Mass Shifts of the Most Significant TMS-Containing Ions in the Mass Spectra of VI1Ia-c VIIIC VIIIa VIIIb [3P,17P(3P-TMS-do; [3P,17B(TMS-ddzI 17P-TMS-dg) (TMS-do)rl 129 [34P 138 129 131 [24] 140 131 157 [loo] 166 166 185 [23] 185 185 211 1261 211 211 226 [20] 226 226 241 [19] 241 241 253 [39] 253 253 267 [59] 267 267 282 [80] 282 282 316 [lo] 325 316 343 [24] 352 343 (1)b 352 (3)* 372 [23] 381 372 (20) 381 (1) 433 [ll] 451 442 447 [3] 462 456 462 [2] 480 47 1 a Figures in brackets refer to the relative intensities of the ions of indicated mass. b Figures in parentheses refer to the ratios of the indicated peaks.

the selective TMS-d9 labeling technique involves the loss of a methyl group from either the M+. or the (M-90)+. ions to give the peaks at rnje 433 and rnje 343, respectively. The spectra of the labeled derivatives of the 17a-methyl (VII6,c; Figure 2,b,c) and 17a-ethyl (VIIIb,c; Table V) compounds provide sufficient isotope or substituent labeling to show that 40x of the methyl loss from M'. originates from the 3/3TMS function and the remaining 60 represents elimination of the 17a-methyl group. In addition the spectrum of the selectively labeled derivative VIIc (Figure 2c) clearly shows that the (M-90)+. ion in the spectrum of VIIa consists of two different species, each eliminating a methyl group with differ-

ent relative preference. This is indicated by the 10 :1 ratio of rnje 352 :rnje 367 as compared to a 1 :4 ratio of rnje 343 :mje 358 (Figure 2c). The (M-90)+. ion originating from elimination of trimethylsilanol from the 38 position, exhibits favorable loss of the 17a-methyl group. By contrast the loss of methyl from the other (M-90)+. ion is much less favorable and is distributed between the 17a and the angular methyl groups. CONCLUSIONS

The data presented above show that it is possible to prepare mixed TMS-do and TMS-dQ steroidal ethers in which the labeled trimethylsilyl moiety occupies a specific position. Unlike previous investigations, which employed an on-column exchange method, we were able to prepare selectively labeled TMS-d9 derivatives in solution by using standard silylation procedures. As illustrated in the examples shown in Figures 1 and 2, the selective silylation in solution, with careful control of the experimental conditions, enabled us to prepare consistently derivatives with an isotopic purity and specificity of 90 or more. This compares with a purity of 40-70 %, obtained by the on-column exchange method, but which usually varied widely with the general condition (e.g., temperature, previous use, etc.) of the gas chromatographic column. As shown in the exatriples discussed here, the preparation of these derivatives is particularly helpful in the interpretation of the mass spectra of trimethylsilyl steroidal ethers. Much of the mass spectrometric information could, otherwise, only have been arrived at by elaborate deuterium and 180-labeling. In view of their simplicity, the selective TMS-dQlabeling procedures described here should be readily applicable to other classes of compounds.

x

RECEIVED for review July 5, 1972. Accepted September 18, 1972. This work was supported by grants from the National Institute of General Medical Sciences (GM-13901, GM16216).

Application of Pattern Separation Techniques to Mass Spectrometric Data Determination of Hydrocarbon Types and the Average Molecular Structure of Gasoline D. D. Tunnicliff and P. A. Wadsworth Shell Development Company, Houston, Texas

The mass spectra of a large group of pure compounds typical of those found in gasoline have been used to derive a set of weight vectors which can be used to determine the average properties of gasoline samples. The method is illustrated for the determination of hydrocarbon types and for the determination of the structural features of the average molecule.

structure of a molecule from its mass spectrum. It has also been shown ( 4 ) that data from several different sources can be combined in a single calculation. One of these techniques (3) can also be applied to the determination of molecular types and the structure of the average molecule in a complex mixture. This approach is based on the following set of

ITHAS BEEN DEMONSTRATED (1-3) that pattern separation techniques are powerful tools in the identification of the molecular

(2) P. C . Jurs, B. R. Kowalski, and T. L. Isenhour, ibid. 41, 21 (1969). (3) B. R. Kowalski, P. C . Jurs, and T. L. Isenhour, ibid. p 695. (4) P. C . Jurs, B. R. Kowalski, T. L. Isenhour, and C . N. Reilley, ibid.,p 1949.

(1) L. R. Crawford and J. D. Morrison, ANAL.CHEM.,40, 1469 (1968). 12

ANALYTICAL CHEMISTRY, VOL. 45, NO. 1, JANUARY 1973

~

Compound types Paraffins Cyclopropanes Cyclobutanes Cyclopentanes Cyclohexanes Aromatics Mix. Cond. Ring Monoolefins Diolefins Cyclic olefins Aromatic olefins Total

~~~

~~

Table I. Number of Compounds in Training Set by Type and by Carbon Number Distribution by carbon number c 5 C6 c 7 CS C9 ClO cn 2 5 9 18 26 10 4 3 1 1 1 6 13 5 1 1 8 11 9 1 1 1 4 8 22 13 2 4 5 1 17 31 14 5 6 1 4

6

39

55

I 1 65

equations, one for each compound in the file of reference spectra serving as the training set

a,

+

j=l

ajxij= y i

i

=

1,2...m

5

Total 19 4 1 25

6

37

20

69

6

11 71

2

I

1

6

3

c 1 2

(1)

where x i j = the mass spectrometric intensity at mass j for compound i, a j = coefficients in the weight vector to be used for determining a particular property of the mixture, and y i = the value of the property of interest for compound i. If the training set (m) is larger than (n l), the number of terms in the weight vector, the latter can be determined using the method of least-squares. The above set of equations is quite general in application. Analytical data of different types such as mass spectrometric, infrared, ultraviolet, nuclear magnetic resonance (NMR), refractive index, and density may all be combined in one equation as variables xi?,provided the units are all compatible. It is essential that the data chosen be additive for a mixturei.e., for the mixture the value of each variable must be equal to the sum of the contributions from all the individual components. Similarly, y i may be a measure of any property of interest in a mixture. It may be the percentage of a single compound or any group of compounds such as the percentage of aromatics or, even more specifically, the percentage of Cs aromatics. In such cases the value of y i is set equal to 100.0 for all compounds belonging to the class being determined and to 0.0 for all compounds not in this class. It may also be used to express some structural property of an average molecule such as the number of -CHp groups, the number of carbon atoms, or the number of aromatic C=C groups. Each such property of interest requires the calculation of a new set of aj's. These may then be substituted into Equation 2 so as to calculate the value for each of these properties for a mixture

+

where x I = the same set of analytical data for a mixture as used for the pure compounds in Equation 1, and y = the computed value of the property for the sample being analyzed. The usefulness of any set of aj's for analyzing samples will depend on the degree of correlation which exists between y z and x t j in Equation 1. If there is a high degree of correlation, then quite reliable results will be obtained. If there is very little correlation, the results will be very erratic and will depend quite strongly on the composition of the particular sample being analyzed. One advantage of this empirical approach is the possibility of developing analyses based on

1 4 62

18

2 53

24

1

8

41

3 42

very complex correlations in the data which might escape detection by the analyst. This paper describes the application of the general method outlined above to the determination of hydrocarbon types and the structural features of an average molecule in gasolines. The method has been tested using mass spectrometric data alone and with mass spectrometric data combined with N M R data, the refractive index, and the density. The inclusion of the N M R data, the refractive index, and the density turned out to be a disadvantage for reasons which will be discussed later. Training Set. The immediate goal of this project was the investigation of new methods of analysis for hydrocarbon mixtures in the gasoline range from which all C 4 components had been removed. It was further assumed that the only significant Cscomponents would be n-pentane, isopentane, and 2-methyl-2-butane. Although the method could be easily extended to include full range gasoline, it seemed desirable to cover only this restricted range because of the difficulty of obtaining reliable analytical data for mixtures containing large percentages of the more volatile components. Consequently, the compounds chosen for the training set consisted mainly of hydrocarbons in the C 6 to C12range with the three C s compounds mentioned above. An effort was made to exclude compounds (such as acetylenes) which are very unlikely to occur in gasolines and to include all the likely compounds for which adequate spectra were available. Also many compounds were included which may not be actually present in gasoline but which are similar in structure to those which are known to be present. Inclusion of such compounds hopefully provides average data for those compounds for which spectra were not available a n d also for the inevitable variations in composition of such products. The distribution of compounds in the training set by type and by carbon number is given in Table I. The mass spectrometric data for the training set were taken mainly from the compilation available on magnetic tape from the Mass Spectrometry Data Centre, Aldermaston, Reading, England. The mass spectra in this file were obtained from many different laboratories, run on several different kinds of instruments and under different conditions. Since calculations based on these data were to be applied to sample data obtained with our own mass spectrometer (CEC 21-103), it was important that any data which had not been obtained under conditions which conformed reasonably well to our own conditions be excluded. Only data obtained on CEC 21-101, 102, and 103 mass spectrometers operated with an ionizing voltage of 70 eV and a n ion-source temperature of approximately 250 "C were used for this application. Since mass

ANALYTICAL CHEMISTRY, VOL. 45, NO. 1, JANUARY 1973

0

13

Table 11. Structural Code (Number per molecule)

Carbon atoms Hydrogen atoms -CH: groups -CH%- groups in chains -CH: groups in chains

i

-C-

groups in chains

-CH2-- groups in rings -CH: groups in rings

I -C-

groups in rings

I -CH:

groups in ring junctions

I

-C-

groups in ring junctions

I

=CH2 groups =CH- groups in chains -C: groups in chains =CH groups in aromatic rings =C: groups in aromatic rings =C: groups in ring junctions =CH- groups in non-aromatic rings =C: groups in non-aromatic rings Rings double bonds C=C total C=C in chains C=C in non-aromatic rings C=C in aromatic rings 3-atom rings 4-atom rings 5-atom rings 6-atom rings 7-atom rings &atom rings Saturated rings Condensed-saturated rings Olefinic rings Condensed-olefinic rings Aromatic rings Condensed-aromatic rings

+

spectrometric data from masses 24 to 170 were to be used, only spectra which provided data for all, or nearly all, of this range at one setting of the magnetic field for this interval were included. A total of 88 spectra obtained on our own instrument were included; these included many of the most significant constituents of gasolines. The selection of the masses for which intensity data are to be used was found to be very important. Initially all data between mass 24 and 175 were used in the calculations. However, it was observed that some of the coefficients, a3 in Equation 2 , were so large that the normal errors in the sample data caused a very large error in the computed result. It was discovered that this problem was due to the use of data for masses for which the intensities for all compounds were very low and so subject to rather large errors. Some improvement was made by using a stepwise regression procedure to choose the most significant masses. The best solution found was to delete data for low intensities. The deletion was accomplished by adding all 342 normalized spectra together and then sorting the resulting data in order of decreasing intensity sums. These sorted data were then examined for the purpose of finding the mass corresponding to the smallest intensity sum which was still considered to be significant in a mass spectrometric sense. Data for 91 masses were selected for the calculations by this rather arbitrary procedure. All mass 14

spectrometric intensities between mass 24 and 175 are normalized to total 1000.0 before selecting the data to be used for the calculations. The NMR data in the training set were calculated from the structural formula of each compound by merely counting the number of hydrogens on aromatic, olefinic, and saturated carbon atoms. In order to keep the units consistent, as will be discussed later, these percentages must be converted to moles of hydrogen for each type per 100 ml of each compound. The refractive indices a t 25 "C for the Na-Z) line and the densities at 25 "C were obtained from the API Research Project 44 Tables whenever possible. In a few instances, it was necessary to convert literature values obtained a t other temperatures to 25 "C and in a very few instances the value was estimated from the values of very similar isomers for which data were available. Some simple means of fixing the appropriate value of y t for any property ofinterest is highly desirable. This was achieved by means of the structural code given in Table 11. Data for this code are included with the other data for each compound. For structural determinations these values, expressed in appropriate units, are used for y l . This code is also very valuable when training for type analysis since it provides many different ways of dividing the compounds up into groups. For example, all compounds with a specified carbon number and a specified number of rings double bonds can be given a value for y 1 of 100.0and all others given a value of 0.0 Outline of the Calculations. The set of equations defined by Equation 1 can be expressed in matrix form by

+

XA

=

Y

(3)

XTXA

=

XTY

(4)

Then, briefly

A

=

(XTX)-'XTY

(5)

where X T = the transpose of X and (X'x)-' = the inverse of the coefficients of the normal equations XTX. In actual practice, Equation 4 can be expressed in terms of the correlation matrix

E-' [ E X T X E T ](ET)-' A

=

E-' [EX'Y]

where E = a diagonal matrix whose terms consist of the reciprocals of the square-roots of the diagonal terms of X T X and EXTXET = the correlation matrix. The use of the correlation matrix is quite important in this application as it tends to reduce errors due to round-off during the solution of the ill-conditioned matrices encountered (5). Although for simplicity no further reference will be made to the correlation matrix, its use will always be implied. In practice the array defined by (XTX)-'XTis calculated and stored on magnetic tape for later reference. Training for a new weight vector, A, then involves only fixing the values of Y and carrying out the multiplication specified in Equation 5 . This operation and testing the computed vector by substitution back into Equation 1 and comparing the true and computed values of Y requires 25 seconds of Univac 1108 time. Approximately 130 seconds are required for the calculation of ( X T X ) - I X T and testing. It is, of course, possible to solve for A without calculating a complete solution. This calculation and the testing requires approximately 40 seconds. The last approach is the most efficient when only a few weight vectors are to be computed for a given set of data but becomes ( 5 ) D. W. Marquardt, J . SOC.Itzd. Appl. M d z . , 11,431 (1963).

ANALYTICAL CHEMISTRY, VOL. 45, NO. 1, JANUARY 1973

Table 111. Standard Deviation and Average Deviation in Training for Type Analysis No. in cateHydrocarbon type gory 0 Rings double bonds 79

+

1

2 4 5 7

Total Cg c 7

CS CO C1a c 1 1 C12

Benzene Toluene Cs Benzenes C9 ClO c 1 1

143 25 61 22 2 37 55 67 62 55 24 39 1 1 4 8 22 13 18 102 93

C12 Total olefins M onoolefins Monoolefins C==C in chain 84 Monoolefins C=C in ring 9 5-Carbon ring sats. 27 5-Carbon ring 41 total &Carbon ring 39 sats. 6-Carbon ring 51 non-aromatic &Carbon ring total 143 Based on Equation 8. Based on Equation I.

0%

100%

0%

100%

MS, RI,a and density Std dev Av dev 0% 100% 0 % 100%

10.3 9.6 4.0 1.9 1.9 0.3 9.6 16.5 15.6 16.8 10.0 6.1 10.5 0.3 2.0 1.1 2.6 2.8 2.1 4.1 16.9 18.6

10.9 15.1 13.4 5.8 6.2 0.2 23.1 32.7 33.6 36.4 29.3 39.8 32.3 9.6 16.5 14.3 17.7 14.7 26.9 30.3

1.4 2.5 0.3 0.1 0.1 0.0 1.5 4.7 5. I 5.7 2.6 1.5 2.4 0.0 0.0 0.0 0.1 0.2 0.2 0.3 5.9 6.8

-4.7 -3.5 -3.7 -0.5 -0.9 -0.1 -12.8 -24.6 -21.1 -25.8 -13.6 -20.1 -18.7 -0.3 -16.4 -1.7 -5.2 -3.1 -4.0 -5.0 -13.9 -18.3

10.0 9.7 4.0 1.9 1.9 0.3 9.7 16.5 15.6 16.8 10.0 6.3 10.5 0.3 2.0 1.1 2.6 2.8 2.2 4.1 11.4 16.1

19.4

33.6

1.4

-22.6

7.0 12.8

30.3 40.5

0.7 3 .O

13.5

32.1

11.4

MS

Av dev

Std dev

9.6 16.5 13.8 17.0 14.3 21.9 27.0

1.4 2.5 0.3 0.1 0.1 0.0 1.5 4.6 5.1 5.7 2.6 1.5 2.3 0.0 0.0 0.0 0.1 0.2 0.2 0.3 3.3 5.3

-4.5 -3.5 -3.6 -0.4 -0.9 -0.1 -12.4 -24.1 -21.0 -25.8 -13.3 -19.4 -17.5 -0.3 -16.0 -1.7 -5.2 -3.0 -3.8 -4.9 -7.8 -14.1

16.7

28.3

5.4

-16.4

-26.3 -35.0

6.8 12.7

28.1 37.9

0.7 2.8

-24.3 -32.6

3.2

-23.3

13.3

30.1

3.0

-21.9

30.7

2.5

-19.2

11.0

26.0

2.1

-15.9

13.3

29.4

3.2

-18.5

12.5

25.7

2.7

-15.4

15.6

18.7

4.9

-6.8

14.6

15.8

3.9

-5.4

.

I

.

...

quite inefficient for computing a large number of weight vectors. Although the terms in the weight vector, A, varied slightly depending upon whether they were obtained by a direct solution o r by the use of the ( X T X ) - l X T matrix, both gave the same value for y in Equation 2. These differences in the A vector are probably due t o round-off errors in the computer calculations. It is not so important in such cases as whether the A vector is unique as it is that the individual terms be self-consistent. It is quite important in these calculations that all the units be consistent. Since mass spectrometric data for liquids are expressed as intensity a t each mass per unit liquid volume of sample, it was necessary to define all other quantities in terms of a n intensity per unit liquid volume. As mentioned previously the NMR data were converted to moles of hydrogen per 100 ml. Fortunately, the refractive index and the density are easily expressed in terms of volume. i=l

(7)

10.9 14.8 12.9 5.4 6.2 0.2 21.9 31.8 33.5 36.4 28.7 38.3 30.2

...

...

MS, NMR, RI,b and density Std dev Av dev _____ 0%

100%

0%

100%

9.6 16.2 13.6 15.0 14.1

0.6 1.9 0.3 0.1 0.1 0.0 1.5 4.6 5.1 5.7 2.6 1.5 2.3 0.0 0.0 0.0 0.1 0.2 0.1 0.3

-1.9 -2.4 -3.5 -0.3 -0.8 -0.1 -12.1 -23.9 -20.6 -25.5 -13.3 -19.1 -17.3 -0.3 -14.1 -1.7 -5.0 -3.0 -3.4 -4.8

15.5

23.5

4.5

-11.8

12.4

36.2

2.6

-30.6

6.9 9.2 4.0 1.9 1.9 0.3 9.7 16.5 15.6 16.8 10.0 6.4 10.6 0.3 1.9 1.1 2.5 2.9 2.3 4.0

5.7 11.6 12.7 4.2 5.8 0.2 21.5 31.6 32.9 36.2 28.6 37.9 29.8

... ...

where nM = refractive index of the mixture, n , = refractive index of component i, c, = volume fraction of component i, n = number of components in the mixture, r,, = refraction of the mixture = (nehi2- l ) / ( t ~ , ~ * 2), dAIr= density of the mixture, and d, = density of component i. Equation 7 was used initially for combining the refractive indices, but this was later changed to the more exact Equation 8 which gave somewhat smaller standard deviations. The values for y , in Equation 1 must also be expressed in terms of a n intensity per unit volume. If y t is given a value of 0.0 or 100.0 for type analysis, then the value of calculated from Equation 2 will be expressed in volume per cent. For structural analyses, y , is expressed as moles of the particular group per 100 ml of pure compound. The computed value of y from Equation 2 then becomes moles per 100 ml of sample. Since this is not apt to be a very useful relation, a weight vector was computed for the number of average molecules per 100 ml of sample. The ratio of these two values then gives the number of groups per average molecule which is a more meaningful quantity. All results for structural features listed in this report are expressed as number of groups per average molecule unless stated otherwise. Training Results. The computer output from the calculation and testing of each weight vector includes some statistical information as to the kind of fit obtained. For type

+

ANALYTICAL CHEMISTRY, VOL. 45, NO. 1, JANUARY 1973

15

Table V. Composition of Synthetic Samples

Table IV. Standard Deviation and Average Deviation in Training for Structural Analysis MS, RI,O MS, NMR, and RI.* and density, density, MS, Std dev Std dev Std dev Structural features 0.03 0.03 Moles/100 ml 0.03 0.11 0.02 Carbon atoms 0.01 0.20 0.03 Hydrogen atoms 0.13 0.33 -CHI 0.27 0.31 0.52 -CH%- in chains 0.47 0.47 0.28 0.27 -CH: in chains 0.27

No. of

Aggregates Paraffins Cyclopentanes Cyclohexanes Aromatics Cond.-ring aromatics Olefins

I

I

-C-

in chains

0.16

0.16

0.15

0.45 0.27

0.30 0.24

0.28 0.24

in rings

0.14

0.13

0.12

=CH2 =CH- in chains =C: in chains =CH- in non-aromatic rings =C: in non-aromatic rings =CH- in aromatic rings =C: in aromatic rings C=C total C=C in chains C=C in non-aromatic rings C=C in aromatic rings 5-Carbon rings 6-Carbon rings Based on Equation 8. * Based on Equation 7.

0.18 0.28 0.19

0.17 0.22 0.18

0.10 0.19 0.16

0.12

0.12

0.11

0.08

0.08

0.08

0.10

0.10

0.00

0.08 0.14 0.17

0.08 0.09 0.11

0.07 0.05’ 0.08

0.07 0.04 0.12 0.12

0.07 0.04 0.12 0.11

0.06 0.03 0.11 0.10

I

-CH2in rings -CH: in rings

I -CI

Q

analysis this consists of the standard deviation and the average deviation for all compounds for which y i = 0.0 and the same information for compounds for which y i = 100.0. For structural analysis, the output gives the overall standard deviation and the average deviation. Tables I11 and IV give these data for type and structural analyses for conditions when only mass spectrometric data were used, for mass spectrometric data combined with refractive index and density data, and for mass spectrometric data combined with NMR, refractive index, and density data. The standard deviations in Table IV are all expressed in terms of moles per 100 ml. The average deviations in Table IV were all less than 0.005 except for “carbon atoms” and “hydrogen atoms” where the values were 0.02 and 0.03, respectively. Sometimes the standard deviations are quite large. In such cases, the accuracy obtained in the analysis of a sample for that category will be quite dependent o n the composition of the sample being analyzed. Adjustment of Weight Vectors. It will be noted in Table 111 that the average deviation for y i = 0.0 is always positive while the average deviation for y i = 100.0 is always negative. The cause of these systematic deviations is presumably due to the fact that the calculations are designed to fit a least-squares criterion. This requires that the sum of the squares of the deviations be a minimum for the entire set. However, for some applications, it is preferable for the average deviations a t y i = 0.0 and y i = 100.0 to be near 0.0. A minimization of the average deviations was acconlplished by applying a 16

comPOnents

131

132

133

134

29 11 6 25

43.6 6.9 4.3 42.9

35.2 15.4 14.9 32.7

45.5 6.9 4.1 34.0

26.6 15.4 15.6 24.0

5 15

2.2 0.0

1.9

0.0

1.4 8.0

2.0 16.4

Volume,

linear adjustment to the computed weight vectors. This procedure was used to adjust all weight vectors used for type analysis. However, since y i takes on a wide range of values for structural analysis, such a procedure is not applicable in this case. Results on Synthetic Samples. The method has been tested by the analysis of four synthetic mixtures prepared from pure compounds. In order to minimize the amount of work required, the pure compounds were first used to prepare the six following aggregates : paraffins, cyclopentanes, cyclohexanes, aromatics, condensed-ring aromatics, and olefins. The composition of each aggregate was based on the relative concentrationv of the more significant components of “Kegular” gasoline as given by Sanders and Maynard (6). However, there were a few components, none major, which were omitted because the pure compounds were not available. There are several instances in which they list two or three possible components in a single gas-chromatographic peak. In such instances the first component named was used. The low solubility of naphthalene caused problems in preparing the condensed-ring aromatic aggregate. Sufficient 5-methyltetralin, not h t e d as one of the components of “Regular” gasoline, was added to dissolve all the naphthalene. These aggregates were then blended to give four synthetic samples as given in Table V. It will be noted that samples 131 and 132 contain n o olefins. The composition of 133 is quite close to that given by Sanders and Maynard for “regular” gasoline. Sample 134 contains large amounts of cyclopentanes, cyclohexanes, and olefins. These types were enhanced deliberately in order to obtain a better measure of the reliability of the method for the determination of such compounds. Although this method of using various proportions of the six aggregates to prepare the four synthetic samples has the advantage of greatly reducing the effort of preparing these samples, it does have the disadvantage of not providing for any variation of the relative concentrations of the individual compounds in each aggregate. The mass spectrum, the N M R integrals, the refractive index, and the density were obtained for each of these samples. The N M R data were obtained by diluting the sample in carbon tetrachloride containing a little hexamethyldisiloxane as a n internal reference. The results were calculated in terms of moles of each type of hydrogen per 100 ml by comparison with a sample of pure ethylbenzene diluted and run in the same manner. A simple computer program was written for calculating from the reference data in the training set the mass spectrum, the NMR, refractive index, and density data for each of these mixtures. (6) W. N. Sanders and J. B. Maynard, ANAL.CHEM., 40,527 (1968).

ANALYTICAL CHEMISTRY, VOL. 45, NO. 1, JANUARY 1973

Table VI. Results of Type Analysis Based on Unweighted MS Data Volume, Z 132

131

Hydrocarbon type Theory 0 Rings double 43.6 bonds L

4 5 7

Total Total Cg C? c 8

CS Cl0 Cil ClZ Total Benzene Toluene Cg Benzenes c 9

Cl 0 Cli c 1 z

Total Total olefins Monoolefins Monoolefins C==C in chain Monoolefins C=C in ring 5-Carbon-ring sats. 5-Carbon-ring total 6-Carbon-ring sats. 6-Carbon-ring non-aromatic 6-Carbon-ring total

Theory

35.2 30.2 0.0 32.7 1.4 0.5 100.0 24.8 24.2 29.4 13.6 7.0 0.9 0.0 99.9 1.4 6.3 10.9 9.6 4.5 0.0 0.0 32.7

33.4 31.5 -0.7 33.4 1.4 0.6 99.6 27.5 28.2 30.6 12.5 4.7 1.9 -5.1 100.3 1.5 6.6 11.2 10.8 4.8 0.6 -1.1 34.4

34.3 31.8 -0.6 32.6 1.1 0.6 99.8 27.3 25.2 30.9 14.7 6.0 1.1 -4.8 100.4 1.5 6.6 10.9 10.7 5.0 0.2 -1.3 33.6

-0.6 -1.5

0.0 0.0

-0.9 0.0

-4.0

0.0

-3.6

Theory

40.7 12.1 -0.5 44.3 1.7 0.8 99.1 23.9 24.4 28.1 18.1 7.0 2.1 -3.3 100.3 2.0 8.7 14.8 14.6 6.3 0.6 -1 .o 46.0

42.4 12.0 -0.3 42.9 1.4 0.7 99.1 24.8 21.9 29.0 18.6 7.8 1.5 -3.3 100.3 1.9 8.7 14.4 14.1 6.4 0.3 -1.3 44.5

0.0 0.0

-0.5 -0.6

0.0

-4.2

+

1

Calcd data

Calcd data

11.2 0.0 42.9 1.5 0.7 99.9 23.3 21.3 27.8 17.6 8.9 1.1 0.0 100.0 1.9 8.2 14.3 12.6 6.0 0.0 0.0 43.0

134

133

Exutl data

Exutl daia

Exutl daia

Calcd data

Exptl daia

Calcd data

45.5 19.0 0.0 34.0 1.o 0.4 99.9 29.7 21.8 25.2 15.2 7.2 0.9 0.0 100.0 1.5 6.5 11.3 10.0 4.7 0.0 0.0 34.0

43.6 19.6 -0.3 35.2 1.o 0.5 99.6 31.3 24.9 26.1 13.3 4.8 1.8 -3.3 98.9 1.6 7.0 11.8 11.3 4.7 0.8 -0.9 36.3

44.2 20.3 -0.3 34.0 0.9 0.5 99.6 31.3 22.8 26.2 14.7 5.5 1.4 -3.0 98.9 1.5 7.0 11.4 11.2 4.9 0.2 -1.1 35.1

26.6 47.4 0.0 24.0 1.4 0.6 100.0 34.3 24.2 24.9 10.1 5.6 0.9 0.0 100.0 1 .o 4.6 8.0 7.0 3.3 0.0 0.0 23.9

26.1 47.2 -0.6 25.1 1.6 0.7 100.0 36.1 28.9 27.1 7.5 4.2 0.9 -6.6 98.1 1.2 5.2 8.4 8.2 3.6 0.4 -1.3 25.1

25.7 49.3 -0.4 23.8 1.3 0.6 100.3 36.5 25.8 25.9 9.7 4.3 0.7 -4.9 98.0 1.1 5.0 8.0 7.8 3.5 0.2 -1.2 24.4

-0.2 0.0

8.0 8.0

6.5 6.7

8.4 8.3

16.4 16.4

13.9 15.9

17.2 19.1

-2.1

8.0

4.2

6.8

16.4

14.2

18.7

Theory

0.0

1.7

0.5

0.0

1.6

0.3

0.0

1 .o

0.2

0.0

0.9

6.9

3.9

4.1

15.4

20.4

20.3

6.9

5.6

4.6

15.4

22.5

22.2

7.4

3.7

4.6

15.7

17.5

17.9

7.2

4.8

4.8

15.8

19.2

19.7

4.3

1.7

1.3

14.9

11.2

10.3

4.1

2.0

1.3

15.6

12.0

10.9

4.3

1.2

0.4

14.9

10.9

9.6

4.1

1.4

0.4

15.6

11.8

10.2

49.4

52.0

49.3

49.4

47.9

45.4

39.6

40.8

38.5

41.6

39.6

36.3

Some of the coefficients for the N M R data in the weight vector were so large that the observed differences between the observed and calculated N M R data would cause quite large errors in the computed values of the y’s. One problem with using N M R data for mixtures is the difficulty of properly defining the frequency range for integrating for each proton type. The normal shifts noted in the location of the range of frequencies associated with each proton type in different compounds give no trouble when studying pure compounds since the integration limits can be adjusted accordingly. However, in mixtures, the optimum ranges for the individual compounds overlap so that the integration range must be predefined somewhat arbitrarily. The use of calculated rather than experimental N M R data was another source of trouble. Since these data are without error, the calculations give too much weight to the N M R data. The use of experimental data would have avoided this latter problem but such data for all the pure compounds were not readily available. Although Tables 111 and IV show several instances where the inclusion of exact N M R data reduced the standard deviations, the use of experimental N M R data would surely have a smaller effect. The inclusion of refractive index and density data resulted in a similar problem. A measure of the effect of errors in the refractive index and density was determined by arbitrarily

-0.1

increasing the refractive index alone and then the density alone for one of the synthetic samples, repeating the calculations (defined in Equation 2 ) and observing the change in the results. I t was noted that even a large error in the density generally had only a small effect o n the results. The determination of olefin types and 5- and 6-carbon ring types was the exception. It follows that in cases where errors in the density did not affect the results, the inclusion of density data is of little benefit. Errors in the refractive index also showed very little effect in many cases, but had a very large effect in the determination of olefin types. A discrepancy of about 0.001 between the calculated and observed refractive index of a mixture can be expected due to the non-additivity of volumes. An error of this magnitude would result in quite a significant error in the determination of olefin types. It is obvious from the above discussion that the data in the training set must never be measured with a significantly higher accuracy than can be obtained for the same measurements on actual samples. If the data for the training set are substantially more accurate, then there appears to be only two alternatives. One is to delete such data. The other is to degrade the data so that the accuracy becomes comparable with the accuracy to be obtained on the samples. This could be achieved in the case of the refractive index and the density by rounding the literature values back to match the expected

ANALYTICAL CHEMISTRY, VOL. 45, NO. 1, JANUARY 1973

17

Table VII. Results of Structural Analysis’ Based on Unweighted MS Data 131

132

Theory

Exptl daia

Calcd data

0.743 7.61 13.0 2.12 1.53 0.32

0.743 7.64 12.9 2.12 1.52 0.33

0.739 0.750 7.65 7.46 13.1 13.3 1.95 2.17 1.25 1.55 0.34 0.25

0.01

0.03

0.04

-CHzin rings -CH: in rings

0.53 0.16

0.39 0.11

-C-

0.01

Structural features Moles/100 ml Carbon atoms Hydrogen atoms --CHI -CHzin chains -CH: in chains -C-

I

I I

in chains

in rings

-0.01

Theory

133

Exptl daia

Calcd data

Theory

Calcd data Theory

Exptl data

Calcd data

0.750 7.44 13.2 2.01 1.29 0.25

0.743 7.49 13.3 2.07 1.33 0.27

0.740 7.41 13.3 2.23 1.68 0.34

0.741 7.43 13.2 2.19 1.64 0.35

0.735 7.46 13.4 2.25 1.68 0.37

0.758 7.21 13.2 2.00 1.18 0.22

0.761 7.19 13.0 2.03 1.18 0.22

0.752 7.24 13.2 2.09 1.22 0.24

0.01

0.02

0.04

0.01

0.04

0.05

0.01

0.02

0.03

0.40 0.11

1.37 0.38

1.09 0.35

1.08 0.33

0.50 0.15

0.45 0.12

0.42 0.10

1.40 0.38

1.20 0.37

1.20 0.35

0.00

0.02

0.02

0.03

0.01

0.00

0.00

0.02

0.04

0.04

0.01 0.11

0.05

0.05 0.15 0.01

0.05 0.16 0.02

0.02 0.22 0.10

0.05 0.24 0.05

0.05 0.26 0.07

0.00

0.03

0.01

0.00

0.02

0.00

0.00

0.01

0.01

0.00

0.02

0.01

1.64 0.65 1.24 0.09

1.69 0.66 1.31 0.10

1.64 0.65 1.28 0.11

1.17 0.45 1.OO 0.17

1.23 0.49 1.06 0.17

1.17 0.45 1.03 0.19

0.00 1.16 0.08 0.44

0.02 1.19 0.07 0.44

0.01 1.16 0.07 0.43

0.00 0.83 0.17 0.45

0.02 0.87 0.17 0.42

0.01 0.83 0.17 0.40

=CHz =CH-

0.02 0.02 0.00 0.01 0.01 0.00 in chains 0.00 0.08 0.07 0.00 0.09 0.10 =C< in chains -0.03 -0.02 0.00 -0.03 -0.02 0.00 =CH- in non-aromatic rings 0.04 0.00 0.03 0.01 0.01 0.00 =C: in non-aromatic 0.00 0.00 0.02 0.01 rings 0.01 0.01 =CH- in aromatic rings 1.58 2.07 1.62 1.57 2.08 2.15 0.81 0.83 0.81 0.61 0.63 0.61 =C: in aromatic rings 1.52 1.11 C=C total 1.47 1.57 1.20 1.17 0.04 0.04 0.04 C=C in chains 0.00 0.00 0.03 C=C in non-aromatic rings 0.02 0.01 0.02 0.01 0.00 0.00 C=C in aromatic rings 1.47 1.11 1.14 1.11 1.51 1.47 5-Carbon rings 0.08 0.17 0.16 0.16 0.06 0.07 0.50 0.48 0.54 0.55 0.55 0.53 &Carbon rings a All values except moles/100 ml are expressed as the number per average molecule.

accuracy for the samples. In the case of NMR data this can best be accomplished by using actual experimental integrals taken over the frequency range to be used for the samples. The option chosen for the present was to delete the NMR, the refractive index, and the density data and depend entirely on the mass spectrometric data. The small gain to be achieved in the reliability of the results did not seem to justify the cost of these extra measurements. However, the use of all possible data should be considered whenever the very best results are required. Tables VI and VI1 show the results obtained for type analysis and structural analysis for the four synthetic samples. The results in the column headed by “Exptl data” are based o n the use of experimentally measured mass spectrometric data. The results in the columh headed by “Calcd data” are based on the use of mass spectrometric data calculated from reference spectra in the training set and the known composition of the sample. The results obtained using the calculated data are a measure of the degree to which the synthetic samples are representative of the entire training set. In these calculations, each of the 342 compounds in the training set must be considered as being equally probable components in the sample. (The use of weighting factors to emphasize the more probable components will be discussed in a later section.) The results obtained using the experimental data are influenced by the above effect and also by the degree with which the operating conditions for our mass spectrometer agree with the average operating conditions of the mass spectrometers used to obtain the training set. A comparison of these two sets of results gives some indication of the relative importance of these two effects. These results would seem to indiczte that the fact that most of the data in the training set were not obtained on our instrument does not materially affect the 18

134

Exptl data

results. These results d o not, however, really test how good the analyses would be with a totally consistent training seti.e., one in which all spectra were obtained on the same instrument. It will be noted that there are several different categories for 5-carbon-ring compounds, 6-carbon-ring compounds, and for olefin types. Particular emphasis was given t o these categories partly because of the general interest in methods for determining 5- and 6-carbon-ring compounds and also because such analyses are considered to be particularly difficult. The results obtained for these categories are rather poor. Use of Weighting Factors. As mentioned above all calculations thus far have assumed that each compound in the training set is a n equally probable component in a n actual sample. However, when this method is to be applied to a particular kind of mixture, such as gasoline, this assumption is not necessary. When the approximate composition of the sample is known beforehand, this information can be used to weight the data for these known components, thus giving much better results. A simple method of weighting the data is to multiply Equation 1 by a large number for important components and by a small number for minor components o r those totally absent. Just omitting all data for those components believed to be not present in the samples to be analyzed is thought to be undesirable. As mentioned earlier, it will be virtually impossible to include data in the training set for all components actually present in gasolines. Elimination of even minor components might cause quite a significant effect on the results. In choosing weighting factors, it seemed desirable to make them somehow related to the relative concentration of that component in a typical sample. The quite arbitrary method selected was to use a weighting factor for each compound in

ANALYTICAL CHEMISTRY, VOL. 45, NO. 1, JANUARY 1973

Table VIII. 131 Exptl data

Calcd data

0.0 0.0

42.0 10.4 -0.2 44.5 1.7 0.7 99.1 21.1 25.7 23.3 19.8 9.3 0.9 -2.2 97.9 1.9 8.4 14.3 13.3 6.3 0.3 -0.1 44.4 -1.7 -1.7

43.3 10.9 -0.1 42.9 1.5 0.7 99.2 23.3 23.3 27.0 17.7 9.1 0.8 -1.8 100. 1 1.9 8.5 14.2 12.1 6.2 0.1 -0.2 42.8 -1.4 -1.9

0.0

-2.7

Hydrocarbon Theory type 0 Rings double 43.6 bonds 1 11.2 2 0.0 42.9 4 5 1.5 7 0.7 Total 99.9 Total c g 23.3 21.3 C? 27.8 17.6 CS 8.9 c 1 0 1.1 Cll

+

c 1 2

Total Benzene Toluene cg Benzenes

c9

c 1 0

Cn ClZ

Total Total olefins Monoolefins Monoolefins C=C in chain Monoolefins C=C in ring 5-Carbon-ring sats. 5-Carbon-ring total 6-Carbon-ring sats. 6-Carbon-ring non-aromatic 6-Carbon-ring total Total 1-ring sats.

0.0 100.0

1.9 8.2 14.3 12.6 6.0 0.0 0.0 43.0

Results of Type Analysis Based on Weighted MS DATA Volume. ’Z IJJ

1J L . ~

E x T data

Calcd data 34.7 30.7 -0.1 32.6 1.3 0.6 99.8 25.5 26.2 29.1 13.1 7.1 0.4 -2.0 99.4 1.4 6.5 10.7 9.2 4.8

0.0 0.0

34.0 30.2 -0.2 33.6 1.5 0.6 99.7 22.2 28.9 27.1 13.2 7.4 0.8 -2.5 97.1 1.5 6.4 10.7 9.7 5.0 0.3 -0.1 33.5 -1.6 -1.6

-0.3 32.3 -0.9 -1.4

-2.2

0.0

-2.5

-0.5

0.0

0.0

Theory 35.2 30.2 0.0

32.7 1.4 0.5 100.0 24.8 24.2 29.4 13.6 7.0 0.9 0.0

99.9 1.4 6.3 10.9 9.6 4.5 0.0 0.0

32.7

Theory

Exptl data

45.5 19.0

44.7 18 .O

0.0

0.0

34.0 1 .o 0.4 99.9 29.7 21.8 25.2 15.2 7.2 0.9

34.0 8.0 8.0

35.3 1.1 0.5 99.6 26.7 25.5 22.4 15.2 8.1 0.8 -2.2 96.5 1.6 6.7 11.4 9.8 5.2 0.4 -0.1 35.0 5.6 5.6

-1.6

8.0

5.1

-0.6

0.0

0.0

0.0

0.0 100.0

1.5 6.5 11.3 10.0 4.7 0.0 0.0

Calcd data

Theory

45.2 18.8 -0.1 34.0 1 .o 0.4 99.3 30.0 23.4 24.6 14.9 7.0 0.5 -1.6 98.8 1.5

6.7 11.2 9.7 4.7 0.0

-0.2 33.6 7.6 7.3 7.5 -0.6

26.6 47.4 0.0

24.0 1.4 0.6 100.0 34.3 24.2 24.9 10.1 5.6 0.9 0.0

100.0 1 .o 4.6 8.0 7.0 3.3 0.0 0.0

Exptl data 27.1 45.6 -0.1 25.2 1.7 0.7 100.2 32.8 25.6 24.1 8.1 7.1 0.3 -2.9 95.1 1.1 4.8 8.0 7.3 3.9

Calcd data 26.1 48.3 -73.1 23.8 1.5 0.6 100.2 35.2 24.8 24.3 9.4 5.6 0.5 -1.7 98.1

0.1

1.1 4.8

7.8 6.8 3.3 0.0

23.9 16.4 16.4

-0.3 24.9 14.2 14.5

-0.2 23.6 17.3 17.4

16.4

14.2

18.1

0.0

0.2

-0.6

0.0

0.1

6.9

6.3

7.1

15.4

20.2

21.1

6.9

7.9

6.9

15.4

20.0

21.7

7.4

5.5

6.9

15.7

18.2

19.6

7.2

6.6

6.6

15.8

18.2

20.2

4.3

3.6

3.1

14.9

14.8

13.7

4.1

3.6

3.0

15.6

16.3

14.5

4.3

3.7

2.9

14.9

15.0

13.6

4.1

3.7

2.8

15.6

16.7

14.5

49.4 11.2

52.1 9.9

49.5 10.2

49.4 30.2

50.4 31.6

47.7 31.4

39.6 11 .o

41.4 11.2

39.2 10.1

41.6 31 .O

43.1 33.1

39.4 32.8

the training set equal to 10.0 times the square root of the percentage present in a typical sample. The typical sample used for this purpose was synthesized by averaging the percentages given by Sanders and Maynard (6) for each component in “Regular” gasoline, “Premium” gasoline, and “API Prototype Fuel No. 1.” Again only the first-named component was considered where there were several possible choices given for a single gas-chromatographic peak. No attempt was made to normalize these average values so as to total 100.0%. The largest weighting factor based on this criterion was 36.5 for toluene. No weighting factor was permitted to become smaller than 1.0. The training was repeated using such weighting factors and the four synthetic samples were reanalyzed using the new weight vectors. The results obtained for the analysis of the four synthetic samples are given in Tables VI11 and IX. In this case the results based on calculated data agree with the theoretical results somewhat better than those based on the experimental data. This is believed to indicate that the principal errors are now due to discrepancies between the mass spectra in the training set and those obtained using our instrument. CONCLUSIONS

The evaluation of the results obtained on real samples is rather difficult since the true values for all the calculated

properties are not readily available. A comparison of the results obtained for hydrocarbon types such as paraffins, naphthenes, and aromatics agree quite well with the results obtained by conventional mass spectrometric analysis provided the average carbon number does not exceed 8.0. Samples with a higher average carbon number show rather poor agreement. This is believed to be due to the relatively small number of the higher-carbon-number compounds in the training set as shown in Table 1. In addition, the spectra of nearly all such compounds were obtained on other instruments whereas the spectra of many of the lower-carbonnumber compounds were obtained on our instrument. This method has also been found useful for the determination of the per cent weight of carbon in hydrocarbon samples from the values for numbers of carbon and hydrogen atoms obtained in the structural analyses. The computed percentages of carbon for 41 samples were compared with the results as determined by combustion analysis. The standard deviation of the difference between the computed and observed values was 0.027% carbon. The average carbon number and the percentage of aromatics in these samples varied from 7.4 to 8.5 and from 2 to 23 %, respectively. It must be emphasized that the present work represents principally a n investigation of a new approach to the analysis of very complex mixtures. Although these calculated weight

ANALYTICAL CHEMISTRY, VOL. 45,

NO. 1, JANUARY 1973

19

Table IX. Results of Structural Analysis' Based on Weighted MS Data 131

Structural features Moles/100 ml Carbon atoms Hydrogen atoms -CHa -CHZ- in chains -CH: in chains

132

Theory

ExDtl data

Calcd data

0.743 7.61

0.748 7.59

0.744 7.60

133

Theory

Exptl data

Calcd data

0.750 7.46

0,755 7.41

134

Theory

Exptl data

Calcd data

0.749 7.45

0.740 7.41

0.745 7.39

Theory

Exptl data

Calcd data

0.741 7.40

0,758 7.21

0.765 7.16

0.757 7.21

13.0 2.12

12.9 2.10

13.0 2.17

13.3 1.95

13.1 1.92

13.3 2.01

13.3 2.23

13.2 2.18

13.3 2.28

13.2 2.00

13.0 1.99

13.2 2.07

1.53

1.47

1.48

1.25

1.24

1.22

1.68

1.63

1.60

1.18

1.13

1.13

0.32

0.35

0.35

0.25

0.26

0.28

0.34

0.36

0.40

0.22

0.24

0.26

0.01

0.00

0.02

0.01

0.00

0.02

0.01

0.00

0.01

0.01

0.53 0.16

0.50 0.14

0.49 0.15

1.37 0.38

1.33 0.36

1.28 0.37

0.50 0.15

0.55 0.15

0.48 0.14

1.40 0.38

1.40 0.36

1.34 0.38

0.01

0.00

0.01

0.02

0.02

0.02

0.01

0.00

0.01

0.02

0.02

0.03

=CH2 0.00 0.00 0.00 0.00 0.00 0.00 0.01 =CH- in chains 0.00 0.01 0.11 0.01 0.00 0.01 0.01 =C: in chains 0.00 -0.01 0.00 0.00 0.00 0.05 0.00 =CH- in non-aromatic rings 0.00 0.01 0.00 0.00 0.01 0.00 0.00 =C: in nonaromatic rings 0.00 0.00 0.00 0.00 0.00 0.00 0.00 =CH- in aromatic rings 2.08 2.15 2.07 1.58 1.61 1.64 1.57 =C: in aromatic rings 0.81 0.62 0.83 0.81 0.61 0.63 0.65 C==C total 1.47 1.51 1.47 1.11 1.15 1.12 1.24 C=C in chains 0.00 0.00 0.00 0.00 0.00 0.01 0.09 C=C in nonaromatic rings 0.00 0.01 0.00 0.00 0.01 0.00 0.00 C - C in aromatic rings 1.47 1.51 1.11 1.14 1.46 1.11 1.16 5-Carbon rings 0.08 0.07 0.08 0.17 0.17 0.18 0.08 6-Carbon rings 0.55 0.56 0.54 0.54 0.53 0.51 0.44 All values except moles/100 ml, are expressed as the number per average molecule.

0.00

0.02

0.02

0.01

0.03

0.09 0.05

0.11 0.05

0.22 0.10

0.18 0.10

0.21 0.10

0.01

0.00

0.00

0.01

0.00

0.00

0.00

0.00

0.00

0.00

1.69

1.63

1.17

1.22

1.17

0.66 1.26 0.07

0.65 1.24 0.09

0.45 1 .OO 0.17

0.48 1.02 0.15

0.45 1 .OO 0.17

0.01

0.00

0.00

0.01

0.00

1.18 0.08 0.45

1.15 0.07 0.43

0.83 0.17 0.45

0.87 0.17 0.45

0.83 0.18 0.42

I -CI

in chains

CHZ- in rings -CH: in rings I

-C-

I

in rings

vectors are useful for analyzing samples similar to gasoline in composition, they are not applicable to samples of higher average carbon number because of the limitation of the training set. Reliable results for such samples can be obtained only by retraining with a n expanded training set. Also, it would be very desirable to obtain all spectra o n the same mass spectrometer that is to be used for the analyses. As indicated previously, this same approach may be applied to the interpretation of many other kinds of analytical data. However, it is not intended for the analysis of relatively simple mixtures where the number of components is less than the number of analytical measurements available. Such mixtures are better analyzed by a least-squares solution of a set of equations based on data for just the components present in the sample. Its principal advantage is in the analysis of very complex mixtures where the number of components exceed the number of analytical measurements. Sometimes such correlations are not obvious and may even be totally unexpected. As a n example, it was discovered in this work that the carbon number could be calculated from the following simple expression C

=

-0.0309

+ 8.31340 + 0.0736R - 0.08388 + B

0.01

moles of hydrogen/100 ml, and B = the sum of a large number of very small terms based on the mass spectrometric intensity. Separate calculations with 15 compounds from the training set have shown that the value of B is nearly independent of the type of compound. A value of -0.067 for B gave errors of less than 0.01 mole/100 ml in the carbon number of these 15 compounds. Another advantage of this method is the simplicity of its application to actual samples. Although deriving the weight vectors involves the collection of a large set of data for the training set and some rather involved calculations, the sample calculations consist only of substituting the sample data into Equation 2. ACKNOWLEDGMENT

The authors wish to acknowledge the work of A. K. Irikura in testing the method and in the preparation of a comparison between the computed percentage carbon and the combustion values and also the valuable advice of J. H. Schachtschneider in some of the mathematical aspects of the calculations.

(10)

where C = the carbon number in moles/100 ml, D = the density at 25 "C, R = the infractive index a t 25 " C , H = the 20

-0.01

RECEIVED for review May 1, 1972. Accepted September 25, 1972.

ANALYTICAL CHEMISTRY, VOL. 45, NO. 1, JANUARY 1973