Qualitative and Quantitative Determination of Suspected Components in Mixtures by Target Transformation Factor Analysis of Their Mass Spectra Edmund R. Malinowski" and Matthew McCue Depatfment of Chemistry and Chemical Engineering, Stevens Institute of Technology, Hoboken,N.J. 07030
Prior studies have shown that the number of components in a series of related mixtures can be determlned by factor analyzing their mass spectra. The present investigation shows how target transformation factor analysis can be used to identify the components and to yield a quantitative analysis of each mixture. An example calculation, using data from the literature, is presented.
Factor analysis (FA), a computer method for data processing, is rapidly becoming an important tool for the analytical chemist involved in component analysis (1-7).In a recent study Isenhour and co-workers (8)demonstrated how the principal component analysis (PCA) feature of FA can be used to determine the number of components in a series of related mixtures by factor analyzing their mass spectra (MS). This method is extremely valuable for discerning whether a single gas chromatographic peak is really composed of two or more components. We wish to describe here how the target transformation feature of FA can be used to yield both qualitative and quantitative information. A detailed discussion of target transformation, also known as least-squares rotation, can be found in the literature (9) and further description will not be presented here. In MS studies, FA target transformation can be used to decide whether or not a suspected substance is present in the mixtures, thus providing the analytical chemist with a new tool for qualitative identification. Having identified the components, the analyst can then use target transformation to deduce the composition of each mixture. In addition to the MS of the pure components, either an exact knowledge of the sample pressures of the pure components in the mass spectrometer or the MS of one mixture of known composition is required for quantitative analysis. All spectra must be recorded under the same experimental conditions. The major advantage of FA over simply solving n simultaneous equations for n unknowns, where n is the number of components in the mixtures, is that FA enables us to use as many values of mle as are experimentally available. Thus from the standpoint of handling random errors, FA, because of its inherent statistical characteristics, is a more desirable computational technique than conventional methods. The existing computer program (10) requires no modification to perform the above tasks.
where H(i,a) is the height of the ith mle peak in mixture CY, h" (i,j) is the height of the corresponding peak of the pure j t h component per unit pressure and p U,a)is the partial pressure of component j in mixture CY. We note here that p G , a ) is the partial pressure of the component in the ionization chamber and not the sample reservoir. h" (i,j) is obtained from the MS of the pure component since
H" ( i J h"(i,j) = -
P"0)
where H" (i,j) is the height of the ith mle peak in the MS of pure j and p o 0') is the pressure of the pure j in the ionization chamber during the measurement. Because of the extremely low pressures involved, Dalton's law applies. The partial pressure of each component in the sample reservoir is directly proportional to the mole fraction X(I',a)of j in the original sample mixture a, provided that the entire sample is volatilized. Because of the nature of the leak system, which introduces mass discrimination, the partial pressures in the ionization chamber are different from but proportional to those in the sample reservoir. In other words PO',^) = DG)XO',a)P(a)
(3)
Here P ( a )is the total pressure in the ionization chamber. DG) is a discrimination term introduced to account for the various peculiarities of the mass spectrometer which tend to cause discrimination between the components of a mixture. For example, for a molecular leak inlet system, because of molecular effusion, the discrimination term would be inversely proportional to the square root of the molecular weight of the component. Inserting Equations 2 and 3 into 1,we find
00') H ( i , a ) = J = n Ho(i,j) ,-XG,a)P(a) P
j=1
0)
(4)
This expression can be reduced to a product function of two cofactors as follows:
H(i,a)=
j=n
H"(i,j)FG,a)
(5)
where
BASIS Factor analysis is based upon expressing a measurable as a linear combination of product functions. Each term in the sum, called a factor, is a product of two cooperating factors, called cofactors. The MS of a series of mixtures containing the same components but varying composition is ideally suited for FA because the intensity (height) of each mass peak is a linear sum of contributions arising from each component. 284
ANALYTICAL CHEMISTRY, VOL. 49, NO. 2, FEBRUARY 1977
Equation 5 expresses the problem in terms of the spectra of the pure components, Ho(i,j). If the peak heights of the pure components are used as test vectors in the FA scheme, the corresponding coefficients, FG, a ) ,which emerge will be related to the mole fractions as shown in Equation 6. For a given solution with components 1,2, . . . , n, the ratio of these coefficients will be in the following proportion:
n
within experimental error represents the number of components in the mixtures. Second, the MS of pure substances suspected to be components are then introduced individually into the scheme as test vkktors. The target transformation procedure allows the analyst to decide whether or not the suspected substance is a true component. Third, after having successfully identified all of the components, a standard solution of known composition, containing all of the components, is prepared. The MS of the standard is recorded under the same experimental conditions as the unknowns. The data are added to the original data matrix and the enlarged data matrix is “decomposed” into abstract factors. The MS of the pure components are fed into the computer program as a combination set. The target transformation scheme will yield the FU,a) cofactors for the standard solution as well as the unknowns. By means of Equation 7 and the known composition of the standard solution, the composition of the unknown mixtures is quickly determined.
MS OF PURE
SUBSTANCES SUSPECTED TO BE COMPONENTS
I PECOMPOSIl l O N
ABSTRACT FACIORS N COMPONENTSl
MS O F SOLUTIONS Di UNKNOWNS
f N FACTORS
=
Flgure 1. Diagram of the FA routine used to deduce the number of
components and their identities-qualitative tests
n
APPLICATIONS
MS OF AN lCCEPTABLE S E T OF TRUE COMPONf NTS
COMBINITI;
MS OF SOLUTIONS O i UNKNOWNS AN0 STANDARD SOLUTION
COMPOSITION OF UNKNOWNS
TARCIT
TRANSFOR,MATION
DLCoMPoSIT1oN
COFPCTORS
O F STANDARO SOLUTION
Figure 2. Diagram of the FA routine used to determine the compositions-quantitative analyses
F ( l,a):F(2,a):. . , :PO’@): . , . :F(u,a)
From this equation we see that in order to determine the composition of the mixture from the FG,a) cofactors we need to know the quantities in the parentheses involving the ratios of pressures and discrimination factors for the pure components. For a molecular leak inlet system
where P”(1) and Po0‘) are the experimentally measured pressures of the pure components in the sample reservoir. A way of obtaining the compositions, independent of pressure measurements, is to employ a standard solution of known composition. Only one standard solution is required. The following sequence of operations describes the routine for 1) deducing the number of components, 2) identifying the components, and 3) determining the chemical compositions of the related mixtures. The reader is advised to refer to Figures 1 and 2 throughout our brief description. First, the MS is subjected to mathematical “decomposition” which expresses the data in terms of abstract factors. The number, n,of abstract factors required to reproduce the data
As an example of the use of the FA methodology developed here, we will examine the MS data recently published by Isenhour and co-workers (8).One of these studies involved recording the intensities of 18 mle values for 7 cyclohexane1 hexane mixtures. When this data matrix was factor analyzed, surprisingly, three factors emerged. This gave evidence that not two but three components were present. The removal of the mle 28 peak from the analysis yielded two factors. Thus the third component was identified to be nitrogen contaminant. To illustrate the principles developed here, we factor analyzed the cyclohexanelhexane data of reference (8) after deleting mle 28 and the MS of the pure components. In agreement with the earlier investigation, we found that two abstract factors reproduced the data within experimental error, f0.05. Target transformation, using two factors, was used to find out whether or not cyclohexane was a component. The MS of pure cyclohexane (see Table I) was inputed as a test vector. T o make the test more severe, the intensities of six masses, in parentheses, were purposely free-floated (left blank) in the test vector. Table I shows that the FA prediction is in excellent agreement with the test values as well as those which were purposely deleted from the test vector. The average difference between the test points and the predicted points, 0.1, lies within an acceptable range of the estimated error. We conclude therefore, that cyclohexane is present in the mixtures. A similar target transformation was carried out to test for the presence of hexane. Table I shows the MS test vector which was employed. Again six masses, in parentheses, were free-floated. The agreement between the prediction and test confirms the presence of hexane. The average difference between the test and prediction is 0.14, within the acceptable error estimation range. The target transformation test procedure described above is a powerful facet of factor analysis because it provides us with a method to test individually whether or not a specific substance is present. Poor agreement between the prediction and test vector is evidence for the absence of the suspected substance. As another example of component identification, we factor analyzed the MS data of reference (8) concerning mixtures of cyclohexane and cyclohexene. Two abstract factors reproduced the data satisfactorily, indicating two components. The results shown in Table I1 confirm the presence of cyclohexane, the average difference between the test and prediction being 0.16. The prediction using the MS of pure
ANALYTICAL CHEMISTRY, VOL. 49, NO. 2, FEBRUARY 1977
285
Table I. Comparisons between Experimental Test Vectors and Results Predicted by Target Transformation FA Using Two-Factor Space. The Data Matrix Was Composed of the MS Intensities of Five Cyclohexane/Hexane Mixturesa Cyclohexane
a
mle
Testb
27 29 39 40 41 42 43 44 54 55 56 57 69 83 84 85 86
(1.8)
Prediction 1.9 1.3 2.3 0.6 7.3 3.5
1.3 2.5 0.7 (7.1) (3.5) 2.2 0.2 (0.8) 4.6 13.5 (1.2) 3.8 0.8 10.7 0.9
0.2 0.7 4.9 13.6 1.2 4.1 0.7 10.4 0.9
(0.1)
0.1
2.1
Hexane Difference
Testb
Prediction 2.7 4.8
-0.2 -0.1 0.2 0 -0.1 0 -0.1 0.3 0.1 0 0.3
(2.8) 5.1 (1.6) (0.4) 8.8 4.7 (8.6) 0.3 0.1 0.9 6.8 12.2 0.2
-0.1
(0.1)
-0.3
0.1 (0.1) 2.8
0.1 0
0 0 Av 0.11
Difference -0.1
-0.3 0.2 -0.2 -0.1 -0.1
1.8
0.2 8.7 4.6 8.8 0.2 0.1 0.6 6.9 12.3 0.3 0.3 0.2 0.2 3.0
0.2
-0.1 0 -0.3 0.1 0.1 0.1 0.2 0.1 0.1
0.2 Av 0.14
Taken from Table I11 of reference (8). Values in parentheses were free-floated, i.e., left blank in the test vector.
Table 11. Comparisons between Experimental Test Vectors and Results Predicted by Target Transformation FA Using Two-Factor Space. The Data Matrix Was Composed of the MS Intensities of Four Cyclohexane/Cyclohexene Mixturesa Cyclohexane rnle
27 28 29 39 40 41 42 43 51 53 54 55
56 67 68 69 79 81
82 84
Testb 1.8
...
1.3 2.5 0.7 7.1 3.5 2.2
...
... 0.8 4.6 13.5
... ...
Prediction 1.8
1.0 0.9 2.6 0.6 7.1 3.3 1.6 0.3 0.5 0.8 4.7 13.6 0.2 0.3
Hexane Differ en ce
Testb
Prediction
Difference
0
2.8
-1.6
-0.4 0.1
5.1 1.6 0.4 8.8 4.7 8.6
1.2 0.6 0.5
... -0.1 0 -0.2
-0.6
... ...
*..
*.. *..
0 0.1 0.1
0.1 0.9 6.8
*..
...
...
... ... ...
0.1
*..
... 0.2 ...
0.1 0.1
...
*..
10.7
10.6
3.8
4.0
0.2
*..
-0.1
.*.
0.1
Av 0.16
2.1
0.4 4.2 1.6 0.7 0.4 0.5 2.4 2.3 6.4 3.1 0.3 1.9 0.3 0.3
*..
-4.6 0.5 0
-4.6 -3.1 -7.9
... ...
2.3 1.4 -0.4
... ...
1.7
... *..
1.1
...
5.0
4.9 Av 2.75
Taken from Table I of reference (8). Taken from Table I11 of reference (8).
hexane as a test vector is also shown in Table 11. In this case the average difference between the test and prediction is 2.75, a disagreement approximately twenty times the acceptable error, indicating that hexane is not a component. Unfortunately the MS of cyclohexene was not reported and a test for its presence could not be carried out. Let us return to our study of the cyclohexanelhexane mixtures. Using target transformation FA we were able to confirm the presence of cyclohexane and hexane. We now enlarge our data matrix by adding the MS of the pure components. This is not necessary, but it should improve the statistical averaging in our computations. We chose the 55 mol % cyclohexane solution to represent our standard solution of known composi286
ANALYTICAL CHEMISTRY, VOL. 49, NO. 2, FEBRUARY 1977
tion. The MS of the standard solution must be included in the enlarged data matrix. When the enlarged data matrix was decomposed by FA, two abstract factors emerged, as expected. The MS of the identified components, cyclohexane and hexane, were inputed as a combination set in the target transformation (see Figure 2). The complete set of cofactors FG,a) which emerged is shown in Table 111.From these cofactors and the composition of the standard solution the compositions of the unknown mixtures were calculated. These results are compared to the reported values in Table 111. Unfortunately, the accuracies of the reported compositions in reference (8)were not published. We suspect that the original solutions were prepared somewhat
LITERATURE CITED
Tab€e111. The Results of Factor Analysis of Seven Cyclohexane/Hexane Mixtures Mixture
W ,a 1
(a)
Cyclohexane Hexane 1.00
0.00
0.79 0.84
6
0.34 0.17
0.10 0.19 0.53 0.76 0.87
7
0.00
1.00
1 2 3 4 5
a
0.66
(1)J. J. Kankare, Anal. Chem., 42, 1322 (1970). (2)2. 2. Hugus, Jr., and A. A. El-Awady, J. Pbys. Chem., 75,2954(1971). (3)J. T. Bulmer and H. F. Shurvell, J. Phys. Chem., 77,256,2085(1973);Can. J. Chem., 53, 1251 (1975).
Mole fraction cyclohexane (from FA)
(reported)
1.00 0.88 0.81
1.00 0.92 0.83
0.55"
0.55"
0.30
0.16
0.23 0.12
0.00
0.00
(4)
D. Macnaughtan, Jr., L. B. Rogers, and G. Wernimont, Anal. Chem., 44,
1421 (1972). (5)J. E. Davis, A. Shepard, N. Stanford, and L. B. Rogers, Anal. Chem., 46, 821 (1974). (6)W. H. Lawton and E. A. Sylvestre, Technometrics, 13,617 (1971). (7)E. A. Sylvestre, W. H. Lawton, and M. S.Maggio, Technometrics, 16,353 (1974). (8)G.L. Ritter, S. R. Lowry, T. L. Isenhour, and C. L. Wilklns, Anal. Chem., 48, 591 (1976). (9)P. H. Weiner, E. R. Malinowski, and A. R. Levinstone, J. Phys. Chem., 74, 4537 (1970). (IO)E. R. Malinowski, D.G. Howery, P. H. Weiner, J. M. Soroka, P. T. Funke,
Used to represent the standard solution.
R. B. Selzer, and A. Levinstone, "FACTANAL-Target Transformation Factor Analysis", Program 320,Quantum Chemistry Program Exchange, Indiana University, Bloomington, Ind., 1976.
crudely since the original purpose was to determine the number of components by FA and not the compositions.
ACKNOWLEDGMENT
RECEIVEDfor review August 25, 1976. Accepted November 12, 1976. M. McCue was supported by a Robert Crooks Stanley Fellowship.
The authors extend their thanks to Harry Rozyn for his help in carrying out the computations.
Gas Chromatography-Mass Spectrometry Study of Acetylacetonyl Dipeptide Methyl Esters Hartmut Frank,'
K. D. Haegele,2 and D. M. Desiderio"
Institute for Lipid Research and Marrs McLean Depatiment of Biochemistry, Baylor College of Medicine, Houston, Texas, 77025
Reaction conditions for derlvatization of dipeptides with acetylacetone are optimized. Compatible procedures for derivatization of trifunctional amino acids are developed. GC/MS properties of acetylacetonyl dipeptide methyl esters are investigated and found to be suitable for identification of components In a complex mlxture of derlvattzed dipeptides. Unamblguous identification of individual components is achieved for unresolved GC peaks.
Most amino acid sequences of natural proteins and peptides have been elucidated by Edman degradation ( I ) . In spite of these indisputable successes, some difficulties have been encountered in sequence determination of peptides isolated in amounts of less than 50 nmol(2). Each successive step has a yield of about 98% with the sequenator, and the amount of detectable amino acid derivative is gradually lowered while the background increases. Other problems may arise during extraction. Slight solubility of polypeptides in the organic solvent used in this step complicates analysis of amounts of less than 25 nmol. Solubility of oligopeptides in organic solvents may also cause loss of C-terminal peptide (2). Another elegant technique for protein sequence determination employs one of the dipeptidyl aminopeptidases (DAP) (3-5).DAP hydrolyzes a polypeptide into a set of dipeptides starting from the amino terminus to the carboxyl terminus or Present address, Chemisches Institut, Universitat Tubingen, Tubingen, West Germany. Present address, Clinical Pharmacology Program, University of Texas, Health Science Center, Departments of Pharmacology and Pathology, San Antonio, Texas 78284.
to any Pro, Lys, or Arg residues. If the latter are present, they are removed with one step of the Edman procedure. The Nterminal amino acid is removed by a chemical method and the resulting polypeptide is again subjected to enzymic digestion with DAP yielding a second, overlapping set of dipeptides. DAP I has been most widely employed. The method is suitable for sequence elucidation in combination with trypsin which selectively cleaves on the carboxyl side of Arg and Lys. These bonds are not hydrolyzed by DAP I. One potential drawback is the requirement for very efficient methods to separate the mixture into components and identify each dipeptide. In Edman degradation one deals with only 20 different molecular species and determination of Rf values distinguishes the 20 PTH-amino acids. Identification of all dipeptides in a mixture resulting from DAP digestion is a much more complicated task. The number of molecular species potentially present is high, and properties governing separation (molecular weight, hydrophilicity, ionic charge, conformation) are often similar for different dipeptides and complete separation can only be expected when the mixture contains a small number of dipeptides with different characteristics. Therefore Rf values or retention times are insufficient for identification of dipeptides resulting from DAP hydrolysis of a polypeptide. One of the least tedious, and at the same time most informative methods, is the combination of gas chromatography and mass spectrometry (GC-MS). Several parameters can be determined in one analysis: retention time, molecular weight, sequence-related fragmentation, and the amino acid sequence. Amino and carboxyl groups of dipeptides must be derivatized to make them suitable for GC. Perfluoroacyl derivatives
ANALYTICAL CHEMISTRY, VOL. 49, NO. 2, FEBRUARY 1977
287