Anal. Chem. 1988, 60,1503-1510
1503
Fast Self-Modeling Curve-Resolution Method for Time-Resolved Mass Spectral Data Willem Windig,*,' Shirley A. Liebman,' Michael B. Wasserman, and A. Peter Snyder US.Army Chemical Research, Development and Engineering Center, SMCCR-RSL, Aberdeen Proving Ground, Maryland 21010-5423
A rapid self-modeilng curve-resolution method for time-resdved mass spectral data Is described. The method is much simpler and thus faster than factor analysis based methods. Examples are given of the application of thls technique to an unresoked gas chromatography/mass spectrometry (QC/MS) data set of a mixture of dicyciohexylamine, diphenylamine, and dibenzylamlne and time-resdved mass spectral data sets of a biopolymer mlxture and a group A Streptococcus organism which were obtained by oxldative pyrolysis atmospheric pressure ionization mass spectrometry. The method can be applied in a continuous ("on-line") mode, which is demonstrated on the overlapping GC/MS data.
Most of the modern self-modeling curveresolution methods are based on the use of factor analysis (1-19). Factor analysis is well suited for curve-resolution procedures because of the noise reduction that can be obtained by the use of the proper number of factors (20)and because of the ability to find "pure masses"; i.e. masses that have an intensity for only one of the components of the mixture under consideration ( 2 , 5 , 9 ) . Although factor analysis is presently the state-of-the-art approach for self-modeling curve resolution, problems arise when applied to complex mass spectral data, such as that resulting from pyrolysis mass spectrometry. There is no established way to determine the proper number of factors. Error indicator functions such as those developed by Malinowski (20) may not be applied in their present form to data from pyrolysis MS. Furthermore, noisy peaks behave like pure masses, and therefore, the latter must be confirmed by independent methods, e.g. by checking if a pure mass axis shows a correlated behavior with other masses that are more or less characteristic for a particular component (21). The above problems inherent in factor analysis necessarily yield a time-consuming process for which an experienced operator is required. A rapid method with minimal or no operator interaction would make self-modeling curve resolution more accessible for general use. Also, an approach for analyte detection in a continuous mode would be desirable, e.g. for process monitoring with a mass spectrometer. A method is described for self-modeling curve resolution that uses a simpler and faster algorithm than that of factor analysis and yields time-resolved component curves and spectra of the pure components. To obtain the spectra of the pure components, a method is described that does not require the knowledge of the number of components in the mixture, which is in contrast to pure mass methods.
EXPERIMENTAL SECTION Gas Chromatography/Mass Spectrometry. Samples of dibenzylamine, diphenylamine, and dicyclohexylamine were obtained from commercial sources and used as received. The Affiliated with GEO Centers, Inc., Prospect Square Building, Suite 502,10903 Indian Head Highway, Ft. Washington, MD 20744. Presently a t the Chemometrics Laboratory, Eastman Kodak Co., Rochester, NY 14650. 0003-2700/88/0360-1503$01.50/0
samples were diluted in methanol and syringe-injected in 0.1-pL aliquota for analysis on a DB-5,30 m X 0.25 mm fused silica capillary column contained in a Hewlett-Packard 5985B GC/MS system. Conditions were chosen in order to obtain coeluting peaks of the three selected amines in approximately equal amounts with methanol as the solvent. Reference spectra for each of the amines were obtained in separate analyses. Typical experimental conditions were 1-2 mL/min helium carrier and 200 "C isothermal oven for 2-3 min and then isothermal at 300 "C. Total ion currents were recorded with standard electron impact conditions: source temperature, 200 "C; electron energy, 70 eV; 300 PA emission current with a 33-500 mass scan range. Mass Spectrometry. The biopolymer mixture was prepared from DNA, glycogen (GLY), and bovine serum albumin (BSA) (Sigma, D-2251, G-0885, and A-9647, respectively). The 1.0 mg/O.l mL solutions were prepared for each biopolymer with distilled, deionized water as well as one solution containing 1.0 mg/O.l mL of each biopolymer. A group A strain of Streptococcus bacteria (22) was obtained as a gift from Alvin Fox of the University of South Carolina. A 1.0 mg/O.l mL suspension in distilled deionized water was made and sonicated for 30 s. Muramic acid (Sigma M-2503), in a water solution (1.0 mg/O.l mL), was prepared. Twenty micrograms (2 pL) of each sample was applied to a Chemical Data System pyroprobe platinum filament, and the probe was connected to a Chemical Data Systems Micro-Extended Pyroprobe pyrolysis power supply. The solvent was evaporated by flash heating the ribbon to 100 "C for 20 s prior to analysis. Oxidative pyrolysis of the samples was conducted at 300 "C/min with a final set temperature of 600 "C, and the total heating time was 2 min. Pyrolyzate transfer from the heating zone to the atmospheric pressure chemical ionization (APCI) source and subsequent removal were effected by an air flow through the probe in combination with an 8-mm pressure difference between the ion source canister and ambient obtained by the exhaust motor of the mass spectrometer. The configuration of the platinum ribbon probe interfaced to the APCI source was identical with that of a pyroprobe quartz tube assembly (23). The ion source exhaust motor was necessary in order to return the total ion current (TIC) to the background level within the 2-min analysis time. The mass spectral analysis was started after the first minute of the temperature program, since no TIC increase was observed before that, and a subsequent 1-min data collection was performed. Mass spectral analysis was performed on a Sciex TAGA 6OOO APCI triple quadrupole mass spectrometer. The analysis conditions were: mass range scanned from m / z 74 to 200 at a mass scan speed of approximately 1spectrum/s. This resulted in 54 spectra for a 1-min analysis period. Daughter ion analysis was performed with the central quadrupole in the rf-only mode and the pressure of the argon collision gas was 6.0 X Torr. RESULTS AND DISCUSSION In order to demonstrate the data analysis procedure, a simulated data set of three components with an overlapping 0 1988 American Chemical Society
1504
ANALYTICAL CHEMISTRY, VOL. 60, NO. 15, AUGUST 1, 1988 loo
1
b
50
I
1
1
I
I
I
I
17
1
9 SPECTRUM NUMBER
17
I
a SPECTRUM NUMBER
V 25 I
I
I
1
9 SPECTRUM NUMBER
17
,
I
1
9
17
SPECTRUM NUMBER
1
0
1
1
1
1
1
TIME
1
1
1
1
Flgure 2. (a) Total ion current (TIC) of a smoothed GC/MS trace. There is no clear indication of the presence of three components in
1
10
Flgure 1. (a) Simulated overlapping evolution of three components, A, B, and C, resulting in a total signal that does not show the presence
of three processes. (b) After the intensities of the three components are squared, the total signal shows the presence of the three processes. The intensities are expressed as a percentage of the maximum total signal.
temporal response was used. The behavior in time of the individual components and their resulting TIC is presented in Figure la. A clear evolution of the three components is not observed in the TIC. If the peak widths of the TIC curves of each of the three components would be narrower, the presence of three components could be observed in the total ion current plot. A mathematical way to achieve this is to square the intensities of the three separate components. This is demonstrated with the simulated data set in Figure lb. If the desired result is not reached after squaring the values, the process can be continued by multiplying the data set by itself again, resulting in the fourth power of the data set. GC/MS Data Set. The partially resolved GC/MS data set as presented in the inset of Figure 2a was used as a model for the self-modeling curve resolution approach by smoothing the chromatogram to produce an unresolved chromatogram (Figure 2a). This was achieved by applying smoothing eq l a
s,* = 0.25s,-, + 0.5S, + o.25st+l for t = 2, m - 1
(la)
four times to the chromatographic data, where St* is the smoothed spectrum at time t , S, is the unsmoothed spectrum at time t, and m is the total number of spectra. For the first and last spectrum, the following smoothing formulas were used:
As can be seen in Figure 2a, the TIC curve does not show a clear presence of three components. The approach described above to resolve the overlapping behavior of the components
the mixture analyzed. The inset represents the unsmoothed, partiilly resolved data set. (b) TIC after the mass variables were scaled to make the length of all mass vectors in the data set equal. (c) TIC resulting kom the squaring of the scaled mass intensltles, the presence of three components becomes visible. (d) TIC resulting from the fourth power of the scaled mass intensities. The maxima assigned to the three components are clearly visible. Spectrum number 6 represents dicylohexylamine, 10 represents diphenylamine, and 15 represents dibenzylamine. can be adapted for this data set as follows: The behavior of a mass that is typical for dicyclohexylamine will show a similar behavior in time as the dicyclohexylamine component. The same is true for masses that are typical for the other two components. Therefore squaring the intensities of these variables results in a narrowing of the component curves comprising the TIC and, consequently, produces a temporal separation of the individual components. Other masses that have contributions from more than one component will produce less of an overall separation effect; however they will contribute to the temporal separation of overlapping components as long as they are biased to one component or nonoverlappingcomponents, e.g. the fist and third component of the data set under consideration. Before the squaring procedure is applied to the data set, the mass variables are scaled, in order to yield features of equal contributions by giving them equal vector length (20). This is important in order to prevent dominance in the data set of high-intensity mass spectral peaks. This is especially true for pyrolysis mass spectra, since high intensity peaks often represent noncharacteristic fragments (24). In mathematical terms the scaling procedure is the following: The original matrix D (size s X u, where s is the number of spectra and u is the number of mass variables) is transformed to matrix DS by
D# = DB bjj =
k[(d~,~)-~/~] i=l
(24
ANALYTICAL CHEMISTRY, VOL. 60, NO. 15, AUGUST 1, 1988
e "j
'?I
1.0,
0
1
30
E
1505
'*!
E
., il , , I , 50
70
Id,
00
,,I,,
110 miz
,)
120
, , , , , ,
130
150
170
1m
,
lo0 mir
mlz
Figure 3. Model spectra of the three components dicyclohexylamine, diphenylamine, and dibenzylamine presented in 3a-c, respectively. The spectra at the local maxima in the plot given in Figure 2d, which have the highest amount of the above three components, are presented in d-f. Overlap is obvious in 3d. The mathematicallyextracted spectra, calculated on the basis of the spectra in 3d-f, are given in 39-1. It is obvious that these spectra are similar to the spectra of the model components.
The scaling procedure in eq 2 has a disadvantage in that indiscriminate,low-intensity features are treated the same way as peaks characteristic of a particular component. In order to avoid this undesirable effect, mass vectors with a length less than 10% of the maximum length were eliminated. The combination of scaling the data, followed by calculating the second, fourth, or higher power and the TIC of the obtained data will be called the scaled power TIC method, abbreviated SP-TIC. Application of the scaling procedure on the GC/MS data set is presented in Figure 2b and does not change the overlapping behavior of the three components. The result of squaring the scaled intensities is given in Figure 212, and the presence of the three components becomes clear. Squaring the data again, resulting in the fourth power of the scaled intensities, indicates a clear presence of the three components as can be deduced from Figure 2d. The mass spectra of the pure components are given in Figure 3a-c. The spectra at the local maxima in Figure 2d, which have a maximum contribution of the components dicyclohexylamine, diphenylamine, and dibenzylamine, respectively (spectrum numbers 6, 10, and 15), are given in Figure 3d-f. Comparison of the mass spectrum of pure dicyclohexylamine (Figure 3a) with that of maximum dicyclohexylamine content in the GC/MS data (Figure 3d) shows a minor contribution of the diphenylamine (e.g. m/z 169, Figure 3b) and dibenzylamine (e.g. m / z 91, Figure 3c) components. The spectrum at the local maximum assigned to diphenylamine (Figure 3e) shows clear contributions of dicyclohexylamine (e.g. m / z 138, Figure 3a) and dibenzylamine (e.g. m / z 91, Figure 3c). Finally, the spectrum at the third local maximum of Figure 2d, assigned to dibenzylamine (Figure 30, shows some overlap when compared to the spectra of the other two components. Dicyclohexylamine is clearly represented by m/z 138 and a minor contribution from m / z 169 indicates the presence of diphenylamine. The overlap between the spectra indicated that a further procedure is necessary to produce the mass spectra of the individual components with no contributions from the other components. To achieve this, the knowledge of "pure masses* is desirable. Although pure masses have a limited meaning for complex data, it is the only way to calculate the spectra of the pure components. A pure mass receives contributions from only one of the components in the mixture. Since all the masses are scaled to the same length, the intensities of a pure mass during the evolution of the associated component will be higher than the
Table I. Contents of Matrix A, Consisting of the Original (Le. Unscaled) Intensities of the Pure Masses of the Spectra at the Local Maxima in Figure 2d" spectrum no.
m/z
mlz
169
m/z 106
6 10 15
0.1329 0.0387 0.0068
0.0670 0.3148 0.0207
0.0305 0.1270 0.3371
55
"Pure masses were selected on the basis of the largest lengthscaled intensities at the local maxima in Figure 2d.
intensities of masses that are not pure for that particular component. This behavior (the pronounced intensity) will even be stronger after raising the intensities to the fourth power. This means that a pure mass can be found as the mass which has the highest intensities (in the power-raised data) during the evolution of the components. The procedure described above does not guarantee that the masses are pure, only that they are the most characteristic for their particular components. An easy way to check the "purenessn of a mass is to plot its intensity versus time: a real pure mass should only show one maximum. (Plots of the pure masses will be presented later in this article.) Since the local maxima found by the SP-TIC method have the highest contributions of the components, they are reliable points in the curve to find the pure masses. The pure masses for dicyclohexylamine, diphenylamine, and dibenzylamine were found to be m/z 55, mlz 169, and mlz 106, respectively. The validity of each of the pure masses was confirmed by determining if they had a high intensity in their immediate neighboring mass spectra as well as a minimal intensity in the spectra at the other maxima. As can be seen, the procedure for pure mass assignment to a particular component does not always agree with assignments made by visual inspection of the pure spectra. Once the pure masses of the three components are obtained, the spectra of the pure components can be calculated as follows. Matrix A (size n X n, where n is the number of components) (Table I) presents the intensities of the unscaled pure masses of the three spectra at the local maxima in Figure 2d. Since the pure masses are a direct measure for the contributions of the components, the rows of matrix A give the mixture composition of the spectra 6, 10, and 15. In mathematical terms, the matrix A reveals the following information:
AC = M
(3)
1506
ANALYTICAL CHEMISTRY, VOL. 60, NO. 15, AUGUST 1, 1988 12
100 _I
b
c -
z
E b
w
$ 1 E
1
1
I
I
I
1
S
17
SPECTRUM NUMBER
Figure 4. Plot of the evolution of the three separate components, represented by their pure masses after the proper scaling and the resulting TIC (solid line).
where C (size n X u ) is a matrix with the spectra of the pure components in its columns and M (size n x u) is the (unscaled) mixture spectra 6,10, and 15, respectively. It will be obvious that the unknown spectra can be obtained by the following equation:
C = A-lM (4) The result of this procedure is given in Figure 3d-f. The mathematically extracted spectra are almost identical with the separate compounds (Figure 3a-c). It is in principle also possible to use all the spectra (mixture) to extract the pure component spectra by using standard numerical methods to solve overdetermined sets of linear equations. Since the approach here is aimed a t a fast algorithm and the results obtained by the method presented above are satisfactory, this has not been pursued. The use of the inverse matrix as discussed above is in principle the same as used for factor analysis based mixture analysis approaches (20, 21). The time-resolved envelope of the three components can be obtained either by using the pure masses or by expressing every mass spectrum in terms of the extracted spectra of the pure components. Since the extracted spectra are based on the pure masses, they are in principle no better than the pure masses. Therefore, the extracted pure masses were used, which simplifies the mathematical procedure and increases the speed of the algorithm. The pure masses cannot be used directly as an approach for the calculation of the TIC, since the relation between the intensities of the pure masses and the TIC is not known. The relation can be found, however, by solving the following expression: AF = T (5) where A is the matrix of the pure masses (Table I) (size n x n). Matrix T (size n X 1)contains the TIC’s of the spectra under consideration (i.e. 6, 10, and 15 in Figure 2d). Thus matrix F (size n X 1) contains the factors to convert the intensities of the pure masses of the spectra 6,10, and 15 into the TIC’s of these spectra. The solution of the unknown matrix F is F = A-lT (6) Since A-l is known from the pure spectra calculations, it is simply a matter of multiplication of matrices. Since the factors to convert the intensities of the pure masses into TIC’s are known now (matrix F), the intensities of the components can be plotted by using the scaled, pure mass intensities. The plot of the extracted component curves and the reconstructed TIC calculated from these curves (TIC*) are shown in Figure 4. The original smoothed TIC (Figure 2a) is observed to closely
20 40 SPECTRUM NUMBER
1
20
40
SPECTRUM NUMBER
Flgure 5. (a)Total ion current resulting from oxidative pyrolysis of DNA, GLY, and BSA. There is a clear overlap between the signals of the three compments. (b) TIC resuking from the fourth power of the
scaled mass intensities. The three components are clearly visible.
track the reconstructed TIC*. Biopolymer Mixture. The next example is based on the oxidative pyrolysis of a mixture of DNA, glycogen (GLY), and a protein, bovine serum albumin (BSA), followed by mass spectral analysis. These biopolymers have been the subject of several other Py-APCI-MS studies (25,26). In a previous pyrolysis study, it was determined that DNA evolves in time before GLY, which in turn evolves before BSA (19). In Figure 5a the TIC of the one-time smoothed data set is presented. Although some structure is present in this curve, it does not give a clear indication of three components. The results of the SP-TIC method (fourth power) are presented in Figure 5b and the maxima assigned to the three components have been indicated by their respective mass spectrum numbers. In spite of the presence of three components in the sample, more than three local maxima can be observed, e.g. there are two local maxima in the region assigned to the BSA profile. Since replicate analyses appeared to be reproducible after similar mathematical treatments, the split maximum was studied more closely. Both local maxima were caused by mass intensities that could be assigned to BSA (Figure 6c). Differences between the two maxima were apparently represented by a slight domination of mass intensities representing aliphatic fragments in the first maximum and of aromatic mass peaks in the second maximum. The conclusion was that the behavior of BSA could be attributed to the complex nature of its pyrolysis over time. This phenomenon was also observed with the protein component in the time-resolved data analysis of the Streptococcus sample discussed below. It is interesting to note that protein has shown a complex behavior in several pyrolysis studies ( 2 7 , B ) .For the present discussion, the BSA maximum will be considered on the basis of spectrum number 21. As observed in Figure 5b, there is also a minor fourth maximum, indicated by an asterisk. Mass spectral analysis indicates a secondary volatilization of condensed products on the glass wall of the probe. Therefore, this maximum was not used in the data processing. The spectra at the local maxima of Figure 5b, i.e. spectra numbers 6,12, and 21, produce differences in comparison to the spectra of the pure Components, which cannot be explained by a simple temporal overlap. For example, a comparison of the mass spectrum of pure DNA (Figure 6a) with the mass spectrum with the maximum DNA content in the time-resolved mixture data set (Figure 6d) shows that m / z 127 is almost absent in the latter spectrum. This phenomenon clearly indicates that the analysis of a mixture is not always the simple sum of the analyses of the separate components and is especially important for ionization under atmospheric pressure circumstances (29). This information was confirmed with m / z 127 daughter ion mass spectra of DNA, GLY, and
ANALYTICAL CHEMISTRY, VOL. 60, NO. 15, AUGUST 1, 1988
I
a
7
127
0
1
C
l'O1
i'
,
,
,
I,
oj
, , , ,, ,
70
90
110
130 mlr
150
170
190
,
,,lj;
I,
, , , ,
90
,,,
,
I
0 70
,
ii
0
0
,11
1507
110
130
fM
170
190
-+ 70
90
110
130 mlz
mlz
150
170
190
Figure 8. Model spectra of DNA, GLY, and BSA presented in 6a-c, respectively. The spectra at the local maxima in the plot given in Figure 5b, which have the highest amount of the three biopolymers, are presented in 6d-f. Overlap between BSA and GLY can be observed. The mathematicaiiv extracted sDectra, calculated on the basis of the sDectra given in parts d-f, are given in parts 69-1. The presence of GLY in the BSA component (Figure 6f) became less pronounced.
the biopolymer mixture. The mlz 127 daughter spectrum of the biopolymer mixture essentially yielded only the characteristic GLY mlz 127 spectrum (not shown). A clear, reproducible difference of the relative intensity of mlz 136 is also observed. A comparison of the GLY model spectrum with the spectrum with the highest GLY content in the biopolymer mixture data (number 12), Figure 6, parts b and e, respectively, again provide a different peak distribution. For example, the presence of mlz 144 and 162 (Figure 6e) in the extracted spectrum rather than mlz 145 and 163 (Figure 6b), respectively, might indicate a charge transfer mechanism rather than protonation. The presence of mlz 136 very likely indicates an overlap with DNA. Finally, the model spectrum of BSA and the spectrum with the maximum BSA concentration in the data f i e (number 21), Figure 6, parts c and f, respectively, show some differences, e.g. the mlz values 115 and 129 are low in intensity in the spectrum presented in Figure 6f. The origin of these peaks is unclear; however, their absence in the extracted mass spectrum provides an indication of their relatively low proton affinity as compared to the pyrolysis products of the DNA and GLY biopolymers. The spectrum with the maximum BSA content in the file also shows some peaks that indicate an overlap with the evolution of GLY, e.g. mlz 127,134, and 144. The minor amount of mlz 136 might indicate an overlap with the DNA component. The pure masses for this data set were mlz 136,127, and 113 for DNA, GLY, and BSA, respectively. The mathematically extracted spectra are presented in Figure 6g-i. The mass spectra of DNA and GLY do not show significant differences, when compared to the spectra in parts d and e of Figure 6, respectively, except for the absence of mlz 136 in the GLY spectrum. Minor differences are observed in a comparison of the BSA mass spectra (Figure 6, parts f and i). The influence of GLY, represented by mlz 99,117, and 127 is virtually absent, although a small amount of mlz 134 and 144 is still present. From the single ion currents of these masses, it appeared that they indeed showed a more gradual decrease in time during the last part of the heating program than other masses typical for GLY. One has to keep in mind that programmed heating of a biopolymer does not result in a simple, concurrent evolution of all masses but that there are also complex degradative reactions involved, which can affect the different masses of one component to different degrees, e.g. the double maxima for the BSA component (Figure 5b).
1
loo
OJ I
1
I
I
20 40 SPECTRUM NUMBER
Figure 7. Plot of the evolution of the three separate biopolymer components, represented by their pure masses, and the resulting TIC.
-I
1
20 40 SPECTRUM NUMBER
1
20 40 SPECTRUM NUMBER
Flgure 8. (a)TIC from oxidative pyrolysis of a group A Streptococcus organism. (b) TIC resulting from the fourth power of the scaled mass intensities. Three processes can be distinguished. The maximum indicated by an asterisk is the result of secondary volatilization of pyrolysis products condensed on the glass wall of the probe.
The reconstruction of the TIC'S of the three separate components, represented by their pure masses, is shown in Figure 7. The reconstructed TIC portrays some differences when compared to the original TIC (Figure 5a), which are probably due to the noise reduction obtained by reconstructing the total ion current from a three-dimensional event rather than using all the information in the data set.
1508
ANALYTICAL CHEMISTRY, VOL. 60, NO. 15, AUGUST 1, 1988 a
100 1
h
'I I
1
I
1
20
40
SPECTRUM NUMBER
Figure 10. Plot of the evolutlon of the three separate bacterial components, represented by their pure masses, and their resulting TIC.
d
8 i pl
108
0
-A*
I
l
l
Group A Streptococcus. The SP-TIC procedure was applied on the smoothed data of the pyrolysis of a group A Streptococcus strain (Figure 8a), which has been the subject of a number of GC and GC/MS pyrolysis studies (22,30,31). Subjects of this complexity provide a considerable challenge to the SP-TIC approach. Analytical pyrolysis mass spectrometry studies generally rely on the single spectrum resulting from signal averaging all the spectra obtained during the pyrolyzate production (24). From such a single spectrum it is not possible to obtain information on (classes of) separate components in a self-modelingway, i.e. without using reference spectra. The results of the SP-TIC method (fourth power) on the Streptococcus sample are presented in Figure 8b. As was observed in the biopolymer mixture analysis, the region assigned to protein (spectra 31-34 in Figure 8b) shows two maxima. Since the mathematically extracted mass spectra and the spectra at the local maxima of Figure 8a did not differ significantly, an analysis of the mathematically extracted spectra will suffice (Figure 9a-c). The choice of model compounds for this complex "mixtureis indeed a challenge. Since the presence of muramic acid in this organism is well documented in the literature (32,33), it was selected as a model compound. The mathematically extracted spectrum of the first maximum (Figure 9a) indeed shows the presence of peaks typical for muramic acid, i.e. m/z 90 and 108, as can be deduced from its model spectrum in Figure 9d. This was c o n f i e d by daughter ion spectra of mlz 90 and 108 in both muramic acid and the group A Streptococcus organism. The daughter ion spectrum of m / z 90 in both cases produced essentially identical spectra (not shown).
The m / z 108 MS/MS spectrum from the organism produced the same m/z distribution and intensity pattern as muramic acid along with a few other signatures, the latter presumably representing other compounds contributing to m / z 108 (not shown). The presence of m/z 136 might originate from nucleic acid like (see also Figure 6a) materials such as adenine (24, 34). The extracted spectrum (Figure 9b) associated with the second maximum (spectrum number 23, Figure 8b) presents a complex pattern, which contains some peak series that might be related with hexoses, e.g. m / z 96,98,102, and 126 in an unprotonated form (24,26,35).The presence of mlz 115 and 129 possibly is an indication of the presence of fatty acids. The third maximum shows similarities with a protein pattern, e.g. m / z 80, 84, 86, 113, and 153 (also see Figure 6c). In addition a peak series with m/z 111,125, and 139 is observed, which might be indicative for the presence of amino sugars, often present in capsular polysaccharides (24, 36,37). Systematic studies by MS/MS and/or GCIMS with the appropriate model compounds are warranted for a more complete accounting of structure/peak assignments. Study of the mass spectral patterns that cause the fourth maximum (indicated by an asterisk in Figure 8b) showed again a secondary volatilization process. The reconstruction of the TIC'Sof the separate components and the reconstructed TIC* are given in Figure 10. The TIC* shows differences with the original TIC from Figure 8a, which is very likely due to noise reduction by reconstructing the data from the three components. An additional indication for this is the fact that the TIC of the scaled data (not shown) is relatively similar to the reconstructed TIC*. Continuous Mode or On-Line Analysis Application. Self-modeling curve resolution such as the SP-TIC method appears to have a number of potential applications when applied in a continuous mode. For example the method can apply to monitoring applications of spectral methods, or for "on-line" data analpis of GC/MS data. The following outlines the self-modeling curve-resolution method in a continuous analysis mode; see also Table 11. (a) After m spectra are obtained ( m is odd), t,, t2, tS, ...,t, (ti thus is an array with all the unscaled mass intensities of a spectrum at time i) the SP-TIC (fourth power) method is applied. In order to simplify the example, the value of 9 is assigned to m, based on the actual value which will be used below. The obtained TIC values are labeled ql, ..., a9,,, where the first subscript indicates the spectrum number and the second subscript the "set" number, indicating that this is the first set of spectra taken in consideration. The central data point of the obtained values, i.e. ~ 5 , 1 ,is used as the first data point for the continuous data analysis mode.
ANALYTICAL CHEMISTRY, VOL. 60, NO. 15, AUGUST 1, 1988
1509
6
Table 11. Calculation Scheme for Continuous Curve Resolution TIC of scaled fourth power data of
sequence
original data
1-9
2-10
3-11
...
4-12
a3,3 a4.3
a..
a5,3
a..
a..
a6.3 a7.3 a8.3 a9,3 a10.3
a..
a,.
a,.
a..
a.. a..
a.. a,.
(b) After the next measurement has been made, the data analysis procedure is applied on the next set of nine spectra, i.e. t2-t10,which yields the data points a2,2-a10,2.The central point of these numbers, i.e. u ~ , is~ the , next point for the continuous data analysis mode. (c) The same procedure as described under (b) is repeated for every measured spectrum. It appeared, however, that a plot of the a i j values (fourth power) obtained in this way on the GC/MS data did not result in a curve that reveals the local maxima in a similar way as Figure 2d does. The reason for this is that the length scaling is different for each of the data sets evaluated for the continuous mode, since the lengths of the mass variables are dependent on the set of spectra used for the calculations. This means that an additional scaling procedure is required to “correct” for the differences in the length scaling for each set of spectra. This correction for the different mass vector lengths can be made by scaling the data in such a way that the intensity of a continuous data point is the same as the corresponding intensity in the analysis of the next set. Therefore, the scaling requires that ~ 5 , 2has the same value as as,1after the scaling procedure. It will be obvious that the scaling factor to achieve this equals This same scale factor is applied on a6,2, which yields the next point for the continuous data analysis mode. The other points for the continuous mode are scaled in the same way. In more general mathematical terms, the procedure does the following: For the first point (C,) and for all the other points where Cj is the TIC for the continuous data analysis mode and a k jis the result of scaling and fourth power calculations on the spectra (It - 0.5m + 0.5) to (It + 0.5m - 0.5),where m is the number of continuous data points and is of an odd value. The length of the vectors can be calculated in an efficient way in a Continuous data analysis mode by adjusting the length of the previous calculation by extracting the contribution of the first spectrum of the previous calculation and adding the contribution of the newly obtained data point. This process was applied on the GC/MS data shown in Figure 2a. A window of nine spectra appeared to give satisfadory results for this data set. Because of the use of a window of nine spectra, the number of points calculated for the curve (Table 11) would miss the first and last four data points of
I
I
I
1
9
17
SPECTRUM NUMBER
Flgure 11. TIC curve of the GUMS data (Figure 2a), scaled and raised to the fourth power resulting from a continuous analysis mode.
the curve in Figure 2d. In order to obtain the same number of data points for the continuous mode, four spectra were added in front of the data set, i.e. a quadruplicate set of the original first mass spectrum. Similarly, four spectra were added to the end of the data set by appending the last spectrum four more times. The curve (Figure 11)is similar to the results obtained when the data analysis procedure is applied on the whole data set (Figure 2d). For the mathematical extraction of the spectra, the best approach is to use the spectra at the local maxima in Figure 11,because it is these spectra that have the maximum contributions of the respective components. In a continuous mode, this is not possible, since the “future” maximum is not yet known. It appeared, however, that it was still possible to use the spectral calculation procedure given by eq 3 and 4. For the local maximum at spectrum 10 (Figure ll),for example, the spectra 6-14 were used in the calculation of the data point Clo (see Table 11). Application of eq 3 and 4 on the pure masses of the mass spectra 6,10, and 14 resulted in mathematically extracted spectra (not shown) highly similar to the ones presented in Figure 3g-i. Since speed can play an important role for on-line applications, a simpler, faster approach for resolving the overlapping spectra is presented. The procedure simply uses the data on which the resolved TIC-curve is based, i.e. the scaled fourth power data. This means that the number of components and their associated pure masses do not have to be known for this simplified procedure. The applied scaling procedure, however, produces a spectrum that is difficult to analyze from a chemical point of view. Masses that are known to have a high intensity for a certain component will have the same intensity as masses that have a low intensity when they are both typical for the component. This problem can be solved by simply back-transforming the data (38). This means that the reverse process of eq 2 has to be done, i.e. multiplying the scaled (fourth power) intensities by their respective lengths
S = SSB-1 (8) where S# is an (row) array with the scaled fourth power spectrum to be back-transformed and S is the back-transformed spectrum. The result of this procedure applied on the spectrum with the highest overlap, i.e. spectrum number 10, see Figure 3e, is given in Figure 12. As can be seen, the diphenylamine extracted spectrum shows some overlap with the other two components, i.e. m / z 56, 138 represent dicyclohexylamine and m / z 91, 106 represent dibenzylamine. However, the mathematically extracted spectrum shows only a minor contribution of dicyclohexylamine and dibenzylamine, when compared to the mass spectrum number 10 (Figure 3e).
1510
ANALYTICAL CHEMISTRY, VOL. 60, NO. 15, AUGUST 1, 1988
1 ”$ z
,
56 1
, , ,
108
9’
/.
,
I
138 I
I
I
II
I
I
,
I
mlz
Flgure 12. Mathematically extracted spectrum from applying the SP-TIC continuous mode on the GUMS data in Flgure 11. This spectrum is calculated by back-transforming the (length-scaled) spectrum number 10 by muklplylng the mass intensities by their respective lengths. Compare with parts e and h of Figure 3.
CONCLUSIONS The data resulting from Py-APCI-MS appeared to be of a complex character, since the mixture data clearly showed that they were not a simple sum of the analyses of the separate compounds. Despite the complexity of the data used, the results presented showed that self-modelingcurve resolution can be accomplished by much simpler ways than factor analysis based methods. Although no direct comparisons have been done between the two methods, indications are that the method is 1to 2 orders of magnitude faster than fador analysis based methods. Another important feature of the SP-TIC approach is that it can be used in a countinous mode. Extension of this method with an algorithm that finds the maxima and checks if the maxima describe different chemical components (by checking the singularity of matrix A) could produce a fully automated approach, which will be a subject for further research in this laboratory. There are curve resolution problems in the SP-TIC method that can be obviated by factor analysis. The SP-TIC requires that the components have a different maximum in time. This is not required for factor analysis, as long as the shape of the curves of the components is different. It is also perceived that the SP-TIC approach can be used as a preprocessing step for factor analysis, e.g. to provide an estimate of the dimension of the data set (i.e. the number of maxima), or as a transformation step of the data matrix. ACKNOWLEDGMENT Mark Jones and Christine Houseknecht are acknowledged for their assistance with the data analysis. Registry No. GLY, 9005-79-2; dicyclohexylamine, 101-83-7; diphenylamine, 122-39-4;dibenzylamine, 103-49-1.
Jie-Hsung, Chen; Lian-Pin, Hwang Anal. Chlm. Acta 1981, 133. 271-281. Sharaf, M. A.; Kowalski, B. R. Anal. Chem. 1981,5 3 , 518-522. Malinowski, E. R. Anal. Chim. Acta 1982, 134. 129-137. Sharaf, M. A.; Kowalskl, R . K. Anal. Chem. 1982,5 4 , 1291-1296. Glliette, P. C.; Lando, B. L.; Koenib, J. L. Anal. Chem. 1983, 5 5 , 630-633. Osten, D. W.; Kowalski, B. R. Anal. Chem. 1984,5 6 , 991-995. Malinowski, E. R. Anal. Chem. 1984,56, 778-781. Vandeginste, B.; Essers, R.; Bosman, T.; Reijnen, J.; Kateman, G. Anal. Chem. 1985,5 7 , 971-985. Borgen, 0. S.; Kowaiski, B. R. Anal. Chim. Acta 1985, 174, 1-26. Ramos, L. S.; Burger. J. E.; Kowalski, B. R. Anal. Chem. 1985, 5 7 , 2620-2625. King, M. D.; King, G. S. Anal. Chem. 1985,57, 1049-1056. Gemperline, P. J. Anal. Chem. 1986,58, 2656-2668. Lacey, F. L. Anal. Chem. 1988,58, 1404-1410. Sharaf, M. A. Anal. Chem. 1986,58, 3084-3091. Windig, W.; Chakravarty, T.; Rlchards, J. M.; Meuzelaar, H. L. C. Anal. Chim. Acta 1986, 191, 205-218. Lindberg, W. L.; Ohman, J. Anal. Chem. W88, 5 8 , 299-303. Windig, W.; Jakab, E.; Richards, J. M.; Meuzelaar, H. L. C. Anal. Chem. 1987,5 9 , 317-323. Malinowski, E. R.; Howery, D. G. Factor Analysis in Chemistry;WileyInterscience: New York, 1980. Windig, W.; McClennen, W. H.; Meuzelaar, H. L. C. Chemom. Intelligent Lab. Syst. 1987, 1, 151-165. Smith, C. S.;Morgan, S. L.; Parks, C. D.; Fox, A,; Pritchard, D. G. Anal. Chem. 1987,5 9 , 1410-1413. Liebman, S. A.; Snyder, A. P.; Krerner, J. H.; Reutter, D. J.; Schroeder, M. A.; Fifer, R. A. J. Anal. Appl. Pyrolysis 1987, 12, 83-95. Meuzelaar, H. L. C.; Haverkamp, J.; Hileman, F. D. pVrolysis Mass Spectrometry Of Recent and Fossil Biomaterials ; Compendium and Atlas; Elsevier: Amsterdam, 1982. Snyder, A. P.; Kremer, J. H.; Meuzelaar, H. L. C.; Windig, W.; Taghizadeh, K. Anal. Chem. 1987,5 9 , 1945-1951. Snyder, A. P.; Kremer, J. H.; Meuzeiaar, H. L. C.; Windig, W.J. Anal. Appl. pVr0lysi.s 1988, 13, 77-88. Van de Meent, D.; De Leeuw, J. W.; Schenck, P. A,; Windig, W.; Haverkamp, J. J. Anal. Appi. pVrolysis 1982,2 , 133-142. Windig, W.; Meuzelaar, H. L. C.; Haws, B. A,; Campbell, W. F.; Asay, K. H. J. Anal. Appl. pVrolysis 1983,2 , 183-198. Vandiver, V. J.; Leasure, C. S.; Eiceman, G. A. Int. J. Mass Spectrom. Ion Processes 1985,66, 223-238. Elliot. T. R.; Rogers, A. H.; Haverkamp, J. R.; Groothuis, D. Forensic Sci. Int. 1984,26, 131-137. Eudy, L. W.; Walla, M. D.; Hudson, J. R.; Morgan, S. L. J. Anal. Appl. Pyrolysls 1985, 7 , 231-247. de Cuenlnck, B. J.; Shockman, G. D.; Swenson, R . M. Infect. Immun. 1982,35, 572-582. Fox, A.; Morgan, S. L. I n Instrumental Methods for Rapid Microbiological Analysis; Nelson, W. R., Ed.; VCH Publishers: Deerfield Beach, FL, 1985; pp 135-164. Alexander, A. J.; Kebarle, P.; Fuciarelli, A. F.; Raleigh, J. A. Anal. Chem. 1987,5 9 , 2484-2491. Schulten, H.-R.;Bahr, U.; Gortz, W.J . Anal. Appl. pVr0lysi.s 1981,3 , 137-150. Wlndig, W.; McClennen, W. H.; Stoik, H.; Meuzelaar, H. L. C. Opt. Eng. 1988,25, 117-122. Haverkamp, J.; Eshuis, W.; Boerboom, A. J. H.; Guinee, P. A. M. I n Advances in Mass Spectrometry;Quayle, A. A,, Ed., Heyden and Son, London, 1980; VoI. 8a. Windig, W.; Kistemaker, P. G.; Haverkamp, J.; Meuzelaar, H. L. C. J. Anal. Appl. pVrolysis 1980,2 , 7-18.
LITERATURE CITED (1) Lawton, W. H.; Syhrestre, A. S. Technometrlcs 1971, 13, 617-633. (2) Knorr. F. H.; Futreil, J. H. Anal. Chem. 1979,5 1 , 1236-1241.
RECfor review September 29, 1987. Accepted February 21, 1988.