Use of Computer Techniques in High=Resolution Mass Spectrometry D. D. Tunnicliff and P. A. Wadsworth Shell Development Co., Emeryaille, Calif. Three sequential computer programs have been developed for processing data obtained on a high resolution mass spectrometer. Although designed for spectral data recorded on a photographic plate, these same programs should be usable with data obtained by some other method of detection. In the final output, each identified ion is listed in a column which corresponds to a particular set of heteroatoms. Experience has shown that a weighted standard error of 0.3 to 0.8 millimass units (1 to 3 ppm) in the fit of the standard mass to the spectrum is usually obtainable for a good spectrum with a dispersion which gives a maximum mass of about 300. A comparison of the known and calculated masses of ions, other than the standard masses, shows similar accuracy.
THE MOST important application of high-resolution mass spectrometry in the field of analytical chemistry is the accurate determination of the masses of a series of ions and the deduction of the corresponding empirical formulas. Information of this type can be used to estimate the empirical formula of an unknown organic molecule and in many cases can lead to deductions as to some of the structural features of the molecule. Two methods of recording a mass spectrum are used for a high-resolution mass spectrometer of the Mattuch-Herzog geometry such as the CEC Model 21-llOB. One method is the electrical detection of ions as they are swept across a slit by varying either the magnetic or electric field of the instrument. The other method is through the use of a special photographic plate placed in the focal plane of the instrument. Each ion type then forms a slit image on the plate. The spectrum thus consists of a series of lines of differing density, The distance between the lines is related to the masses of the corresponding ions. The variations in intensity are related to the abundance of the ion type. Although the equipment and computer programs to be described are specifically designed for processing data recorded on a photographic plate, the same programs could be used for electrical detection if similar digital information of the spectrum were available. If the mass of an ion corresponding to a line of the photographic plate can be determined with sufficient accuracy, the empirical formula of the ion can be determined unambiguously because there will be only one combination of atoms which will give this exact mass. However, in practical problems the accuracy often will not be sufficient to give an unequivocal answer as to the composition of an ion. This is particularly true at the higher mass ranges where there may be many possible combinations of atoms with masses which are very nearly equal to the mass of an unknown ion. As the number of possible combinations is strongly dependent on the accuracy of the measurement, it is highly desirable to measure the masses of all ions with the maximum possible accuracy. The remaining uncertainties can often be resolved by simultaneously considering the composition of other ions in the same spectrum. This knowledge is then used in conjunction with any other available knowledge about the sample to deduce the empirical formula of the original molecule and some of its structural features. 1826
0
ANALYTICAL CHEMISTRY
According to Duckworth and Ghoshal (I), the observed positions for a series of ions should be a linear function of the square roots of the corresponding masses. Consequently, the accuracy of the mass determination is dependent both on the accuracy with which the center of each line can be determined and on the accuracy of the distance measurement between these line centers. The usual practice (2-4) is not to depend on absolute distance measurements but instead to add some calibration compound to the sample and then measure the position of unknown lines relative to the position of the known lines of the calibration compound. Perfluorokerosine (PFK) is commonly used for this purpose because it provides a convenient series of lines with masses which differ significantly from the masses of the usual sample ions. However, deutero-paraffins, n-paraffins, a set of preidentified lines in the sample spectrum, or any combination of such lines may be used. As a normal spectrum contains 100-500 lines, any manual method of determining the masses of all the lines is quite impractical and about all that can be done is to determine the mass of a few lines which are thought to be especially significant. Some method of automatically digitizing the spectral data followed by a computer calculation of the masses and the identification of the corresponding empirical formulas is indicated. We have recently acquired a CEC Model 21-087 Automatic Mass Spectrum Data System for digitizing the spectral data. This equipment, which has been described by Venkataraghavan, McLafferty, and Amy ( 4 ) , consists of a high-quality comparator-densitometer which scans across a 35-cm spectrum in 12 minutes. Whenever the density (100.0-transmittance) exceeds some preset threshold value, the density values corresponding to the line profile are recorded at 0.25-micron intervals until the density falls below the threshold value, Each such string of density values is followed by the comparator-densitometer position corresponding to the last density value and is recorded on the magnetic tape as a separate tape record. This output tape is taken directly to the computer. The following modifications to the standard instrument were made by the manufacturer, at our request, which significantly improve the usefulness of this equipment: 1. The fixed threshold concept is not entirely satisfactory because of the variation in background observed across a photographic plate. If the threshold is set high, then weak lines in a region of low background will be missed. If, however, the threshold is set low and the density of the background in a high-background region is above the threshold value, then the unit will continue to record density values at 0.25(1) H. E. Duckworth and S. N. Ghoshal, “Mass Spectrometry,” C. H.McDowell, Ed., McGraw-Hill Book Co., New York, 1963, p 201. (2) D. Desiderio and K.Biemann, Twelfth Annual Conference on Mass Spectrometry, Montreal, June 1964. (3) W. Hargrove, Eli Lilly, Indianapolis, private communication, 1966. (4) R. Venkataraghavan, F. W. McLafferty, and J. W. Amy, ANAL. CHEM., 39, 178 (1967).
micron intervals through the whole region. Because this results in 4000 density values per millimeter of plate travel, the amount of data in a magnetic tape record can easily exceed the capacity of the computer to process it. This problem has been largely eliminated by connecting an additional amplifier to the output of the photomultiplier. This amplifier has a comparatively long time constant so that the output is mainly a function of the background density and is only slightly influenced by the density of the occasional lines. The magnitude of the difference between the output of this long-time-constant amplifier and the regular short-time-constant amplifier is used to initiate the recording of the data on magnetic tape. The performance of this feature has been excellent. 2. Although the standard unit will automatically process any specified number of spectra on a single plate without any intervention, it has no provision for automatically skipping unwanted spectra. As the usual practice is to record many spectra on the plate and then pick the ones with optimum exposures, the standard unit requires more attention than is really necessary, A set of 28 switches was added so that any selected set of spectra may be automatically recorded. 3. Provision was made for automatically recording the background density at preselected intervals. This feature should make it possible to correct the observed densities for background. 4. Provision was made for skipping selected spectral regions. The last two features were added mainly because of the possible application to other fields of interest such as recording spectral data obtained from emission spectrometry. The skip feature would permit skipping the many lines observed in band spectra which are not generally useful and would greatly increase the computer time required for processing the data. This modified plate reader has been found to be quite satisfactory for digitizing the mass spectral data. The variable threshold feature is particularly useful. One unforeseen advantage is that it greatly reduces the recording of data for the unavoidable specks of dust, lint, plate scratches, marks due to plate fogging, and metastable ions. The instrument can also be operated in the fixed threshold mode when desired. COMPUTER PROGRAMS
Three basic goals have been considered in developing these computer programs: 1. The mass of the ions should be determined as accurately as possible. 2. The program should be economical of computer time. 3. The calculations should be quite routine so as to minimize any special handling. Although the goals of high accuracy and minimum computer time are not entirely compatible, an acceptable compromise has been achieved. The routine nature of the calculation has been accomplished by incorporating into the program the capability for making most of the decisions as to how the calculations should proceed under a wide variety of conditions. The calculations have been divided up among three separate computer programs. This was done both because of computer memory limitations and to improve efficiency. The pertinent results from each program are stored on magnetic tape for use by the following program. Although it is possible to go through all the calculations at one time, it is often desirable to inspect the results from one program before proceeding to the next. Also, it is an advantage to be able to repeat
Numbers Represent Nominal Mass
Figure 1. Profile of lines with satisfactory exposure
part of the calculations without having to start at the beginning. The first computer program, named HIRESl, determines the number of lines in each block of data and then computes the exact center of each line. This program also assigns a weighting factor to each lline as described below. The second program, named HIRES2, identifies a set of lines as corresponding to a specified set of standard masses and uses this information to compute the masses of all the observed lines. The last program, named HIRES3, calculates from a selected group of atoms, all possible empirical formulas which correspond to each of the computed masses and prepares the final output. The calculations, including the band deconvolutions, involved in HIRESl are similar to those described by Venkataraghavan, McLafferty, and Amy ( 4 ) but were arrived at independently. However, the calculations in the other programs differ considerably. DESCRIPTION OF HIRESl
A single magnetic tape record (a block of data) may contain density values for a single line or for several lines with various degrees of over1,spping. It may even result from a speck of dust, a scratch, or other plate imperfection. Figures 1 and 2 show the line profiles for a few selected records from a single high-quality spectrum. The exposure for the lines shown in Figure 1 permils a clear resolution of the individual lines. Unfortunately, thl: over-exposed lines shown in Figure 2 cause the lines to merge together making it very difficult to ascertain how many lines are represented in such records. Because of the wide range in intensities of the individual lines in a spectrum, most specira contain lines of all degrees of exposure from barely perceptible lines to over-exposed lines. The line profiles are often much worse than any of those shown in Figure 2. A typical magnetic-tape record will contain 30 to 300 density values. Satisfactory results can usual!y be obtained by using only part of these data. Our usual practice is to use about 30 equally spaced density values per line in the record. This procedure significantly reduces the amount of computer time required to process the data. Records containing less than some minimum nuimber of density values, usually 18, are rejected as not representing a real line. Records containing more than 8000 data points (representing 2.0 millimeters of plate travel) are rejected as being too large to process. VOL. 40, NO. 12, OCTOBER 1968
1827
J
27
28
41
42
Numbers Represent Nominal Mass
Figure 2. Profile of over-exposed lines The program first reads a tape record and then enters an 11-point smoothing (5) routine to reduce the effect of possible random noise. It is first assumedithat the record contains a single line so only about 30 equally-spaced smoothed values are computed. If these smoothed values do not show a positive slope followed by a negative slope with both slopes greater than some preset value, the record is rejected as not representing a real line. First differences between consecutive points are computed and the changes in these differences used to find inflections, maxima, and minima. Each inflection and maximum implies the presence of a line. However, one of the critical problems is concerned with the decision as to whether the changes in the first differences represent a real line or are just due to plate noise which may arise from various sources. The program input includes tolerances which are used in making these decisions. The optimum magnitude of these tolerances is related in some degree to the quality of the photo plate. Broad, poorly focused lines with high background and poor plate-development technique may require higher tolerances than a high-quality plate. Part of the problem of plate noise is eliminated by requiring that no lines can be closer together than some fraction of the average half-width, If more than one line is found, a new set of smoothed equally-spaced points is computed sufficient in number to give about 30 values per line. These data are again inspected for the number of lines. This procedure is repeated until no additional lines are found. The results from this section of the program consist of the number of lines and the approximate position and intensity of each line. The following calculations assume that the shape of a single line can be represented by a Gaussian function and that several lines can be represented by the sum of an appropriate number of Gaussians. (The program permits up to 10 lines in a record,) The Gaussian function has the form:
Where:
Y Yl X,
= = =
intensity at position X intensity at the maximum position of the maximum
( 5 ) E. Whitaker and G. Robinson, “The Calculus of Observations,” 4th Ed., Blackie and Sons, London, 1949, p 295.
1828
ANALYTICAL CHEMISTRY
W Bo = 2 . 0 ~ 0 W = half-width When a block of data represents a single line, the position, intensity, and half-width can be calculated using the following equation which is derived from Equation 1 by taking the logarithm of both sides and rearranging terms:
The coefficients in the above polynomial are calculated by least-squares and the required values of the position, intensity, and half-width, derived directly from these coefficients. This above calculation requires only about 60% of the time which would be required for processing the same data using the iteration procedure described below. If the tape record represents density values for more than one line, a direct solution for the line position, intensities, and half-widths is no longer possible and it becomes necessary to employ an iteration process. Several different procedures have been tried for these calculations. The most successful is the use of damped-least-squares essentially as described by Marquardt (6) but using a different procedure to choose the damping factors. The damping factor as used in these calculations is defined as the value which is added to all the diagonal terms of the correlation matrix before solving for the changes in the line parameters. A value of 0.0 corresponds to no damping and may result in divergence rather than convergence. Too large a damping factor results in slow convergence. Unfortunately, no single damping factor is satisfactory for all conditions or even for all steps of a given problem. The general approach used is to find the damping factor for each iteration which gives nearly the minimum standard error in the fit between the computed and observed data. Tests with different damping factors have shown that the standard error near the minimum is not a very sensitive function of the damping factor. The present practice is to start with a small damping factor (usuaIly 0.003) and then derive the successive damping factors by multiplying the previous value by some number (usually 10). This procedure is con(6) D. W. Marquardt, J. SOC.Ind. Appl. Math, 11, 431 (1963).
tinued until a minimum in the standard error as a function of the damping factor has been found. Although as pointed out by Marquardt (6), this procedure tends to give damping factors which are larger than the optimum values, the calculations nearly always converge. Smaller damping factors may converge somewhat faster in some cases, but in other cases the calculations never arrive at satisfactory values. Reliability in obtaining convergence is of great importance in this application, because it is quite impractical to inspect the results and then repeat the calculations with a modified procedure when satisfactory results are not obtained. The starting point for the iterations are the line centers and intensities estimated in the earlier part of the program. The initial half-width at the start of the processing of a spectrum is set by the program input. Thereafter, this value is adjusted by averaging the previous average with the computed halfwidth for each record containing only a single line, provided the standard error in the fit is below some preset value. The iteration procedure is allowed to continue until the change in the position of all line centers is less than some preset percentage (usually 0 . 2 z ) of the ratio of the half-width to the intensity. Thus, the tolerance for a narrow strong line is smaller than the tolerance for a broad weak line. If the data to be used in the calculations were selected manually, it might be assumed that all the data were of reasonable quality. However, when the data are recorded automatically, no such assumption is possible. The computer program must be prepared to discriminate between data for real lines and data from other sources. Even data for real lines will differ widely in quality. Some lines are sharp and well resolved while other lines are over-exposed, broad, and poorly resolved from other lines. A simple solution to this problem is to assign a weighting factor to each line. Then lines with high weighting factors will receive more emphasis in the subsequent calculation while lines with very low weighting factors will receive very little emphasis. The weighting factors for each line are based on the followingequation:
(3)
WF = weighting factor = average half-width for a sharp line = half-width in microns S E = standard error in the Gaussian fit to the data D = distance between the line and its nearest neighbor ao, a1 = input constants (normally 5).
W, W
Although the exact form of Equation 3 may not be the optimum, it does assign a higher weight to an isolated, narrow, well-shaped line and a much lower weight to a broad, poorlyshaped badly-overlapped line. All calculations to this point are expressed in terms of density. However, it has been found that the area under a line is a much more sensitive measure of line intensity than the maximum density. The area, A , is derived from the maximum density, D , and the line half-width, W , from the following relation : A = 1.064 D W
The program has provision for reporting intensity in terms of density, area, or relative abundance. The position, intensity, half-width, and weighting factor for each line are stored until all the data for a spectrum have been processed and are then written on magnetic tape for use by HIRES2. DESCRIPTION OF HIRES2
The principal purpose of this program is to assign precise mass values to each of the lines found by the program HIRESl. The computer method described by Desiderio and Biemann (2) calculates the masses corresponding to the lines between two adjacent lines of the calibration compound by an interpolation procedure assuming that the square root of the mass is a linear function of the distance. The first two standard lines are manually identified and then the next line corresponding to a standard mass is found by a linear extrapolation. Data for this line and the previous standard line are then used in finding the next standard line, etc. The accuracy of the mass determination is thus dependent on the validity of the linear interpolation between two lines and on the accuracy of the measurement of the position of each line relative to these two adjacent standard lines. This computer program follows a somewhat different approach. It is assumed that the relation between the mass and the distance can be defined by the following equation:
Mp
= a.
Where:
+ alD + azD2+ a 3 D 3+
(4)
.
. .
.
+ a,D"
(6)
M
= the mass of any ion in the spectrum
P
=
D
= the distance of the line correspond-
ao, a l , a2, etc. =
/ wo\ 4
Where:
Limited experience has shown that a straight-line relation exists between the logarithm of the ion abundance, I , and the area of the line as shown by the following equation:
nominally 0.5 ing to the mass M from some arbitrary zero constants uniquely determined for each spectrum.
This program is also based on the addition of a calibration compound or the prior identification of known lines in the sample spectrum. A set of up to 200 standard masses may be specified preferably chosen at about evenly-spaced intervals throughout the spectrum. The input to the program includes the position of the lines corresponding to any two masses in the first part of the spectrum. It also includes a tolerance in the specification of the position of these two lines. This information is used to calculate the ranges within which the lines corresponding to the first four standard masses to be expected in the spectrum should be found. The program starts with one line in each of the first two of these ranges and uses an extrapolation procedure based on Equation 6 to identify the lines corresponding to each of the subsequent standard masses. The value of n Equation 6 is first set equal to 1 and then is gradually increased to 3 or 4 as more and more lines are identified. The values of the coefficientsare recomputed using weighted least-squares techniques after the identification of each additional line. Data for all previously identified lines are included in the calculations. The weighting factors used are the values computed in HIRESl . The tolerance permitted between the computed and true value of M p is calculated from the slope of the function and the length of the extrapolation from the last identified line. VOL. 40, NO. 12, OCTOBER 1968
1829
(CONT'D) L I S T OF TERMINAL FRAGMENTS H C2H3 CH30 C3H6CL CZMO
MASS
FRAGMENT
200
C3H6CL
198
CH3 CZH4 CL C3H7 C4H9
C t H
0
OH
Of
107
91
90
85
H CH3
C4H6 0.C153C -0.57 0.24
53
cl43
C4H5 0.03238
0.01006 0.58 -0.30
51
H
C8Hl6OCL 2 0.02191 0443 0147
C4H3 0.00347 0.27 0.e3
51
C5H1L OCL
CLZ
OCLZ 0.29
0.01766 CHZCL(37 t 0.15
50
CH30 HCL C3H6CL
C ~ H ~ O ~ C L 0.0Y 793 1.27 0.11
49
OH
C4HBOCL 0.00249 -0.92 0173
45
H
C3H8Cfl3lCLI37) 0.01118 0.32 0.18
CZH50 0.01 043 0.27 0.78
44
H
CZH40 0.01931 0.49 0.51
CZH4 CZH5
C4H8CL(371 0.33136 0.82 0.02
44
HCL
C3HBC( 131CL 0.05018 0.83 0.08
43
H
H HCL CHZCL
C4H8CL 1.00000 0.79 0.00
43
H
CZH5 CL HCL CHZCL
C4H7CL 0.00898 0.51 0.18
42
H
H2O CZP4
CSH90
CO
-0.Cl
O.OOC65
73
H2O czn+ C2H5
71
OH Hzn CZHZ
ch251
0.05938 -0.04 0.42
cc2 0.01568 0.50 0.44
41
H
C3H5 0.21541
40
H
C3H4 0.02357 -0.94 0.07
39
H
C3H3 0.1363C -0.55 0.14
31
H
C3H2 O.CC501 -0.11 1.44
C4H9C 0.00129 1.20 -0.12 C4H70 0.00905 -0.27
Or41
CH30
36
C3H50 0.00237 2.03 -0.73
c2n2 C3H7C(l31 0.09836 1.17 C.C4
H
C4H7 0.84e5c -0.58 c.01
CH3
H
HCL 0.00272 -0.83 0.41 CL 0.00054 -0.46 3.75
35
CZHZ
31 55
0.07
C 3H O.CO168 -0.95 0.70
37
C2H5 C HCI
56
C3H6 0.16847 -1.04 0.07
0.01
CO
CH3
CZH3C 0.00701 c.c4 0.80
CZHZ@ O.OOC54 -0.45 1.98
C2H3 CZH4
OH H 20
c3h1 0.05736 0.11 0.11
42
CH~O
0.21
C4H2 0.0CObS 0.71 1.58
1,95
CH30 C3H7 czn30 C2H40 C2HSO
57
C.28
C5HlOCCLt371 0.02289 0.7.3 0.12
94
92
54
cot
C3H6CL
HCL CZH4CL
93
CL
CZHZ CHO CZH4CL C2H40 C3H5O
C 8H 160CL CL ( 3 71
123
121
I420 CZH5 CHZCL
co
HC L CZH3C CZH3C2
CH30 0.06554
C.38
0.25
Figure 3. Computer output for HIRES3
There are two complications to the procedure outlined above. It is quite possible that lines corresponding to one or more of the standard masses may not be observed. This may be due to a low abundance mass and a short exposure or to the obscuring of the line by some very strong adjacent line or to other reasons. Also, it is not always possible to positively identify the two starting lines; instead, there may be several possible lines within the calculated ranges for the first four standard masses. Because of these problems, a trial-and-error procedure is used for identifying the first 10 standard masses. One line from each of the computed ranges for the first two masses is used as a starting point and an attempt is made to find lines 1830
ANALYTICAL CHEMISTRY
corresponding to the next 8 masses. If this fails, then other combinations of lines are used as a starting point. The first combination which permits the identification of some preset number (usually 5 .to 8) is considered to be correct. However, if no such combination can be found, it is assumed that one of the first standard masses is absent and either the third or fourth mass is tried in its place. The only basic requirement is that both the first and second standard mass may not be absent and, if one of these is absent, then both the third and fourth mass may not be absent. If the required number of the first ten masses can be identified, then the calculations proceed from mass to mass, skipping masses where the corresponding line cannot be identified.
When all possible standard masses have been identified, a more precise set of the coefficients for Equation 6 is computed using weighted least-squares and a higher value for n (up to 12). Double-precision arithmetic is used for these final calculations in order to reduce errors due to computer round-off which can become serious for such high-order polynomials. Orthogonal polynomial curve-fitting techniques described by Forsythe (7) are used to further minimize errors due to computer round-off. All the above identifications are based on a series of extrapolations from previously identified lines. Such extrapolations are always uncertain and will result in some errors. The program has provision for using the computed coefficients in the power series as a basis for a new identification of the standard masses by interpolation. This procedure usually results in deleting some lines which were incorrectly identified in the previous calculations and also results in some new identifications which were missed previously. There is provision for an automatic reidentification of the standard masses whenever the standard error in the fit is above some specified tolerance. If desired, several such passes may be made with a reduced tolerance for identifying the standard masses with each pass. The coefficients from the last calculation are then used to compute the mass for each recorded line and these values are printed out in tabular form. The program also has provision for deleting a specified list of masses which usually includes the masses due to the calibration standard and any other values such as argon, mercury, etc. which are not part of the sample spectrum. There is provision for preparing a composite spectrum using data obtained from several different exposures of the same sample in order to combine data for both weak and strong lines into a single spectrum. All the intensities of the new spectrum being added to the composite are first scaled so as to conform, on the average, to the intensities of the same masses which are already in the composite. The weighting factors computed in HIRESl are again used as a basis for all averaging of equivalent masses. The results from HIRES2 are written on magnetic tape for use by HIRES3. DESCRIPTION OF HIRES3 The purpose of this program is first to calculate all possible empirical formulas corresponding to each computed mass and then to prepare the final output. The procedure used iii calculating the empirical formulas is based on the program described by Desiderio and Biemann (2). It subtracts the sum of the masses of the atoms being considered from the observed mass and then tests to see whether the remainder corresponds to an integral number of hydrogen atoms. The calculations continue for each mass until all possible combinations of atoms have been found which agree with the observed mass within some prescribed tolerance. Limits to the number of each kind of atoms to be permitted are based on the procedure described by Tunnicliff, Wadsworth, and Schissler (8). The kinds of atoms to be considered and the constants in the relation defining the limits to the number of each kind of atom are specified in the input to the program. A maximum of 20 different atoms (each isotope is considered as a different kind of atom) may be specified in the program input. Up to 12 atoms from this group may be used for a particular problem. They may be specified either with the input for the problem or may be determined by the program. In the latter case, all (7) G. E. Forsythe, J. SOC.Ind. Appl. Math., 5, 75 (1957). (8) D.D. Tunnicliff, P. A. Wadsworth, and D. 0.Schissler, ANAL. CHEM., 37, 543 (1965).
possible empirical formulas are computed for the lower masses using all possible kinds of atoms. Then all atoms involved in these empirical formulas together with any atoms specified with the input are used in the further calculations. The output for this program consists of a table of all possible empirical formulas corresponding to the computed mass of each line and another table which is a modified form of the “element map” described by Desiderio and Biemann (2). This element map provides columns for 12 different sets of heteroatoms and a final column containing any overflow. Ions containing such terminal groups as -C4H0, -OH, -NH2, etc. are usually accompanied by a corresponding ion which has lost this group. A list of such terminal groups is included with the input. The program checks all ions for the presence of another ion which corresponds to the loss of any of these terminal groups. Terminal groups which may have been lost from a particular ion are listed with the empirical formula of that ion. However, it must be pointed out that the presence of two ions differing by the mass of the terminal group does not prove that the lighter ion was formed in this manner. Figure 3 shows part of the computer output for HIRES3 for the compound, CICdHsOCnHsC1. The mass range recorded from the photographic plate by the comparator-densitometer was from mass 18 to about mass 400. This nominal resolution of this spectrum is about 1 / 2 ~ , ~ ~ o . The empirical formula of the ion is listed first. Immediately below this formula is shown the relative intensity of the ion; the most intense ion in the spectrum is assigned the value of 1,00000. The first number below the intensity value is the error in millimass between the observed and theoretical ion composition. The second number is the weighting factor determined by HIRESl . The nominal mass of the ion is listed along the left edge of the output. If more than one elemental composition had been detected for an ion, this would have been listed and indicated by an asterisk after the nominal mass. Tabulated in the column labeled “Fragment” are those fragments for which pairs of ions differing by the fragment have been detected, This fragment is listed opposite the heaviest of the pair. The table headings are assembled by the computer on the basis of the determined ion compositions. The ultimate output for this program would consist of the structural formula of the molecule. As the laws relating the mass spectrum of a molecule to its structure are not fully understood at this time, this ideal goal is not yet attainable. However, it will be desirable to further expand the output of this program as more useful concepts relating the mass spectrum to structure are formulated. DISCUSSION
The program, HIRESl, seldom requires any changes in the input unless the spectrum being processed is of particularly low quality. HIRES2 requires a little more attention because of the usual practice of varying the plate dispersion depending on the expected mass of the ions to be observed for a given sample. However, such changes usually require only a change in the specification of the mass and position of the two identified lines and do not normally require any change in the list of the standard masses unless a different calibration compound is used. The program considers the first standard mass to be the first mass in the list at or above the first known mass specified in the input. However, if no lines are observed in this region, then the next higher standard mass is used as a starting point. If sufficient of the first 10 standard masses VOL. 40, NO. 12, OCTOBER 1968
1831
cannot be identified, again the next highest standard mass is chosen as a starting point and a new try made. The position of these two known lines is usually most conveniently determined when the photographic plate is initially oriented on the plate reader. At other times an inspection of the output from HIRESl may be required for the identification of two known lines. It is occasionally necessary to repeat the HIRES2 calculations if too many of the lines corresponding to the standard masses are missing because of a poor plate exposure. In such cases it may be necessary to change some of the standard masses or perhaps reduce the number of the first 10 masses which must be identified. The tolerance in the positions of the two known lines may be several millimeters in relatively simple spectra and still permit proper identification of the standard masses. However, a few cases have been noted when the computed mass scale has been off exactly 1.O mass units. This can be avoided by specifying these two lines more precisely or by choosing standard masses which are more unique. The addition of xenon to the sample to provide some additional known lines has been found to give good results. The multiply-charged ions are quite unique and they also fill in the gap between mass 31 and 50 where PFK has no useful lines. The iteration procedure for finding the center of overlapped lines is quite satisfactory for properly exposed lines. Figure 4 shows the changes in the computed positions of the line centers for each step of the iteration for the record corresponding to nornimal mass 82 in Figure 1. Usually 4 to 7 iterations are sufficient. The maximum number of iterations permitted is usually 10. The results obtained from these three programs are dependent on the quality of the spectrum. Broad over-exposed lines with superimposed plate noise present a difficult problem. There is always the danger of finding more or less lines than are actually present. Errors in estimating the actual number of lines will then affect the computed center of the lines. Such errors are particularly objectionable for lines corresponding to one of the standard masses. However, the impact of such lines on the computed masses of other lines is minimized by simultaneously fitting all the data to a single, continuous function by weighted least-square techniques. The very small weighting factors usually associated with il poor line are particularly valuable in minimizing the impact of a poor standard line on the computed results for nearby unknown lines. Also the continuous function avoids the errors inherent in the usual linear interpolation, An additional advantage to this approach is that it permits obtaining satisfactory results even though lines corresponding to some of the standard masses are not observed. Experience has shown that a weighted standard error of 0.3 to 0.8 millimass units (1 to 3 ppm) in the fit of the standard masses to the spectrum is usually obtainable for a good spectrum with a dispersion which gives a maximum mass of about 300. The weighted standard error for a mass range up to mass 600 is usually in the range of 0.5 to 1.5 millimass units (1 to 3 ppm). This corresponds to a relative accuracy in the line measurements of 0.2 to 0.5 micron, independent of the dispersion. These weighted standard errors are computed using the weighting factors derived in HIRESl. A comparison of the known and calculated masses of known ions, other than the standard masses, shows an accuracy comparable to the above values. Limited experience has indicated the accuracy of the ion intensity as calculated from Equation 5 is rather erratic as can be deduced from Figure 3. Some improvement in this relation is desirable. 1832
ANALYTICAL CHEMISTRY
?
,4
c
. I
H
a
B
m
Figure 4. Change in computed line center for each step of iteration, nominal mass 82, Figure 1
As stated previously, it is commonly assumed that the square root of the mass is a linear function of the position of the observed lines. However, Hargrove (3),Venkataraghavan, McLafferty, and Amy ( 4 ) have found that the slope of the chords connecting consecutive standard masses is not constant as would be expected for this relation. Calculations for our instrument have shown that a more nearly linear relation can be obtained if the value of P in Equation 4 is about 0.49. The use of this smaller value does seem to help in the identification of the standard masses using the extrapolation procedure. The program has provision for printing and plotting the derivative of the power series as a function of the distance for any value of P. The value of P which gives the most nearly constant value of the derivative is used for the subsequent calculations. PROGRAM LANGUAGE AND CALCULATION TIME
The three programs described above are mostly written in FORTRAN IV. However, several short machine-language programs are also required. Although these programs were originally written for use on an IBM 7040, versions suitable for use on a UNIVAC 1108 have been developed. Either version of the programs may be obtained by contacting the authors. The computing times required for various operations, as listed below, are for operation on an IBM 7040. An IBM 7094 would be expected to be 4 to 5 times faster and a UNIVAC 1108 is about 15 to 18 timesfaster. The program, HIRES1, requires an average of about 0.6 second for the complete processing of a record containing data for a single line. A typical record containing data for two lines requires about 5 seconds. Records for 5 to 6 lines
may require from 50 to 150 seconds for processing. A spectrum containing 200 lines will usually require about 3 to 7 minutes for processing. The program HIRES2 usually requires 10 to 60 seconds for identifying the standard masses and computing the masses of all the lines. The computing time required by HIRES3 depends on the number of atoms to be considered and on the size of the constants which define the maximum number of each atom permitted in a given formula. About 40 to 60 seconds is typical
for computing all possible combinations of C, C13, H, 0,and N for a spectrum containing 200 lines when one C13 and 2 to 4 0 and N atoms are permitted in each ion. ACKNOWLEDGMENT The authors acknowledge the valuable advice of A. C. Jones and J. H. Schachtschneider in some of the mathematical aspects of these programs.
RECEIVED for review June 10, 1968. Accepted July 19, 1968.
Automatic Apparatus for Sampling and Preparing Gases for Mass Spectral Analysis in Studies of Carbon Isotope Fractionation during Methane Metabolism Melvin P. Silverman and Vance 1. Oyama Exobiology Division, Ames Research Center, NASA, Moffett Field, Calif. 94035 An automatic apparatus is described which samples the gas phase above a microbial methane metabolizing system every two hours, separates and measures the individual gas components (H2, 02,Nz, C02, CH,) by dual-column gas chromatography, combusts methane quantitatively to CO,, and collects separately the metabolic CO, and COz derived from methane combustion for subsequent mass spectral analysis of carbon isotope ratios. With this apparatus, it was found that the aerobic methane oxidizing bacterium Methanomonas methanooxidans p refe rentia IIy utiI izes the Iig hter isotope of carbon, thus leaving the residual methane enriched in *3C. The microflora in Bower’s clay soil also preferentially utilize 1zC during anaerobic methane production from hydrogen and carbon dioxide; residual carbon dioxide becomes enriched in 13C, and biogenic methane becomes progressively more enriched in l3C.
FRACTIONATION of the stable isotopes of carbon by biological systems has been well established (I-5),and it may be possible to expioit this phenomenon as an indicator of life processes. Microorganisms that metabolize methane, either oxidizing it to COZin the presence of oxygen or producing it anaerobically from COz or other organic compounds, are widely distributed on Earth. Methane producing bacteria are responsible for some of the highest carbon isotope fractionations known (6,7) but no such information is available for the methane oxidizing bacteria. The possibility that certain extraterrestrial bodies may harbor similar microorganisms led us to study carbon isotope fractionation during methane metabolism in order to evaluate this process as the basis for the design of a life detection tool. Standard methods of preparing samples for mass spectrometric analysis of 1“c/ lZCratios demand considerable attention to detail ( I , 8). The samples must be converted completely to COZ,freed of water, and transferred to a suitable (1) H. Craig, Geochim. Cosmochim. Acta, 3, 53 (1953). (2) R. Park and S. Epstein, ibid., 21, 110 (1960). (3) P. H. Abelson and T. C. Hoering, Proc. Nut. Acad. Sci. U.S., 47, 623 (1961). (4) P. L. Parker, Geochim. Cosmochim. Acta, 28, 1155 (1964). (5) E. S. Cheney and M. L. Jensen, ibid., 29, 1331 (1965). (6) W. D. Rosenfeld and S. R. Silverman, Science, 130,1658 (1959).
(7) N. Nakai, “Geochemical Studies on the Formation of Natural
Gas,” Ph.D. diss., Nagoya Univ., Nagoya, Japan (1961); cited
in (5). (8) N. Nakai, J. Earth Sci., Nagoya Uniu., 8, 174 (1960).
vessel before they enter the mass spectrometer. These procedures, which require the use of leak-proof combustion and vacuum systems with their attendant problems, involve time consuming and tedious manual operations, and limit the number of samples that can be processed in a reasonable time. It was necessary for our studies of carbon isotope fractionation during microbial methane metabolism to develop an automatic apparatus that would periodically sample the gas phase above a methane metabolizing culture, separate and measure 0 2 , Nz, COZ,CHd), comthe individual gas components (Hz, bust methane quantitatively to COZ,and collect separately metabolic COSand the COzderived from methane combustion for subsequent mass spectrometric analysis. This paper describes such an apparatus capable of processing a gas sample every 2 hours. Preliminary results are given for experiments on carbon isotope fractionation by an aerobic methane oxidizing system and an anaerobic methane producing system. EXPERIMENTAL Automatic Apparatus. The apparatus is essentially a dualcolumn gas chromatograph connected in series with a combustion system and collection tubes. It is mobile and selfcontained except for a source of ac power. Table I lists the construction features and operational modes that were found to provide satisfactory performance. A flow diagram for the apparatus is shown in Figure 1. The programmed sequence of operations and events is illustrated in Figure 2. The water-jacketed fermentation unit, fitted with bacteriological filters at the gas inlets and outlets, is sterilized by autoclaving. Sterile medium is added to the main body of the fermentor and inoculum to the side arm. With stopcock 1 open, the unit is evacuated and then filled with the desired gas mixture. The inoculum is added at zero time through stopcock 2, and stopcocks 1 and 2 are closed. The culture is stirred with the Teflon-coated magnetic stirring bar. The temperature of the culture is controlled by pumping water from a water bath through the water jacket of the fermentor. Gas enclosed in the fermentation unit is circulated through the sample loop by a diaphragm pump to provide thorough mixing of the gases. At the time of sampling, the gases in the sample loop are injected into the helium carrier gas and pass through the Poropak T column where a composite peak (H,, 02,N,, CHI) is separated from a subsequent COz peak. Both peaks are sensed by a detector and the composite peak passes through valve V1 into the molecular sieve column. VOL. 40, NO. 12, OCTOBER 1968
1833