Anal. Chem. 2003, 75, 1470-1482
Reduction of Chemical Formulas from the Isotopic Peak Distributions of High-Resolution Mass Spectra Stilianos G. Roussis* and Richard Proulx
Research Department, Products and Chemicals Division, Imperial Oil, Sarnia, Ontario, Canada, N7T 8C8
A method has been developed for the reduction of the chemical formulas of compounds in complex mixtures from the isotopic peak distributions of high-resolution mass spectra. The method is based on the principle that the observed isotopic peak distribution of a mixture of compounds is a linear combination of the isotopic peak distributions of the individual compounds in the mixture. All possible chemical formulas that meet specific criteria (e.g., type and number of atoms in structure, limits of unsaturation, etc.) are enumerated, and theoretical isotopic peak distributions are generated for each formula. The relative amount of each formula is obtained from the accurately measured isotopic peak distribution and the calculated isotopic peak distributions of all candidate formulas. The formulas of compounds in simple spectra, where peak components are fully resolved, are rapidly determined by direct comparison of the calculated and experimental isotopic peak distributions. The singular value decomposition linear algebra method is used to determine the contributions of compounds in complex spectra containing unresolved peak components. The principles of the approach and typical application examples are presented. The method is most useful for the characterization of complex spectra containing partially resolved peaks and structures with multiisotopic elements. The development of techniques such as matrix-assisted laser desorption/ionization (MALDI),1 electrospray ionization (ESI),2 and atmospheric pressure chemical ionization (APCI)3,4 has greatly enhanced the range of compounds amenable to characterization by mass spectrometry. MALDI provides rapid and routine analysis of large, nonvolatile molecules. For example, intact microorganisms and bacteria can be rapidly analyzed by the method.5,6 ESI is more compatible than MALDI with on-line liquid chromato* To whom correspondence should be addressed. Tel: 519-339-2441. Fax: 519-339-4436. E-mail:
[email protected]. (1) Karas, M.; Hillenkamp, F. Anal. Chem. 1988, 60, 2299-2301. (2) Fenn, J. B.; Mann, M.; Meng, C. K.; Wong, S. F.; Whitehouse, C. M. Science 1989, 246, 64-71. (3) Horning, E. C.; Carroll, D. I.; Dzidic, I.; Haegele, K. D.; Horning, M. G.; Stillwell, R. N. J. Chromatogr. 1974, 99, 13-21. (4) Henion, J. D.; Thomson, B. A.; Dawson, P. H. Anal. Chem. 1982, 54, 451456. (5) Fenselau, C.; Demirev, P. A. Mass Spectrom. Rev. 2001, 20, 157-171. (6) Lay, J. O. Mass Spectrom. Rev. 2001, 20, 172-194.
1470 Analytical Chemistry, Vol. 75, No. 6, March 15, 2003
graphic methods of separation and is being widely used in pharmaceutical, biochemical, and environmental applications to analyze large, nonvolatile and thermally labile compounds.7-9 More recently, ESI has been applied to the characterization of petroleum fractions.10-14 APCI was the earliest of the three methods to develop, and although not as widely used as ESI, it has unique capabilities for quantitative analysis due to its linear signal increase with analyte concentration over a wide dynamic range.14,15 APCI is readily compatible with liquid chromatography, and even though its ability to analyze ionic compounds is limited in comparison to ESI, its application areas greatly overlap with those of ESI.14,16 The newer ionization methods have a great potential to characterize nonboiling petroleum fractions, which have been historically difficult to achieve with conventional sample introduction systems (e.g., direct insertion probe, all-glass-heated inlet system) and ionization methods (e.g., electron ionization, chemical ionization). However, the ability to ionize high molecular weight compounds introduces new challenging demands for higher instrument resolving powers. The resolving power needed to separate two compounds differing by a constant mass doublet increases linearly with the mass of the compounds. For example, the compound types alkylbenzenes (CnH2n-6) and benzothiophenes (CnH2n-10S) differ by the C2H8-S mass doublet (0.0905 amu). To separate the isobars at m/z 134 a resolving power of ∼1500 is required, whereas to separate the isobars at m/z 1394 a resolving power of ∼15 000 is required. Separation of this mass doublet is readily achievable by many modern medium and high resolving power instruments. However, separation of other mass doublets common in petroleum mixtures (e.g., 13CH/N, C3/SH4) can (7) Cole, R. B., Ed. Electrospray Ionization Mass Spectrometry: Fundamentals, Instrumentation and Applications; John Wiley & Sons: New York, 1997. (8) Niessen, W. M. A. Liquid Chromatography-Mass Spectrometry: 2nd ed.; Marcel Dekker: New York: 1999. (9) Pramanik, B. N., Ganguly, A. K., Gross, M. L., Eds. Applied Electrospray Mass Spectrometry; Marcel Dekker: New York, 2002. (10) Hsu, C. S.; Dechert, G. J.; Robbins, W. K.; Fukuda, E. K. Energy Fuels 2000, 14, 217-223. (11) Qian, K.; Rodgers, R. P.; Hendrickson, C. L.; Emmett, M. R.; Marshall, A. G. Energy Fuels 2001, 15, 492-498. (12) Qian, K.; Robbins, W. K.; Hughey, C. A.; Cooper, H. J.; Rodgers, R. P.; Marshall, A. G. Energy Fuels 2001, 15, 1505-1511. (13) Rousssis, S. G.; Proulx, R. Anal. Chem. 2002, 74, 1408-1414. (14) Roussis, S. G.; Fedora, J. W. Rapid Commun. Mass Spectrom. 2002, 16, 1295-1303. (15) Carroll, D. I.; Dzidic, I.; Horning, E. C.; Stillwell, R. N. Appl. Spectrosc. Rev. 1981, 17, 337-406. (16) Thurman, E. M.; Ferrer, I.; Barcelo, D. Anal. Chem. 2001, 73, 5441-5449. 10.1021/ac020516w CCC: $25.00
© 2003 American Chemical Society Published on Web 02/15/2003
become very challenging (required resolving power of ∼62 000 and ∼147 000, respectively, to separate these mass doublets at m/z 500)17 even for state-of-the-art ultrahigh-resolution FT-ICR instruments.11 A partial reduction of the complexity can be achieved by using liquid chromatography to separate the sample in fractions of different polarity. Unfortunately, this approach does not always reduce the resolving power requirements as is the case for the nonpolar sulfur aromatic compounds, which overlap with the nonpolar hydrocarbon compound types in the bulk hydrocarbon matrix but cannot be separated based on polarity differences. Gas chromatography (GC) can reduce the resolving power requirements for the characterization of low-boiling fractions by separating the mixture components as a function of boiling point.18 However, there is no significant improvement for fractions boiling higher than ∼350 °C due to their high complexity.19 Furthermore, the short elution time of each mixture component (e.g., 10 s/mass decade). The ability of ESI to characterize heavy and nonboiling aromatic petroleum fractions by complexation with Ag+ has been demonstrated recently.13 The approach extends the ESI applicability range from the conventional analysis of polar and ionic compounds to the analysis of neutral hydrocarbons in heavy petroleum fractions and vacuum residues. The method is sensitive, and its ability to introduce samples in a continuous-flow mode makes it suitable for high resolving power experiments. However, the reduction of molecular information from the raw data is very challenging. In addition to the natural complexity of the fractions, farther complexity is introduced from the use of the multiisotopic silver reagent ion. Comprehensive characterization of the heavy petroleum fractions using Ag+ ESI experiments would require the combined use of high resolving power experiments and advanced data reduction methods. Such a combined approach is presented in this report. The traditional approach used to obtain information about the chemical nature of compounds in a sample analyzed by highresolution mass spectrometry is based on the assignment of the most likely chemical formula(s) to each accurately measured peak in the spectrum, within a given experimental error (e.g., 10 ppm).20,21 Restrictions about the type and number of atoms in the candidate structure are applied based on supplemental knowledge derived from other experiments (elemental analysis, IR, tandem mass spectrometry, etc.). Prior knowledge of the sample origin and previous analysis of similar samples can further reduce the number of possible candidate formulas by applying additional restrictions. The approach is generally successful when the resolving power used for the analysis is equal to or higher than that required to separate the isobaric mass doublets. For samples of similar nature, it has been possible in the past to use empirical approaches to account for the limitations of the resolving power of the instruments. For example, the peak intensity trends of the fully separated mass doublets at the lower masses have been used (17) Roussis, S. G. Rapid Commun. Mass Spectrom. 1999, 13, 1031-1051. (18) Hsu, C. S.; Green, M. Rapid Commun. Mass Spectrom. 2001, 15, 236-239. (19) Roussis, S. G.; Fitzgerald, W. P. Energy Fuels 2001, 15, 477-486. (20) Johnson, B. H.; Aczel, T. Anal. Chem. 1967, 39, 682-685. (21) Aczel, T.; Allan, D. E.; Harding, J. H.; Knipp, E. A. Anal. Chem. 1970, 42, 341-347.
to assign by extrapolation the relative amounts of the nonresolved mass doublet components at the higher masses.22,23 A more accurate approach to obtain the relative amounts of components in overlapping mass doublets is by consideration of the unresolved peak abundance and the difference between the theoretical and measured masses of the candidate formulas (peak centroid method).17 Peaks in spectra produced by cationization using silver or other multiisotopic reagent ions in addition to contributions of unresolved isobaric peak components contain additional contributions from the heavy isotopes of the lower mass components. Direct application of the centroid method is not possible because the observed peak mass centroid will be affected by the isotopic contributions of the lower mass components. The contribution of the lower mass peak isotopes must be accurately accounted for to reliably assign the formulas of the higher mass peaks. In this work, to improve the traditional approach of formula reduction from high-resolution mass spectra, which is based on the measurement of the mass of a single peak, a method has been developed that uses all accurately measured peaks in the isotopic peak cluster. In principle, the observed isotopic peak distribution is a linear combination of the isotopic distributions of a list of possible structures with different formulas. The method first enumerates all possible chemical formulas that meet specific criteria (e.g., type and number of atoms in structure, limits of unsaturation, etc.). The criteria limits can vary depending on the knowledge about the nature of the sample. Theoretical isotopic distributions are then generated for each candidate chemical formula. The relative amount of a given formula is finally obtained from the accurately measured isotopic peak distribution and the calculated isotopic peak distributions of all candidate formulas. The method is most useful for the reduction of chemical formulas from complex spectra containing partially resolved peaks and structures containing multiisotopic elements. EXPERIMENTAL SECTION Mass Spectrometer. A ZabSpec Ultima tandem doublefocusing magnetic sector/orthogonal acceleration time-of-flight (oa-TOF) mass spectrometer (Micromass Ltd., Manchester, U.K.) was used for the experiments.24 Experiments were conducted either by scanning the fully laminated Ultima magnet (field strength ∼1.5 T) (magnet scan) or by scanning the accelerating voltage (voltage scan). Spectra were acquired in the continuum (profile) acquisition mode to maintain peak shape and fine structure information. The magnet was rapidly scanned (e.g., 2 s/mass decade) over wide mass ranges (e.g., m/z 1500-100), usually under low resolving power conditions (e.g., ∼1500), to acquire the entire molecular weight distributions of samples. Most high resolving power experiments were conducted by the voltage scan. The scanning rate for the voltage scan was 5 s/mass decade. The resolving power for the voltage scan experiments ranged, depending on the application, between 1500 and 20 000 (full width at half peak height, fwhh). Several narrow (∼100-150 Da each), consecutive, and overlapping scans were used to acquire the entire mass range of interest by the voltage scan. (22) Aczel, T. Rev. Anal. Chem. 1972, 1, 226-261. (23) Schmidt, C. E.; Sprecher, R. F.; Batts, B. D. Anal. Chem. 1987, 59, 20272033. (24) Bateman, R. H.; Green, M. R.; Scott, G.; Clayton, E. Rapid Commun. Mass Spectrom. 1995, 9, 1227-1233.
Analytical Chemistry, Vol. 75, No. 6, March 15, 2003
1471
Samples were analyzed using the ESI interface in the positive ionization mode. The accelerating voltage was 4000 V. The potential difference between the counter electrode and the electrospray needle was ∼3000 V. Nitrogen was used as bath and nebulizer gas. The ESI interface temperature was 90 °C. The needle, sampling cone, skimmer lens, and ring electrode voltages were tuned for maximum sensitivity. Poly(propylene glycol) (PPG) mixtures were used to calibrate the mass scale. Sample Preparation and Introduction. Solutions of crude oils and fractions (∼1000 ppm) were prepared by introducing appropriate amounts of the samples into a hydrocarbon solvent (e.g., toluene). Further dilution to suitable concentrations for ESI analysis (e.g., 10 ppm) was done using methanol. For the metal cationization experiments, silver nitrate (∼10 ppm) was added either directly into the methanol solution containing the hydrocarbons (continuous infusion experiments) or into the methanol solvent used as the mobile phase (flow injection experiments). A Hewlett-Packard 1090 HPLC unit and a Harvard model 22 syringe pump were used for the flow injection and continuous infusion experiments, respectively. Flow rates ranged between 2 and 50 µL/min. The injection volume was 20 µL. Samples. Model compounds were obtained from Aldrich. Crude oils, petroleum fractions, and products were obtained from Imperial Oil, Products and Chemicals (Sarnia, ON, Canada). RESULTS AND DISCUSSION Several methods have been developed to determine the elemental compositions of ions from the measured isotopic peak distributions.25-27 However, most of these methods are based on low-resolution mass spectra with peaks that are not accurately measured and cannot be used to treat complex systems involving overlapping peak components. The presence of overlapping peak components was considered in an early work by Hilmer and Taylor,28 who developed a method to reduce the number of possible formulas corresponding to peaks in high-resolution mass spectra by converting isotopically equivalent formulas to single nonisotopic formulas. The contributions of individual chemical species were obtained by mathematical fitting of the measured and calculated isotopic peak distributions. However, the use of a single centroid for each measured peak, produced underdetermined systems of equations. That is, the spectrum could contain fewer peaks than the number of chemical species in the mixture (i.e., more unknowns than equations). In later work, van Katwijk29 used the entire profile of peaks to obtain many data points per chemical species and, hence, systems with more equations than unknowns (overdetermined systems of equations). In the current work, we have used the profiles of isotopic peak distributions to determine the nature and amounts of individual components in complex mixtures. This was done to maximize the use of the spectral information available in high-resolution mass spectra. Our need to characterize heavy and nonboiling petroleum fractions by complexation with Ag+ in ESI experiments has placed an emphasis on the reduction of chemical formulas containing multiisotopic elements. The principles of the approach and typical applications are described below. (25) Evans, J. E.; Jurinski, N. B. Anal. Chem. 1975, 47, 961-963. (26) Tenhosaari, A. Org. Mass Spectrom. 1988, 23, 236-239. (27) Do Lago, C. L.; Kascheres, C. Comput. Chem. 1991, 15, 149-155. (28) Hilmer, R. M.; Taylor, J. W. Anal. Chem. 1974, 46, 1038-1044. (29) van Katwijk, J. Int. J. Mass Spectrom. Ion Phys. 1981, 39, 287-310.
1472
Analytical Chemistry, Vol. 75, No. 6, March 15, 2003
Principles of Approach. The current approach is based on the fundamental assumption that the observed isotopic peak distribution of a mixture of compounds is a linear combination of the isotopic peak distributions of the individual compounds in the mixture. The experimental peak intensity ei of a data point i in the distribution is equal to the sum of the peak intensities tij of the n individual mixture components at the same data point i, adjusted for their relative amounts xj in the mixture: n
ei )
∑t x
ij j
(1)
j)1
In that manner, the formula and amount of each mixture component are determined from the accurately measured spectrum and the theoretical peak distributions of the mixture compounds. In matrix notation, the equation becomes
{X} ) {E}[T]-1
(2)
where {X} is a vector containing the compound concentrations (output), {E} is the experimental spectrum acquired in profile mode (input), and [T] is a matrix containing the theoretical isotopic peak distributions of all possible compounds in the mixture, also in profile mode (input). Solution of the equation generates the relative amounts of the individual components in the mixture. The use of the profile mode ensures more equations than unknowns. Uniform response factors are assumed for the mixture components. Predetermined response factors for specific compounds or groups of compound types could be used to increase the quantitative accuracy of the method. Data Acquisition. Data acquisition and preliminary data treatment are conducted using the Micromass Opus V3.5X data system operating on a Digital AlphaStation 255 computer. In typical high-resolution experiments, the mass spectrometer is first tuned to the required resolving power by fine adjustment of the instrument slits and the ion focusing lens voltages. High- and lowmass limits, experimental resolution, and scan rate (s/mass decade) parameters are defined for the acquisition of mass spectra. A 16-bit 400-kHz analog-to-digital converter is used to acquire the data. The generation of a composite signal by the accumulation of several scans increases the signal-to-noise ratio but generally reduces the effective peak resolution. Although the current method was developed using data acquired with a magnetic sector instrument, data acquired with other instruments capable of performing high-resolution experiments, such a FT-ICR and TOF, can equally be used by the method. The only requirement in such cases is the accurate description of the peak shape as a function of mass, since it can change depending on the instrument used for the experiments. The discussion in this report is restricted to the data acquired with the ZabSpec instrument. Data Treatment Steps: (1) General Information. Smoothing and mass calibration of the raw data is done using the existing routines of the Opus data system. The List routine is used to export the data to an ASCII file, which is then transferred to a Pentium 4 PC (2.2 GHz, 512 MB memory) for further treatment. The file contains the absolute and relative peak intensities as a
Figure 1. Flowchart summary of the data treatment steps.
function of mass and acquisition time in ascending (voltage scan) or descending (magnet scan) mass order. A computer program residing on the PC was written for the calculations. The original code was developed in Compaq Visual Fortran 6.5. Later, it was transferred to Microsoft Visual Basic 6.0, which although slower than Fortran provided a convenient platform for the development of the user interface. A flowchart summary of the data treatment steps is shown in Figure 1. The program provides for the manual treatment of selected mass segments or the automated treatment of the entire spectrum. In the automated mode, individual mass segments are treated stepwise from the low to the high masses. Criteria can be changed for the treatment of selected mass segments, but they are fixed for the automated treatment option. Manual treatment of selected mass segments is usually done to establish the most appropriate treatment criteria and typically precedes the automated option. (2) Abundance Threshold. An abundance threshold is initially defined in absolute or relative counts (e.g., percent of base peak) to eliminate the baseline signal and reduce the overall number of data points that the program has to handle. We found that this significantly reduces the computational time with very little, or no, effect on the accuracy of the results. Incorporating the baseline data points in the calculations offers no benefit since they contain no useful information about the sample components. This becomes particularly important at the higher resolution settings where a very large number of baseline data points are acquired between peaks. This is because at the higher resolution settings peak widths become narrower and the sampling rate must increase in order to maintain a constant number of data points per peak. By applying the abundance threshold, the computational time is only a function of the data points acquired with signal above
the baseline. However, caution is required to ensure that the abundance threshold is lower than the lowest abundance peak in the spectrum. (3) Peak Definition. Two peak definition methods are possible with the program: (1) full peak and (2) peak top. In the first method, the main criteria used to define peaks are the peak start, peak end, and peak width. The peak valley criterion is not used in order to maintain the integrity of partially resolved peaks. In that way, partially resolved components remain parts of a single peak. This maintains the accuracy of the peak shape and ensures the accurate representation of the experimental peak shapes needed for the calculations. The peak width criterion is primarily used to discount the detection of noise peaks (e.g., peaks less than ∼3 data points wide). The number of formulas determined by this peak definition method is the largest possible since all formulas with masses within the peak start and peak end mass boundaries are considered possible. Using this method, the number of possible formulas can only be reduced by performing the experiment at higher resolving powers where the peak widths are reduced and individual components can be separated. The second peak definition method is more flexible as it uses the peak top and a user-specified mass error to determine the number of possible formulas. By changing the mass error window, one can control the number of formulas determined. The benefit of the peak top definition method is for experiments conducted at lowresolution settings involving simple (i.e., not overlapping) systems. In that case, using the full peak definition method would generate a large number of formulas since at low resolution the peak width becomes large. On the other hand, using the peak top definition and the user-specified mass error limits the number of formulas and produces accurate results without the need for high resolving power. By opening of the mass error window the peak top method becomes very similar to the full peak definition method. (4) Determination of Possible Formulas. All formulas with masses within the mass boundaries of each defined peak are determined by consideration of a list of user-specified criteria. These criteria include the minimum and maximum number of atoms in the structure, the minimum and maximum ranges for the degrees of unsaturation (Z-number, double bond equivalent), and the maximum number of heteroatoms in the structure. The type of ion is allowed to be defined (i.e., odd electron, even electron, or both) as this contains information about the nature of the ions one expects in the spectrum, depending on the type of ionization used. The number of charges criterion is available for the consideration of multiply charged ions. (5) Generation of Theoretical Isotopic Peak Distributions. For each formula in each peak, the theoretical isotopic peak distributions are determined using an algorithm developed from first principles.30,31 The abundance of the isotopic peak distribution of a molecule with a general formula AB...X is obtained from the multiplication of the natural isotopic abundances A1, A2, A3, ..., Am of element A with those of element B (B1, B2, B3, ..., Bn)..., and those of element X (X1, X2, X3, ..., Xk). Atoms of the same element in the formula can be treated as different elements and their isotopic distributions can be calculated in the same sequential multiplication fashion. (30) Hugentobler, E.; Loliger, J. J. Chem. Educ. 1972, 49, 610-612. (31) Genty, C. Anal. Chem. 1973, 45, 505-511.
Analytical Chemistry, Vol. 75, No. 6, March 15, 2003
1473
A very large number of isotopic species is possible for molecules containing many elements, which can make the calculations long and difficult to manage. However, one can considerably reduce the number of the species by eliminating the isotopes having extremely low abundances (e.g.,