Anal. Chem. 2010, 82, 6726–6729
Computation of the Isotopic Distribution in Two Dimensions Jorge Fernandez-de-Cossio* Bioinformatics Department. Center for Genetic Engineering and Biotechnology, P.O. Box 6162, CP 10600, C. Habana, Cuba A new algorithm for the calculation of the isotopic distribution is described here. The extreme levels of detail, the coarser structures (∼1 Da) and the sharper irregular details (∼millidaltons), get split in separate dimensions. Consequently, dense sampling can be concentrated in the close informative surrounding of the isotopic peaks. The one-dimensional isotopic distribution is reconstructed by allotting the abundances along the mass axis. Performance is evaluated with small and relatively large compounds (300 Da to 90 kDa) of diverse composition including challenging polyisotopic elements. The superiority over well-established methods in terms of accuracy, speed, and memory resources is clearly demonstrated for high and ultrahigh resolution ranges by a program implementing the algorithm. The isotopic distribution shaping the mass spectrum signals conveys relevant information for the accurate analysis and “deconvolution” of complex mixture data at high resolution.1-6 Efficient tools are widely available to calculate the isotopic distribution of compounds for low and moderate high resolution (Figure S-8 in the Supporting Information). Unfortunately, their performance deteriorates for higher resolutions demanding from improvements and better automation. It was recently demonstrated that a considerable reduction in computation can be achieved by merely changing the mass of the elemental isotopes.7 However, a single implementation of this procedure optimally applicable for all compounds and resolutions remains difficult to reach, so far allowing efficient tuning only for families of compounds.7 A new algorithm is proposed in this manuscript for efficient isotopic distribution calculations, equally applicable for all compounds and resolutions, hence amenable to be introduced as a tool in high-throughput automated workflows. The crucial step * To whom correspondence should be addressed. E-mail: Jorge.cossio@ cigb.edu.cu. (1) Senko, M. W.; Beu, S. C.; McLafferty, F. W. J. Am. Soc. Mass Spectrom. 1995, 6, 229–233. (2) Horn, D. M.; Zubarev, R. A.; McLafferty, F. W. J. Am. Soc. Mass Spectrom. 2000, 11 (4), 320–332. (3) Roussis, S. G.; Proulx, R. Anal. Chem. 2003, 75, 1470–1482. (4) Fernandez-de-Cossio, J.; Gonzalez, L. J.; Satomi, Y.; Betancourt, L.; Ramos, T.; Huerta, V.; Besada, V.; Padron, G.; Minamino, N.; Takao, T. Rapid Commun. Mass Spectrom. 2004, 18, 2465–2472. (5) Sperling, E.; Bunner, A. E.; Sykes, M. T.; Williamson, J. R. Anal. Chem. 2008, 80 (13), 4906–4917. (6) Guan, S.; Burlingame, A. L. Mol. Cell. Proteomics 2009. (7) Fernandez-de-Cossio, J. Anal. Chem. 2010, 82 (5), 1759–1765.
6726
Analytical Chemistry, Vol. 82, No. 15, August 1, 2010
Figure 1. Three dimensional representation of the 2D-isotopic distributions of (C2Br3Cl3)5 at resolving power 5 × 106. The isotopic distributions obtained with the Da and the Da* metrics are plotted in dark blue and dark green, respectively (1 Da* ) 0.998 874 Da).
of this algorithm separates the coarser (∼1 Da) and sharper (∼millidalton) structures by 2D-Fourier analysis, getting rid of a burden of unnecessary calculations without detriment in accuracy. The superior performance over the two prevailing methods, polynomial8,9 and Fourier transform (1D),10,11 is clearly established for high and ultrahigh resolution. METHODS Rationale. 2D-Isotopic Distribution. The isotopic species are separated according to the number of nucleons, accommodating the sharp details along a second dimension. Graphically, the sharp details are profiled in parallel planes separated by integer mass units. A 3D plot of the 2D isotopic distribution of the compound (C2Br3Cl3)5, at resolving power 5 × 106, is shown in Figure 1 (blue profile). Let F(n, ∆m) be the relative abundance at integer mass n and mass defect ∆m. The abundances ansms of the species s with integer mass ns and mass defect ∆ms ) ms - n sum-up to F(n, ∆m) whenever n ) ns and ∆m ) ∆ms: F(n, ∆m) )
∑
ans,msδn,nsδ(∆m - ∆ms)
(1)
S∈species
where δj,k and δ( · ) are the Kronecker and Dirac delta functions, respectively. To accurately recover the isotopic distribution along the one-dimensional mass scale, the abundance at mass (8) Senko, M. W. IsoPro, version 3.1; Cornell University: Ithaca, NY, 1997. A computer program that implements Yergey’s improved polynomial method running under Microsoft Windows (ref 9). (9) Yergey, J. A. Anal. Chem. 1983, 55, 353–356. (10) Rockwood, A. L.; Van Orden, S. L. Mercury; Pacific Northwest Laboratory: Richland, WA, 1995. A software developed for the calculation of the isotopic distribution by Fourier transform methods (refs 11 and 12). (11) Rockwood, A. L.; Van Orden, S. L.; Smith, R. D. Rapid Commun. Mass Spectrom. 1996, 10, 54–59. 10.1021/ac101039x 2010 American Chemical Society Published on Web 07/08/2010
m is contributed by all the terms F(n, ∆m) where n + ∆m ) m. 2D-Fourier Transform. The two-dimensional Fourier transform f in terms of frequencies uand wyields f(u, w) )
∑
ans,∆ms e-2πi(nsu+∆msw),
i ) √-1
(2)
S∈species
The isotopic distributions FA and FB of two compounds (or atoms) A and B convolute into the isotopic distribution FABof the “merged” AB compound.12 By repeated application of the convolution theorem, the Fourier transform of the isotopic distribution of a compound with molecular formula AnA, BnB,. . ., ZnZ yields fAn
. . .Zn (u, w) ) [fA(u, w)]nA[fB(u, w)]nB. . .[fZ(u, w)]nZ
B A nB
Z
(3) where fA′fB′. . . are the Fourier transforms of the isotope abundances distribution of the elements A,B,. . ., and nA, nB,. . . are the respective elemental composition of the compound. Peak Shape. Any reasonably chosen peak shape can be naturally convoluted with the isotopic distribution.13 The peak shape F is a function of ∆m only, its Fourier transform P is a function of w only. Multiplication at the Fourier domain yields fAn
. . .Zn (u, w) × P(w)
B A nB
(4)
Z
Ranges. The same rule 10(1 + σ2)1/2 successfully applied by Rockwood is adopted here to sizing the range spanned by the isotopic distribution in the n-dimension, i.e., about 10 times its standard deviation σ. The n-dimension is heterodyned in the Fourier domain so as to shift the average mass to the vicinity of the origin. By replacement of σ with the average spread about each individual isotope peak, the same rule can be applied to the range spanned by the ∆m-dimension. A polynomial method, originally conceived to accurately calculate the individual isotope peak centroids,14 was modified at this stage to obtain the required average peak spread: the accumulated variance of each peak is properly recomputed at each polynomial expansion and merging steps. In compliance with a requirement of FFT, a power of 2 was chosen for the number of sampling points along both dimensions. Mass Scale Metrics. Even when the mass defect concentrates in small ranges, it might be located away from the integer nominal mass of the peak. In order to reduce the mass defect range, a change of scale is adopted in the n-dimension as an alternative to the standard dalton units. The metrics Da* is defined such that 1 Da* is equal to the average interpeak distance in Da. Algorithm. (1) Calculate the 2D-Fourier transform of the involved elements adopting the Da* metrics. (2) Multiply all of them according to their multiplicity in the chemical formula. (3) Multiply by the peak-shape Fourier transforms. (4) Inverse Fourier transform to the n × ∆m domain. (12) Rockwood, A. L. Rapid Commun. Mass Spectrom. 1995, 19, 103–105. (13) Rockwood, A. L.; Van Orden, S. L.; Smith, R. D. Anal. Chem. 1995, 67 (15), 2699–2704. (14) Rockwood, A. L.; Haimi, P. J. Am. Soc. Mass Spectrom. 2006, 17, 415–419.
The four steps outlined above loosely resembled Rockwood’s original formulation.13 The output at this stage is a 2D isotopic distribution. An additional step is required to build the conventional 1D isotopic distribution. For high and ultrahigh resolutions, the mass defect and the corresponding integer mass of the sampled abundances are added to obtain the actual masses. At lower resolutions, the isotopic peaks are not fully resolved to the baseline and multiple masses can overlap. Abundances of those masses just added up. Materials. Isotope Data. The isotope masses and abundances used in this manuscript come from the file ISOTOPE.DAT distributed along with the Mercury software.10 Compounds. Various compounds are used in the proof of concept and performance evaluation of the algorithm described in this manuscript (Table 1). Their particular bearing to the tests are described in the Supporting Information. Computer. Calculations were performed with an Intel Pentium D CPU 3.00 GHz, 1 GB RAM, running a 32 bits Windows Vista OS.15 Software. Various software were involved in the comparative evaluation of performance and accuracy. Mercury software10 with an option implements a digital filtering approach to zoom-in the fine structure of a single selected isotope peak.11 IsoPro software8 implements Yergey’s improved polynomial method.9 IsoDalton software16 calculates the isotopic distribution at “infinite” resolution by polynomial methods.17 Accurate centroids were calculated with an in-house implementation of the algorithm reported by Rockwood.14 Some computations apply the packing procedure7 using ad-hoc implemented scripts and spacer lengths manually assigned. A software program,18 conceived as a proof of concept in the first place, implements the algorithm put forth in this manuscript. The software adopts a Gaussian peak shape and seven sampling points per full-width at half-maximum (fwhm). The command line requires two parameters: the chemical formula and the resolving power (m/∆m50%). Optional parameters (dealing with output formats) and usage are described in the software companion readme.txt file. The program was coded in C++ using double precision arithmetic and compiled with Visual Studio19 to run on a Windows OS. RESULTS AND DISCUSSION Number of Sampling Points. The mass defect range spanned by each of the ∼39 peaks of the 2D isotopic distribution of (C2Br3Cl3)5 fall below ∼0.0474 Da (Figure 1 blue profile). A total of 2048 points is required to cover this range at a resolving power of 5 × 106, for a total of N2D ) 131 072 (64 × 2048) sampling points. With the original Fourier transform method13 at the same resolving power, a total of N ) 2 097 152 sampling points is required; i.e., 16N2D. By removal of spacers of length 0.987 Da, the packing procedure7 requires a total of NP ) 32 768 sampling points; i.e., 1/4 of N2D (this spacer length is near (15) Microsoft Corporation. Microsoft Windows Vista, 2007. (16) Snider, R. K. IsoDalton, version 1.0, 2007. Software for the calculation of the isotopic distribution by polynomial methods using dynamic programming techniques (ref 17). (17) Snider, R. K. J. Am. Soc. Mass Spectrom. 2007, 18, 1511–1515. (18) CIGB. DEUTERIUM, version 0.1, 2010. Software for the isotopic distribution calculation based on 2D-Fourier transform techniques (described here). (19) Microsoft Corporation. Microsoft Visual Studio, version 9 RTM; 2008.
Analytical Chemistry, Vol. 82, No. 15, August 1, 2010
6727
Table 1. Comparative Performance of the Calculations Exhibited by IsoPro and DEUTERIUM Software compound
formula
MW (kDa)
C2Br3Cl3
C2Br3Cl3
0.37
(C2Br3Cl3)5
(C2Br3Cl3)5
1.85
bovine-insulin
C254H378N65O75S6
5.73
EGF
C270H401N73O83S7
6.22
bovine-ubiquitin
C378H629N105O118S
8.56
P16
C681H1100N216O208S5
15.8
IFN-R 2
C964H1531N251O283S12
21.55
substance P
C2135H3243N517O579S26
46.25
human plasminogen
C3948H6073N1123O1213S59
90.57
Sn10
Sn10
1.19
Sn100
Sn100
11.87
resolving power m/∆m50%
Mercurya (seconds)
IsoPro b (seconds)
DEUTERIUM (seconds)
1 × 106 2 × 106 3 × 106 5 × 106 1 × 106 3 × 106 5 × 106 6 × 106 1 × 107 3 × 107 1 × 107 2.5 × 108 2 × 107 1.5 × 108 3 × 107 3.5 × 108 3 × 107 5 × 107 1 × 108 1 × 107 5 × 107 1.25 × 108 1 × 107 5 × 107 1 × 108 3 × 108 5 × 108 1 × 106 1 × 107 1 × 107 5 × 107 1 × 108
2.30 ----3.99 ----4.09 ----3.60 --27.89 3.65 --7.63 ----7.69 ------32.39 ------25.18 ---