Computation of Isotopic Peak Center-Mass Distribution by Fourier

Aug 8, 2012 - Jorge Fernandez-de-Cossio Diaz† and Jorge Fernandez-de-Cossio*‡ ... Michael T. Marty , Andrew J. Baldwin , Erik G. Marklund , Georg K. A...
1 downloads 0 Views 441KB Size
Article pubs.acs.org/ac

Computation of Isotopic Peak Center-Mass Distribution by Fourier Transform Jorge Fernandez-de-Cossio Diaz† and Jorge Fernandez-de-Cossio*,‡ †

Faculty of Physics, Havana University, Havana, Cuba Bioinformatics Department, Center for Genetic Engineering and Biotechnology, La Habana, Cuba



ABSTRACT: We derive a new efficient algorithm for the computation of the isotopic peak center-mass distribution of a molecule. With the use of Fourier transform techniques, the algorithm accurately computes the total abundance and average mass of all the isotopic species with the same number of nucleons. We evaluate the performance of the method with 10 benchmark proteins and other molecules; results are compared with BRAIN, a recently reported polynomial method. The new algorithm is comparable to BRAIN in accuracy and superior in terms of speed and memory, particularly for large molecules. An implementation of the algorithm is available for download.

D

considered later. Let a(n) and c(n) be the total abundance and average mass of all isotopic configurations of G with equal excess nucleon number n (by number of excess nucleons we mean the number of nucleons a molecule or atom has over its lightest isotopic variant; from now on, “excess nucleon number” will be abbreviated to “nucleon number”). By center-mass distribution of G we mean the list of ordered pairs (c(n), a(n)) for the relevant values of n (the prominent peaks). Let ai(n) and ci(n) be the abundance and mass of the isotope of i with n nucleons where i is one of X, Y, ..., Z. If i has no stable isotope with n nucleons, this means that ai(n) = 0. We can express a(n) and c(n) in terms of ai(n), ci(n). Indeed, the isotopic species of G with n nucleons are those in which nX + nY + ... + nZ = n, where ni is the nucleon number of the isotope present for i. The abundances of each of those species is aX(nX) aY(nY)...aZ(nZ). It follows that

ramatic technological advances of mass spectrometry have been impacting the protein chemistry laboratories in the past few years. Nowadays, widely available instruments routinely deliver accurate mass measurements at very high resolution and sensitivity, serving current high-throughput technologies, like proteomics, metabolomics, and others.1,2 However, the valuable information purported by these accurate data is still poorly exploited by current identification methods.3 The isotopic distribution, the basic structure of an individual signal in a mass spectrum, is typically resolved in its individual isotopic peaks, each of which tightly agglomerates the isotopic species with the same nominal mass. At the high mass accuracy and resolution currently served by the state-of-the-art instruments, the center mass of the individual isotope peaks shall provide higher discriminant power for the identification of compounds than single mass statistics (for example, monoisotopic, average, or most abundant). The location, spread, and distances between center masses differ between molecules with different elemental compositions.4,5 Two approaches dominate the computations of isotopic distributions: polynomial methods6 and Fourier transform methods.7 See ref 8 for a recent overview. Accurate center-mass distributions have been mainly approached by polynomial methods.9−12 In the present manuscript we derive for the first time a Fourier transform method for the accurate and efficient computation of the center-mass distributions. The performance of the algorithm in terms of speed, memory, and accuracy is compared with BRAIN, an efficient polynomial method recently reported.9 Comparisons of BRAIN with other software can be found in the same reference.



a(n) =

aX (nX )aY (nY )...aZ (nZ )

= (aX ∗aY ∗...∗aZ )(n)

(1)

where ( f∗g)(n) denotes the discrete convolution of f and g.7 The center mass is calculated by adding the masses of the isotopic configurations of G with n nucleon number, weighted by their abundances.13 The mass of a particular isotopic configuration of G is cX(nX) + cY(nY) + ... + cZ(nZ), and its abundance is aX(nX)aY(nY)...aZ(nZ); summing over all values of nX, nY, ..., nZ with nX + nY + ... + nZ = n and multiplying by the normalization factor 1/a(n) gives

METHODS

Received: May 12, 2012 Accepted: July 23, 2012 Published: August 8, 2012

Rationale. Consider a molecule G = XY...Z, where X, Y, ..., Z stand for distinct generic elements. Element multiplicity will be © 2012 American Chemical Society

∑ nX + nY + ... + nZ = n

7052

dx.doi.org/10.1021/ac301296a | Anal. Chem. 2012, 84, 7052−7056

Analytical Chemistry

Article

Table 1. Ten-Polypeptide Set 1 2 3 4 5 6 7 8 9 10

c(n) =

common name

molecular formula

monoisotopic

av

angiotensin II bovine insulin human insulin human myoglobin human intrinsic factor bovine serum albumin human Na/K ATPase renal isoform, subunit human ATP binding cassette protein human intrinsic factor−hydroxocobalamin receptor human dynein

C50H71N13O12 C254H377N65O75S6 C520H817N139O147S8 C744H1224N210O222S5 C2023H3208N524O619S20 C2934H4615N781O897S39 C5047H8014N1338O1495S8 C8574H13378N2092O2392S77 C17600H26474N4752O5486S197 C23832H37816N6528O7031S170

1045.534515 5729.600867 11616.84935 16812.95478 45387.00703 66389.86247 112823.8795 186386.7993 398470.367 533403.4751

1046.181107 5733.510759 11624.44875 16823.32135 45415.67937 66432.45556 112895.1259 186506.0526 398722.9725 533735.2147

1 a(n)



The new algorithm can now be summarized as follows: (1) Determine the range N of points to include in the calculation by eq 6. (2) Compute âi(v)b̂i(v) for each chemical element in G. (3) Compute â(n) and b̂(n) by eq 5. (4) Apply the inverse fast Fourier transform to obtain a(n), b(n). (5) Divide b(n) by a(n) to obtain the average masses c(n). Consider, for example, the carbon dioxide molecule: G = CO2. The abundance and mass of 12C, 13C, 16O, 17O, and 18O are denoted by aC(0), aC(1), aO(0), aO(1), aO(2) and cC(0), cC(1), cO(0), cO(1), cO(2), respectively. Using the range given by eq 6 with σ2 = σC2 + 2σO2, we compute the Fourier transforms âC(v), b̂C(v)âO(v), bÔ (v). Then we have

aX (nX )aY (nY )...aZ (nZ )

nX + nY + ... + nZ = n

(cX(nX ) + cY (nY ) + ... + cZ(nZ ))

Defining b(n) = a(n)c(n) and bi(n) = ai(n)ci(n) yields b(n) = (bX ∗aY ∗...∗aZ )(n) + (aX ∗bY ∗...∗aZ )(n) + ... + (aX ∗aY ∗...∗bZ )(n)

(2)

Taking the discrete Fourier transform of eqs 1 and 2 gives a(̂ ν) = aX̂ (ν)aŶ (ν)...aẐ (ν)

aCO ̂ 2(ν) = aĈ (ν)[aÔ (ν)]2 ,

b(̂ ν) = bX̂ (ν)aŶ (ν)...aẐ (ν) + aX̂ (ν)bŶ (ν)...aẐ (ν) + ... + aX̂ (ν)aŶ (ν)...bẐ (ν)

⎛ ̂ ⎞ ̂ ̂ (ν) = ⎜⎜ bC(ν) + 2 bO(ν) ⎟⎟a(̂ ν) bCO 2 aÔ (ν) ⎠ ⎝ aĈ (ν)

(3)

̂ denotes the discrete Fourier transform of f(n). We where f(v) recover a(n), b(n) by taking the inverse discrete Fourier transforms of â(v), b̂(v). Finally, the center masses are obtained from c(n) = b(n)/a(n)

From the inverse Fourier transforms aCO2(n), bCO2(n) the center masses are given by cCO2(n) = bCO2(n)/aCO2(n). Software. A program implementing the algorithm proposed here was coded in C#, Microsoft NetFramework 4. The algorithm BRAIN9 is implemented as a Bioconductor package coded in R. Version 1.0.0 was downloaded from http://www. bioconductor.org/packages/release/bioc/html/BRAIN.html. The number of consecutive peaks starting from the monoisotopic mass required by the BRAIN algorithm was computed by the rule recommended in ref 9, with the function “calculateNrPeaks” provided in the BRAIN package. We set to 14 digits for BRAIN output. The scripts in R to reproduce the results for BRAIN computations in this manuscript, as well as an implementation of our algorithm, can be downloaded from http://bioinformatica.cigb.edu.cu/isotopica/centermass.html. Only peaks within ±8σ of the theoretical average mass were considered in the computation of the average mass and standard deviation from the output of both algorithms. Data. The mass and abundances of the stable isotopes of carbon, hydrogen, nitrogen, oxygen, and sulfur were taken from IUPAC 1997 Standard.15 Three probe data sets are used to evaluate and compare the algorithm proposed here; they are named 10-polypeptide set, sulfur set, and averagine set. The 10-polypeptide set is a set of 10 biomolecules used as benchmarks for isotopic computations9,10 (Table 1). The sulfur set contains compounds with generic formula S100×n, where n = 1, ..., 50. Sulfur has four abundant stable isotopes, which makes it peculiar and suitable for tests.4,5,16 The averagine set contains molecules with generic formula averaginen, where avergine is a model molecule with the elemental composition of an average protein C4.9384H7.7583N1.3577O1.4773S0.0417.17 Here n = 10, 50, 250,

(4)

This is the mathematical basis of our algorithm. Suppose now that X, Y, ..., Z are distinct elements having multiplicities p, q, ..., r in G:

G = X pYq...Zr It is then convenient to rewrite eqs 3 in the form a(̂ ν) = [aX̂ (ν)]p [aŶ (ν)]q ...[aẐ (ν)]r ⎛ b ̂ (ν ) b ̂ (ν ) b ̂ (ν ) ⎞ + ... + r Z ⎟⎟a(̂ ν) b(̂ ν) = ⎜⎜p X +q Y aŶ (ν) aẐ (ν) ⎠ ⎝ aX̂ (ν)

(5)

It is clear from eq 5 that the numbers âi(v), b̂i(v) need to be computed only once per distinct chemical element in G. To implement the Fourier transforms it is necessary to determine the range of values of n to include in the calculation. This is the range spanned by the prominent peaks. Following ref 14, we use the standard deviation σ of the mass distribution and calculate N = ⌈α 1 + σ 2 ⌉

(6)

where σ is the standard deviation of the mass distribution, the half-brackets ⌈x⌉ denote the smallest integer larger than x, and α is a constant factor, usually taken as α = 10. We use here α = 16. A range of N points is then taken, centered at the average nucleon number of G. 7053

dx.doi.org/10.1021/ac301296a | Anal. Chem. 2012, 84, 7052−7056

Analytical Chemistry

Article

Figure 1. Time required for 100 repetitions of the computation of the isotopic center-mass distribution of the compounds in the three data sets described in the Methods section. Computations with the Fourier algorithm are shown in the upper panel. The plots are adjusted with a square root curve. Computations with the BRAIN algorithm are shown in the lower panel. The plots are adjusted with a quadratic curve.

distribution by eq 6, it increases roughly as the square root of the mass m. Hence, the performance is roughly of order √m log m. A polynomial method named BRAIN to compute the centermass distribution based on Viète’s formulas and Newton’s identities has been recently published.9 Briefly, let g(ξ) be a polynomial such that g(0) ≠ 0 with (in general complex) roots ξ1, ξ2, ..., ξl, which need not all be distinct. Then

500, 750, 1000, 2000, 3000, 4000, 5000. Noninteger multiplicity of an atomic element is rounded to the nearest integer, for example, S4.17 → S4.



DISCUSSION Heterodyning. Contrary to previous Fourier transform formulations of isotopic distribution computations,7 the present algorithm does not require heterodyning, in principle. Any translation in the nucleons domain keeps the numerator paired to the corresponding denominator in eq 4, producing the same set of mass-abundances pairs. The sole advantage in heterodyning here is to produce, in only O(N) operations, this same set but ordered by mass. Performance. Comparison of algorithms in terms of the performance of particular implementations is in general misleading, since their codes run on different platforms and software and have unequal grades of optimization. So we first give a comparison in terms of the theoretical limits of performance of the algorithms, and only then illustrate the performance of actual implementations. The algorithm proposed in this article requires the computation of the abundances and of the masses times abundances (âk and bk̂ ) in the frequency domain, once for each distinct element occurring in the molecular formula. These computations are fast because most entries in ai(n), bi(n) are zero except for a few small values of n, since the number of isotopes of each element is very small compared with the range N. The major burden of the calculations lies in the computation of the inverse fast Fourier transform of â(v) and b̂(n). Hence, the order of the algorithm is dominated by the O(N log N) operations performed in this step. Since N is approximately proportional to the standard deviation of the isotopic

g(ξ) = αl(ξ − ξ1)(ξ − ξ2)...(ξ − ξl) = α0 + α1ξ + ... + αlξ l

Given that one knows the roots ξ1, ξ2, ..., ξl and the coefficient α0, the coefficients αj with 1 ≤ j ≤ l can be computed recursively through the formula αn = −

1 j

n

∑ αn− k(ξ1−k + ξ2−k + ... + ξl−k),

1≤n≤l

k=1

(7)

which follows from substituting Newton’s identities into Viète’s formulas. Consider the molecule G = XpYq...Zr. One can obtain the peak abundances from the expansion of the polynomial in a symbolic variable ξ: g (ξ ; p , q , ..., r ) = [gX (ξ)]p [gY (ξ)]q ...[gZ (ξ)]r

where gi(ξ) = ai(0) + ai(1)ξ + ai(2)ξ 2 + ...

Expanding: g (ξ ; p , q , ..., r ) =

∑ a(n)ξ n n

7054

(8)

dx.doi.org/10.1021/ac301296a | Anal. Chem. 2012, 84, 7052−7056

Analytical Chemistry

Article

Table 2. Difference between the Theoretical and Calculated Average Mass and Standard Deviation of the Isotopic Distributions of Compounds in the Three Data Sets, as Computed by the Fourier Transforms Algorithm and the BRAIN Algorithm set

molecular formula

av

SD (σ)

10 set 10 set 10 set 10 set 10 set 10 set 10 set 10 set 10 set 10 set averagine averagine averagine averagine averagine averagine averagine averagine averagine averagine sulfur sulfur sulfur sulfur sulfur sulfur sulfur sulfur sulfur sulfur

C50H71N13O12 C254H377N65O75S6 C520H817N139O147S8 C744H1224N210O222S5 C2023H3208N524O619S20 C2934H4615N781O897S39 C5047H8014N1338O1495S8 C8574H13378N2092O2392S77 C17600H26474N4752O5486S197 C23832H37816N6528O7031S170 C49H78N14O15 C247H388N68O74S2 C1235H1940N339O369S10 C2469H3879N679O739S21 C3704H5819N1018O1108S31 C4938H7758N1358O1477S42 C9877H15517N2715O2955S83 C14815H23275N4073O4432S125 C19754H31033N5431O5909S167 C24692H38792N6788O7386S208 S500 S1000 S1500 S2000 S2500 S3000 S3500 S4000 S4500 S5000

1046.181107 5733.510759 11624.448751 16823.321352 45415.679370 66432.455560 111612.482544 186506.052593 398722.972482 533735.214649 1103.230915 5558.279442 27761.391061 55571.835678 83333.226739 111127.671952 222238.289157 333365.961109 444505.643796 555588.250861 16033.042347 32066.084695 48099.127042 64132.169390 80165.211737 96198.254085 112231.296432 128264.338780 144297.381127 160330.423475

0.831659 2.160874 2.964740 3.407244 5.711423 7.007350 8.592460 11.594232 17.005109 19.287229 0.843358 1.976640 4.418516 6.262656 7.664472 8.856247 12.518351 15.334348 17.708360 19.795018 9.295790 13.146233 16.100781 18.591580 20.786019 22.769943 24.594349 26.292465 27.887370 29.395870

Fourier Δav −1.44 −1.87 5.46 0 1.46 0 0 0 1.16 1.16 −2.05 −2.53 3.64 0 0 −1.46 −2.91 0 −5.82 −1.16 3.64 3.64 7.28 −7.28 0 −2.91 −1.46 2.91 2.91 0

1 n

n

∑ a(n − k)ψk k=1

× 10−11

× × × × ×

10−10 10−10 10−5 10−7 10−12

× 10−11 × 10−11 × × × × × ×

10−11 10−10 10−12 10−12 10−12 10−12

× × × ×

10−11 10−11 10−11 10−11

Fourier ΔSD −6.549 −7.360 −8.198 −8.212 −6.704 −9.950 −1.277 −2.055 −3.458 −3.867 −9.122 −5.413 −2.317 −6.880 −1.019 −1.274 −2.377 −3.136 −3.568 −3.989 −9.929 −2.229 −2.812 −3.402 −3.925 −4.824 −4.902 −5.308 −5.672 −6.079

× × × × × × × × × × × × × × × × × × × × × × × × × × × × × ×

10−5 10−6 10−6 10−6 10−5 10−5 10−4 10−4 10−4 10−4 10−5 10−6 10−5 10−5 10−4 10−4 10−4 10−4 10−4 10−4 10−5 10−4 10−4 10−4 10−4 10−4 10−4 10−4 10−4 10−4

BRAIN Δav −4.434 −3.178 −8.200 −4.581 −1.378 −1.744 2.646 4.013 1.067 1.250 −5.214 −3.396 −5.439 −1.937 −3.351 2.596 5.451 8.062 1.045 1.113 −1.612 −1.215 8.004 −7.421 1.382 4.220 2.910 −3.041 −3.551 −5.646

× × × × × × × × × × × × × × × × × × × × × × × × × × × × × ×

10−3 10−1 10−2 10−2 10−4 10−6 10−8 10−8 10−7 10−7 10−3 10−1 10−3 10−5 10−8 10−8 10−8 10−8 10−7 10−7 10−4 10−8 10−11 10−10 10−9 10−10 10−10 10−9 10−9 10−9

BRAIN ΔSD −1.186 −3.654 −1.217 −7.354 −3.697 −2.329 −1.326 −2.549 −4.159 −4.331 −1.378 −3.786 −1.101 −6.888 −1.726 −1.964 −2.838 −3.600 −4.224 −4.585 −4.178 −7.282 −8.326 −9.785 −1.128 −1.052 −1.362 −1.493 −1.545 −1.563

× × × × × × × × × × × × × × × × × × × × × × × × × × × × × ×

10−2 10−1 10−1 10−2 10−4 10−5 10−5 10−5 10−5 10−5 10−2 10−1 10−2 10−5 10−5 10−5 10−5 10−5 10−5 10−5 10−4 10−7 10−7 10−7 10−6 10−6 10−6 10−6 10−6 10−6

algorithms. As mentioned above, the actual computation speed depends on the platform used. Both algorithms were run on the same computer, but BRAIN was programmed in R, an interpreted language, whereas our algorithm was compiled in C#. Accuracy. To compute the average masses of the individual peaks, c(n), a division by the abundances a(n) is required according to eq 4. Division by a(n) = 0 is not required since in that case c(n) is irrelevant. Divisions by positive but vanishingly small abundances are prone to produce inaccurate average masses. However, those vanishing peaks are indistinguishably buried within the noise. The interesting peaks are usually the more abundant ones, which are accurately computed by this algorithm. Most of the peaks of the center-mass distribution of moderately large molecules are vanishingly small. Computations can then be targeted to the more abundant peaks without significant accuracy loss. It is a bonus of the Fourier transform method that the range spanning only the prominent peaks can be established from the beginning. In the BRAIN algorithm, the peaks are necessarily computed in order from the lighter to the heavier one. Computations start from the monoisotopic mass and can stop when the abundances becomes too small,9 after the prominent peaks have been obtained. For large molecules, the monoisotopic and other lighter peaks become vanishingly small, but the BRAIN algorithm has to include them, requiring thus a larger range than the Fourier transform algorithm to compute the centermass distributions.

Applying the recursive formula 7 gives a(n) = −

× 10−5 × 10−7 × 10−12

(9)

where ψk is a linear combination of the −kth power of the roots of the polynomials gX(x), gY(x), ..., gZ(x), which need to be computed only once per distinct element. BRAIN computes the peak abundances through this formula. To calculate the center masses, a recursion like formula 9 is done for each distinct element in G; see ref 9 for the details. Formula 9 embodies the burden of computations in the BRAIN algorithm. Accordingly, the number of multiplications required by BRAIN to compute a(k) is k plus the number of multiplications required to compute a(k − 1), plus the number of multiplications required to compute a(k − 2), and so on. Summing gives a total of k(k + 1)/2 multiplications required to compute all the coefficients up to a(k). Thus, BRAIN is of quadratic order in the number of peaks that are calculated. Since BRAIN starts calculations at the monoisotopic peak, the number of peaks required increases roughly linearly with the mass. Therefore, BRAIN is of quadratic order in the mass. The isotopic distributions of the compounds in the three data sets described in the Methods section were computed by both the Fourier and the BRAIN algorithms. The average times required for 100 repetitions of these computations are plotted in Figure 1. Figure 1 validates the square root and quadratic dependency on mass, predicted, respectively, for the Fourier and BRAIN 7055

dx.doi.org/10.1021/ac301296a | Anal. Chem. 2012, 84, 7052−7056

Analytical Chemistry



ACKNOWLEDGMENTS This study was supported in part by the CIGB of Havana. The authors thank Professor Alfredo Delgado Rodriguez (CIGB) for language revision and style corrections. We are grateful to reviewers for their comments which resulted in improved text.

In Table 2, accuracy of calculations is evaluated in terms of the theoretical average and standard deviation of isotopic distributions for both the Fourier transform algorithm and the BRAIN algorithm. The accuracy of BRAIN apparently deteriorates for the smaller molecules in the table. But this is not an issue of the BRAIN polynomial method at all. It is only an issue of the rule of thumb we choose to calculate the number of peaks given as parameter, which for small molecules falls shorter than that required to span all the prominent peaks.9 Accuracy promptly recovered after increasing the span range (data not shown). However, computation time increases with the number of computed peaks. Accuracy of the new method is comparable to BRAIN for small and moderately large compounds. The theoretical standard deviation σ of the isotopic distribution, used in Table 2, is related to the standard deviation σCM of the peak center-mass distribution by σ 2 = σCM 2 +

Article



REFERENCES

(1) Mann, M.; Kelleher, N. L. Proc. Natl. Acad. Sci. U.S. A 2008, 105 (47), 18132−18138. (2) Michalski, A.; Damoc, E.; Hauschild, J. P.; Lange, O.; Wieghaus, A.; Makarov, A.; Nagaraj, N.; Cox, J.; Mann, M.; Horning, S. Mol. Cell. Proteomics 2011, 10 (9), M111. (3) Cox, J.; Mann, M. J. Am. Soc. Mass Spectrom. 2009, 20 (8), 1477− 1485. (4) Fernandez-de-Cossio, J. Anal. Chem. 2010, 82 (5), 1759−1765. (5) Fernandez-de-Cossio, J. Anal. Chem. 2010, 82 (15), 6726−6729. (6) Yergey, J.; Heller, D.; Hansen, G.; Cotter, R. J.; Fenselau, C. Anal. Chem. 1983, 55 (2), 353−356. (7) Rockwood, A. L. Rapid Commun. Mass Spectrom. 1995, 9 (1), 103−105. (8) Valkenborg, D.; Mertens, I.; Lemiere, F.; Witters, E.; Burzykowski, T. Mass Spectrom. Rev. 2012, 31, 96−109. (9) Claesen, J.; Dittwald, P.; Burzykowski, T.; Valkenborg, D. J. Am. Soc. Mass Spectrom. 2012, 23 (4), 753−763. (10) Olson, M. T.; Yergey, A. L. J. Am. Soc. Mass Spectrom. 2009, 20 (2), 295−302. (11) Rockwood, A. L.; Haimi, P. J. Am. Soc. Mass Spectrom. 2006, 17 (3), 415−419. (12) Rockwood, A. L.; Van Orman, J. R.; Dearden, D. V. J. Am. Soc. Mass Spectrom. 2004, 15 (1), 12−21. (13) Roussis, S. G.; Proulx, R. Anal. Chem. 2003, 75 (6), 1470−1482. (14) Rockwood, A. L.; Van Orden, S. L.; Smith, R. D. Anal. Chem. 1995, 67 (15), 2699−2704. (15) Rosman, K. J. R.; Taylor, P. D. P. Pure Appl. Chem. 1998, 70 (1), 217−235. (16) Valkenborg, D.; Jansen, I.; Burzykowski, T. J. Am. Soc. Mass Spectrom. 2008, 19 (5), 703−712. (17) Senko, M. W.; Beu, S. C.; McLafferty, F. W. J. Am. Soc. Mass. Spectrom. 1995, 6, 229−233. (18) Tipton, J. D.; Tran, J. C.; Catherman, A. D.; Ahlf, D. R.; Durbin, K. R.; Lee, J. E.; Kellie, J. F.; Kelleher, N. L.; Hendrickson, C. L.; Marshall, A. G. Anal. Chem. 2012, 84 (5), 2111−2117.

∑ a(n)σn2 > σCM 2 n

where σn denotes the standard deviation of the mass distribution of the isotopic configurations with n nucleons. The values of σn2 can be expected to be very small, and therefore, σ2 ≈ σCM2. Both algorithms give values smaller than and very close to σ2 as expected. Memory. The algorithm presented in this paper stores in memory the numbers âi(v), b̂i(v), v = 0, ..., N − 1, for each distinct element i in the molecule G. This gives 2

2N × number of distinct elements in G

values stored in memory. All further calculations are performed in this memory space. As we noted earlier, N increases roughly as the square root of the mass; therefore, the memory required scales as the square root of the mass. BRAIN stores the numbers ψk, a(k) for each peak to compute the abundances. To compute the center masses, BRAIN does a separate recursion for each chemical element. Each recursion also requires a memory space linear in the number of peaks. Therefore, BRAIN’s overall memory requirements are linear in the number of peaks, and thus linear in the mass.



CONCLUSION The new algorithm based on Fourier transform techniques for the computation of the center-mass distribution is faster than polynomial methods, particularly for large molecules, which are being measured with the accuracy emerging in leading instrumentations of the state of the art.18 Fourier transform techniques are well and long established tools with wide applications in every area of science, engineering, and industry. Though the compactness of the recursive BRAIN algorithm facilitates its implementation from scratch, we think that the methods developed here are more simple, general, and easy to implement than polynomial or other relatively more customized methods, due to the availability of highly optimized fast Fourier transform (FFT) recipes.



AUTHOR INFORMATION

Corresponding Author

*E-mail: [email protected]. Notes

The authors declare no competing financial interest. 7056

dx.doi.org/10.1021/ac301296a | Anal. Chem. 2012, 84, 7052−7056