Approaches and Limits for Accurate Mass Characterization of Large

Approaches and Limits for Accurate Mass Characterization of Large Biomolecules. Roman A. Zubarev, Plamen A. Demirev, Per. Haakansson, and Bo U. R. ...
0 downloads 0 Views 669KB Size
Anal. Chem. 1995, 67, 3793-3798

Approaches and Limits for Accurate Mass Characterization of Large Biomolecules Roman A. Zubamv,* Plamen A. Demimv, Pew Hhnsson, and Bo U. R. Sundqvist Ion Physics Department, Institute for Radiation Sciences, Uppsala Univetsity, Box 535, 75121 Uppsala, Sweden

The use of the average mass for mass characterization of large biomolecules is examined in light of the latest achievementsin mass spectrometry, and factors affecting the accuracy of both theoretical calculation and experimental determination are analyzed. It is concluded that, in practice, the accuracy of average mass measurements is limited to f O . l Da for molecular masses below 10 000 Da and to 10 ppm for masses above that value. Inherent properties of the isotopic distributionslead to a systematic underestimation of the average mass during the measurements. The procedure proposed earlier (Zubarev, R A. Int. J. Mass.Spectrom. Ion Processes 1991,107,1727) in order to correct for this effect is now extended to the case of multiply-chargedions and their use for mass scale calibration. A formula is derived for the relationship between mass accuracy and both the instrumental resolving power and molecular ion peak statistics. Monoisotopic mass measurements are recommended to be used whenever possible. As a complement to that, other additive quantities, such as the ratio of intensities of the h t isotopic peak to the monoisotopic peak, can be employed. Recent advances in mass spectrometry of biomolecules bring to life the old issue about the “most comprehensive” mass spectrometric characteristics of molecular weight (MW) .1 The following well-known quantities are usually applied to characterize Mw: nominal, monoisotopic, most abundant, and average (chemical) mass.’ Previously, the average mass (AM), defined as the centroid of the biomolecules isotopic distribution, was found to be the least ambiguous for MW of large biomolecules.’ Later, it was realized that, because of the natural variations of isotopic abundances of the elements, AM can only be determined, even theoretically, with limited accuracy? Carbon atoms alone, contributing -50% of a polypeptide’s mass and 70% of the isotopic shift (difference between the AM and monoisotopic mass), give f8 ppm uncertainty in the AM of a protein3 It is not an overestimation to assume that other elements raise the total uncertainty up to 10 ppm. Thus, the theoretical AM of the natural protein horse myoglobin lies within the interval 16 951.3-16 951.7 Da, although a straightforward calculation from the sequence gives a value of 16 951.49 Da.4 Since the theoretical value itself is (1) Yergey, J.; Heller, D.; Hansen, G.; Cotter, R J.; Fenselau, C. Anal. Chem. 1983,55,353-356. (2) Pomerantz, S. C.; McCloskey, J. A Om.Mass Spectrom. 1987,22, 251253. (3) Beavis, R C. Anal. Chem. 1993,65,496-497. (4) Zaia, J.; Annan, R S.; Biemann, K Rapid Commun. Mass Specfrom. 1992, 6, 32-26.

0003-2700/95/0367-3793$9.00/0 Q 1995 American Chemical Society

defined with f 1 0 ppm uncertainty, any comparison of the experimental results with theoretical calculations is only meaningful if the difference exceeds that value. Another finding concerning the AM is that its experimental determination almost inevitably leads to a systematic error that can be as large as 1Da.5 The origin of this systematic error can be rationalized through the consideration of the shape of the biomolecules’ isotopic distribution. The latter is nonsymmetric and has a “tail”stretching toward higher masses. Because of its low abundance, this tail is generally ignored during the measurements in order to suppress interferences from the noise that is always present in real mass spectra. Therefore,the mass obtained as the centroid of the isotopic distribution in such measurements is always “lighter“ than the true AM. The remarkable feature of this systematic error, or shift between the theoretical and measured AM, is that it is almost independent upon the MW or elemental composition of a biomolecule. Instead, it is determined by what fraction of the isotopic distribution is used for the measurements. If, for instance, one considers only the upper half of the distribution, the shift will be 0.5 3~ 0.1 Da for biomolecules with MW > 700.5 So far, no attempts have been made to rationalize this particular behavior of AM. This work proposes an explanation based on the consideration of the properties of the binomial distribution. The explanation found confirms that the considered shift is, in fact, an inherent feature of the biomolecules’ isotopic distribution and that the “measure-andcorrect” approach proposed in ref 5 is valid for a wide range of biomolecules. An extension of the approach for multiplycharged ions and reference peaks used for mass scale calibration is also given. In experimentallyobtained mass spectra, one observes isotopic distributions “simulated” by nature. Because of the limited number of ions accumulated in the peaks and other instrumental factors, the obtained distributions deviate from the theoretically calculated one and so does the measured AM. Limited mass resolution of the instrument leads to a “fusion” of the isotopic peaks which might also shift the centroid of the distribution. Of analytical interest is what knowledge of instrumental resolution and accumulated statistics is necessary in order to achieve the desired mass accuracy. In the current work, an attempt is made to fhd the answer to this question using Monte Carlo simulations. 3

EXPERIMENTAL SECTION

Computational Methods. Isotopic distributions of bovine insulin were calculated on an IBM PC AT with Borland C v. 2.0 compiler using an approach similar to that described by Yergey et aL6 Different threshold values were used to limit the total (5) Zubarev, R A Inf./. Mass Spectrom. Ion Processes 1991,107, 17-27. (6) Yergey, J. A Inf.J. Mass Spectrom. Ion Php. 1983,52, 337-349.

Analytical Chemistry, Vol. 67, No. 20, October 15, 7995 3793

number of permutations involved, ranging from 0.001%to 1%of the intensity of the most abundant permutation. The Monte Carlo simulations were performed on a Macintosh IIci using a program written on the THINK C v. 4.0 (Symantec). These simulations involved the computation of isotopic distributions of CNclusters with N = 100-lo00 carbon atoms and totally 102-104 molecules (clusters) in each distribution. Instrumentation. The in-housebuilt DIPLOMA time-of-flight plasma desorption mass spectrometer was used in this study; it is described in detail el~ewhere.~ The instrument employs a single-stage mass reflectron and provides a typical mass resolution of 5000 fwhm for organic ions and loo00 for CsI. The total effective length of the flight path of ions is 2.3 m. The primary 72.3 MeV lz7II3+ions come from a tandem Van de Graaf accelerator at a typical rate of 2500 s-~. Spectra were taken at acceleration voltage for secondary ions of +15 kV. Samples. The peptide thymosin was obtained from a commercial source and was used without further purification. The samples were prepared in the following way. Fist,a nitrocellulose (NC) layer of 400-700 A thickness was deposited by spin coating from an acetone solution on a 1 x 1 cm2 silicon slice (chip). Peptide molecules were dissolved in water and were dried on top of the backing. Finally, the samples were washed with water in order to remove salts and low-mass contaminants.6 RESULTS AND DISCUSSION Peculiarities of the Theoretical Calculation of AM. The theoretical AM can be calculated in two ways: (i) by multiplying the coefficients in the elemental composition formula by the chemical masses of the elements and (i) by generating the isotopic distribution and calculating its centroid.

where Mi and 4 are the masses and corresponding intensities of the spectrum points (bins of a line histogram). In theory, the two computational approaches for AM calculations are equivalent. In practice, however, the second way may involve complications because of the huge number of permutations involved. For the relatively simple molecule of glucagon (MW = 3483.8), for instance, the calculation of 7.9 x 109 unique permutations is needed.6 A commonly employed method to avoid excessive calculations is to generate only permutations exceeding a threshold value.6 The introduction of a threshold may introduce an error: the isotopic distribution of bovine insulin, calculated with 0.1%threshold relative to the most abundant permutation, gives an average mass of 5733.521 Da.‘j The result deviates from the theoretical value AM = 5733.585 Da calculated from the elemental chemical masses, by 0.064 Da or 11 ppm. That is, the error obtained is of the order of the uncertainty from the isotopic abundance variations! We have found that, in fact, a threshold as low as 0.001%is needed to achieve an agreement of better than 0.001 Da between the average masses calculated by the two approaches. (7) Brinkmalm, G.; HBkansson, P.; Kjellberg, J.; Demirev, P.; Sundqvist, B. U. R.; Ens, W. Int. J. Mass Spectrom. Ion Processes 1 9 9 2 , 114, 183-207. (8) Jonsson, G. P.; Hedin, A B.; HPikansson, P. L.; Sundqvist, B. U. R.; S v e , B. G. S.; F. P.; Nielsen, P. F.; Roepstorff, P.; Johansson, IC-E.; Kamensky, I.; Lindberg, M. S. L. Anal. Chem. 1986,58, 1084-1087.

3794 Analytical Chemistry, Vol. 67, No. 20, October 15, 1995

A

i

5726 5728 5730 5732 5134 5736 5738 5740 5742

Mass, Da Flgure 1. Centroid determination of the bovine insulin isotopic distribution: (a) top-fitting method, a cutoff at a level hand averaging only the upper part of the distribution, Le., background subtraction; (b) integration down to the base line of all prominent isotopic peaks exceeding h (no background subtraction).

Another constraint may arise from the number m of signiscant isotopic peaks in the obtained distribution. Calculations for bovine insulin show that, in order to obtain better than f 0 . 1 Da accuracy, one needs to measure both masses and intensities of at least m = 11 isotopic peaks, the smallest of which is just 3%of the maximum intensity (provided that both masses and intensities are measured exactly). It should be noted that m increases rapidly with Mw: for MW = 12 OOO, m = 20 if f O . l Da accuracy is desired. The above results suggest that proper care should be taken while using computer programs (including commercial ones) for generating the isotopic distributions in order to avoid miscalculations. The best strategy would be to calculate the centroid of the obtained distribution and compare it with the AM value obtained through the chemical masses of elements. Experimental Measurements of AM. Two Approaches to Centroid Measurements. In order to suppress interferences from the noise in experimentally obtained spectra, one has to select the “most prominent” peaks by setting up a suitable threshold. In Figure 1, two possible approaches for centroid determination are shown: (a) “t0pfitting”,5~~JO a cutoff at a certain level h and averaging only the upper part of the distribution, i.e., background subtraction; (b) integration down to the base line of all prominent isotopic peaks exceeding h , Le., no background subtraction. The (9) Feng, R.; Konishi, Y.; Bell. A W. ] A m . SOC.Mass Spectrom. 1992,2,387-

401. (10) McEwen, C. N.; Larsen, B. S. Rapid Commun. Mass Spectrom. 1992, 6, 173-178.

centroid obtained in both approaches is always "lighter" than the true average mass, because the high-mass tail is not included in the calculations. Both cases, therefore, lead to a negative systematic error. For a cutoff level h = 50%and topfitting, this systematic error is almost independent of the mass and amino acid composition of polypeptides with MW > 700 and lies within 0.45 f 0.10 Da.5 Thus, an obvious approach for improving the accuracy of the average mass is to add a correction factor (Amso% = 0.45 Da) to the experimentally determined value. The uncertainty f O . l Da of the correction factor remains, however, as a possible mass error. The "down-to-base line" integration Figure lb) is, in fact, a combination of two figures: the topfitting part (as in F i r e la) and an under-thethresholdpart. The comparison of the two approaches will be given later. Origin of the Systematic ShZR in Measured AM Vulues. The source of systematic deviation of the measured centroid from the theoretical AM is best illustrated with the example of the isotopic distribution of single-element two-isotope molecules, e.g., pure carbon clusters CN (including fullerenes). The isotopic peak abundances of such a distribution are described by the binomial distribution

P(n) =

N! p"(1 - p y - " n! ( N - n ) !

where P(n) is the relative intensity of the nth peak (the monoisotopic peak is denoted as the 0th peak) and p < 1 is the relative abundance of the less abundant isotope. The position nmmof the most abundant peak in the distribution satisfies the following relation:

0

400

300

200

100

500

Number of carbon atoms Flgure 2. Dependence of p - nmaxfor carbon clusters as a function of the total number N of atoms in the cluster (u, true position of the isotopic distribution centroid; nmm,position of the maximum peak). The period of "oscillations"is equal to llp, where p is the relative abundance of the less abundant isotope.

4

0.6-

\$ 2

0.4'

0.23% 1% '

0

100

.

.

I

200

.

.

"

I

.

"

'

l

'

300

.

400

.

'

l

500

Number of carbon atoms

Here ,u = pN is the mean value of the binomial distribution, i.e., its centroid. Making an assumption p 1 charges, their measured centroid will be shifted by (-0.5/2 0.5) Da. M e r deconvolution (multiplication of the measured centroid values by z), one obtains an error of 0.5(2 - 1) Da. In the general case, when ions with z = n are used for the calibration with centroids determined above threshold h, the corresponding error of qcharged ion peak centroids will be Amh(q- n). Influence of Instrumental Parameters on AM Measuements. Monte Carlo simulations of the isotopic distributions of carbon clusters CN were performed in order to determine how resolution and statistics affect the mass accuracy. The distributions containing N, molecules were “observed with an instrumental resolution of R Le., every isotopic peak was replaced by a Gaussian distribution with dispersion a, = AM/(2.35@. The latter expression originates from the definition of resolving power R = AM/fwhm, where fwhm = 2 . 3 5 ~is ~the full width at halfmaximum of a single isotopic peak. Masses of the individual isotopic peaks M, were assumed to be known exactly, while intensities (abundances) I, were allowed to deviate randomly from their theoretical values by up to *2JIi (Poisson statistics was assumed). For every simulated distribution, the centroid was calculated at h = 0%using eq 1 and compared with the theoretical AM. Every kind of distribution was simulated 104 times, and the standard deviation uAM from the theoretical AM was calculated. Based on the results of the simulations, an empirical formula was found for akv2:

+

Equation 2 implies that, in order to obtain an accuracy of AMa = 2a = 0.1 Da in average mass measurement of a compound with

AM zz 10 000 Da, one should accumulate at least N, = 4000 ions in the molecular ion peak, provided the isotopic peaks are well resolved. Therefore, high sensitivity does not necessarily lead to high accuracy of mass determination. Even though single molecules can be trapped in the FTICR cell,” and excellent (11) Cheng, X.; Bakhtiar, R.; Van Orden, S.; Smith, R. D.Anal. Chem. 1994, 66. 2084-2087.

resolution can be achieved, accurate AM measurements still will require a collection of thousands of ions. Another conclusion that can be drawn is that, in terms of the error in AM measurements, the requirements on the resolving power of an instrument are rather modest. It is easy to see from eq 2 that, with equal statistics accumulated in the molecular ion peak, a resolving power of R L (150.AM)1/2provides an accuracy comparable with that obtained with much higher resolution. From that point of view, the “sufiicient”resolution is R = 1200 for h4W .= 10 000 and R = 3900 foF MW up to 100 OOO: in both cases, the accuracy of the AM determination will not be compromised by the resolution. Comparisonof AM with Other Mass Spectrometric Characteristics. From a practical point of view, the use of additive quantities for biomolecule characterization is preferable. Nominal, monoisotopic, and average mass are additive: their values for the parts of a molecule, summed together, yield the corresponding values for the whole molecule. On the contrary, the mass of the most abundant isotopic peak, as well as any other peak besides a monoisotopic one, is not additive (see below). Another example of a nonadditive quantity is the width of the isotopic distribution. Additive quantities are linear functions of the elemental composition for all biomolecules and monomer composition for biopolymers and thus allow a wider range of mathematical techniques to be implemented for molecule recognition than those nonadditive.12 Monoisotopic Mass. The monoisotopic mass carries information on the elemental composition of the molecule but not on isotopic abundances. This is, in fact, an advantage, unless deuteration or similar procedures are used. There are no basic limitations on the accuracy in the monoisotopic mass measure ments of a biomolecule, except for that the abundance of the monoisotopic peak is very low at high MW. Generally, it is hard to establish the position of the monoisotopic peak at MW > 5000, even though the isotopic pattem might be well resolved. Erratic monoisotopic mass assignment leads to an error of at least 1Da. The way out has been shown by the approach based on an a priori relationship between the AM and the most probable position M , of the monoisotopic peak:13

where K = 1463 for proteins and K = 2092 for polynucleotides. Here it was assumed that the monomers (amino acids and nucleotides, respectively) are equally abundant in biopolymers. Formula 3 works satisfactorily, Le., gives an error less than f0.5 Da, for proteins below 10 kDa and polynucleotides below 100 kDa. Senko et. al. improved the approach for peptides by constructing the model amino acid averagine, which takes into account the natural abundances of different amino acids.14 The factor K, calculated through the averagine, tums out to be 1741.5. The difference in K reflects relatively low natural abundance of Scontainiing amino acids, methionine and cysteine. The Senko’s value gives a better estimation for the most probable (12) Berdnikov, A S. Ph.D. Thesis, Institutefor Anaiytical Instrumentation, USSR Academy of Sciences, Leningrad, 1990. (13) Zubarev, R A; Bondarenko, P. V. Rapid Commun. Mass Spectrom. 1991, 5,276-277. (14) Senko, M.W.; Beu, S. C.; McLafferty, F. W. J. Am. SOC.Mass Spectrom. 1995, 6,229-233.

monoisotopic mass of natural peptides, but otherwise does not extend the mass range of the validity of eq 3, because the spread of monoisotopic masses at MW = 10 0o0 remains to be f0.5 Da. Masses of Individual Isotopic Peaks. Since every isotopic peak except the monoisotopic consists of many closely situated components (isobars),l the mass of each of those peaks is, in fact, a centroid of the isobar distribution. This makes the mass of an individual isotopic peak a nonadditive quantity. The width of that distribution depends on the distance from the monoisotopic peak and usually is in the range of several millidalt~ns.’~J~ The calculation of the exact mass of an isotopic peak must involve, therefore, a full-scale generation of the isotopic distribution; it is not enough just to calculate the position of the 13Ccomponents. Obviously, the natural spread in the isotopic abundances will influence the exact position of an isotopic peak, although it is not yet clear to what extent. For high-mass compounds, no individual isotopic peak can always be recognized as the most intense,’ and therefore, the mass of the most abundant peak hardly can serve as a characteristics of MW. However, the exact masses of individual isotopic peaks can be measured with an accuracy of a few ppm.17 It has been suggested therefore that the set of masses of individual peaks can be used for the biomolecular mass ~haracterization.~~ It is not clear yet how big an error in the molecular characterization could arise from even 1ppm error in the mass of an individual isotopic peak. The main djf6culty here is to identify the “origin of the coordinates”,i.e., to h d the position of the monoisotopic peak. Again, a mistake in assignment “costs” at least 1Da error in MW determination. It is therefore doubtful that the set of individual isotopic peak masses alone can be a sufiicient characteristic of MW, although it might be a useful complement to other mass spectrometriccharacteristics, e.g., the average mass. Other Additive Characteristics of Mw. There is yet another additive quantity that can be useful for the mass spectrometric characterization of molecules provided the isotopic peaks are resolved and the intensity of the monoisotopic peak can be measured. This is Kilo, the ratio of intensities of the first isotopic peak, usually referred to as the 13C1peak, to the monoisotopic one. is a linear function of the molecular elemental composition (except for elements without M+l isotopes, like C1 or P) and isotopic abundances:

where Ei are the numbers of atoms in the elemental formula and AOE,and Axi are the corresponding abundances of the M and M 1isotopes for each element, respectively. For average natural isotopic abundances, it follows

+

K1,o = 0.0111EC + 0.00015EH + 0.0037EN + 0.0004Eo

+ 0.0079Es

Theoretically, KI/O contains information on both the elemental composition and abundances and thus is a valuable mass spec(15) Werlen, R C. Rupid Commun.Mass Spectrom. 1994,8,976-980. (16) Yergey, J.; Cotter, R J.; Heller, D.; Fenselau, C. AndChem. 1984,56, 2262-2263. (17) Dobberstein, P.; Schroeder, E. Rapid Commun.Mass Spectrom. 1993,7,

861-864.

Analytical Chemistty, Vol. 67, No. 20, October 15, 1995

3797

trometric characteristic. In practice, however, it might be dif6cult to measure this parameter with suf6cient accuracy due to resolution and statistics constraints. Besides, the 10 ppm limit in accuracy due to natural variation in the isotopic abundances applies also to that quantity. CONCLUSION Accurate average molecular weight characterization of biomolecules faces substantial dif6culties when the desired accuracy approaches 0.1 Da for MW below 10 OOO or 10 ppm above that limit. Even theoretically calculated isotopic distributions may deviate from the “true” shape if an insufficiently low threshold is chosen to limit the number of permutations. In practice, measurements of the average mass inevitably lead to an underestimation of its value; even with a proper correction, the residual uncertainty is still of the order of f0.1 Da. Such a correction can only be made if the topfitting approach is used for centroid measurements; the down-to-base line method leads to a greater, and not correctable, error. One should be aware of the origin of the systematic shift between the theoretical and measured AM, which is in the very nature of the isotopic distribution of biomolecules. Statistics accumulated in the molecular ion peak, together with limited

3798 Analytical Chemistry, Vol. 67, No. 20, October 15, 1995

resolving power of the instrument, can compromise the accuracy of high-sensitivity AM measurements. An equation is obtained (eq 2) allowing one to estimate the statistical error arising due to these reasons. The use of the monoisotopic mass, whenever it is possible, is preferable. Other additive quantities, such as KI~o, can also be used in combination with monoisotopic or average mass, to characterize biomolecules. As for the set of exact masses of the individual isotopic peaks, it is doubtful that it can serve as a sole MW characteristic. ACKNOWLEDOMENT This work has been supported by the Swedish Natural Sciences Research Council (NFR) and the Swedish National Board for Technical Development (!TU). Dr. Curt T.Reimann is gratefully acknowledged for careful reading of the manuscript and for a number of useful suggestions. Received for review April 17, 1995. Accepted August 2, 1955. AC950370J @

Abstract published in Advance ACS Abstracts, September 1, 1995.