Anal. Chem. 1996, 68, 4060-4063
Accuracy Requirements for Peptide Characterization by Monoisotopic Molecular Mass Measurements Roman A. Zubarev,* Per Håkansson, and Bo Sundqvist
Ion Physics Division, Department for Radiation Sciences, Uppsala University, Box 535, 75 121 Uppsala, Sweden
Accurately measured monoisotopic mass of a biomolecule can reveal its elemental and, in the case of a biopolymer, its monomer composition. In this work, the limitations of the technique were analyzed in application to peptides. For the currently available level of mass accuracy of about (1 ppm, the mass limit for revealing the unique elemental composition of a peptide was found to be 700-800 Da, with the possibility to extend this range as the mass accuracy improves. As for the amino acid composition determination, the principal limit of ∼500-600 Da cannot be overcome in the general case by instrumental or methodological improvements. It is proposed that, for the peptide characterization, the molecular mass must be determined with sufficient accuracy to rule out a significant fraction of the peptides having the same nominal mass but different elemental and amino acid compositions. An accuracy of (1 ppm was found to exclude 99% of such peptides and, therefore, ensure a high degree of confidence in peptide characterization. Determination of the molecular mass is one way to characterize a biomolecule. If all chemical elements had only one stable isotope, the molecular mass would be described by just a single number. The existence of different stable isotopes of the elements is the reason why the molecular mass involves such a nontrivial conception as an isotopic distribution.1 The isotopic distribution can be characterized by certain parameters, such as nominal, monoisotopic, and average isotopic mass, mass of the most abundant isotopic peak, etc.1 In our previous paper, we discussed possibilities and limitation of the average isotopic mass measurements for biomolecule characterization.2 It has been found that, for large biomolecules, the average mass is the most relevant characteristic, although the practically achievable accuracy of its determination is limited by ∼0.1 Da in most cases. Even though the complications involved in the experimental measurements can be overcome, the intrinsic uncertainty of the average mass due to natural isotopic spread remains to be at the level of 10 ppm.2-4 As will be shown below, such a level of mass accuracy is not sufficient for the characterization of peptides by molecular mass. There is, however, no commonly shared opinion about sufficient mass accuracy. Should one pursue some defined figure, (1) Yergey, J.; Heller, D.; Hansen, G.; Cotter, R. J.; Fenselau, C. Anal. Chem. 1983, 55, 353-356. (2) Zubarev, R. A.; Demirev, P. A.; Håkansson, P.; Sundqvist, B. U. R. Anal. Chem. 1995, 67, 3793-3798. (3) Pomerantz, S. C.; McCloskey, J. A. Org. Mass Spectrom. 1987, 22, 251253. (4) Beavis, R. C. Anal. Chem. 1993, 65, 469-497.
4060 Analytical Chemistry, Vol. 68, No. 22, November 15, 1996
such as the requirement of the Journal of Organic Chemistry that the acceptable error limits are (5 ppm for high-resolution mass spectral data?5 Or should one be able to determine unequivocally the elemental composition (which would demand, in general, much higher accuracy)?5 Let us consider the case of peptides. For them, as well as for other biopolymers, the monomer (amino acid) composition is far more essential than the elemental composition. It has been demonstrated that the monomer composition of a biopolymer can be revealed under appropriate circumstances by accurate mass measurements.6,7 Unlike an elemental composition, which can always be revealed unequivocally provided the mass accuracy is sufficient for that, a unique monomer composition can not always be determined by accurate mass measurements, since different monomer composition may correspond to one and the same elemental composition and, thus, to the same molecular mass (see Figure 1). It should be noticed that, in this work, the peptides with the same amino acid (monomer) composition but different sequences are considered to be identical since they have the same elemental composition and molecular mass. In principle, a mass accuracy is sufficient that allows for determination of the unique elemental composition of a peptide, since the additional information (in terms of the monomer composition) cannot be extracted by increasing the mass accuracy. This does not mean that the list of plausible elemental formulae resulting from the accurate mass measurements should contain only one entry. There might be many more principally plausible elemental compositions, but all belong to non-peptide molecules. An elemental composition corresponding to at least one peptide combination will be referred to here as a peptide elemental composition. Since both monoisotopic and average masses of biopolymers including peptides are not just randomly distributed, but rather concentrated around certain points with ∼1 Da intervals,8,9 the density of peptide elemental compositions per unit mass will depend on where on the mass scale the measured mass is centered. Mann gives the coordinates of the positions Mp around which the monoisotopic masses of peptides are gathered:8 (5) Gross, M. L. J. Am. Soc. Mass Spectrom. 1994, 5, 57. (6) Pomerantz, S. C.; Kowalak, J. A.; McCloskey, J. A. J Am. Soc. Mass Spectrom. 1993, 4, 204-209. (7) van Setten, D. C.; van de Werken, G. Rapid Commun. Mass Spectrom. 1994, 8, 917-919. (8) Mann, M. Useful Tables of Possible and Probable Peptide Masses. Abstracts of the 43rd ASMS Conference on Mass Spectrometry and Allied Topics; Atlanta, GA, May 21-26, 1995. (9) Zubarev, R. A.; Bondarenko, P. V. Rapid Commun. Mass Spectrom. 1991, 5, 276-277. S0003-2700(96)00465-9 CCC: $12.00
© 1996 American Chemical Society
Figure 1. Information derived via accurate mass measurements. The unique elemental composition can be revealed by improving the mass accuracy, while the monomer (amino acid) compositions corresponding to the same elemental composition remain unresolved.
Mp ) [M] + 0.00048M (Da)
(1)
where [M] is the lower integer value of the molecular mass M (i.e., nominal mass). The width Wp encompassing 95% of all amino acid compositions was found to be8
Wp ) 0.19 + 0.0001M (Da)
(2)
It has been proposed to use these equations to check whether the molecule with a certain monoisotopic mass can, indeed, be a peptide.8 Another question which is arising is, what is the average number of monomer (amino acid) compositions per single elemental composition, and how does it depend on the molecular mass? In this paper, we try to answer this and other questions by modeling the distributions of monoisotopic masses for all possible peptides for a given integer (nominal) mass. The results allow us to estimate the average number of the peptide elemental and monomer compositions per mass unit and to derive the mass accuracy requirements for peptides. EXPERIMENTAL SECTION Computer programs were written on a Macintosh IIci (Apple Computers, Cupertino, CA) using THINK C v.4.0. (Symantec Co., Cupertino, CA) language. Basically, two units of software were used. The first program found all possible peptide combinations (i.e., compositions of amino acid residues) corresponding to a given nominal molecular mass; a program that functioned in a similar manner was described by Mann earlier.8 Additionally, the program constructed in a separate file a list of all elemental (empirical) formulae of the found peptides. The residues for the peptide compositions were chosen out of set of 20 common amino acid residues; the terminal groups were assumed to be H and OH. The other program searched through the list and counted the multiplicity of the occurrences of different elemental formulae, as well as the total number of unique formulae occurring in the list. The multiplicity value corresponds to the number of different amino acid compositions that have one and the same elemental formula. RESULTS AND DISCUSSION A. Distribution of Peptide Monomer Compositions. The peptide distribution around nominal mass [M] ) 1000 Da obtained
Figure 2. Number of possible amino acid compositions (peptide combinations) as a function of the peptide monoisotopic molecular mass for [M] ) 1000 Da ([M] is the nominal molecular mass, i.e., lower integer mass value). The histogram is built with a 10 mDa step. The top density of the distribution is ∼230 peptide compositions per millidalton or per ppm.
in the current study is shown in Figure 2. The distribution is built with 10 mDa intervals and contains ∼50 000 peptides with different amino acid compositions. A Gaussian fit to the distribution had the following parameters: position of the center, 1000.52 ( 0.01 Da; height, 2320 ( 36 combinations per 10 mDa; and dispersion, 0.082 ( 0.001 Da. The width of the distribution encompassing 95% of the compositions is 0.33 ( 0.01 Da, which is in good agreement with formula 2. The total number of possible peptide compositions Np for a peptide with molecular mass M (M < 1200 Da) can be approximated by the empirical formula
Np ≈ 0.64 exp(M/89)
(3)
B. Distribution of Peptide Elemental Compositions. There are only ∼600 unique peptide elemental compositions found for [M] ) 1000 Da (the distribution is shown in Figure 3). The elemental composition curve has much lower density than the monomer composition curve: ∼2.3 mDa-1. For this mass, there are, on average, 90 different amino acid combinations for a unique peptide elemental composition. The average length of the peptide chain at [M] ) 1000 Da is only nine residues. It is doubtful that any useful information on peptide composition can be extracted in this case by accurate mass measurements, even if the elemental composition is determined. It is interesting that the average value of this distribution is 1000.437 Da, i.e., almost 0.1 Da less than that of the monomer composition distribution (compare Figures 2 and 3). The elemental composition distribution is also wider than the monomer composition distribution: 0.4 Da encompasses 95% of the compositions, 1.5 times wider than predicted by formula 2. The following empirical formula derived for the mass range 600-1200 Da estimates the total number of unique elemental compositions Ne for a peptide with molecular mass M:
Ne ≈ 1.77 exp(M/175)
(4)
As can be seen from the comparison of the exponential factors, Ne grows significantly slower with mass than Np. Perhaps more Analytical Chemistry, Vol. 68, No. 22, November 15, 1996
4061
Figure 3. Total number of unique peptide elemental compositions as a function of the monoisotopic molecular mass for [M] ) 1000 Da. The histogram is drawn with a 50 mDa step. The top density of the distribution is ∼2.3 elemental compositions per millidalton or per ppm.
important than either of the quantities is their ratio Np/Ne, i.e., the average number of peptide amino acid combinations per unique elemental composition. This value, which determines the utility of the accurate mass measurements for revealing the amino acid composition, should be compared with the length of the peptide chain. Already at the mass of 500-600 Da, the Np/Ne ratio exceeds the average number of amino acids in the peptides of this mass (five or six). This can be considered as a fundamental limit of the technique. However, the reservation should be made that this limit is valid only in the most plausible case. At the “wings” of both the monomer and elemental combination distributions, the density is much lower than at its center (see Figures 2 and 3). It can happen that one finds there just a few possible amino acid combinations, even for a large peptide. C. Determination of the Peptide Elemental Composition. Since the top density of Ne at M ) 1000 Da is ∼2.3 compositions per millidalton (see above), the mass accuracy of 0.4 ppm should be sufficient for unequivocally determining the peptide elemental composition (of course, the general case was taken; at some mass points, the accuracy needed may be somewhat higher or lower). For other masses, the accuracy needed can be calculated from the empirical formula (tested for masses up to M ≈ 1200 Da):
∆M/M ) 14.4 exp(-M/285) (ppm)
(5)
For the mass of 700 Da which was found to be the upper limit for the amino acid composition determination (see above), eq 5 gives a value of 1.2 ppm. This level of accuracy has been already reached by both magnetic sector10 and FT ICR11 mass spectrometers. On the other hand, time-of-flight instruments with particle desorption exhibit accuracy of an order of magnitude lower at best.12 Suppose that eq 5 holds for even higher molecular masses. For a mass of 2600 Da, it requires an accuracy of 1.6 ppb. To appreciate this figure, one should take into account that the mass (10) Dobberstein, P.; Schroeder, E. Rapid Commun. Mass Spectrom. 1993, 7, 861-864. (11) Wang, M.; Marshall, A. G. Anal. Chem. 1989, 61, 1288-1293. (12) Zubarev, R. A.; et al., Rapid Commun. Mass Spectrom., in press.
4062
Analytical Chemistry, Vol. 68, No. 22, November 15, 1996
scale standard, i.e., mass of the 12C atom, is defined with 1.7 ppb accuracy.13 The uncertainty arises from the tiny difference in mass of the gas phase ionized carbon atom and that of the carbon atom in condensed state. This difference is considered to be so subtle that it is neglected by the current IUPAC standard.13 Therefore, such a level of mass accuracy, when achieved, will require certain changes in mass scale units. The above considerations are based on the monoisotopic mass measurements solely. Other parameters of the isotopic distribution, such as average mass, ratio of intensities of first two isotopic peaks, etc., may, in principle, also be used for determining the unique peptide elemental composition.2 This involves accurate measurements of the isotopic peak abundances, which is not always easy at high resolution. Accurate determination of mass of isotopic peaks is also not without problems, one of the most serious of which arises from the multiple nature of those peaks (except for the monoisotopic peak).1,14 The relative intensities of the peaks contributing to the fine structure of the isotopic peaks may vary due to the spread in natural isotopic abundances, which reflects in a “walk” of isotopic peak masses on the mass scale. The expected mass error arising due to this effect has not yet been estimated. The conclusion can be drawn that the current limit for determining the unique peptide elemental composition is the molecular mass of ∼700-800 Da, which is somewhat higher than the upper limit for the amino acid composition determination. The extension of the range for elemental composition determination depends mainly on improving the accuracy of mass determination. D. “Reasonable” Mass Accuracy for Peptide Characterization. Suppose that the analyzed sample is expected to contain a known peptide with a theoretical monoisotopic molecular mass of 1000.54408 Da. Suppose also that the mass of the peptide in the sample was measured experimentally with 1 ppm accuracy and was found to be 1000.545 ( 0.001 Da. What are the chances that the peptide in the sample is the expected known peptide? Let us consider the density of the peptide combinations at the nominal mass [M] ) 1000 Da (see Figure 2). The point M ) 1000.545 Da is close to the top of the distribution, where the average density is ∼230 combinations per millidalton or per ppm. In total, the distribution contains ∼50 000 peptides with different amino acid compositions. Therefore, the interval (1 ppm encompasses ∼460 peptides or 1% of the total amount of different peptides with nominal mass [M] ) 1000 Da. That is, the chance that the peptide in the sample is the expected peptide (in terms of amino acid composition), and not some other peptide with the same nominal mass, is at least 99%. This is a conservative estimate, made for the most unfavorable case of the maximum density of the Gaussian distribution. Another, more straightforward estimation yields (1 - 1/50000) × 100% ) 99.8%. The same calculations made for the elemental composition distribution not surprisingly yielded the same result: mass interval of (1 ppm encompasses a maximum 1% of the unique peptide elemental compositions. It is quite plausible that, in practice, such a level of confidence is sufficient for peptide characterization. Since the width of the peptide distribution grows approximately linearly with the mass (see eq 2), the (1 ppm interval always encompasses ∼1% of all (13) Doughterty, R. C.; Marshall, A. G.; Eyler, J. R.; Richardson, D. E.; Smalley, R. E. J. Am. Soc. Mass Spectrom. 1994, 5, 120-123. (14) Yergey, J. A.; Cotter, R. J.; Heller, D.; Fenselau, C. Anal. Chem. 1984, 56, 2262-2263.
possible peptide combinations and unique elemental compositions for this particular nominal mass. Again, this estimation is made for the top of the distribution, i.e., for the most probably case; for the wings, the level of confidence is higher (although the probability of such an event is lower). CONCLUSION The possibilities of determining the elemental composition and amino acid composition of peptides using accurate monoisotopic mass measurements were analyzed. For the mass accuracy of about (1 ppm currently achieved for peptides with molecular masses of several hundred daltons, the limit for the revealing the unique elemental composition of a peptide was found to be 700800 Da, with the possibility to extend this range as the technique progresses and mass accuracy improves. As for the amino acid composition determination, the principal limit of ∼500-600 Da cannot be overcome by instrumental or methodological improvements. It was proposed that, for peptide characterization, the molecular mass must be determined with sufficient accuracy to rule out a significant fraction of the peptides having the same
nominal mass but different elemental and amino acid compositions. An accuracy of (1 ppm was found to exclude 99% of such peptides (in worst case) and therefore to ensure a high degree of confidence. Although the simulations done in this work were performed for linear nonmodified peptides containing 20 common amino acids, the results are also applicable for other peptides, provided the type of the modification is known a priori. The general approach to data analysis used here can also be applied to other types of biopolymers. ACKNOWLEDGMENT This work has been supported by the Swedish Natural Sciences Research Council (NFR) and the Swedish National Board for Technical Developments (STU). Received for review May 9, 1996. Accepted August 28, 1996.X AC9604651 X
Abstract published in Advance ACS Abstracts, October 1, 1996.
Analytical Chemistry, Vol. 68, No. 22, November 15, 1996
4063