Theoretical and Experimental Prospects for Protein Identification

ACS eBooks; C&EN Global Enterprise .... However, for nominally isobaric peptides, the mass accuracy and resolving power of broadband Fourier transform...
11 downloads 0 Views 114KB Size
Theoretical and Experimental Prospects for Protein Identification Based Solely on Accurate Mass Measurement Fei He,† Mark R. Emmett,†,‡ Kristina Håkansson,§ Christopher L. Hendrickson,†,‡ and Alan G. Marshall*,†,‡ Ion Cyclotron Resonance Program, National High Magnetic Field Laboratory, Florida State University, 1800 East Paul Dirac Drive, Tallahassee, Florida 32310-4005 Received August 4, 2003

We discuss the theoretical and experimental potential and limitations of protein identification by mass measurement of proteolytic peptides and database searching. For peptides differing in composition by one (or two or three) amino acids, a surprisingly high number turn out to have isomers: 10% (or 29% or 53%), considering the 20 common amino acids with equal relative abundance. Even if isomers differing by leucine/isoleucine are excluded, the latter numbers are 14% and 38%sthose isomeric peptides cannot be distinguished based on mass alone, and tandem mass spectrometry and/or other additional constraints are needed. However, for nominally isobaric peptides, the mass accuracy and resolving power of broadband Fourier transform ion cyclotron resonance (FT-ICR) mass spectrometry theoretically and experimentally suffice to resolve virtually all peptide doublets differing by up to two amino acidssincluding the smallest mass difference of 3.4 mDa. We demonstrate experimental resolution of another pair of peptides differing by 11 mDa, even when present in a complex mixture of hundreds of other peptides. Keywords: protein identification • FTMS • FT-ICR • FTICR • exact mass • accurate mass, elemental composition • chemical formula • peptide • protein database • mass resolving power • mass resolution • mutation • post-translational modification

Introduction Protein identification is typically performed in one of two ways. The “bottom up” approach is based on mass measurement of proteolytic peptides from a condensed-phase (solution, gel, or immobilized) enzymatic digest (peptide mapping) and/ or gas-phase fragmentation to yield peptide sequence tags combined with protein, DNA, or expressed sequence tag (EST) database searching.1-12 The “top down” approach is based on mass measurement of the whole protein, dissociation of larger proteolytic fragments or whole protein ions, and database searching.13,14 Although most such methods typically rely on mass measurement accuracy of ∼1 Da, it is well-known that the number of possible amino acid composition candidates drops rapidly with increasing mass measurement accuracy.15-21 Recent reports have exploited the high mass accuracy of Fourier transform ion cyclotron resonance mass spectrometry (FT-ICR MS) for peptide mapping and protein database searching.22,23 Conrads et al. also reported ∼1 ppm or even sub-ppm mass measurement accuracy with an accurate mass tag (AMT) for proteome-wide protein identification.20 By considering all * To whom correspondence should be addressed. E-mail: marshall@ magnet.fsu.edu. † National High Magnetic Field Laboratory, Florida State University. ‡ Department of Chemistry and Biochemistry, Florida State University, Tallahassee, FL 32306. § Department of Chemistry, University of Michigan, 930 North University, Ann Arbor, MI 48109-1055. 10.1021/pr034058z CCC: $27.50

 2004 American Chemical Society

possible amino acid compositions (i.e., not just those for a particular protein or a database of all known protein sequences), Zubarev et al. have shown that mass accuracy of 1 ppm suffices to establish a unique amino acid composition for peptides less than ∼600 Da in mass.16 The same paper also notes that further improvement in mass accuracy cannot overcome this limitation. If more than one proteolytic peptide from the same protein is used for protein identification, Eriksson et al. have shown that the number of random matches to a database decreases sharply for a mass accuracy to within better than 0.1 Da.24 However, the same authors found some random matches even at a mass accuracy of 0.006 Da. Thus, in practice, correct identification of peptides must also rely on partial sequence information6,8,9,25,26 and/or other additional constraints. The use of peptide retention time from an on-line LC/MS run has been proposed by Palmblad et al.27 The major challenge for that approach is accurate prediction of chromatographic retention time. In that respect, Smith and coworkers have recently demonstrated successful prediction by use of artificial neural networks.28 On-line liquid chromatography reduces the complexity of the sample presented to the mass spectrometer at any given instant, improving dynamic range and reducing ion suppression, but does not affect the need for high mass accuracy to match an experimentally observed peptide to a segment of a putative database protein. Here, we investigate the probability that different amino acid compositions possess the same nominal (nearest-integer) mass Journal of Proteome Research 2004, 3, 61-67

61

Published on Web 12/10/2003

research articles or even identical mass. We consider two situations: (a) peptides of different elemental composition but the same nominal mass (nominally isobaric peaks); and (b) peptides of different amino acid composition but the same elemental composition (isomers). Among the 20 common (unmodified) amino acids, for example, glutamine and lysine have the same nominal “residue” mass (128 Da, namely, the mass of the neutral amino acid minus the mass of H2O that is lost on forming a peptide linkage at both the amine and carboxyl termini) but differ by 0.036 Da in “exact” mass, corresponding to the elemental difference, CH4 vs O. Conversely, leucine and isoleucine are isomers, and thus have identical mass. (Actually, positional isomers have different heat of formation, and thus different mass, ∆E ) ∆mc2, in which c is the speed of light, but an energy difference of ∼1 eV corresponds to a mass difference of ∼10-9 Da, so we are unlikely to mass-resolve such isomers in the foreseeable future!) Thus, two peptides of different elemental composition may be distinguished by mass alone at sufficiently high mass resolving power, whereas isomers are for practical purposes indistinguishable. We next address a series of questions whose answers determine the probability that a peptide can be identified by mass alone. First, if two peptides (e.g., an experimental tryptic peptide and a segment of a protein in a database) differ (randomly) by one (or two or three) amino acids, what are the possible nominal “isobaric” pairs of peptides, and how many ways can each arise? Second, how many nominal “isobaric” pairs of peptides are in fact isomers (and thus unresolvable experimentally)? Third, how do the above results change if posttranslational modifications are considered? Finally, in nature, neither amino acid compositions nor amino acid sequences are random, and we shall discuss that issue briefly. We begin by evaluating the theoretical probability of isomeric and nominally isobaric peaks arising from peptides differing in composition by one, two, or three amino acid residues, based on consideration of the following: (a) 20 common amino acids; (b) 19 common amino acids (excluding isoleucine: i.e., treating leucine and isoleucine as equivalent); and (c) 20 amino acids plus some common posttranslational modifications. We then demonstrate the resolution of nominally isobaric peptides by experimental electrospray ionization (ESI) FT-ICR broadband detection of peptide pairs, both pure and added to a complex mixture of other peptides. The theoretical and experimental results combine to evaluate the feasibility of mass-based identification of proteolytic peptides.

Experimental Section All experiments were performed with a home-built 9.4 T FT-ICR mass spectrometer as described previously.29,30 The peptides were custom synthesized in the BASS facility at Florida State University and used without further purification. Solutions of the peptides were prepared in 50:50 MeOH:H2O with 2% acetic acid and infused through a tapered 50 µm i.d. fused silica micro-electrospray31 needle (+2 kV) at a flow rate of 300 nL/min. Ions were externally accumulated in a linear octopole for 3-5 s before transfer to the ICR cell through a second octopole ion guide. The desired charge states for each species were isolated by dipolar SWIFT excitation.32,33 Following ejection of ions of undesired m/z ratios, the remaining ions were excited by broadband frequency-sweep excitation and then detected. The time-domain ICR signal (512 K) was subjected to baseline zeroing, followed by Hanning apodization and one zero-fill 62

Journal of Proteome Research • Vol. 3, No. 1, 2004

He et al.

before Fourier transformation, magnitude calculation, and frequency-to-mass conversion34,35 based on external mass calibration. Each spectrum is the result of 3-5 scans coadded to improve the signal-to-noise ratio. All mass spectra were acquired by use of the MIDAS data system developed in this group.36

Results and Discussion Theoretical Prospects from Computer Simulations. Because mass measurement cannot distinguish differences in primary amino acid sequence order, it is clear that we must consider combinations, not permutations. Thus, if two peptides differ in composition by one amino acid residue, there are 20 possible compositionally distinct peptides (i.e., one for each of the 20 common amino acids). (Note that neither the sequence position of the amino acid of interest nor the ordering of the other amino acids in the peptide affects the peptide mass.) If peptides can differ by two amino acids, then the differences between peptides can be represented by 20 × 20 ) 400 dipeptides. However, 20 of those are homodipeptides (e.g., AlaAla, GlyGly, etc.) and half of the rest differ only in amino acid sequence (e.g., AlaGly and GlyAla, etc.). Thus, the total number of compositionally distinct peptides is 20 + (400-20)/2) ) 210. A general expression for the number of compositionally distinct peptides differing by n amino acids (irrespective of sequence, as usual) is given by the following well-known formula from combinatorial theory37 CnN )

(N + n - 1)! n!(N - 1)!

(1)

in which N is the total number of amino acids considered (including any specified number of post-translationally modified amino acids). According to eq 1, the number of compositionally distinct peptides differing by one amino acid, considering 20 common amino acids, is 20. In other words, 100% (i.e., 20 out of 20) of the peptides are compositionally distinct. The number of compositionally distinct peptides differing by two amino acids is 210 out of 20 × 20 ) 400 possible peptides. Thus, 210/400, or ∼52% of the possible peptides are compositionally distinct. The number of compositionally distinct peptides differing by three amino acids is 1540, or ∼19% of 20 × 20 × 20 ) 8000 possible peptides. These calculations quantitate the first major limitation of accurate mass measurement for peptide identification: namely, mass alone can limit the number of possible peptide candidates, but not uniquely without independent sequence information (e.g., from tandem mass spectrometry (MS/MS) and/or other additional constraints). A severe additional limitation arises when we consider isomers among the compositionally distinct peptides (see below). The next problem is to search all of the combinatorially possible pairs of peptides differing by one, two, or three amino acids, to determine which masses are nominally isobaric (i.e., same nominal mass but different elemental composition, and thus distinguishable by sufficiently accurate mass measurement) or identical (i.e., isomers, indistinguishable by mass alone). We therefore constructed an appropriate algorithm in C language to search for all possible isomeric and nominally isobaric species (disregarding sequence) by varying one, two, or three amino acid residues (including or excluding the common posttranslational modifications listed in Table 1). Those modifications comprise only a small subset of a more

research articles

Spell-Checking Nature’s Translations Table 1. Common Post-translational Protein Modifications Considered in This Paper

Table 3. List of All 26 Possible Dipeptides Having Isomers, Excluding Isoleucinea

modification

elemental composition

mass shift (Da)

dipeptide 1

dipeptide 2

residue mass (Da)

disulfide bond oxidation methylation formylation acetylation lipoic acid pyroglutamic acid (Q) deamidation (Q/N) carboxylation (E/D) phosphorylation sulfation

-H2 +O +CH2 +CO +C2H2O +C8H12OS2 -NH3 -NH +O +CO2 +HPO3 +SO3

-2.0157 +15.9949 +14.0156 +27.9949 +42.0106 +188.0330 -17.0265 +0.9840 +43.9898 +79.9663 +79.9568

Asn and Ala Asp and Ala Glu and Asn Leu and Asn Leu and Asp Leu and Gly Met and Ala Ser and Ala Ser and Gln Ser and Glu Ser and Leu Ser and Phe Trp and Asn

Gly and Gln Gly and Glu Gln and Asp Val and Gln Val and Glu Val and Ala Val and Cys Thr and Gly Thr and Asn Thr and Asp Val and Thr Tyr and Ala Tyr and His

185.0800 186.0640 243.0855 227.1270 228.1110 170.1055 202.0776 158.0691 215.0906 216.0746 200.1161 234.1004 300.1222

Table 2. Number and Proportion of Peptides Having Isomers among the Total Possible Number of Distinct Amino Acid Compositions, Considering the Substitution of One, Two, or Three Amino Acidsa isomers arising from 20 common amino acids

no. of amino acids varied no. of compositionally distinct peptides no. of peptides having isomers percentage

1 20 2 10%

2 210 61 29%

3 1540 815 53%

isomers arising from 19 common amino acids (i.e., excluding isoleucine)

no. of amino acids varied no. of compositionally distinct peptides no. of peptides having isomers percentage

1 19 0 0%

2 190 26 14%

3 1330 506 38%

a The total number of compositionally distinct peptides is 201, 202, or 203 for 20 common amino acids, or 191, 192, or 193 if isoleucine is excluded. However, we seek only combinations, not permutationsse.g., peptides with sequences, AB and BA, are counted only oncessee text. That list is then searched for unique pairs of isomeric peptides (A ) B and B ) A pairs are counted only once).

comprehensive list reported by Ken Mitchelhill (Delta Mass Version 2.1, http://prowl.rockefeller.edu/aainfo/deltamassv2. html). For example, Table 2 lists the number and proportion of peptides having isomers among all compositionally distinct peptides, based on 20 common equally abundant amino acids (or 19 by treating leucine and isoleucine as identical). It is surprising to discover how rapidly the number and proportion of isomeric peptides grow, as the number of amino acid substitutions grows. For example, among the 20 peptides differing by a single one of the 20 common amino acids, only two peptides (i.e., 10%) are isomeric (Leu vs Ile). However, the proportion of peptides having isomers jumps to 29% and 53% for peptides differing by two or three amino acids, respectively! Thus, we have the startling result that, irrespective of primary sequence, more than half of the possible compositionally distinct peptides differing by three amino acids have isomers that cannot be distinguished by accurate mass measurement! Of course, not all isomers result from Leu vs Ile. A list of all (26) such isomeric dipeptides (excluding Ile) differing by two amino acid residues is given in Table 3. Table 2 further shows that even if we exclude Leu or Ile, the proportion of peptides having isomers is still 14% and 38% of the total number of compositionally distinct peptides differing by two or three amino acids, respectively. The next issue is to determine the mass accuracy required to distinguish the various possible nominally isobaric peptides. Table 4 shows some common nonisomeric isobars (including

a Each mass is the sum of the masses of the two amino acids, minus the mass of H2O.

the possible posttranslational modifications listed in Table 1), along with their frequency of occurrence, among peptides differing by two amino acids. For example, the most common nominal isobar is that arising from CH4 vs O (differing by 36 mDa). One example of that mass difference is Lys vs Gln, but there are 141 other ways in which the same mass difference can be generated (for example, Tyr + Leu vs Phe + Glu). In contrast, the PH vs S mass difference (9.5 mDa) can arise in only one way (phosphorylation vs sulfation). It is clear from Table 4 that mass accuracy of ∼5 mDa suffices to distinguish virtually all of the possible nominally isobaric peptides, except for isomers. Note that by varying three amino acids per peptide, we found only one more possible elemental difference less than one Dalton, namely, N4O vs S2H8, with a mass difference of 0.45 mDa. Although that mass difference is actually less than the mass of an electron, such a doublet has been resolved experimentally by ESI heterodyne FT-ICR mass spectrometry.38 For a peptide library (natural or combinatorial),39-42 it is common that both the sequence and monomer compositions are not random: ergo, part of the sequence is conserved throughout the whole library. In such a situation, if at most two residues vary throughout the entire library, then Tables 2 and 4 show that more than 70% can be unambiguously identified at a mass accuracy of ∼5 mDa and comparable mass resolution if, as is typically the case, several peptides are present simultaneously. If more than two residues vary, then mass measurement cannot unambiguously identify more than 47% of the peptides. Other methods, such as MS/MS,6,8,9,25,26 must be brought to bear. For protein identification, more than one peptide from a given protein must be observed in order to identify that protein from a database. Experimental Prospects: Resolution of Peptide Mass Doublets. Although Tables 2 and 4 show the theoretical limits of peptide identification based solely on mass, it is important to establish whether the required mass accuracy and mass resolution can be achieved experimentally in a broadband mass spectrum of peptides present in a complex mixture (as from an enzymatic digest). We therefore conclude by examining three pairs of peptides separated by smaller and smaller mass differences. Figure 1 demonstrates much better than baseline mass resolution in the broadband ESI FT-ICR mass spectrum of an equimolar mixture of two pure peptides, each ∼1060 Da in mass, differing by a single amino acid substitution, namely, Lys vs Gln, (elemental difference CH4 vs O, corresponding to a mass difference of 36 mDa). Lysine and glutamine are the only Journal of Proteome Research • Vol. 3, No. 1, 2004 63

research articles

He et al.

Table 4. Partial List of Isobars (excluding isomers) among Peptides Differing by Two Amino Acids (including 20 naturally occurring and 11 post-translationally modified amino acids)a peptide A

peptide B

elemental difference

Leu or Ile and 2 S-H Phe + Leu Val + Met methylated Lys + Met lysine Phe Gln + Arg acetylated Lys + Tyr deamidated Gln + Val hydrazidation (NHNH3) phosphorylation (PO3H2) Pro + Met pyroglutamic acid + Glu Val + Trp Met + Leu

S-S and aspartic acid Met + Glu 2 Asp Trp + Ser glutamine oxidized Met Phe + His Trp + Phe Pro + Met O-methylation (OCH3) sulfaction (SO3H) 2 Asn His + Cys Glu + Arg Pro + Phe

C 2H8 C5H4 C2H8S CH8 CH4 C4 ON2H4 O2H4 O2 N2 PH C2H4S CO3 C5 SH4

a

multiplicity

72.8 69.4 55.1 39.8 36.4 33.0 32.4 21.1 17.7 11.2 9.5 7.4 6.5 4.0 3.4

21 27 20 3 142 41 3 20 26 58 1 2 2 4 6

Multiplicity is the number of peptide pairs with the same elemental composition difference (see text).

Figure 1. FT-ICR mass spectrum of an equimolar mixture of the amyloid b-protein fragment 25-35 (Ab25-35) and its single amino acid mutant K28Q (i.e., the lysine at position 28 in Ab25-35 is replaced by glutamine). Singly protonated peptide (M + H)+ ions (including those with 13C, 15N, and/or 18O atoms) were isolated by SWIFT excitation/ejection (bottom) and the two monoisotopic (M + H)+ ions are completely resolved (top, plotted from a separate spectrum). The masses of two monoisotopic peptides differ by 0.036 Da, arising from the elemental composition difference, CH4 vs O.

two isobaric species (leucine and isoleucine are isomers) among the 20 common amino acids and the CH4 vs O mass difference is the one most frequently encountered. Differentiation based on accurate mass alone between peptides differing by the substitution, Lys vs Gln, has been demonstrated previously, in separate experiments, by orthogonal time-of-flight mass spectrometry.43 However, that instrumentation does not provide the resolution necessary to measure the two peptide masses simultaneously in a mixture, as shown here. Figure 2 shows that the same CH4 vs O mass difference can still be baseline-resolved in much larger peptides (monoisotopic masses, 2433.4302 and 2433.3938) by broadband ESI FT-ICR MS. In fact, the doublet can be resolved not only for the monoisotopic species (i.e., all carbons are 12C, all nitrogens are 14N, all oxygens are 16O, all sulfurs are 32S, etc.) of the two buforin peptides, but also (not shown) for the same peptides 64

O2 O2S O4 C2O O OS C4 C3 S CO S N2O2 N2 S N2O2 C3

mass diff. (mDa)

Journal of Proteome Research • Vol. 3, No. 1, 2004

Figure 2. FT-ICR mass spectrum of an equimolar mixture of Buforin II and its single amino acid mutant K21Q (BF K21Q). Two charge states (4+ and 5+) were SWIFT-isolated simultaneously (bottom), and each charge state shows a resolved isotopic distribution (middle). The monoisotopic peaks for the two peptides are baseline-resolved for each of the two charge states (top). The monoisotopic molecular weights of the two neutral peptides are 2433.4302 (buforin) and 2433.3938 (buforin K21Q) Da.

in which one (or two or three) 12C atoms are replaced by 13C. That is important, because for peptides of this size, species containing one 13C atom are actually more abundant than the monoisotopic species. Also, it is worth noting that FT-ICR mass resolving power increases linearly with charge state,44 so that this higher-mass peptide doublet can still be resolved. Figure 3 presents a more challenging peptide doublet, namely, two chemically modified forms of the nonapeptide, bradykinin, differing this time by only 11 mDa (CO vs N2). Again, the pure peptides are readily baseline-resolved (Figure 3, left) as an equimolar mixture of pure compounds by storedwaveform inverse Fourier transform (SWIFT)32,33 isolation of the molecular ion region, followed by broadband excitation and detection at a mass resolving power, m/∆m50% ≈ 450 000, in which m is ion mass and ∆m50% is spectral peak full width at half-maximum peak height. A more severe test is provided by broadband detection, without prior SWIFT isolation, of the same equimolar peptides mixed with a pepsin digest of a p19 tumor suppressor protein. Although the relative abundance of

Spell-Checking Nature’s Translations

Figure 3. FT-ICR mass spectra of an equimolar mixture of two derivatives of the peptide bradykinin: O-methyl (CO-OCH3) and hydrazide (CO-NH-NH2). The two peptides differ in mass by 0.011 Da, corresponding to the elemental difference, CO vs N2. Left: Resolution of the doubly protonated pure species isolated by SWIFT. Right: Resolution of the same two species present as minor components in a very complex peptide mixture. The two peptide derivatives were mixed with a tumor suppressor protein p19 pepsin digest at 1:5 molar ratio, and acquired in broadband mode. Despite their low relative abundance (bottom right), the two peptides are resolved in an m/z scale-expanded view (top right), demonstrating simultaneous ultrahigh resolution as well as high dynamic range.

the target peptides is much less than many of the peptides in the digest, the two target peptides are nevertheless baselineresolved at a mass resolving power of 180 000, demonstrating both the high resolution and the large dynamic range of FT-ICR mass spectrometry. The smallest mass difference shown in Table 4 is 3.4 mDa (SH4 vs C3). As a final test, we therefore custom-synthesized the two peptides, ALANGMARSHALL and ALANGPARSHALF, differing by substitution of the amino acids, Pro and Phe for Met and Leu. Figure 4 shows baseline resolution of those two peptides by ESI FT-ICR MS. The doubly protonated peaks were SWIFT-isolated from an equimolar mixture followed by broadband excitation and detection. This time, the resolution of the two peptides is shown for both the monoisotopic (Figure 4, top left) and single-13C (Figure 4, top right) species. In both cases, the separate isotopic peaks are well-resolved. Because the 3.4 mDa mass difference is the smallest (except for isomers) that can result from two peptides differing by up to two amino acids (even allowing for 11 common posttranslational modifications), it is evident that virtually all such peptide doublets can be resolved and assigned by broadband ESI FT-ICR mass spectrometry.

Conclusion In summary, a surprising number of nominally isobaric peptides are isomers, and thus indistinguishable even by very accurate mass measurement. The good news is that broadband electrospray ionization FT-ICR mass spectrometry can routinely resolve (and thus identify with confidence, even from a complex mixture) even the closest commonly encountered mass doublets for peptides differing by up to two amino acids. We have considered only peptide pairs having the same number of 13C atoms (typically zero or one). However, because FT-ICR MS can resolve nominal “isobars” differing in elemental

research articles

Figure 4. Resolution, in a broadband ESI FT-ICR mass spectrum, of a mass difference, 0.0034 Da (corresponding to the elemental difference, C3 vs SH4), for the two peptides, ALANGPARSHALF and ALANGMARSHALL. The mass difference arises from substitution of the amino acids, Pro and Phe for Met and Leu. This example represents the smallest (nonzero) mass difference between peptides differing by at most two unmodified amino acids.

composition (“isotopic fine structure”) for proteins up to 16 kDa in mass,45 it is possible to enhance peptide identification by resolving, e.g., the monoisotopic form of one peptide from the first isotopic species (containing one 13C) of another peptide. For example, deamidation (NH2 replaced by OH) vs replacement of 12C by 13C each increase peptide mass by ∼1 Da, but differ by 19.3 mDa (13CNH vs 12CO). Other published examples are differentiation between the monoisotopic peak of an asparagine containing peptide and the first isotopic peak of the same peptide in which Asn is replaced by Ile or Leu, corresponding to a mass difference of 44.5 mDa (12CNO vs 13 12 C C2H5), and differentiation between the monoisotopic peak of a methionine-containing peptide and the second isotopic peak of the same peptide in which Met is replaced by Glu, corresponding to a mass difference of 8.8 mDa (12C2H2S vs 13 C2O2) in a dodecapeptide library.46 Heterodyne detection in FT-ICR MS allows the resolution of peptides differing by less than an electron mass (Val-Met-Met vs Ser-His-His).38 However, that technique is limited to a relatively small mass window. Finally, we have assumed equal relative abundances for the 20 common amino acids. In fact, the relative abundances of amino acids in (e.g.) the SWISS-PROT database differ noticeably, ranging from 1.2% for Trp to 9.53% for Leu.47 The actual number and proportion of isomeric peptides will therefore differ somewhat (but not dramatically) from the values in Table 2.

Acknowledgment. The authors thank Dr. Guillaume van der Rest for many helpful discussions. This work was supported by the NSF National High-Field FT-ICR Mass Spectrometry Facility (CHE 99-09502), the USA Public Health Service (NIHJournal of Proteome Research • Vol. 3, No. 1, 2004 65

research articles GM-31683), Florida State University, and the National High Magnetic Field Laboratory at Tallahassee, Florida.

References (1) Henzel, W. J.; Billeci, T. M.; Stults, J. T.; Wong, S. C.; Grimley, C.; Watanabe, C. Identifying proteins from 2-dimensional gels by molecular mass searching of peptide fragments in protein sequence databases, Proc. Natl. Acad. Sci. U.S.A. 1993, 90, 50115015. (2) Mann, M.; Hojrup, P.; Roepstorff, P. Use of mass spectrometric molecular weight information to identify proteins in sequence databases, Biol. Mass. Spectrom. 1993, 22, 338-345. (3) Cox, A. L.; Skipper, J.; Chen, Y.; Henderson, R. A.; Darrow, T. L.; Shabanowitz, J.; Engelhard, V. H.; Hunt, D. F.; Slingluff, C. L. Identification of a peptide recognized by 5 melanoma-specific human cytotoxic T-cell lines, Science 1994, 264, 716-719. (4) Cottrell, J. S. Protein identification by peptide mass fingerprinting, Pept. Res. 1994, 7, 115-124. (5) James, P.; Quadroni, M.; Carafoli, E.; Gonnet, G. Protein identification in DNA databases by peptide mass fingerprinting, Protein Sci. 1994, 3, 1347-1350. (6) Shevchenko, A.; Jensen, O. N.; Podtelejnikov, A. V.; Sagliiocco, F.; Wilm, M.; Vorm, O.; Mortensen, P.; Shevchenko, A.; Boucherie, H.; Mann, M. Linking genome and proteome by mass spectrometry: Large-scale identification of yeast proteins from twodimensional gels, Proc. Natl. Acad. Sci. U.S.A. 1996, 93, 14 44014 445. (7) Qin, J.; Fenyo, D.; Zhao, Y. M.; Hall, W. W.; Chao, D. M.; Wilson, C. J.; Young, R. A.; Chait, B. T. A strategy for rapid, high confidence protein identification, Anal. Chem. 1997, 69, 3995-4001. (8) Jensen, O. N.; Wilm, M.; Shevchenko, A.; Mann, M. Peptide Sequencing of 2-DE gel-isolated proteins by nanoelectrospray tandem mass spectrometry, Methods Mol. Biol. 1999, 112, 571588. (9) Yates, J. R., III.; Carmack, E.; Hays, L.; Link, A. J.; Eng, J. K. Automated protein identification using microcolumn liquid chromatography-tandem mass spectrometry, Methods Mol. Biol. 1999, 112, 533-569. (10) Gygi, S. P.; Corthals, G. L.; Zhang, Y.; Rochon, Y.; Aebersold, R. Evaluation of two-dimensional gel electrophoresis-based proteome analysis technology, Proc. Natl. Acad. Sci. U.S.A. 2000, 97, 9890-9895. (11) Yates, J. R., III. Mass spectrometry. From genomics to proteomics, Trends Genet. 2000, 16, 5-8. (12) Aebersold, R.; Goodlett, D. R. Mass Spectrometry in Proteomics, Chem. Rev. 2001, 101, 269-295. (13) Kelleher, N. L.; Lin, H. Y.; Valaskovic, G. A.; Aaserud, D. J.; Fridriksson, E. K.; McLafferty, F. W. Top down versus bottom up protein characterization by tandem high-resolution mass spectrometry, J. Am. Chem. Soc. 1999, 121, 806-812. (14) Meng, F.; Cargile, B. J.; Miller, L. M.; Forbes, A. J.; Johnson, J. R.; Kelleher, N. L. Informatics and multiplexing of intact protein identification in bacteria and the archaea, Nat. Biotechnol. 2001, i19, 952-957. (15) Mann, M. Role of mass accuracy in the identification of proteins by the mass spectrometric peptide maps, J. Protein Chem. 1994, 13, 506-507. (16) Zubarev, R. A.; Hakansson, P.; Sundqvist, B. Accuracy Requirements for Peptide Characterization by Monoisotopic Molecular Mass Measurements, Anal. Chem. 1996, 68, 4060-4063. (17) Takach, E. J.; Hines, W. M.; Patterson, D. H.; Juhasz, P.; Falick, A. M.; Vestal, M. L.; Martin, S. A. Accurate Mass Measurements Using MALDI-TOF with Delayed Extraction, J. Protein Chem. 1997, 16, 363-369. (18) Clauser, K. R.; Baker, P.; Burlingame, A. L. Role of accurate mass measurement ((10 ppm) in protein identification strategies employing MS or MS MS and database searching, Anal. Chem. 1999, 71, 2871-2882. (19) Berndt, P.; Hobohm, U.; Langen, H. Reliable automatic protein identification from matrix-assisted laser desorption/ionization mass spectrometric peptide fingerprints, Electrophoresis 1999, 20, 3521-3526. (20) Conrads, T. P.; Anderson, G. A.; Veenstra, T. D.; Pasa-Tolic, L.; Smith, R. D. Utility of Accurate Mass Tags for Proteome-Wide Protein Identification, Anal. Chem. 2000, 72, 3349-3354. (21) Falick, A. M.; Hawke, D. H.; Hall, S. C. Mass Spectrometry in combination with less-specific enzymes for protein mapping, In Proc. 47th ASMS Conf. on Mass Spectrometry & Allied Topics: Dallas, TX, 1999.

66

Journal of Proteome Research • Vol. 3, No. 1, 2004

He et al. (22) Palmblad, M.; Wetterhall, M.; Markides, K.; Hakansson, P.; Bergquist, J. Analysis of enzymatically digested proteins and protein mixtures using a 9.4 T Fourier transform ion cyclotron mass spectrometer, Rapid Commun. Mass Spectrom. 2000, 14, 1029-1034. (23) Green, M. K.; Johnston, M. V.; Larsen, B. S. Mass accuracy and sequence requirements for protein database searching, Anal. Biochem. 1999, 275, 39-46. (24) Eriksson, J.; Chait, B. T.; Fenyo, D. A. Statistical Basis for Testing the Significance of Mass Spectrometric Protein Identification Results, Anal. Chem. 2000, 72, 999-1005. (25) Gevaert, K.; Verschelde, J. L.; Puype, M.; VanDamme, J.; Goethals, M.; DeBoeck, S.; Vandekerckhove, J. Structural analysis and identification of gel-separated proteins, available in the femtomole range, using a novel computer program for peptide sequence assignment, by matrix-assisted laser desorption ionization time-of-flight mass spectrometry, Electrophoresis 1996, 17, 918-924. (26) van der Rest, G.; He, F.; Emmett, M. R.; Marshall, A. G.; Gaskell, S. Gas-phase cleavage of Edman-derivatized electrosprayed tryptic peptides: mass-based protein identification without liquid chromatographic separation, J. Am. Soc. Mass Spectrom. 2001, 12, 288-295. (27) Palmblad, M.; Ramstrom, M.; Markides, K. E.; Hakansson, P.; Bergquist, J. Prediction of chromatographic retention and protein identification in liquid chromatography/mass spectrometry, Anal. Chem. 2002, 74, 5826-5830. (28) Petritis, K.; Kangas, L. J.; Ferguson, P. L.; Anderson, G. A.; PasaTolic, L.; Lipton, M. S.; Auberry, K. J.; Strittmatter, E. F.; Shen, Y.; Zhao, R.; Smith, R. D. Use of artificial neural networks for the accurate prediction of peptide liquid chromatography elution times in proteome analyses, Anal. Chem. 2003, 75, 1039-1048. (29) Senko, M. W.; Hendrickson, C. L.; Pasa-Tolic, L.; Marto, J. A.; White, F. M.; Guan, S.; Marshall, A. G. Electrospray ionization FT-ICR mass spectrometry at 9.4 T, Rapid Commun. Mass Spectrom. 1996, 10, 1824-1828. (30) Senko, M. W.; Hendrickson, C. L.; Emmett, M. R.; Shi, S. D.-H.; Marshall, A. G. External accumulation of ions for enhanced electrospray ionization Fourier transform ion cyclotron resonance mass spectrometry, J. Am. Soc. Mass Spectrom. 1997, 8, 970-976. (31) Emmett, M. R.; Caprioli, R. M. Micro-electrospray mass spectrometry - ultrahigh-sensitivity analysis of peptides and proteins, J. Am. Soc. Mass Spectrom. 1994, 5, 605-613. (32) Marshall, A. G.; Wang, T.-C. L.; Ricca, T. L. Tailored Excitation for Fourier transform ion cyclotron resonance mass spectrometry, J. Am. Chem. Soc. 1985, 107, 7893-7897. (33) Guan, S.; Marshall, A. G. Stored waveform inverse Fourier transform (SWIFT) ion excitation in trapped-ion mass spectrometry: theory and applications, Int. J. Mass Spectrom. Ion Processes 1996, 157/158, 5-37. (34) Ledford, E. B., Jr.; Rempel, D. L.; Gross, M. L. Space charge effects in Fourier transform mass spectrometry mass calibration, Anal. Chem. 1984, 56, 2744-2748. (35) Shi, S. D.-H.; Drader, J. J.; Freitas, M. A.; Hendrickson, C. L.; Marshall, A. G. Comparison and interconversion of the two most common frequency-to-mass calibration functions for Fourier transform ion cyclotron resonance mass spectrometry, Int. J. Mass Spectrom. 2000, 195/196, 591-598. (36) Senko, M. W.; Canterbury, J. D.; Guan, S.; Marshall, A. G. A Highperformance modular data system for FT-ICR mass spectrometry, Rapid Commun. Mass Spectrom. 1996, 10, 1839-1844. (37) Korn, G. A.; Korn, T. M. Manual of Mathematics; McGraw-Hill Book Company: New York, 1967. (38) He, F.; Hendrickson, C. L.; Marshall, A. G. Baseline mass resolution of peptide isobars: a new record for molecular mass resolution, Anal. Chem. 2001, 73, 647-650. (39) Lambert, P. H.; Boutin, J. A.; Bertin, S.; Fauchere, J. L.; Volland, J. P. Evaluation of high performance liquid chromatography electrospray mass spectrometry with selected ion monitoring for the analysis of large synthetic combinatorial peptide libraries, Rapid Commun. Mass Spectrom. 1997, 11, 1971-1976. (40) Fang, A. S.; Vouros, P.; Stacey, C. C.; Kruppa, G. H.; Laukien, F. H.; Wintner, E. A.; Carell, T.; Rebek, J. Rapid characterization of combinatorial libraries using electrospray ionization Fourier transform ion cyclotron resonance mass spectrometry, Comb. Chem. High Throughput Screen. 1998, 1, 23-33. (41) Stevenson, C. L.; Augustijns, P. F.; Hendren, R. W. Use of caco-2 cells and LC/MS/MS to screen a peptide combinatorial library for permeable structures, Int. J. Pharm. 1999, 177, 103-115.

research articles

Spell-Checking Nature’s Translations (42) Schmid, D. G.; Grosche, P.; Jung, G. High-resolution analysis of a 144-membered pyrazole library from combinatorial solid-phase synthesis by using electrospray ionization Fourier transform ion cyclotron resonance mass spectrometry, Rapid Commun. Mass Spectrom. 2001, 15, 341-347. (43) Bahr, U.; Karas, M. Differentation of “isobaric” peptides and human milk oligosaccharides by exact mass measurements using electrospray ionization orthogonal time-of-flight analysis, Rapid Commun. Mass Spectrom. 1999, 13, 1052-1058. (44) Marshall, A. G.; Hendrickson, C. L. Charge reduction lowers mass resolving power for isotopically resolved electrospray ionization Fourier transform ion cyclotron resonance mass spectra, Rapid Commun. Mass Spectrom. 2001, 15, 232-235. (45) Shi, S. D.-H.; Hendrickson, C. L.; Marshall, A. G. Counting individual sulfur atoms in a protein by ultrahigh-resolution

Fourier transform ion cyclotron resonance mass spectrometry: experimental resolution of isotopic fine structure in proteins, Proc. Natl. Acad. Sci. U.S.A. 1998, 95, 11 532-11 537. (46) Ramjit, H. G.; Kruppa, G. H.; Spier, J. P.; Ross, C. W. I.; Garsky, V. M. The significance of monoisotopic and carbon-13 isobars for the identification of a 19-component dodecapeptide library by positive ion electrospray Fourier transform ion cyclotron resonance mass spectrometry, Rapid Commun. Mass Spectrom. 2000, 14, 1368-1376. (47) Bairoch, A.; Apweiler, R. The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000, Nucleic Acids Res. 2000, 28, 45-48.

PR034058Z

Journal of Proteome Research • Vol. 3, No. 1, 2004 67