Mining a Tandem Mass Spectrometry Database To Determine the

Two different statistical methods were used to identify sequence-dependent fragmentation patterns that could be used to improve fragmentation models i...
1 downloads 5 Views 502KB Size
Anal. Chem. 2003, 75, 6251-6264

Mining a Tandem Mass Spectrometry Database To Determine the Trends and Global Factors Influencing Peptide Fragmentation Eugene A. Kapp,† Fre´de´ric Schu 1 tz,‡ Gavin E. Reid,† James S. Eddes,† Robert L. Moritz,† § Richard A. J. O’Hair, Terence P. Speed,‡,| and Richard J. Simpson*,†

Joint ProteomicS Laboratory, Ludwig Institute for Cancer Research and The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria 3050, Australia, Division of Genetics and Bioinformatics, The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria 3050, Australia, School of Chemistry, University of Melbourne, Melbourne, Victoria 3010, Australia, and Department of Statistics, University of California, Berkeley, California 94720

A database of 5500 unique peptide tandem mass spectra acquired in an ion trap mass spectrometer was assembled for peptides derived from proteins digested with trypsin. Peptides were identified initially from their tandem mass spectra by the SEQUEST algorithm and subsequently validated manually. Two different statistical methods were used to identify sequence-dependent fragmentation patterns that could be used to improve fragmentation models incorporated into current peptide sequencing and database search algorithms. The currently accepted “mobile proton” model was expanded to derive a new classification scheme for peptide mass spectra, the “relative proton mobility” scale, which considers peptide ion charge state and amino acid composition to categorize peptide mass spectra into peptide ions containing “nonmobile”, “partially mobile”, or “mobile” protons. Quantitation of amide bond fragmentation, both N- and C-terminal to any given amino acid, as well as the positional effect of an amino acid in a peptide and peptide length on such fragmentation, has been determined. Peptide bond cleavage propensities, both positive (i.e., enhanced) and negative (i.e., suppressed), were determined and ranked in order of their cleavage preferences as primary, secondary, or tertiary cleavage effects. For example, primary positive cleavage effects were observed for Xaa-Pro and Asp-Xaa bond cleavage for mobile and nonmobile peptide ion categories, respectively. We also report specific pairwise interactions (e.g., Asn-Gly) that result in enhanced amide bond cleavages analogous to those observed in solutionphase chemistry. Peptides classified as nonmobile gave low or insignificant scores, below reported MS/MS score thresholds (cutoff filters), indicating that incorporation of * Corresponding author. Phone: +61-3-9341-3110. Fax: +61-3-9341-3192. E-mail: [email protected]. † Ludwig Institute for Cancer Research and The Walter and Eliza Hall Institute of Medical Research. ‡ Division of Genetics and Bioinformatics, The Walter and Eliza Hall Institute of Medical Research. § University of Melbourne. | University of California. 10.1021/ac034616t CCC: $25.00 Published on Web 10/03/2003

© 2003 American Chemical Society

the relative proton mobility scale classification would lead to improvements in current MS/MS scoring functions. Proteomics is playing a pivotal role in the postgenome era in helping to define the functional role of genes.1,2 Mass spectrometry (MS), coupled with a range of electrophoretic and multidimensional chromatographic separation techniques, has emerged as a key platform technology in proteomics for the rapid and highthroughput identification, characterization, and quantitation of proteins.3 Typically, proteins are digested using trypsin and the resultant peptides then subjected to MS analysis. The tryptic peptide masses provide a characteristic “mass fingerprint”, which can be used to identify proteins.4-10 Although this approach is useful for identifying proteins in simple mixtures (e.g., 2-DE gel spots), peptide sequence information, obtained by tandem mass spectrometry (MS/MS), is required for identifying individual proteins in more complex mixtures (e.g., 1-DE gels). To this end, sophisticated algorithms (e.g., SEQUEST, Mascot) have been developed for identifying proteins from peptide MS/MS data,11-14 whereby peptides are identified by correlating the uninterpreted (1) Pennington, S. R.; Dunn, M. J. Proteomics: from protein sequence to function, 1st ed.; BIOS Scientific Publishers Ltd.: New York, 2001. (2) Simpson, R. J. Proteins and Proteomics: A Laboratory Manual, 1st ed.; Cold Spring Harbor Laboratory Press: Cold Spring Harbor, NY, 2003. (3) Aebersold, R.; Goodlett, D. R. Chem. Rev. 2001, 101, 269-95. (4) Henzel, W. J.; Billeci, T. M.; Stults, J. T.; Wong, S. C.; Grimley, C.; Watanabe, C. Proc. Natl. Acad. Sci U.S.A. 1993, 90, 5011-5. (5) James, P.; Quadroni, M.; Carafoli, E.; Gonnet, G. Biochem. Biophys. Res. Commun. 1993, 195, 58-64. (6) Mann, M.; Hojrup, P.; Roepstorff, P. Biol. Mass Spectrom. 1993, 22, 33845. (7) Pappin, D. J.; Hojrup, P.; Bleasby, A. J. Curr. Biol. 1993, 3, 327-32. (8) Yates, J. R., III; Speicher, S.; Griffin, P. R.; Hunkapiller, T. Anal. Biochem. 1993, 214, 397-408. (9) Zhang, W.; Chait, B. T. Anal. Chem. 2000, 72, 2482-9. (10) Clauser, K. R.; Baker, P.; Burlingame, A. L. Anal. Chem. 1999, 71, 287182. (11) Eng, J. K.; McCormack, A. L.; Yates, J. R., III. J. Am. Soc. Mass Spectrom. 1994, 5, 976-89. (12) Yates, J. R., III; Eng, J. K.; McCormack, A. L. Anal. Chem. 1995, 67, 320210. (13) Yates, J. R., III; Eng, J. K.; McCormack, A. L.; Schieltz, D. Anal. Chem. 1995, 67, 1426-36. (14) Perkins, D. N.; Pappin, D. J.; Creasy, D. M.; Cottrell, J. S. Electrophoresis 1999, 20, 3551-67.

Analytical Chemistry, Vol. 75, No. 22, November 15, 2003 6251

MS/MS spectra with simulated (predicted) product ion spectra derived from peptides of the same mass contained in the available databases, using a relatively simple set of previously defined parameters regarding the expected fragmentation behavior of protonated peptide ions. Alternatively, proteins may be identified by a combination of peptide mass and partial sequence information, i.e., the “sequence tag” approach.15 For proteins not contained within sequence databases, it is necessary to determine partial or complete amino acid sequences using either manual16-18 or automated19-22 de novo peptide sequence analysis methods.23 While the above-mentioned algorithms for protein identification from peptide MS/MS data have enjoyed considerable success, the utility of these is directly related to the quality of the product ion spectra. Thus, if product ion spectra are formed that are not readily interpretable, low or insignificant search scores can result. This problem is exacerbated when a single peptide MS/MS spectrum is to be used for protein identification. In these instances, manual interrogation of the search results is required for data validation. Hence, a more detailed understanding of the factors influencing the gas-phase fragmentation of protonated peptides would allow the development of more robust search and scoring algorithms. Determination of the gas-phase fragmentation mechanisms of peptide ions has been the subject of significant interest over the years.24,25 These studies have been critical to understanding the fundamental gas-phase chemistry of peptide ions and for the development of a general model describing how peptides fragment in the gas phase. Cleavage of amide bonds under low-energy collisional activation conditions is generally thought to be initiated by migration of the charge from the initial site of protonation (e.g., the N-terminal amino group or the side chains of basic amino acids such as arginine, lysine, and histidine) to an amide carbonyl oxygen along the peptide backbone. This has been termed the “mobile proton” hypothesis and is one of the central tenets in peptide fragmentation mechanisms.24,26,27 Fragmentation of the peptide amide bond then occurs by neighboring group attack from an adjacent nucleophilic amide carbonyl group to yield complementary N-terminal b-type C-terminal y-type, or both,28,29 product ions, i.e., a “charge-directed” process. Additional fragmentation (15) Mann, M.; Wilm, M. Anal. Chem. 1994, 66, 4390-9. (16) Hunt, D. F.; Yates, J. R.; Shabanowitz, J.; Winston, S.; Hauer, C. R. Proc. Natl. Acad. Sci. U.S.A. 1986, 83, 6233-7. (17) Papayannopoulos, I. A.; Biemann, K. Acc. Chem. Res. 1994, 27, 370-8. (18) Papayannopoulos, I. A. Mass Spectrom. Rev. 1995, 14, 49-73. (19) Johnson, R. S.; Martin, S. A.; Biemann, K.; Stults, J. T.; Watson, J. T. Anal. Chem. 1987, 59, 2621-5. (20) Taylor, J. A.; Johnson, R. S. Rapid Commun. Mass Spectrom. 1997, 11, 106775. (21) Dancik, V.; Addona, T. A.; Clauser, K. R.; Vath, J. E.; Pevzner, P. A. J. Comput. Biol. 1999, 6, 327-42. (22) Chen, T.; Kao, M. Y.; Tepel, M.; Rush, J.; Church, G. M. J. Comput. Biol. 2001, 8, 325-37. (23) Verhagen, A. M.; Ekert, P. G.; Pakusch, M.; Silke, J.; Connolly, L. M.; Reid, G. E.; Moritz, R. L.; Simpson, R. J.; Vaux, D. L. Cell 2000, 102, 43-53. (24) Wysocki, V. H.; Tsaprailis, G.; Smith, L. L.; Breci, L. A. J. Mass Spectrom. 2000, 35, 1399-406. (25) O’Hair, R. A. J. Mass Spectrom. 2000, 35, 1377-81. (26) Dongre, A. R.; Jones, J. L.; Somogyi, A.; Wysocki, V. H. J. Am. Chem. Soc. 1996, 118, 8365-74. (27) Cox, K. A.; Gaskell, S. J.; Morris, M.; Whiting, A. J. Am. Soc. Mass Spectrom. 1996, 7, 759. (28) Biemann, K. Biomed. Environ. Mass Spectrom. 1988, 16, 99-111. (29) Roepstorff, P.; Fohlman, J. Biomed. Mass Spectrom. 1984, 11, 601.

6252

Analytical Chemistry, Vol. 75, No. 22, November 15, 2003

pathways, such as those resulting in the formation of a-type ions by the loss of CO from b-type ions, the loss of small molecules such as NH3 or H2O from amino acid side chains or the peptide backbone,25 the loss of diagnostic side-chain fragments30-32 (e.g., methanesulfonic acid from methionine sulfoxide or phosphoric acid from phosphoserine or phosphothreonine), sequential fragmentations,33 or intramolecular rearrangement product ions,34-36 may also contribute to the resultant product ion spectrum. The presence of particular amino acid residues located within a peptide sequence can act to enhance the degree of fragmentation occurring at a given amide bond. When the ionizing proton is freely “mobile”, enhanced cleavage is commonly observed at XaaPro bonds, presumably due to the higher local proton affinity of the proline imide bond compared to a conventional amide bond.16,37-39 In contrast, when ionizing protons are strongly “sequestered” at the initial site of protonation (i.e., “nonmobile”), for example, the [M + H]+ ion of an arginine-containing peptide, poor fragmentation is commonly observed. In this case, if an aspartic acid residue is present, preferential fragmentation of the Asp-Xaa bond may occur.40-43 A mechanism for this type of cleavage has been proposed to involve proton transfer from the aspartyl side chain to the adjacent amide bond to form a salt bridge structure, which then undergoes cleavage to yield the product ions.44 In this fragmentation pathway, the ionizing proton is not involved, making this type of cleavage a “charge-remote” process in contrast to the “charge-directed” process described above.25 In a complementary approach to the mechanism-based studies, several groups have systematically examined small databases of tryptic peptide MS/MS spectra to evaluate the effects of specific residues or charge states on fragmentation and to obtain further insights into the “mobile proton” model and peptide fragmentation behavior.45-47 For example, a recent analysis of 505 doubly charged tryptic peptides has shown that cleavage is more prominent at Asp-Xaa bonds for peptides that also contain an internal histidine residue, supporting the idea that the extent of cleavage observed C-terminal to aspartic acid is more pronounced if there is a basic (30) Lagerwerf, F. M.; van de Weert, M.; Heerma, W.; Haverkamp, J. Rapid Commun. Mass Spectrom. 1996, 10, 1905-10. (31) O’Hair, R. A. J.; Reid, G. E. Eur. Mass Spectrom. 1999, 5, 325-34. (32) Simpson, R. J.; Connolly, L. M.; Eddes, J. S.; Pereira, J. J.; Moritz, R. L.; Reid, G. E. Electrophoresis 2000, 21, 1707-32. (33) Ballard, K. D.; Gaskell, S. J. Int. J. Mass Spectrom. 1991, 111, 173-89. (34) Thorne, G. C.; Ballard, K. D.; Gaskell, S. J. J Am. Soc. Mass Spectrom. 1990, 1, 249-57. (35) Yague, J.; Paradela, A.; Ramos, M.; Ogueta, S.; Marina, A.; Barahona, F.; Lopez de Castro, J. A.; Vazquez, J. Anal. Chem. 2003, 75, 1524-35. (36) Farrugia, J. M.; O’Hair, R. A. J. Int. J. Mass Spectrom. 2003, 222, 229-42. (37) Schwartz, B. L.; Bursey, M. M. Biol. Mass Spectrom. 1992, 21, 92-6. (38) Loo, J. A.; Edmonds, C. G.; Smith, R. D. Anal. Chem. 1993, 65, 425-38. (39) Addario, V.; Guo, Y.; Chu, I. K.; Ling, Y.; Ruggerio, G.; Rodriquez, C. F.; Hopkinson, A. C.; Siu, K. W. Int. J. Mass Spectrom. 2002, 219, 101-14. (40) Qin, J.; Chait, B. T. J. Am. Chem. Soc. 1995, 117, 5411-2. (41) Qin, J.; Chait, B. T. Int. J. Mass Spectrom. 1999, 191, 313-20. (42) Yu, W.; Vath, J. E.; Huberty, M. C.; Martin, S. A. Anal. Chem. 1993, 65, 3015-23. (43) Sullivan, A. G.; Brancia, F. L.; Tyldesley, R.; Bateman, R.; Sidhu, K.; Hubbard, S. J.; Oliver, S. G.; Gaskell, S. J. Int. J. Mass Spectrom. 2001, 210, 665-76. (44) Lee, S. W.; Kim, H. S.; Beauchamp, J. J. Am. Chem. Soc. 1998, 120, 318895. (45) van Dongen, W. D.; Ruijters, H. F.; Luinge, H. J.; Heerma, W.; Haverkamp, J. J. Mass Spectrom. 1996, 31, 1156-62. (46) Huang, Y.; Wysocki, V. H.; Tabb, D. L.; Yates, J. R. Int. J. Mass Spectrom. 2002, 219, 233-44. (47) Breci, L. A.; Tabb, D. L.; Yates, J. R., III; Wysocki, V. H. Anal. Chem. 2003, 75, 1963-71.

residue in the peptide that can sequester available protons.46 More recently, the laboratories of Wysocki and Yates have carried out a statistical analysis of the fragmentation behavior of a database of selected doubly charged tryptic peptides.48 This work led to the development of a relative cleavage preference scale for fragmentations occurring at the N- or C-terminal side of each amino acid within the peptide sequence, as well as an examination of factors that influence the appearance of product ions corresponding to the neutral loss of ammonia and water. Moreover, the results from this study confirmed previous observations relating to preferred fragmentation of Xaa-Pro bonds for doubly protonated peptide ions,49 as well as noting enhanced fragmentation at His-Xaa and at Xaa-Gly and Xaa-Ser bonds. A mechanistic rationale to explain preferential cleavage at the C-terminal side of histidine has been described previously.24,50 Although the study by the Wysocki and Yates laboratories represents the most comprehensive peptide MS/MS database analysis carried out to date, the results represent only a subset of the types of peptides that are commonly encountered in a typical “tryptic” digest. The spectra in this database consisted of doubly protonated peptides, filtered to exclude those peptides containing less than 50% of the expected sequence ions, as well as those containing internal lysine or arginine residues. Typically, the amino acid sequences of such peptides can be readily identified from their MS/MS spectra by automated search algorithms due to the favorable fragmentation behavior of the doubly protonated precursor ions, which generally yield an extensive series of singly charged N- and C-terminal sequence ions.49,51,52 It is widely acknowledged that the precursor ion charge state and amino acid composition can have a dramatic, often detrimental effect on the formation of sufficient product ions to enable subsequent identification of the peptide. Singly charged peptide ions, particularly those terminated by an arginine residue, generally yield less sequence information, with significant neutral loss and other rearrangement processes contributing to the resultant product ion spectrum. Preferential Asp-Xaa bond cleavage, leading to the formation of a single dominant product ion, may also limit the ability of conventional search algorithms to correctly identify these peptides. A number of novel chemical and bioinformatic tools have subsequently been developed to exploit the appearance of this cleavage for protein identification.43,53,54 Tryptic peptides containing internal lysine or arginine residues generally yield triply or quadruply charged precursor ion charge states upon electrospray ionization and, as a result, may also have a detrimental effect on the fragmentation behavior of peptide ions.49 Thus, an examination of factors such as the effect of charge state,55,56 proton mobility,24 peptide composition,26 site-specific (48) Tabb, D. L.; Smith, L. L.; Breci, L.; Wysocki, V. H.; Lin, D. Y.; Yates, J. R. Anal. Chem. 2003, 75, 1155-63. (49) Tang, X. J.; Thibault, P.; Boyd, R. K. Anal. Chem. 1993, 65, 2824-34. (50) Farrugia, J. M.; O’Hair, R. A. J.; Reid, G. E. Int. J. Mass Spectrom. 2001, 210, 71-87. (51) Hunt, D. F.; Zhu, N.-Z.; Shabanowitz, J. Rapid Commun. Mass Spectrom. 1989, 3, 122-4. (52) Eddes, J. S.; Kapp, E. A.; Frecklington, D. F.; Connolly, L. M.; Layton, M. J.; Moritz, R. L.; Simpson, R. J. Proteomics 2002, 2, 1097-103. (53) Brancia, F. L.; Butt, A.; Beynon, R. J.; Hubbard, S. J.; Gaskell, S. J.; Oliver, S. G. Electrophoresis 2001, 22, 552-9. (54) Sidhu, K. S.; Sangvanich, P.; Brancia, F. L.; Sullivan, A. G.; Gaskell, S. J.; Wolkenhaue, O.; Oliver, S. G.; Hubbard, S. J. Proteomics 2001, 1, 136877.

amino acid locations,57 and peptide conformation58,59 as well as product ion abundances should lead to a more global understanding of peptide fragmentation and lead to improved methods for protein identification by automated search algorithms. Here, we have developed a comprehensive database of 5500 unique, manually validated, tryptic peptide MS/MS spectra. Comprehensive analysis of these spectra reveals important trends and factors regarding the fragmentation behavior of protonated peptide ions. The results of this study will enable the derivation of predictive models for MS/MS peptide ion fragmentation that can be incorporated into existing and new search algorithms. EXPERIMENTAL SECTION The peptide CID MS/MS spectra in the database compiled for this study were acquired by LC/MS/MS,60,61 following tryptic digestion of a diverse range of membrane and cytosolic proteins from several tissue types and cancer cell lines using a quadrupole ion trap mass spectrometer (model LCQ; Finnigan, San Jose, CA) equipped with electrospray ionization.32,62 The amino acid sequences of the uninterpreted CID MS/MS spectra were determined by correlation with predicted spectra of peptide sequences present in a nonredundant protein sequence database (950 000 proteins)52 using the SEQUEST algorithm.11 All SEQUEST outputs were then manually validated and assigned by the investigator as (i) a positive identification (see below), (ii) “potential de novo” (false positive but good-quality spectrum, i.e., spectra containing at least four consecutive product ions with the potential to yield a tripeptide sequence), (iii) “poor spectrum” (spectrum too complex or low signal-to-noise ratio), or (iv) “nonpeptide” (background or matrix-related contaminant) using an interactive in-house program, CHOMPER,52 prior to submission to a relational database. This program incorporates the cutoff filters or thresholds developed by Yates and co-workers63,64 as a guide to highlight probable positive identifications. Peptide spectra from proteins with multiple peptide matches were assigned first. For these proteins, peptide spectra with scores above the thresholds were quickly visualized for spectral quality and y-type and b-type ion matching continuity. Peptide spectra with below accepted threshold values,63 or DelCn values below 0.1 were examined more closely, first, for significant signal-to-noise ratio (product ions clearly above baseline noise) and, second, for whether dominant peaks could be easily explained and annotated (55) Burlet, O.; Orkiszewski, R. S.; Ballard, K. D.; Gaskell, S. J. Rapid. Commun. Mass Spectrom. 1992, 6, 658-62. (56) Downard, K. M.; Biemann, K. J. Am. Soc. Mass Spectrom. 1994, 5, 96675. (57) Kinter, M.; Sherman, N. E. Protein Sequencing and Identification Using Tandem Mass Spectrometry; Kinter, M., Sherman, N. E., Eds.; WileyInterscience: New York, 2000; Chapter 4. (58) Loo, J. A.; He, J. X.; Cody, W. L. J. Am. Chem. Soc. 1998, 120, 4542-3. (59) Tsaprailis, G.; Nair, H.; Somogyi, A.; Wysocki, V. H.; Zhong, W. Q.; Futrell, J. H.; Summerfield, S. G.; Gaskell, S. J. J. Am. Chem. Soc. 1999, 121, 514254. (60) Moritz, R. L.; Reid, G. E.; Ward, L. D.; Simpson, R. J. Methods: Companion Methods Enzymol. 1994, 6, 213-26. (61) Reid, G. E.; Rasmussen, R. K.; Dorow, D. S.; Simpson, R. J. Electrophoresis 1998, 19, 946-55. (62) Ji, H.; Reid, G. E.; Moritz, R. L.; Eddes, J. S.; Burgess, A. W.; Simpson, R. J. Electrophoresis 1997, 18, 605-13. (63) Link, A. J.; Eng, J.; Schieltz, D. M.; Carmack, E.; Mize, G. J.; Morris, D. R.; Garvik, B. M.; Yates, J. R., III. Nat. Biotechnol. 1999, 17, 676-82. (64) Washburn, M. P.; Wolters, D.; Yates, J. R., III. Nat. Biotechnol. 2001, 19, 242-7.

Analytical Chemistry, Vol. 75, No. 22, November 15, 2003

6253

the peptide backbone. Then, to quantify the extent of fragmentation occurring both N- and C-terminal to each amino acid residue within the peptide (similar to that carried out by Huang et al.46), a CIR was calculated by dividing the summed intensity for each amide bond cleavage site by the average intensity of all cleavage sites within the peptide as shown in eq 1, Z

∑b CIRs )

on the basis of a preferred cleavage (e.g., Xaa-Pro or Asp-Xaa bond cleavage) or precursor ion neutral loss. Where proteins were identified by a single peptide spectrum, the score of the matched peptide had to exceed threshold values in order to be included as a positive identification in the spectral database. Each entry therefore contains all the information from the SEQUEST identification data, in addition to the raw spectral information (i.e., the masses and abundances of all the product ions observed). Content of the MS/MS Database. The current database contains 16 165 peptide MS/MS spectra of which 11 145 have been positively identified (∼69% of total). Of these, 5500 are unique (i.e., where duplicate spectra of the same charge state but lower SEQUEST cross-correlation scores have been removed). 1127 (∼19%) of these are singly charged, 3388 (∼67%) doubly charged, 876 (∼13%) triply charged, and 108 (∼1%) quadruply charged (see Figure 1). Over half of the unique spectra (3135, 57%) correspond to true tryptic peptides, i.e., containing a single lysine or arginine residue at the C-terminus, with 1993 and 1142 containing or lacking an internal histidine residue, respectively, while 2365 entries (43%) represent spectra with either one (1806 spectra), two (523 spectra), or three (36 spectra) missed tryptic cleavage sites. Statistical Analysis Using Cleavage Intensity Ratio (CIR) Calculations. A program, “FragX” was developed to extract matched b- and y-type ion masses and abundances for all product ion charge states up to and including the precursor ion charge state for all unique peptide MS/MS spectra contained within the database. Product ion abundances were then normalized with respect to charge state (i.e., intensity divided by charge), assuming mass detection response to be linear with respect to the charge state of the ion, to better represent the actual relative abundances of the various amide bond fragmentation pathways. Further processing of each entry in the table was then performed by summing the normalized intensities for all the b- and y-type ions corresponding to each particular amide bond cleavage site along 6254

Analytical Chemistry, Vol. 75, No. 22, November 15, 2003

z+

+ ys

z)1

1 N

Figure 1. Charge-state and relative proton mobility scale classification of the 5500 unique peptides analyzed in the data mining study (for definition of the RPM scale see the Results and Discussion section). Electrospray ionization and tryptic digestion conditions give rise to predominantly doubly charged peptides (3388, ∼62%). Of these, 1115 are classified as mobile, 2035 are classified as partially mobile, and 238 are classified as nonmobile. Singly and triply charged peptides account for ∼21% and ∼16% of the total number of peptides, respectively.

z+ s

N

(1)

Z

∑∑

(bz+ i

+

yz+ i )

i)1 z)1

where N is the number of cleavage sites, s is the cleavage site of interest (1 e s e N), z is the charge state of the precursor ion, and biz+ and yiz+ are the normalized intensities of the b- and y-type ions with z charge at the ith cleavage site, respectively. The average intensities of the summed b- and y-type ion abundances were used for these calculations rather than the product ion spectrum’s total ion current in order to allow quantitative analysis of the “backbone” fragmentation processes, separate from any effects caused by other fragmentation pathways, such as those giving rise to neutral loss product ions, as well as to eliminate adverse effects from those peptide spectra with low signal-to-noise ratios. For each peptide, the average CIR value was determined, and then for a given set of peptides, an average CIR value was determined from the individual peptide average CIR values. An enhanced cleavage is indicated by an average CIR value greater than 1.0, while values less than 1.0 indicate reduced cleavage. Due to the low-mass cutoff (LMCO) effect encountered during CID MS/MS experiments in quadrupole ion trap instruments (product ions having m/z values ∼30% or less that of the precursor ion are not stable in the trap and are therefore not observed in the product ion spectrum), there is an inherent bias against the detection of low-mass b- and y-type product ions. Therefore, CIR values were calculated based on two separate data sets: (i) a “no cutoff” data set, where CIR values were determined for all amide bond cleavage sites, regardless of whether one of the potential product ions for that cleavage site fell below the LMCO of the instrument; and (ii) a “cutoff” data set where CIR values were not determined for a particular amide bond cleavage site if one of the potential product ions fell below the LMCO of the instrument. Statistical Analysis Using a Linear Model. Scripts used to process the FragX output were developed using Perl (version 5.6.0, http://www.perl.com). Prior to statistical analysis, the summed intensity for each cleavage site was normalized (i.e., intensity divided by total intensity of all cleavage sites, as for the CIR calculation but without dividing by 1/N) to determine the relative cleavage intensity. Statistical analysis was performed (statistical package R, version 1.5.0 http://www.r-project.org)) using the following linear model,

log2(cleavage intensity) ) baseline cleavage intensity + (increase/decrease due to residue on C-term) + (increase/decrease due to residue on N-term) + R(pos) + β(pos)2 + γ(log2(length of the peptide)) (2)

where pos denotes the relative position of the cleavage along the peptide backbone, with values ranging from 0 (cleavage at the N-terminal amide bond) to 1 (cleavage at the C-terminal amide bond). The pos terms (R(pos) + β(pos)2) were used to account for an observed increased ion abundance at intermediate-mass (corresponding to cleavage of amide bonds toward the middle of the peptide ion) as opposed to low- or high-mass ions (corresponding to cleavage of amide bonds at the extremes of the peptide ion). The log2(length) factor accounts for the lower overall intensity, due to the normalization process, of a given cleavage when it occurs in a longer peptide. The baseline cleavage intensity term represents an “average” intensity that would be expected if neither of the adjacent amino acids had any special effect on product ion abundance. A linear regression was performed to estimate the effect of each of these variables on the fragmentation process (i.e., estimate R, β, and γ as well as the specific effect of the adjacent amino acid on either side of the cleavage site). A variable selection procedure (backward selection65) was then applied to remove from the model all variables not significantly different from zero at the 1% level. Therefore, only those variables that have a real effect on the fragmentation process were retained. RESULTS AND DISCUSSION Quantitation of Asp-Xaa Bond Cleavage: Development of a “Relative Proton Mobility” Scale. The concept of the mobile proton model26 was developed to account for the apparent mobility of the migrating proton in a peptide to result in a heterogeneous population of precursor ions that differ primarily in their site of protonation27 and to rationalize the subsequent dissociation behavior of protonated peptide ions.24,26 Central to this model is the idea that fragmentation of most bonds within the peptide ion requires localization of a proton at the cleavage site, i.e., that the cleavages are “charge-directed”. Thus, if an amino acid side chain tightly binds or “sequesters” a proton, more energy will be required to transfer that proton from the basic side chain to the peptide backbone to induce amide bond dissociation. In certain cases, more energy might be required to mobilize the proton than is required to initiate fragmentation by alternative pathways. For example, under these nonmobile conditions (generally considered to occur when the number of arginine residues within the peptide is equal to or greater than the number of ionizing protons), it has been proposed that fragmentation at AspXaa bonds (where Xaa represents any amino acid) occurs via a charge-remote process.40-43 Given that fragmentation of Asp-Xaa bonds is highly dependent on the “mobility” of the ionizing protons,40,66,67 the extent of fragmentation occurring at these bonds should reflect the degree of proton mobility within a peptide ion and may provide a suitable framework for MS/MS spectra classification from which to initiate data mining studies. CIR values were determined for Asp-Xaa bond cleavages from the “no cutoff” data set for all singly, doubly, and triply protonated (65) Draper, N. R.; Smith, H. Applied Regression Analysis; John Wiley & Sons: New York, 1966. (66) Tsaprailis, G.; Somogyi, A.; Nikolaev, E. N.; Wysocki, V. H. Int. J. Mass Spectrom. 2000, 196, 467-79. (67) Gu, C.; Tsaprailis, G.; Breci, L.; Wysocki, V. H. Anal. Chem. 2000, 72, 580413.

peptides and classified according to the basic amino acid composition of the peptide and the prominence of the cleavage (see Table 1). For the 5500 unique peptide MS/MS spectra contained within our database, 3355 contain aspartic acid residues (61% of the total). Of these, 1545 do not contain a proline residue (Table 1). To evaluate the effect that proline (especially Asp-Pro bonds) has on cleavage, CIR values were determined for aspartic acidcontaining peptide data sets, both containing and lacking proline residues. An examination of Table 1 reveals that for all peptide ion charge states there is an increase in average CIR values and, therefore, an increase in the extent of selective and enhanced cleavage of Asp-Xaa bonds, as the number of arginine, lysine, and histidine residues within the peptide increases. Similar to previous studies,24 we have classified peptides as “nonmobile” if the number of available protons is less than or equal to the number of arginine residues (high average CIR values). However, in contrast to previous studies,46 where the effect of other basic amino acids on proton mobility has not been specifically classified, here we have classified peptides having a number of protons greater than the number of arginine residues, but less than or equal to the total number of basic residues (combined number of arginine, lysine, and histidine) as “partially mobile” (intermediate average CIR values) and those peptides with a number of protons greater than the total number of basic residues as “mobile” (low average CIR values). Fragmentation of Asp-Xaa bonds is enhanced for all singly protonated tryptic peptides. Lysine-terminated peptides (designated as partially mobile) have average CIR values greater than 2.0, whereas arginine-terminated peptides (designated as nonmobile) have average CIR values greater than 4.0 (Table 1). In the doubly protonated data set, Asp-Xaa bond cleavage is reduced for peptides classified as mobile (average CIR values less than 1.0). Those peptides classified as partially mobile (i.e., containing an internal lysine or histidine residue) generally have average CIR values between 1.0 and 2.0, while peptides classified as nonmobile (i.e., those containing at least two arginine residues) display average CIR values greater than 4.0. The CIR analysis was also carried out for glutamic acid-containing peptides from the doubly charged data set, as it has been previously reported that this residue may also lead to enhanced C-terminal cleavage under nonmobile conditions.40 These results indicated average CIR values of 2.0 for peptides classified as nonmobile, suggesting that the Glu-Xaa bond is ∼50% less labile than that of aspartic acid (data not shown). This may be substantiated mechanistically on the basis that the structure of the eight-membered ring intermediate for Glu-Xaa bond cleavage would be expected to be less favored compared to that of the seven-membered ring intermediate for Asp-Xaa bond cleavage.59,67 The average CIR values for AspXaa bond cleavage of triply protonated peptides classified as mobile (average CIR less than 1.0), partially mobile (average CIR values between 1.0 and 2.0), and nonmobile (average CIR values ∼3.0) also follow the same general trends as seen for the singly and doubly protonated data sets. In all, nine classifications based on charge state and proton mobility have been defined (as shown in Table 1). Although 108 quadruply protonated peptide ions are contained in the data set (1.38% of the total), these were insufficient to allow their clasAnalytical Chemistry, Vol. 75, No. 22, November 15, 2003

6255

Table 1. Average Cleavage Intensity Ratio (CIR) Values for Asp-Xaa Bond Cleavage in the No Cutoff Data Set mobile peptidesa charge state of precursor ion

basic amino acid composnd

non-Pro containing peptides

partially mobile peptidesb all peptides

singly charged peptides (1+)

basic amino acid composn K1 H1K1 K2

non-Pro containing peptides 2.18 1.49 1.73

total doubly charged peptides (2+)

total

all peptides 2.37 1.53 1.92

(207) K1 R1

0.73 0.91

K1 R1 H1K1 K2 H1R1 K1R1 R2

1.00 0.90 0.73 0.97 1.05 0.90 0.77

total triply charged peptides (3+)

(168) (19) (20)

(157)e (127)

0.81 1.04

(284) (2) (4) (18) (18) (12) (11) (3)

(68)

(358) (316)

H1K1 H1R1 K2 K1R1 H2K1 H2R1 H1K2 K3 H1K1R1 K2R1

1.34 1.49 1.44 1.56 1.18 1.66 1.90 1.53 1.80 1.97

H2K1 H1K2 K3 H2R1 H1K1R1 K 2R 1 H1R2 K1R2 H3K1 H1K3 H2K1R1 H1K2R1 H1K1R2

0.97 1.15 1.37 1.57 1.48 1.19 1.69 1.52 1.66 1.11 0.93 1.80 2.11

(774) 0.70 0.73 0.88 0.94 1.00 0.91 1.31

(14) (8) (54) (36) (51) (37) (24)

(100) (105) (101) (154) (10) (17) (43) (20) (50) (33)

(224)

(159)

(238) (26) (29)

basic amino acid composn R1 H1R1 K1R1

non-Pro containing peptides 5.00 4.35 4.00

(293) 1.49 1.63 1.66 2.06 1.16 1.94 1.85 1.78 2.09 2.49

(633) (7) (24) (14) (9) (32) (21) (6) (7) (4) (10) (10) (10) (5)

nonmobile peptidesc

(195) (199) (276) (301) (22) (38) (78) (48) (90) (69) (29) (62) (29) (46) (79) (57) (18) (21) (11) (28) (19) (23) (10) (432)

5.10 4.41 4.07

(111) R2 K1R2 H1R2 R3

4.25 4.58 4.81 3.77

R3 H1R3

3.30

(1316) 1.47 1.27 1.32 1.47 1.63 1.61 2.54 2.51 1.33 1.23 1.60 1.94 2.71

(84) (10) (17)

all peptides

(46) (13) (13) (6)

(159) 4.96 5.46 4.88 4.08

(78) (5)

(5)

(126) (12) (21) (92) (21) (23) (7)

(143) 3.63 2.62

(12) (2)

(14)

a Mobile peptides: number of basic residues (i.e., combined Arg, Lys, and His residues) less than the total number of protons. b Partially mobile peptides: all peptides not classified as mobile or nonmobile. c Nonmobile peptides: number of Arg residues is greater than or equal to the total number of protons. d Basic amino acid residue composition for each category. e Number of peptides in each category are shown in parentheses.

sification based on proton mobility and basic amino acid composition as a function of their average Asp-Xaa bond cleavage CIR values. A number of anecdotal points may be made about the data shown in Table 1. The average CIR values for peptides containing an internal histidine residue are lower than those containing an internal lysine residue, indicative of its lower proton affinity. Generally, average CIR values for Asp-Xaa bond cleavage for peptides containing proline are higher compared to non-prolinecontaining peptides, which is expected given the enhanced lability of the Asp-Pro peptide bond (289 instances in the data set). The generally lower average CIR values in the triply protonated peptide ion data set compared to that of the singly and doubly protonated peptide ions are potentially due to the increased number of protonated internal basic sites in these peptides, resulting in competition due to charge-localized fragmentations occurring adjacent to these residues.68 The lower Asp-Xaa bond cleavage CIR values for the triply charged data set are even more interesting when we consider that the percent occurrence of aspartic acid and glutamic acid in this data set (7.44% for aspartic acid and 9.08% for glutamic acid) are significantly higher than their percent occurrence in the protein sequence databases. For (68) Newton, K. A.; Chrisman, P. A.; Reid, G. E.; Wells, J. M.; McLuckey, S. A. Int. J. Mass Spectrom. 2001, 212, 359-76.

6256 Analytical Chemistry, Vol. 75, No. 22, November 15, 2003

example, the Swiss-Prot database contains 5.28% aspartic acid and 6.54% glutamic acid. Aside from an expected increase in lysine and arginine percent occurrences in these peptides, no other amino acid exhibited a significant change in their occurrences. Using the percent occurrence of leucine as a reference, the triply charged data set examined here contains 10.04% leucine, whereas Swiss-Prot contains 9.56% leucine. Of further note is the observation that acidic residues are more often located within two residues of a missed tryptic cleavage site, suggesting that the presence of an acidic residue near the cleavage site affects the proteolytic specificity of trypsin (data not shown). This has previously been noted69 and included as a refinement to peptide mass fingerprint searches (e.g., FindPept, a tool to identify unmatched masses from PMF searches70); however, this observation has not been incorporated into current MS/MS search algorithms. Quantitation and Comparison of Gas-Phase Peptide Ion Fragmentation Cleavage Effects. To quantitatively determine whether particular residues show an N- or C-terminal cleavage preference, plots were generated for each of the nine classifications of peptides, defined by charge and proton mobility, as shown (69) Thiede, B.; Lamer, S.; Mattow, J.; Siejak, F.; Dimmler, C.; Rudel, T.; Jungblut, P. R. Rapid Commun. Mass Spectrom. 2000, 14, 496-502. (70) Gattiker, A.; Bienvenut, W. V.; Bairoch, A.; Gasteiger, E. Proteomics 2002, 2, 1435-44.

Figure 2. Average CIR values of N- and C-terminal cleavages for the amino acids for the doubly protonated set of peptides. Log2CIR values are shown for the no cutoff data set for (A) peptides classified as mobile, (B) peptides classified as partially mobile, and (C) peptides classified as nonmobile. The specified cleavage site as well as number of peptides containing the specified cleavage is indicated along the X-axis (Xaa-Pro bond cleavage represented by nP; Asp-Xaa bond cleavage represented by cD). Amino acid residues are indicated by their singleletter code with “a” representing methionine sulfoxide, and “d” representing pyridylethylcysteine.

in Table 1 for the no cutoff and cutoff data sets using both average CIR values and the linear model (shown in Figures 2 and 3, respectively, for the no cutoff doubly protonated peptide data set). The results of both calculations were plotted as their log2 values

to allow direct comparison between the two methods. For the data from the linear model, only those amino acids that showed statistically significant effects, compared to the average, are plotted. Analytical Chemistry, Vol. 75, No. 22, November 15, 2003

6257

Figure 3. Statistical analysis based on the linear model (see eq 2) to determine preferential N- and C-terminal cleavage sites for the amino acids in the doubly protonated peptides from the no cutoff data set. Significant log2 effects computed using the statistical model are shown for (A) peptides classified as mobile, (B) peptides classified as partially mobile, and (C) peptides classified as nonmobile. The standard error computed from the model is indicated for all significant effects.

Examination of the data in Figures 2 and 3 indicates the different trends for N- and C-terminal cleavage preference depending on proton mobility classification (i.e., mobile, partially mobile, and nonmobile), clearly validating this classification scheme for data mining studies of MS/MS data. Regardless of the method used to evaluate N- and C-terminal cleavage preferences (CIR calculations or linear model), both plots show similar trends. Additionally, as the peptide MS/MS spectra contained within the singly and triply charged data sets were classified according to the same proton mobility criteria as the doubly charged data, the N- and C-terminal cleavage preference trends for these data were also found to be generally consistent with those shown in Figures 2 and 3 (data not shown), suggesting that proton mobility and not charge state has the largest effect on N- and C-terminal cleavage preferences. The use of the linear model (eq 2) has several advantages compared to the simple CIR calculations for quantitative determination of cleavage preferences. First, for a given cleavage site, the intensity is apportioned between the two amino acids that appear adjacent to the cleavage site, rather than crediting both equally, as in the CIR calculations. This allows the individual contributions of each amino acid to the cleavage to be determined. Second, the location of the cleavage site within the peptide sequence has been incorporated into the linear model as a positional variable. Third, the linear model provides information about the statistical significance of each variable. For each of the nine strata defined by charge and mobility, the overall regression was significant, which means that at least one of the fitted terms had a significant coefficient at the predefined significance level (1%). In practice, several terms were selected each time, including the log2(length) term, the positional variables, and several terms 6258 Analytical Chemistry, Vol. 75, No. 22, November 15, 2003

describing an increase/decrease due to the amino acid residue on the N- or C-terminal side of the cleavage site. The positional variables were always found to be significant at the 1% level, and their coefficients indicate that product ions were generally more abundant toward the middle rather than at the extremes of a peptide ion (data not shown). While this may be expected in the no cutoff data set, these trends were also apparent in the cutoff data set, suggesting that there is an inherent “chemical” bias (e.g., charge solvation) to fragmentations occurring toward the middle of peptide ions, in addition to any instrument-related biases to product ion detection efficiencies. Tabb et al.48 have recently shown that the y-type and b-type product ions show a distribution centered around ∼60% and ∼45% of the precursor mass, respectively, which is in agreement with our findings. Further examination of the data (Figures 2 and 3) based on the relative proton mobility scale allows the various N- and C-terminal cleavage effects to be classified as either N-terminal positive or negative effects (i.e., enhanced or suppressed cleavage occurring at the N-terminal side of the amino acid of interest, respectively) or C-terminal positive or negative effects (i.e., enhanced or suppressed cleavage occurring at the C-terminal side of the amino acid of interest, respectively). For simplicity, we have defined these preferential cleavage effects, depending on their overall magnitudes, as primary, secondary, or tertiary cleavage effects and discuss these below in terms of the trends observed within the individual proton mobility categories. (i) Mobile Proton Category. In the mobile proton category of the doubly protonated data set, using both CIR values and the linear model, proline was observed as having a primary N-terminal positive cleavage effect (∼4 times greater than the average, Figures 2A and 3A). Secondary N-terminal positive cleavage effects

were consistently observed for glycine and tryptophan. Secondary N-terminal positive cleavage effects were also observed for arginine and lysine using the linear model calculations (Figure 3A). In contrast, these residues indicated a primary N-terminal negative cleavage effect based on the CIR calculations (Figure 2A). However, as the CIR calculation does not contain a positional variable (unlike the linear model), it cannot accurately determine cleavage effects for amino acids that appear near the ends of a peptide (arginine and lysine residues are predominantly present at the C-terminal end of peptides due the specificity of trypsin during proteolysis). A tertiary N-terminal positive cleavage effect for serine and tyrosine was also observed (Figures 2A and 3A). An “N-bias” for proline, glycine, and serine was recently observed by Tabb et al.48 Enhanced Xaa-Gly cleavage cannot be explained on the basis of gas-phase basicity, as it has the lowest gas-phase basicity of all amino acids. However, enhanced cleavage could be due to the lack of a bulky side chain (glycine can assume conformations normally forbidden to other amino acids71), allowing facile proton transfer to the peptide backbone to facilitate cleavage.49 The side chain of serine is amenable to rotation which, coupled with the ease with which the γ-hydroxyl group can be involved in short-range hydrogen bonding with a neighboring group,71 may also promote facile proton transfer to the adjacent peptide backbone to facilitate cleavage. No primary or secondary N-terminal negative or primary C-terminal positive cleavage effects were observed for the mobile proton category. However, secondary C-terminal positive cleavage effects were seen for isoleucine and valine, as well as leucine, albeit to a lesser extent. A rationale to explain the observation of enhanced Ile-Pro and Val-Pro bond cleavage, as a result of their restricted rotational conformations71 “producing a “reactive” conformation that directly leads to product ions”, was recently proposed by Breci et al.47 The results presented here demonstrate that enhanced Cterminal cleavage for these aliphatic amino acids occurs regardless of the adjacent amino acid residue. A primary C-terminal negative cleavage effect was observed for Pro-Xaa bond cleavage, whereas secondary C-terminal negative cleavage effects for Gly-Xaa and, to a less extent, Ser-Xaa bond cleavage was observed. These C-terminal negative effects were also recently observed by Breci et al.47 All other residues were found to have only a limited effect on amide bond cleavage. Outliers, such as Arg-Xaa and His-Xaa bond cleavages, as well as Xaa-His bond cleavage, could not be accurately determined because of the low number of occurrences of these cleavages in this data set (8 for Arg and 4 for His). (ii) Partially Mobile Proton Category. For the partially mobile category of peptide MS/MS spectra (Figures 2B and 3B), proline retains a primary N-terminal positive cleavage effect. Tertiary N-terminal positive cleavage effects were observed for glycine and tryptophan, similar to that observed for the mobile proton category. In addition, lysine and histidine also appear to have tertiary N-terminal positive cleavage effects. Thus, while preferential His-Xaa bond cleavage has previously been shown to occur,24,48 and a mechanism has been proposed to explain this localized cleavage,50 the data presented here indicate that pref(71) Chakrabarti, P.; Pal, D. Prog. Biophys. Mol. Biol. 2001, 76, 1-102.

erential Xaa-His bond cleavage may also occur. No primary N-terminal negative or primary C-terminal positive cleavage effects were observed from this data set. However, in contrast to that seen for the mobile proton category, aspartic acid was observed to have a secondary C-terminal positive cleavage effect (approximately twice the average). In addition, Ile-Xaa, Val-Xaa, His-Xaa, and Lys-Xaa bond cleavages were also observed to have secondary C-terminal positive cleavage effects. Finally, Pro-Xaa, Gly-Xaa, and Ser-Xaa bond cleavages were found to exhibit C-terminal negative cleavage effects similar to those observed in the mobile proton category (i.e., primary negative effect for ProXaa and secondary negative effect for Ser-Xaa). An exception to this trend was observed for the partially mobile and nonmobile peptide ion categories, when proline was in the second position of the peptide sequence. In these cases, enhanced Pro-Xaa bond cleavage was often observed (i.e., a site-specific amino acid location dependence). Further analysis or conclusions regarding this cleavage will require additional data as many of the b2 ions resulting from this cleavage may not be observed due to the LMCO of the ion trap instrument, thereby complicating comprehensive statistical analysis. The fact that Pro-Xaa bond cleavage does not often occur has previously been used as evidence that bn ions are protonated oxazolones,72 since cleavage at the Cterminal side of proline would lead to a thermodynamically unfavored bicyclic oxazolone ion.73 However, the observation of enhanced Pro-Xaa bond cleavage when in the second position suggests that an energetically favorable alternative fragmentation mechanism may be operating, such as the formation of a diketopiperazine product ion due to a nucleophilic attack by the N-terminal amino group.73 Diketopiperazine product ions have been recently demonstrated for b2 ions formed from the peptides His-Gly, Gly-His, His-Gly-Gly, and Gly-His-Gly.74 (iii) Nonmobile Proton Category. For the nonmobile set of peptides (Figures 2C and 3C), Asp-Xaa bond cleavage dominates all other cleavages (i.e., primary C-terminal positive cleavage effect, ∼11 times greater than the average). Secondary C-terminal positive effects were seen for glutamic acid, as well as arginine and lysine, albeit to a lesser extent. Proline and glycine again displayed C-terminal negative cleavage effects. No primary Nterminal positive cleavage effect was observed, as proline was relegated to having only a secondary N-terminal positive cleavage effect, along with lysine and histidine. No N-terminal negative cleavage effects were observed for this proton mobility category. The N- and C-terminal cleavage (both positive and negative) effects described here are generally consistent, albeit more fully expanded with respect to proton mobility classification, with those observed by Tabb et al., who also recently introduced the notion of an “N- and/or C-bias” to indicate a particular amino acid residue preference for cleavage.48 The similar findings are encouraging, considering that different protocols and methods were used for analyzing the data. However, here we are able to make quantitative inferences regarding the differences in cleavage preferences between individual residues (for example, comparing the abun(72) Schlosser, A.; Lehmann, W. D. J. Mass Spectrom. 2000, 35, 1382-90. (73) Eckart, K.; Holthausen, M. C.; Brauninger, T.; Koch, W.; Spiess, J. Proc. 43rd ASMS Conf., Atlanta, 1995, p 1046. (74) Farrugia, J. M.; Taverner, T.; O’Hair, R. A. Int. J. Mass Spectrom. 2001, 209, 99-112.

Analytical Chemistry, Vol. 75, No. 22, November 15, 2003

6259

dances of Ile-Xaa versus Xaa-Pro bond cleavages from the mobile proton data set), rather than simply analyzing the differential between cleavage N- or C-terminal to a particular residue. Influence of Pairwise Interactions on Cleavage. Based on the stratification of the data set (i.e., by charge state and proton mobility), CIR values were calculated for all amino acid pairs to establish the effect, if any, of specific pairwise interactions on amide bond cleavage. For illustrative purposes, box plots for XaaPro (primary N-terminal positive cleavage effect), Asp-Xaa (primary C-terminal positive cleavage effect), and Pro-Xaa (primary C-terminal negative cleavage effect) bond cleavages are shown in Figures 4-6, respectively, for doubly protonated peptides classified as mobile, partially mobile, and nonmobile. Figure 4A (mobile) shows that Ile-Pro, followed by Val-Pro and Leu-Pro pairwise cleavages, on average, give rise to the most abundant product ions for all Xaa-Pro cleavages. Figure 4B (partially mobile) shows that Asp-Pro cleavage competes with Ile-Pro cleavage and that His-Pro cleavage becomes more prominent. Figure 4C (nonmobile) shows that Asp-Pro cleavage is the dominant fragmentation, followed by Glu-Pro as a poor second. In all instances Pro-Pro cleavage is the least abundant. The global average CIR values (denoted by a dashed line across each box plot) for Xaa-Pro cleavage decrease with decreasing proton mobility (3.8 for mobile, 3.5 for partially mobile, and 2.5 for the nonmobile data set). Panels A-C of Figure 5 (mobile, partially mobile and nonmobile, respectively) show that Asp-Pro cleavage is the dominant fragmentation for all Asp-Xaa cleavages across all proton mobility sets. In agreement with the individual N- and C-terminal preferential cleavage studies, those amino acids determined to have secondary and tertiary N-terminal positive cleavage effects (glycine and tryptophan in the mobile and partially mobile categories and lysine and histidine in the nonmobile category) are also observed in these pairwise interaction data as having positive effects on Asp-Xaa cleavage. For Asp-Xaa cleavage, the global average CIR values increase with decreasing proton mobility (1.0 for mobile, 2.0 for partially mobile, and 5.0 for nonmobile). Panels A-C of Figure 6 (mobile, partially mobile, and nonmobile, respectively) show that the global average CIR for ProXaa cleavage is constant (∼0.2) across all proton mobility data sets, indicative of its primary C-terminal negative cleavage effect. However, the observation of Pro-Pro cleavage as the most abundant for Pro-Xaa cleavages indicates the competition between N-terminal positive and C-terminal negative cleavage effects on pairwise cleavage ion abundances. An important observation from Figures 4-6 is the variability in CIR values obtained across each of the different mobility sets. Note that if the doubly protonated set of peptides were grouped together (i.e., no classification based on proton mobility), the variability in CIR values would be even larger due to competing charge-directed and charge-remote fragmentation channels. The global pairwise interactions for the doubly protonated set of peptides classified as partially mobile are summarized by way of a 3-D plot (Figure 7), showing average CIR values for all amino acid combinations. It can be seen that Xaa-Pro bond cleavage clearly dominates all other cleavages. One of the interesting secondary pairwise effects is cleavage between adjacent Lys-His residues, which gives rise to enhanced product ions (greater than 6260 Analytical Chemistry, Vol. 75, No. 22, November 15, 2003

Figure 4. Boxplots showing CIR values for Xaa-Pro cleavages (pairwise effects) for doubly protonated peptides from the no cutoff data set for (A) peptides classified as mobile, (B) peptides classified as partially mobile, and (C) peptides classified as nonmobile. The order of the residues (identity of Xaa as well as number of peptides) along the x-axis is determined by the calculated average CIR value for Xaa-Pro cleavage (highest to lowest). The global average CIR value for Xaa-Pro cleavage is indicated by the dashed line (- - -) (e.g., CIR of ∼3.8 for mobile, ∼3.5 for partially mobile, and ∼2.5 for nonmobile) while the global average CIR of 1.0 for all cleavages (Xaa-Yaa) is indicated by - ‚ -. Boxplots were generated using R with default parameters. The box represents the scores between 25% and 75%, with median at 50%, outliers are points that are more than 1.5 times the interquartile range (25%-75%) from the box and denoted by dots (O), and the whiskers represent the most extreme point that is not an outlier.

the sum of their individual contributions). Recent fragmentation studies of small peptides containing lysine and histidine have demonstrated the role of the basic side chains in “sequestering” charge and directing fragmentation through the amide bond C-terminal to these residues. This has been proposed to involve initial charge solvation of a proton localized at the -amino or imidazole nitrogens of lysine and histidine residues, respectively, by the amide carbonyl oxygen of the amide bond, followed by

Figure 5. Boxplots showing CIR values for Asp-Xaa cleavages (pairwise effects) for doubly protonated peptides from the no cutoff data set for (A) peptides classified as mobile, (B) peptides classified as partially mobile, and (C) peptides classified as nonmobile. The order of the residues (identity of Xaa as well as number of peptides) along the x-axis is determined by the calculated average CIR value for Asp-Xaa cleavage (highest to lowest). The global average CIR value for Asp-Xaa cleavage is indicated by the dashed line (- - -) (e.g., CIR of ∼1.0 for mobile, ∼2.0 for partially mobile, and ∼5.0 for nonmobile) while the global average CIR of 1.0 for all cleavages (Xaa-Yaa) is indicated by - ‚ -.

side-chain attack leading to cleavage of the amide bond C-terminal to the amino acid residue.24,46,50,75 A preference for this chargelocalized Lys-Xaa bond cleavage, between adjacent Lys-Pro and between adjacent Lys-His residues, has also been recently reported for the gas-phase dissociation of multiply protonated protein ions under certain conditions.68,76 Another observed pairwise effect is the observation of enhanced cleavage (more than the sum of their individual abundances) for Asn-Gly, which is interesting given that this cleavage is known to be labile in solution.77 (75) Yalcin, T.; Harrison, A. G. J. Mass Spectrom. 1996, 31, 1237-43. (76) Engel, B. J.; Pan, P.; Reid, G. E.; Wells, J. M.; McLuckey, S. A. Int. J. Mass Spectrom. 2002, 219, 171-87.

Figure 6. Boxplots showing cleavage intensity ratio (CIR) values for Pro-Xaa cleavages (pairwise effects) for doubly protonated peptides from the no cutoff data set for (A) peptides classified as mobile, (B) peptides classified as partially mobile, and (C) peptides classified as nonmobile. The order of the residues (identity of Xaa as well as number of peptides) along the x-axis is determined by the calculated average CIR value for Pro-Xaa cleavage (highest to lowest). The global average CIR value for a Pro-Xaa cleavage is indicated by the dashed line (- - -) (e.g., CIR of ∼0.25 for all mobility sets) while the global average CIR of 1.0 for all cleavages (XaaYaa) is indicated by - ‚ -.

Determination and Ranking of the Top Pairwise Cleavage Sites Based on the Relative Proton Mobility Scale. A problem encountered when attempting to predict product ion abundances is the variability of ion abundance due to competing fragmentation channels or the multiple occurrence of specific residues within a given peptide sequence. This becomes more acute as the peptide ions become larger and of higher charge states. To determine how this variability might affect future predictive models based on the “relative proton mobility” scale proposed here, the three most abundant cleavage sites (using the CIR values for individual (77) Radkiewicz, J. L.; Zipse, H.; Clarke, S.; Houk, K. N. J. Am. Chem. Soc. 2001, 123, 3499-506.

Analytical Chemistry, Vol. 75, No. 22, November 15, 2003

6261

Figure 7. Three-dimensional plot summarizing pairwise amino acid cleavage abundances for all residue combinations based on calculated average CIR values for doubly protonated peptides classified as partially mobile from the no cutoff data set. The order of the N- and C-terminal residue along each axis is determined by the specific residues individual contribution to cleavage (i.e., residues giving rise to enhanced cleavages are positioned toward the back of the plot such that synergistic pairwise interactions are highlighted).

amide bond cleavages) for each peptide spectrum were determined. Table 2 shows the overall frequencies of residue-specific amide bond cleavage (Xaa-Yaa bond cleavages limited to the top 25 for illustrative purposes) for the mobile, partially mobile, and nonmobile sets of the no cutoff MS/MS data for all doubly protonated peptides. The amino acid residue pairs giving rise to the most abundant cleavages are consistent with our earlier observations detailing the individual amino acid contributions to cleavage (Figures 2 and 3). For peptides containing proline, XaaPro bond cleavages are consistently the most abundant. The amino acid pairs Val-Xaa, Ile-Xaa, and Leu-Xaa give rise to the most abundant bond cleavages for peptides classified as mobile. The amino acid pairs Asp-Xaa, His-Xaa, Lys-Xaa and to a lesser extent Val-Xaa, Ile-Xaa, and Leu-Xaa give rise to abundant bond cleavages for peptides classified as partially mobile. Asp-Xaa, Glu-Xaa, and Arg-Xaa (Xaa not Asp or Glu) bond cleavages are abundant for peptides classified as nonmobile. In the absence of Pro (Xaa-Yaa bond cleavages), Yaa is Gly, Ser, Tyr, and Phe or Trp and Xaa ) Val or Ile for peptides classified as mobile, Xaa ) His, Asp, Lys, or Trp for peptides classified as partially mobile, and Xaa ) Asp, Glu, Arg, or Lys for peptides classified as nonmobile. Similar observations were made for the singly and triply protonated peptide data sets based on the relative proton mobility scale (data not shown). 6262

Analytical Chemistry, Vol. 75, No. 22, November 15, 2003

These findings indicate that cleavage abundance is predictable for amino acid pairs giving rise to primary N- and C-terminal positive cleavage effects. Similarly, we can tabulate those amino acid pairs giving rise to primary N- and C-terminal negative cleavage effects (data not shown). It is important that a predictive fragmentation model takes into account both positive and negative effects so as to maximize the level of discrimination and hence differentiate between closely scoring (or homologous) peptide sequences. A new statistical scoring function, which incorporates peptide fragment ion intensity data, has recently been described.78 While this scoring function claims to perform better than the current SEQUEST algorithm, it does not take into account any of the global factors influencing fragmentation, determined in this or previous studies. Evaluation of MS/MS Search Algorithm Scores Based on the Relative Proton Mobility Scale. Each of the 5500 unique manually validated peptide MS/MS spectra positively identified using SEQUEST were resubmitted to the Mascot search algorithm (using the same search parameters as used for the SEQUEST search), and then the distribution of top scoring search results obtained from both algorithms were plotted with respect to their proton mobility classification within each charge state (Figures 8 (78) Havilio, M.; Haddad, Y.; Smilansky, Z. Anal. Chem. 2003, 75, 435-44.

Table 2. Summary of the Top 25 Pairwise Cleavage Sites in the Doubly Charged No Cutoff Data Set mobile peptidesa non-Pro containing peptides dVE

VS IA VA VL LG LS VN IS IG VT VV IT VD LY YG IV LT II VG IL VY FF IN FG

63.3 71.9 64.1 51.9 51.0 46.4 48.0 60.7 48.9 45.7 60.0 57.1 57.1 47.5 66.7 55.2 58.3 46.2 60.0 48.4 44.4 75.0 75.0 43.3 46.2

31/49 23/32 25/39 27/52 26/51 26/56 24/50 17/28 22/45 21/46 15/25 16/28 16/28 19/40 12/18 16/29 14/24 18/39 12/20 15/31 16/36 6/8 6/8 13/30 12/26

partially mobile peptidesb

all peptides LP VP IP EP TP SP DP AP VS NP QP LG VE LS IG VA IA VD IS VG IT YP VN YG FP

79.5 82.9 83.6 78.8 71.4 68.8 68.4 60.0 51.4 56.1 71.8 43.3 45.4 39.3 51.9 42.3 49.4 41.2 45.2 43.2 44.6 63.2 49.2 47.3 52.6

93/117 63/76 56/67 67/85 60/84 44/64 39/57 48/80 54/105 46/82 28/39 52/120 49/108 53/135 42/81 47/111 40/81 42/102 38/84 38/88 37/83 24/38 29/59 26/55 20/38

non-Pro containing peptides DL DA DG VG VT DF DS DY NG dH IG KF IS DW VS IH HG DH HA QG LH KG KM Ka VF

43.8 55.8 61.4 51.4 49.3 50.8 49.2 58.1 65.5 100.0 46.8 57.5 51.0 78.6 43.8 51.1 54.1 51.2 44.0 51.4 45.7 53.1 100.0 100.0 50.0

70/160 48/86 35/57 37/72 36/73 31/61 32/65 25/43 19/29 5/5 29/62 23/40 26/51 11/14 28/64 23/45 20/37 21/41 22/50 19/37 21/46 17/32 4/4 4/4 18/36

nonmobile peptidesc

all peptides DP HP LP VP IP KP DL EP DA TP MP DG AP VG DS QP DF NP IG YP DN DH IH DY DW

89.3 83.5 73.4 77.6 83.1 70.0 44.6 66.3 52.7 72.2 100.0 55.6 57.9 50.0 46.3 67.9 48.3 55.0 46.6 67.5 50.6 49.3 47.3 50.8 70.4

92/103 71/85 102/139 76/98 54/65 91/130 123/276 65/98 89/169 39/54 8/8 60/108 55/95 65/130 62/134 36/53 56/116 44/80 48/103 27/40 41/81 36/73 35/74 32/63 19/27

non-Pro containing peptides DR DI DF DS DE EG DL DV DD DG DK EK DA DQ DY DT EA RH LH RD EI EQ EN ED KR

100.0 100.0 100.0 100.0 92.3 92.3 88.2 100.0 88.9 100.0 100.0 100.0 78.6 81.8 85.7 83.3 50.0 100.0 100.0 50.0 47.1 46.7 46.7 46.7 100.0

19/19 13/13 10/10 9/9 12/13 12/13 15/17 7/7 8/9 5/5 5/5 5/5 11/14 9/11 6/7 5/6 12/24 3/3 3/3 8/16 8/17 7/15 7/15 7/15 2/2

all peptides DI DF DV DL DR DS DG DE DA DP RP DK EG DN DY EP DQ EK DD FP LP DT EY EI DH

100.0 100.0 100.0 92.3 87.2 93.8 100.0 86.4 85.0 90.9 68.0 100.0 84.2 88.9 84.6 80.0 80.0 100.0 81.8 85.7 75.0 77.8 64.7 50.0 100.0

23/23 15/15 13/13 24/26 34/39 15/16 9/9 19/22 17/20 10/11 34/50 7/7 16/19 8/9 11/13 12/15 12/15 5/5 9/11 6/7 9/12 7/9 11/17 11/22 3/3

a Mobile peptides: number of basic residues (i.e., combined Arg, Lys, and His residues) less than the total number of protons. b Partially mobile peptides: all peptides not classified as mobile or nonmobile. c Nonmobile peptides: number of Arg residues is greater than or equal to the total number of protons. d Thirty-one of the 49 observed Val-Glu amide bond cleavages in the mobile proton category are in the top 3 calculated CIR values of their respective peptide spectra.

and 9, respectively). Both the SEQUEST Sp and Xcorr values (shown in Figure 8A and B, respectively) and normalized Mascot scores (determined by subtraction of the homology score from the ion score, Figure 9) were found to vary considerably depending on their charge state and proton mobility classification. It was also observed that many of the peptide MS/MS spectra examined in this study had search scores that are below currently acceptable cutoff filters or thresholds63,64,79 and therefore would not have resulted in positive protein identifications by automated data validation methods alone. In particular, nonmobile singly protonated peptide spectra were most affected, with over half of the spectra falling below the SEQUEST Sp and Xcorr cutoff scores of 500 and 1.964 and a normalized Mascot cutoff score of 0, respectively. This observation is particularly relevant to MS-based tryptic peptide sequencing approaches employing MALDI as the ionization method for subsequent low-energy CID MS/MS, as these approaches typically generate ions in the singly protonated, nonmobile proton category,80 i.e., the classification found in this study to return the lowest search scores. SEQUEST and Mascot search results for the doubly protonated mobile and partially mobile peptide classifications typically returned scores greater than the currently accepted cutoff values (SEQUEST Sp >500 and Xcorr >2.2, and normalized Mascot score >0). However, a significant proportion of peptides in the doubly protonated nonmobile category still fall below these cutoff values. Triply (79) Moore, R. E.; Young, M. K.; Lee, T. D. J. Am. Soc. Mass Spectrom. 2002, 13, 378-86. (80) Krause, E.; Wenschuh, H.; Jungbult, P. R. Anal. Chem. 2000, 71, 41604165.

protonated peptide MS/MS spectra search results showed trends similar to that of the doubly protonated peptide spectra, albeit with slightly lower overall scores. Again, those peptide spectra classified as nonmobile consistently returned lower scores than peptides in the mobile or partially mobile categories. These findings are not surprising, given that nonmobile spectra are often characterized by a few dominant product ions and exhibit fragmentation patterns that significantly deviate from the ideal model (i.e., a continuous series of b- and y-type ions representative of the amino acid sequence). Overall (Figures 8 and 9), it can be seen that the Mascot scoring function performs more poorly (i.e., less significant scores) compared to the cross-correlation function (Xcorr) performed by SEQUEST. This could be due to the fact that low-abundance peaks in the spectrum are artificially enhanced due to the normalization routine employed by the SEQUEST cross-correlation function.11 CONCLUSIONS A limitation of current proteomic approaches for protein identification using peptide ion MS/MS data is the lack of fully automated data analysis systems for analyzing proteins in complex mixtures. To overcome this bottleneck, more rigorous search and scoring algorithms need to be developed, which consider those factors that influence the gas-phase fragmentation of protonated peptides, and the resultant product ion abundances. Our analysis of 5500 unique singly, doubly, and triply protonated peptides indicates that proton mobility is the most important factor influencing fragmentation (and subsequent database search result scores). Other factors, such as local Analytical Chemistry, Vol. 75, No. 22, November 15, 2003

6263

Figure 9. Distribution of normalized Mascot scores (determined by subtracting the homology score from the ion score) for all unique, manually validated, peptide identifications for all charge states based on the relative proton mobility scale classification. Normalized scores falling below zero, indicated by the dashed line (- - -), would be considered to be random matches and would not be identified without manual interrogation and intervention by an experienced analyst. M denotes peptides classified as mobile, PM denotes peptides classified as partially mobile, and NM denotes peptides classified as nonmobile.

Figure 8. Distribution of SEQUEST Sp and Xcorr scores for all unique, manually validated, peptide identifications for all charge states based on the relative proton mobility scale classification: (A) Sp scores (preliminary scores); (B) XCorr scores (final scores that determine a peptides ranking). Currently accepted thresholds or cutoff filters are indicated by the dashed line (- - -). Identifications falling below accepted cutoffs would not be identified without manual interrogation and intervention by an experienced analyst. M denotes peptides classified as mobile, PM denotes peptides classified as partially mobile, and NM denotes peptides classified as nonmobile.

secondary structure and conformation (side-chain/main-chain electrostatic interactions) are also clearly important, because of the consistent preference that certain residues show for N- or for C-terminal cleavage. The use of the linear model that we have developed also demonstrates that the position of a residue within a peptide is an important factor, since product ions are generally more abundant toward the middle of a peptide. Additionally, we have shown that currently acceptable MS/ MS search algorithm cutoff filters or score thresholds perform

6264 Analytical Chemistry, Vol. 75, No. 22, November 15, 2003

poorly for certain categories of peptide ion spectra (e.g., nonmobile peptide ions) when applied in an automated protein identification environment. Therefore, until MS/MS algorithms account for these global factors influencing fragmentation, it is necessary to reevaluate current thresholds by taking into account the composition and charge state of the peptide ions based on the relative proton mobility scale described here. Models for predicting product ion abundances, which are based on peptide sequence, charge state, and proton mobility classification, are currently being developed to improve scoring functions for automated high-throughput MS/MS search algorithms and de novo peptide sequence analysis. ACKNOWLEDGMENT We thank Tom Taverner for his assistance during the early stages of this work. We also thank Lisa Connolly and David Frecklington for generating much of the mass spectrometry data and Asa Wirapati for generating the 3-D plot using Mathematica V4.0.

Received for review June 6, 2003. Accepted August 27, 2003. AC034616T