Infrared Multiphoton Dissociation for Enhanced de Novo Sequence

(CAD) methods in a quadrupole ion trap. Commercial de novo sequencing software was applied for the inter- pretation of CAD and IRMPD MS/MS spectra ...
0 downloads 0 Views 179KB Size
Anal. Chem. 2006, 78, 6855-6862

Infrared Multiphoton Dissociation for Enhanced de Novo Sequence Interpretation of N-Terminal Sulfonated Peptides in a Quadrupole Ion Trap Jeffrey J. Wilson and Jennifer S. Brodbelt*

Department of Chemistry and Biochemistry, 1 University Station A5300, University of Texas at Austin, Austin, Texas 78712

Infrared multiphoton dissociation (IRMPD) of N-terminal sulfonated peptides improves de novo sequencing capabilities in a quadrupole ion trap mass spectrometer. Not only does IRMPD promote highly efficient dissociation of the N-terminal sulfonated peptides but also the entire series of y ions down to the y1 fragment may be detected due to alleviation of the low-mass cutoff problem associated with conventional collisional activated dissociation (CAD) methods in a quadrupole ion trap. Commercial de novo sequencing software was applied for the interpretation of CAD and IRMPD MS/MS spectra collected for seven unmodified peptides and the corresponding N-terminal sulfonated species. In most cases, the additional information obtained by N-terminal sulfonation in combination with IRMPD provided significant improvements in sequence identification. The software sequence tag results were combined with a commercial database searching algorithm to interpret sequence information of a tryptic digest on r-casein s1. Energy-variable CAD studies confirmed a 30-40% reduction in the critical energies of the N-terminal sulfonated peptides relative to unmodified peptides. This reduction in dissociation energy facilitates IRMPD in a quadrupole ion trap. Mass spectrometry in proteomic applications has become an indispensable tool for the study of biological systems over the past decade.1,2 Its unmatched combination of sensitivity, speed, and applicability to complex mixtures, especially tandem mass spectrometry techniques, which allow specific modification sites to be rapidly pinpointed, has played a crucial role in peptide and protein sequencing research.3-5 A number of tandem MS techniques have been explored for peptide sequencing over the years, including collisional activated dissociation (CAD),6-8 infrared * To whom correspondence should be addressed. E-mail: jbrodbelt@ mail.utexas.edu. (1) Godovac-Zimmermann, J.; Brown, L. R. Mass Spectrom. Rev. 2001, 20, 1-57. (2) Winston, R. L.; Fitzgerald, M. C. Mass Spectrom. Rev. 1997, 16, 165-179. (3) Wysocki, V. H.; Resing, K. A.; Zhang, Q. F.; Cheng, G. L. Methods 2005, 35, 211-222. (4) Regnier, F. E.; Riggs, L.; Zhang, R. J.; Xiong, L.; Liu, P. R.; Chakraborty, A.; Seeley, E.; Sioma, C.; Thompson, R. A. J. Mass Spectrom. 2002, 37, 133-145. (5) Lill, J. Mass Spectrom. Rev. 2003, 22, 182-194. (6) Koy, C.; Mikkat, S.; Raptakis, E.; Sutton, C.; Resch, M.; Tanaka, K.; Glocker, M. O. Proteomics 2003, 3, 851-858. (7) Moyer, S. C.; Cotter, R. J.; Woods, A. S. J. Am. Soc. Mass Spectrom. 2002, 13, 274-283. 10.1021/ac060760d CCC: $33.50 Published on Web 08/23/2006

© 2006 American Chemical Society

multiphoton dissociation (IRMPD),9-11 electron capture dissociation,12 surface-induced dissociation,13 and postsource decay (PSD).14 Two of the most popular, CAD in ion trap systems15-17 and PSD of metastable ions in MALDI-TOF instruments,18,19 facilitate lowenergy fragmentation pathways producing mainly y and b ions via cleavage along the peptide backbone. MS/MS spectra of peptides from enzymatically digested proteins contain a vast amount of information that is often tedious to manually interpret in an efficient manner, especially as the complexity of the samples increases or for new proteomes where the primary sequence is unknown. Two types of software algorithms, both of which rely on scoring systems, have emerged to attack this problem. The first of these algorithms scores experimental tandem mass spectra against a database of theoretical MS/MS spectra for all possible peptides to determine the degree of matching. Sequest20 and Mascot21 are two popular algorithms employing this approach to MS/MS spectral interpretation for automation and high-throughput applications. Although highly successful, these approaches have the disadvantage of relying upon genomic information to create the databases, meaning that some proteins are absent from the database due to incomplete or incorrect genomic information. This is in part due to the occurrence of alternative gene splicing and mutations/ polymorphisms within coding regions of the genomic sequence that can create novel proteins. De novo sequencing algorithms (8) Demelbauer, U. M.; Zehl, M.; Plematl, A.; Allmaier, G.; Rizzi, A. Rapid Commun. Mass Spectrom. 2004, 18, 1575-1582. (9) Payne, A. H.; Glish, G. L. Anal. Chem. 2001, 73, 3542-3548. (10) Little, D. P.; Speir, J. P.; Senko, M. W.; Oconnor, P. B.; Mclafferty, F. W. Anal. Chem. 1994, 66, 2809-2815. (11) Crowe, M. C.; Brodbelt, J. S. J. Am. Soc. Mass Spectrom. 2004, 15, 1581-1592. (12) Zubarev, R. A. Curr. Opin. Biotechnol. 2004, 15, 12-16. (13) Dongre, A. R.; Somogyi, A.; Wysocki, V. H. J. Mass Spectrom. 1996, 31, 339-350. (14) Chaurand, P.; Luetzenkirchen, F.; Spengler, B. J. Am. Soc. Mass Spectrom. 1999, 10, 91-103. (15) Kaiser, R. E., Jr.; Cooks, R. Graham; Syka, John E. P.; Stafford, G. C., Jr. Rapid Commun. Mass Spectrom. 1990, 4, 30-33. (16) Jakubowski, J. A.; Sweedler, J. V. Anal. Chem. 2004, 76, 6541-6547. (17) Li, W.; Boykins, R. A.; Backlund, P. S.; Wang, G. Y.; Chen, H. C. Anal. Chem. 2002, 74, 5701-5710. (18) Spengler, B.; Kirsch, D.; Kaufmann, R. Rapid Commun. Mass Spectrom. 1991, 5, 198-202. (19) Grunert, T.; Pock, K.; Buchacher, A.; Allmaier, G. Rapid Commun. Mass Spectrom. 2003, 17, 1815-1824. (20) Eng, J. K.; Mccormack, A. L.; Yates, J. R. J. Am. Soc. Mass Spectrom. 1994, 5, 976-989. (21) Perkins, D. N.; Pappin, D. J. C.; Creasy, D. M.; Cottrell, J. S. Electrophoresis 1999, 20, 3551-3567.

Analytical Chemistry, Vol. 78, No. 19, October 1, 2006 6855

were created as an alternative spectral interpretation approach to avoid these database inadequacies.22 De novo interpretation relies on direct reconstruction of the peptide sequence from the MS/ MS spectrum. Since these algorithms do not rely on genomic databases, posttranslational modifications and single point mutations in a protein can be assigned with less difficulty relative to the database searching algorithm method. There are several adaptations of this method in practice, the most common being Lutefisk,23-25 Sherenga,26 and PEAKS.27 PEAKS Studio is based on a sophisticated dynamic program algorithm for efficient interpretation and has been shown to outperform other current algorithms.27 For this reason, PEAKS Studio v2.4 software was used for the comparison of tandem mass spectra collected in the present study. One drawback to de novo sequencing is related to the complexity of the peptide MS/MS spectra. Dissociation methods typically cleave the backbone of the peptide in a characteristic manner, and a large portion of the peaks within these spectra yield redundant information by representing the same fragmentation site. No additional information is obtained from these redundant peaks that complicate interpretation both manually and by producing false positives in de novo-based software algorithms. This problem has driven research efforts to improve algorithm performance in a variety of ways28-32 and to alter the MS/MS spectra for peptides via chemical means.33-38 For example, one approach used to reduce MS/MS spectral complexity was reported by Keough et al. based on conversion of peptides to N-terminal sulfonate derivatives.33 This chemical modification alters the sequestration and mobility of protons,33 thus facilitating the nearly exclusive formation of y ions and lowering the critical energy for fragmentation, which enhances the dissociation efficiency, a factor that is especially important in MALDI/PSD applications. N-Terminal sulfonation leads to relatively simple mass spectra by MALDI/PSD in TOF instruments33,35,37 or by ESI/CAD in quadrupole ion trap instruments,34,39-41 which can be readily interpreted in a de novo manner. (22) Chen, B. L. T. Drug Discovery Today: Biosilico 2004, 2, 85-90. (23) Taylor, J. A.; Johnson, R. S. Anal. Chem. 2001, 73, 2594-2604. (24) Johnson, R. S.; Taylor, J. A. Mol. Biotechnol. 2002, 22, 301-315. (25) Taylor, J. A.; Johnson, R. S. Rapid Commun. Mass Spectrom. 1997, 11, 10671075. (26) Dancik, V.; Addona, T. A.; Clauser, K. R.; Vath, J. E.; Pevzner, P. A. J. Comput. Biol. 1999, 6, 327-342. (27) Ma, B.; Zhang, K. Z.; Hendrie, C.; Liang, C. Z.; Li, M.; Doherty-Kirby, A.; Lajoie, G. Rapid Commun. Mass Spectrom. 2003, 17, 2337-2342. (28) Yan, B.; Pan, C.; Olman, V. N.; Hettich, R. L.; Xu, Y. Bioinformatics 2005, 21, 563-574. (29) Frank, A.; Pevzner, P. Anal. Chem. 2005, 77, 964-973. (30) Ma, B.; Zhang, K. Z.; Liang, C. Z. J. Comput. Syst. Sci. 2005, 70, 418-430. (31) Bruni, R.; Gianfranceschi, G.; Koch, G. J. Pept. Sci. 2005, 11, 225-234. (32) Yan, B.; Qu, Y. X.; Mao, F. L.; Olman, V. N.; Xu, Y. J. Comput. Sci., Technol. 2005, 20, 483-490. (33) Keough, T.; Youngquist, R. S.; Lacey, M. P. Proc. Natl. Acad. Sci. U.S.A. 1999, 96, 7131-7136. (34) Lee, Y. H.; Han, H.; Chang, S. B.; Lee, S. W. Rapid Commun. Mass Spectrom. 2004, 18, 3019-3027. (35) Wang, D. X.; Kalb, S. R.; Cotter, R. J. Rapid Commun. Mass Spectrom. 2004, 18, 96-102. (36) Keough, T.; Lacey, M. P.; Youngquist, R. S. Rapid Commun. Mass Spectrom. 2002, 16, 1003-1015. (37) Keough, T.; Lacey, M. P.; Youngquist, R. S. Rapid Commun. Mass Spectrom. 2000, 14, 2348-2356. (38) Keough, T.; Youngquist, R. S.; Lacey, M. P. Anal. Chem. 2003, 75, 156a-165a. (39) Bauer, M. D.; Sun, Y. P.; Keough, T.; Lacey, M. P. Rapid Commun. Mass Spectrom. 2000, 14, 924-929.

6856

Analytical Chemistry, Vol. 78, No. 19, October 1, 2006

Although de novo sequencing of N-terminal sulfonated peptides in quadrupole ion traps has proven to be a promising approach,34,39-41 the confident assignment of sequences is hindered by the lack of the lower series of y ions due to the low-mass cutoff (LMCO) problem inherent to CAD in quadrupole ion traps. Effective trapping and activation of a selected precursor ion in a quadrupole ion trap generally requires that the rf trapping voltage be set at a sufficiently high level that prevents simultaneous storage of approximately the lower third of the m/z range relative to the precursor ion, the so-called low-mass cutoff problem. The low-mass cutoff has been noted as a shortcoming in ESI-ion trap studies conducted by other groups for these N-terminally sulfonated peptides.34,39,40 This concern has sparked our interest in applying IRMPD in a quadrupole ion trap system to overcome these disadvantages. The infrared photoabsorption process avoids translational excitation of the precursor ions, thus reducing signal losses due to ion scattering, and most importantly averts the lowmass cutoff encountered in traditional CAD experiments. Several groups have reported successful applications of IRMPD in quadrupole ion traps,9,11,42-49 including several recent studies focused on the characterization of peptides, entailing the development of thermal-assisted IRMPD by Payne and Glish9 and collisionally assisted IRMPD by Yuichiro et al.48 to provide efficient dissociation of peptides at optimal ion trap pressures. Our group has reported IRMPD for differentiation of phosphorylated and nonphosphorylated peptides, including those in tryptic digests and for rapid screening of peptide mixtures by HPLC-IRMPD.49 Recently, two methods based on relatively more complex scan functions have emerged that also show promise for resolving the low-mass cutoff encountered in quadrupole ion trap instruments. Murrell et al. first discovered that sufficient ion activation could be achieved in shorter periods of ∼100 µs by using very high excitation voltages.50 The large excitation voltage required to obtain efficient dissociation also required larger trapping voltages (i.e., to maintain a qz value of 0.4) to prevent gross precursor ion population losses, thus resulting in a more severe low-mass cutoff problem for the resulting fragment ions. This same concept was further explored more recently by Cunningham et al. in a technique termed high-amplitude short-time excitation,51 which entailed 1-2-ms activation times with the standard qz value of 0.25. This method was shown to have promise in overcoming the LMCO problem in a three-dimensional quadrupole ion trap, via use of an external waveform generator, as well as a two(40) Lee, Y. H.; Kim, M. S.; Choie, W. S.; Min, H. K.; Lee, S. W. Proteomics 2004, 4, 1684-1694. (41) Keough, T.; Lacey, M. P.; Strife, R. J. Rapid Commun. Mass Spectrom. 2001, 15, 2227-2239. (42) Stephenson, J. L.; Booth, M. M.; Shalosky, J. A.; Eyler, J. R.; Yost, R. A. J. Am. Soc. Mass Spectrom. 1994, 5, 886-893. (43) Shen, J.; Brodbelt, J. S. Analyst 2000, 125, 641-650. (44) Keller, K. M.; Brodbelt, J. S. Anal. Biochem. 2004, 326, 200-210. (45) Goolsby, B. J.; Brodbelt, J. S. J. Mass Spectrom. 1998, 33, 705-712. (46) Colorado, A.; Shen, J. X. X.; Vartanian, V. H.; Brodbelt, J. Anal. Chem. 1996, 68, 4033-4043. (47) Crowe, M. C.; Brodbelt, J. S.; Goolsby, B. J.; Hergenrother, P. J. Am. Soc. Mass Spectrom. 2002, 13, 630-649. (48) Hashimoto, Y.; Hasegawa, H.; Yoshinari, K.; Waki, I. Anal. Chem. 2003, 75, 420-425. (49) Crowe, M. C.; Brodbelt, J. S. Anal. Chem. 2005, 77, 5726-5734. (50) Murrell, J.; Despeyroux, D.; Lammert, S. A.; Stephenson, J. L.; Goeringer, D. E. J. Am. Soc. Mass Spectrom. 2003, 14, 785-789. (51) Cunningham, C.; Glish, G. L.; Burinsky, D. J. J. Am. Soc. Mass Spectrom. 2006, 17, 81-84.

dimensional (linear) ion trap.51 Unfortunately, restrictions within the software of some current commercial quadrupole ion trap instruments prevent improvement of the normal LMCO by more than 50 additional m/z units. Schwartz has also developed a new method based on fast excitation referred to as pulsed-q dissociation (PQD).52 This method uses a high qz value of ∼0.7 during ion activation to allow fast and efficient dissociation within a ∼100-µs excitation pulse. After the initial activation stage, the rf trapping voltage is rapidly lowered to create a qz value of 0.1 or less to efficiently trap low-mass ions during fragmentation, which essentially alleviates the LMCO problem encountered in CAD experiments. This pulsed-q dissociation method appears promising for future generations of ion trap instruments. IRMPD, the method highlighted in the present study, affords the ability to generate and retain low-mass fragment ions as well, regardless of the m/z value of the precursor ion and thus yielding effective qz values of e0.05 during MS/MS experiments. We demonstrate the application of IRMPD in a quadrupole ion trap mass spectrometer to improve the sequence information obtained for N-terminal sulfonated tryptic-like peptides. MS/MS was employed for both an array of arginine C-terminated and guanidinated lysine C-terminated peptides, ones chosen to mimic tryptic peptides. The benefits of N-terminal sulfonation in combination with IRMPD were further established by evaluating sequence identification obtained from MS/MS spectra by PEAKS Studio v2.4 software and Mascot search results. The impact of N-terminal sulfonation on the critical energies for dissociation of peptides was also evaluated by energy-variable CAD, thus supporting that the lower dissociation energies of the N-terminal sulfonated peptides enhances their IRMPD efficiencies in a quadrupole ion trap. EXPERIMENTAL SECTION Mass Spectrometry. A ThermoFinnigan LCQ Duo (San Jose, Ca) and a modified ThermoFinnigan LCQ Deca XP were both utilized for all of the mass spectrometry experiments. Solutions were directly infused into the mass spectrometer at a flow rate of 3 µL/min with a Harvard Apparatus PHD 2000 syringe pump (Holliston, MA). Solutions were prepared at a concentration of 10 µM in a 99/1/1 MeOH/H2O/HOAc (v/v) solvent mixture for ESI-MS analysis. For CAD experiments, the ions were activated for 30 ms at the default qz value of 0.25. The isolation window was centered around the ion of interest and maintained at 5 mass units for all peptides. For energy-variable CAD experiments, the collision activation voltage was incrementally stepped, and the voltages necessary to reduce the precursor ion abundance to 50% of its original abundance were recorded. All data were collected within the same day in triplicate. The same peptide species was analyzed at the beginning and end of these experiments to determine a drift of 75% seen for some unmodified peptides in Table 1, even though most of the amino acid sequence is incorrect compared to the true sequence. For an unknown protein sequence, these relatively high PEAKS scores convey confidence in a correct match, leading to

6860 Analytical Chemistry, Vol. 78, No. 19, October 1, 2006

Table 3. PEAKS Interpretation of MS/MS Spectra Collected for a Tryptic Digest of r-Casein Protein by CAD, for a Guanidinated Tryptic Digest of r-Casein Protein by CAD, and for a Guanidinated, N-Terminal Sulfonated Tryptic Digest of r-Casein Protein by CAD or IRMPDa

a The specific peptide location within the protein is shown in the left-most column. The best hit is shown for the various MS/MS spectra, and correctly interpreted amino acids are underlined in boldface type. The percentage values convey a confidence-weighted level compared to other possible matches for which higher values are more significant. The total number of amino acids correctly interpreted relative to the total number of amino acids possible for these 10 tryptic peptides is shown at the bottom of the columns. Guanidinated lysine residues are denoted K*.

false positive sequence tags. The use of IRMPD for analysis of the N-terminally sulfonated peptides provides a lower number of these inaccurate sequence tags. De novo sequencing based on the CAD mass spectra of the N-terminal sulfonated peptides offers a significant enhancement in sequence assignment (Table 1). At least five consecutive amino acids are correctly identified for each of the seven peptides, and the three short enkephalin peptide sequences are completely identified, even in the absence of the y1 ion. The overall de novo algorithm performance is better as well, leading to the correct identification of 50 of the 63 amino acids. The software analysis tends to break down near the C-terminal side of the peptide where there is heavy reliance on the low-mass y ions. Unfortunately, the low-mass cutoff in these CAD experiments prevents observation of these key y ions. When IRMPD was used for the analysis of the seven Nterminal sulfonated peptides, a complete series of y ions down to the y1 fragment was obtained in every case, significantly increasing the accuracy of the de novo algorithm and resulting in correct assignment of 60 of the 63 total amino acids in the series of peptides (Table 1). The three missed amino acids included a Lys (K) for a Gln (Q) in FSWGAEGQR and an Asn (N) for Gly-Gly (GG) in fibrinopeptide A. The first error arises from a common problem inherent in lower resolution mass spectrometers due to the small mass difference (∆M ∼ 0.04 Da) between Lys and Gln. The second best hit from the PEAKS algorithm did in fact provide the entire correct sequence with a Gln (Q) in place of the Lys (K) at the eighth position. This problem could be avoided in peptides from tryptic digests when guanidination of lysine residues is conducted prior to analysis. The second problem (identification of Asn (N) instead of Gly-Gly (GG) in fibrinopeptide A) is associated with an isomeric overlap stemming from the very low abundance of the fragment ion due to cleavage between the two glycine residues. Despite these three sequencing errors, the overall improvement in peptide sequence identification demon-

strates the power of IRMPD analysis in combination with Nterminal sulfonation. Bovine R-casein s1 phosphoprotein was used as a model to demonstrate the advantages of N-terminal sulfonation with IRMPD in a quadrupole ion trap for protein identification. Three forms of the tryptic digest sample were analyzed by ESI-MS/MS experiments: without modification, after guanidination, and finally after both guanidination and N-terminal sulfonation. This systematic approach allowed identification of singly charged tryptic peptides in which guanidination and/or N-terminal sulfonation had occurred by monitoring the characteristic mass additions of 42 or 184 Da, respectively. All of these different singly charged tryptic peptide species were analyzed by CAD, in addition to IRMPD analysis of the N-terminal sulfonated tryptic peptides. Table 3 shows the location of each peptide within the protein along with the best-fit tryptic peptide sequences derived from the PEAKS software de novo interpretation of the MS/MS spectral data. Underlined amino acids in Table 3 designate a correct interpretation by the PEAKS software algorithm relative to the known sequence for the protein. The total number of amino acids correctly identified is shown at the bottom of each column for 10 tryptic peptides. The unmodified tryptic peptides analyzed by CAD provided reasonable sequence coverage for the protein with 54 of the 84 amino acids correctly identified, but the CAD performance was generally worse for the larger peptides in this sample. The guanidinated protein digest sample analyzed by CAD resulted in somewhat poorer amino acid identification with only 48 of the 84 correctly identified. The same overall level of accuracy was assumed for the unmodified arginine C-terminated peptides in the guanidinated CAD results. The CAD data for the N-terminal sulfonated tryptic peptides did not provide any improvement over that obtained for the guanidinated sample, with 48 of the 84 amino acids accurately characterized. By using IRMPD to analyze these N-terminally sulfonated tryptic peptides, the sequence accuracy was significantly enhanced with 60 of the 84 amino acids correctly Analytical Chemistry, Vol. 78, No. 19, October 1, 2006

6861

Table 4. Mascot Search Results from MS Sequence Tags Created in PEAKS from the Data in Table 3 Acquired for a Tryptic Digest of an r-Casein Protein Sample by CAD, for a Guanidinated Tryptic Digest of an r-Casein Protein Sample by CAD, and for a Guanidinated, N-Terminal Sulfonated Tryptic Digest of an r-Casein Protein Sample by CAD or IRMPDa Mascot sequence query tryptic peptides (CAD)

guanidinated peptides (CAD)

N-sulfonated peptides (CAD)

N-sulfonated peptides (IRMPD)

top hit

R-casein s1 precursor

hypothetical protein C25A11.3 Caenorhabditis

042R. chilo iridescent virus (CIV) (insect iridescent virus type 6)

R-casein s1 precursor

protein code Mascot score expectation

KABOSB 143 1.2 × 10-8

T15599 84 8.7 × 10-3

Q19G56•IRV6 75 6.8 × 10-2

KABOSB 335 7.4 × 10-28

R-casein s1 Mascot score expectation

#1 143 1.2 × 10-8

#7 68 4.0 × 10-1

#7 68 4.0 × 10-1

#1 335 7.4 × 10-28

a The Mascot score and expectation value is given for each set of MS/MS results for the best protein hit, which is identified at the top of each column. The lower half of the table shows the Mascot score and expectation value of R-casein s1, including the hit number for the Mascot search for the correct protein. Higher Mascot scores and lower expectation values indicate strong statistical identification of the protein.

characterized. The overall improvement in sequence interpretation was not as dramatic as observed for the individual peptides shown in Table 1; however, the two N-terminal sulfonation strategies, with either IRMPD or CAD analysis, were the only methods to correctly identify the single posttranslation modification, a phosphoserine at position 130, present within this series of tryptic peptides. This modest degree of improvement is likely related to the relative complexity of the protein digest compared to the individual peptide samples. Sample losses from cleanup steps and incomplete reaction yields lead to significantly lower levels of modified tryptic peptides within the samples. Additionally, ESI signal suppression could be considerable in these complex mixtures due to the presence of residual reagents and buffers. An overall decrease in the ESI-MS signal response for the protein digest was observed as the number of modifications to the tryptic peptides increased (data not shown). These lower abundance ions influence the spectral quality of the MS/MS experiments sufficiently to the point of impairing the performance of the de novo interpretation process.27 Despite this fact, the IRMPD analysis of the N-terminally sulfonated tryptic peptides still provides the most accurate sequence information when compared to the CAD results. The sequence tags, including their m/z values, were entered into the Mascot database for a comparative analysis of the overall data quality from PEAKS for protein identification. The Mascot results for these sequence tags are summarized in Table 4, including the top protein hit with the Mascot score and expectation value in the upper half of the table and these same scores for the correct protein, R-casein s1, including the hit number, in the lower half of the table. The Mascot score is calculated in a manner where larger values indicate stronger statistical matching and is composed of the absolute probability of a random match, the expectation value, and the size of the sequence database searched. Analysis of the unmodified tryptic peptides by CAD resulted in sufficiently accurate sequence tags to obtain the correct protein identification with a Mascot score of 143, thus providing a significant match. The sequence tags obtained upon CAD of the guanidinated and N-sulfonated peptides were not accurate enough 6862

Analytical Chemistry, Vol. 78, No. 19, October 1, 2006

to produce the proper protein identification for either of these two cases as shown in the top of Table 4. The correct identification of R-casein s1 protein appears seventh on the list in both of these searches, suggesting that the lack of accuracy in the sequence tags is critical to protein identification. In contrast, the N-sulfonated sample analyzed by IRMPD produces the correct protein identification with a Mascot score of 335. This high Mascot score conveys a large confidence in the identification of the protein when N-terminal sulfonation is combined with IRMPD in an ion trap mass analyzer. CAD analysis of the unmodified tryptic peptide sample also provided accurate protein identification; however, with a lower statistical confidence in the Mascot score (143). Although both methods provided correct significant protein matches, the dramatic different in expectation value for N-sulfonated IRMPD over that of the unmodified CAD sequence tags, 20 orders of magnitude, suggests better de novo interpretation for these spectra by the PEAKS software analysis. CONCLUSION N-Terminal sulfonation of peptides in conjunction with IRMPD in a quadrupole ion trap mass spectrometer provides a versatile and powerful strategy for de novo sequencing. Conventional CAD in a quadrupole ion trap prohibits the analysis of low-mass y fragment ions due to the low-mass cutoff problem and thus has limited success for the sequencing of the C-terminal side of peptides. N-Terminal sulfonation can be used to reduce the critical energies of peptides to facilitate IRMPD within an ion trap at optimal bath gas pressure, and IRMPD allows detection of a complete series of y ions. This additional sequence information can increase de novo analysis, which is essential when the primary sequence is unknown or posttranslation modifications are present. ACKNOWLEDGMENT Funding from the Welch Foundation (F1155) and the National Science Foundation (CHE-0315337) is gratefully acknowledged. Received for review April 21, 2006. Accepted July 18, 2006. AC060760D