Analytical Utility of Small Neutral Losses from Reduced Species in

Oct 7, 2008 - ACS eBooks; C&EN Global Enterprise .... Small neutral losses from charge-reduced species [M + nH](n−1)+• is one of the most abundant...
0 downloads 0 Views 132KB Size
Anal. Chem. 2008, 80, 8089–8094

Analytical Utility of Small Neutral Losses from Reduced Species in Electron Capture Dissociation Studied Using SwedECD Database Maria Fa¨lth,† Mikhail M. Savitski,‡ Michael L. Nielsen,‡ Frank Kjeldsen,‡ Per E. Andren,† and Roman A. Zubarev*,‡ Molecular Biometry, Department for Cell and Molecular Biology, Uppsala University, Box 596, SE-75 124 Uppsala, Sweden, and Medical Mass Spectrometry, Department of Pharmaceutical Biosciences, Uppsala University, Box 583, SE-75 123 Uppsala, Sweden Small neutral losses from charge-reduced species [M + nH](n-1)+• is one of the most abundant fragmentation channels in both electron capture dissociation, ECD, and electron transfer dissociation, ETD. Several groups have previously studied these losses on particular examples. Now, the availability of a large (11 491 entries) SwedECD database (http://www.bmms.uu.se/CAD/indexECD.html) of high-resolution ECD data sets on doubly charged tryptic peptides has made possible a systematic study involving statistical evaluation of neutral losses from [M + 2H]+• ions. Several new types of losses are discovered, and 16 specific (>94%) losses are characterized according to their specificity and sensitivity, as well as occurrence for peptides of different lengths. On average, there is more than one specific loss per ECD mass spectrum, and twothirds of all MS/MS data sets in SwedECD contain at least one specific loss. Therefore, specific neutral losses are analytically useful for improved database searching and de novo sequencing. In particular, N and GG isomeric sequences can be distinguished. The pattern of neutral losses was found to be remarkably dissimilar with the losses from radical z• fragment ions: e.g., there is no direct formation of w ions from the reduced species. This finding emphasizes the difference in fragmentation behaviors of hydrogen-abundant and hydrogen-deficient species. Peptide fragmentation is a cornerstone in mass-spectrometrybased proteomics.1-3 The most commonly employed fragmentation techniques are collisionally activated dissociation, CAD,4 and electron capture/transfer dissociation, ECD5 and ETD.6 Recently we have designed and made publicly available the SwedCAD * Corresponding author. Roman A. Zubarev, e-mail: [email protected]. † Department of Pharmaceutical Biosciences, Uppsala University. ‡ Department for Cell and Molecular Biology, Uppsala University. (1) Aebersold, R.; Goodlett, D. R. Chem. Rev. 2001, 101, 269–295. (2) Aebersold, R.; Mann, M. Nature 2003, 422, 198–207. (3) Steen, H.; Mann, M. Nat. Rev. Mol. Cell Biol. 2004, 5, 699–711. (4) Paizs, B.; Suhai, S. Mass Spectrom. Rev. 2005, 24, 508–548. (5) Zubarev, R. A.; Kelleher, N. L.; McLafferty, F. W. J. Am. Chem. Soc. 1998, 120, 3265–3266. (6) Syka, J. E. P.; Coon, J. J.; Schroeder, M. J.; Shabanowitz, J.; Hunt, D. F. Proc. Natl. Acad. Sci. U.S.A. 2004, 101, 9528–9533. 10.1021/ac800944u CCC: $40.75  2008 American Chemical Society Published on Web 10/07/2008

database,7 one of several public databases containing CAD mass spectra of tryptic peptides.8-13 The uniqueness of SwedCAD lies in the fact that it contains high-resolution, high-accuracy MS/ MS data. The SwedCAD database has now been complemented with an ECD database SwedECD containing high-resolution MS/ MS spectra of over 11 000 tryptic peptides. For most of these peptides, SwedCAD contains a CAD mass spectrum. MS/MS databases have multiple areas of application. They can serve as reference points for testing bioinformatics tools, such as de novo sequencing programs.14,15 Peptide identification can be performed by matching unidentified experimental spectra against the annotated spectra stored in the database.8-11 The matching procedure is much faster and in principle more reliable than conventional database searching, but it is limited by the available annotated MS/MS spectra. However, the main area of application of MS/MS databases is currently the study of peptide fragmentation mechanisms.16-20 ECD along with its sibling ETD are well established fragmentation techniques that are complementary to CAD. 19 Both techniques have found wide applications in MS-based proteomics experiments in the recent years.21-26 A deeper fundamental understanding of ECD is likely to result into advances in the proteomics area. The results obtained by mining large data sets (7) Falth, M.; Savitski, M. M.; Nielsen, M. L.; Kjeldsen, F.; Andren, P. E.; Zubarev, R. A. J. Proteome Res. 2007, 6, 4063–4067. (8) Craig, R.; Cortens, J. C.; Fenyo, D.; Beavis, R. C. J. Proteome Res. 2006, 5, 1843–1849. (9) Desiere, F.; Deutsch, E. W.; King, N. L.; Nesvizhskii, A. I.; Mallick, P.; Eng, J.; Chen, S.; Eddes, J.; Loevenich, S. N.; Aebersold, R. Nucleic Acids Res. 2006, 34, D655–D658. (10) Desiere, F.; Deutsch, E. W.; Nesvizhskii, A. I.; Mallick, P.; King, N. L.; Eng, J. K.; Aderem, A.; Boyle, R.; Brunner, E.; Donohoe, S.; Fausto, N.; Hafen, E.; Hood, L.; Katze, M. G.; Kennedy, K. A.; Kregenow, F.; Lee, H. K.; Lin, B. Y.; Martin, D.; Ranish, J. A.; Rawlings, D. J.; Samelson, L. E.; Shiio, Y.; Watts, J. D.; Wollscheid, B.; Wright, M. E.; Yan, W.; Yang, L. H.; Yi, E. C.; Zhang, H.; Aebersold, R. Genome Biol. 2005, 6. (11) Lam, H.; Deutsch, E. W.; Eddes, J. S.; Eng, J. K.; King, N.; Stein, S. E.; Aebersold, R. Proteomics 2007, 7, 655–667. (12) Martens, L.; Hermjakob, H.; Jones, P.; Adamski, M.; Taylor, C.; States, D.; Gevaert, K.; Vandekerckhove, J.; Apweiler, R. Proteomics 2005, 5, 3537– 3545. (13) McLaughlin, T.; Siepen, J. A.; Selley, J.; Lynch, J. A.; Lau, K. W.; Yin, H. J.; Gaskell, S. J.; Hubbard, S. J. Nucleic Acids Res. 2006, 34, D649-D654. (14) Frank, A. M.; Savitski, M. M.; Nielsen, M. L.; Zubarev, R. A.; Pevzner, P. A. J. Proteome Res. 2007, 6, 114–123. (15) Savitski, M. M.; Nielsen, M. L.; Kjeldsen, F.; Zubarev, R. A. J. Proteome Res. 2005, 4, 2348–2354.

Analytical Chemistry, Vol. 80, No. 21, November 1, 2008

8089

of ECD MS/MS20,27 and their statistical evaluation will be highly useful for development of better search engines28-30 for processing of ECD data. Now, the large SwedECD database of highresolution ECD data sets on doubly charged tryptic peptides is made available at http://www.bmms.uu.se/CAD/indexECD.html. The high-resolution nature of the data is of great importance for studies of ECD fragmentation, since there are still various uncharacterized ion types in ECD, and the determination of their unique elemental composition is essential for spectra rationalization. In this paper we introduce the SwedECD database and present a study of small losses in ECD from charge-reduced molecular species. Small neutral losses from charge-reduced species is one of the most abundant fragmentation channels in both ECD and ETD.5 Several groups have previously studied these losses on particular examples.31-34 Now, the SwedECD database made possible a systematic study involving statistical evaluation of the observed regularities. EXPERIMENTAL SECTION The data were collected from the proteomics analysis of lysates of human cell lines, (K562 and A-431) cells, human milk samples, and Escherichia coli (all samples, except the milk one, are from Sigma Aldrich, St. Louis, MO) performed on an 7 T LTQ FT mass spectrometer (Thermo Fisher Scientific, Bremen, Germany) using consecutive ECD and CAD fragmentation of peptides eluting from the analytical column of a nanoLC system (Agilent 1100). MS/ MS spectra were accumulated with the resolution 25 000 and 50 000. The .raw files were processed by BioWorks software (16) Huang, Y. Y.; Triscari, J. M.; Pasa-Tolic, L.; Anderson, G. A.; Lipton, M. S.; Smith, R. D.; Wysocki, V. H. J. Am. Chem. Soc. 2004, 126, 3034–3035. (17) Huang, Y. Y.; Triscari, J. M.; Tseng, G. C.; Pasa-Tolic, L.; Lipton, M. S.; Smith, R. D.; Wysocki, V. H. Anal. Chem. 2005, 77, 5800–5813. (18) Savitski, M. M.; Kjeldsen, F.; Nielsen, M. L.; Garbuzynskiy, S. O.; Galzitskaya, O. V.; Surin, A. K.; Zubarev, R. A. Angew. Chem., Int. Ed. 2007, 46, 1481–1484. (19) Savitski, M. M.; Kjeldsen, F.; Nielsen, M. L.; Zubarev, R. A. Angew. Chem., Int. Ed. 2006, 45, 5301–5303. (20) Savitski, M. M.; Kjeldsen, F.; Nielsen, M. L.; Zubarev, R. A. J. Am. Soc. Mass Spectrom. 2007, 18, 113–120. (21) Chi, A.; Huttenhower, C.; Geer, L. Y.; Coon, J. J.; Syka, J. E. P.; Bai, D. L.; Shabanowitz, J.; Burke, D. J.; Troyanskaya, O. G.; Hunt, D. F. Proc. Natl. Acad. Sci. U.S.A. 2007, 104, 2193–2198. (22) Hunt, D. F. Mol. Cell. Proteomics 2006, 5, S343–S343. (23) Nielsen, M. L.; Savitski, M. M.; Zubarev, R. A. Mol. Cell. Proteomics 2005, 4, 835–845. (24) Savitski, M. M.; Nielsen, M. L.; Zubarev, R. A. Mol. Cell. Proteomics 2005, 4, 1180–1188. (25) Savitski, M. M.; Nielsen, M. L.; Zubarev, R. A. Mol. Cell. Proteomics 2006, 5, 935–948. (26) Sweet, S. M. M.; Creese, A. J.; Cooper, H. J. Anal. Chem. 2006, 78, 7563– 7569. (27) Savitski, M. M.; Nielsen, M. L.; Zubarev, R. A. Anal. Chem. 2007, 79, 2296– 2302. (28) Craig, R.; Beavis, R. C. Bioinformatics 2004, 20, 1466–1467. (29) Eng, J. K.; McCormack, A. L.; Yates, J. R. J. Am. Soc. Mass Spectrom. 1994, 5, 976–989. (30) Perkins, D. N.; Pappin, D. J. C.; Creasy, D. M.; Cottrell, J. S. Electrophoresis 1999, 20, 3551–3567. (31) Cooper, H. J.; Hakansson, K.; Marshall, A. G.; Hudgins, R. R.; Haselmann, K. F.; Kjeldsen, F.; Budnik, B. A.; Polfer, N. C.; Zubarev, R. A. Eur. J. Mass Spectrom. 2003, 9, 221–222. (32) Cooper, H. J.; Hudgins, R. R.; Hakansson, K.; Marshall, A. G. J. Am. Soc. Mass Spectrom. 2002, 13, 241–249. (33) Haselmann, K. F.; Budnik, B. A.; Kjeldsen, F.; Polfer, N. C.; Zubarev, R. A. Eur. J. Mass Spectrom. 2002, 8, 461–469. (34) Fung, Y. M. E.; Chan, T. W. D. J. Am. Soc. Mass Spectrom. 2005, 16, 1523– 1535.

8090

Analytical Chemistry, Vol. 80, No. 21, November 1, 2008

(Thermo Fisher), and so-called dta files (text files containing molecular mass and charge state of the precursor ion as well as m/z and abundances of the fragment ions) were created, the noise threshold being set at 0.3. The dta files of CAD and ECD MS/ MS for the same peptide were compared and consensus information extracted into the “merged” dta files.23,24 These merged files were submitted to the Mascot search engine (Matrix Sciences, U.K.), and the peptides were identified with the threshold score suggested by Mascot. The reliability of the assigned sequences in the database was above 95%. The Mascot-assigned ECD dta files were put in the SwedECD database. Database Structure. SwedECD contains 11 491 annotated MS/MS spectra of doubly charged tryptic peptide cations. Each MS/MS data set contains at least five c, z, or w ions. The average length of peptides in the database is 10.6 residues, and the average mass is 1196.6 Da. Because of the presence of lysine or arginine in the C-terminal position of almost all tryptic peptides, the spectra are dominated by C-terminal fragments, 55 345 z′ or z• ions are present as compared to 25 403 c′ or c• ions. More than 95% of the peptide sequences in SwedECD have a corresponding CAD spectrum stored in the SwedCAD database. The search functions in SwedECD are the same as in SwedCAD.7 Briefly, it is possible to search for peptides according to their mass, presence or absence of missed cleavages, and partial or exact sequences. The search parameters can be used individually or in combination. Clicking on the “view” icon next to the found sequence produces a graphic plot of the corresponding mass spectrum and a table of fragments. The spectral visualization was inspired by the way the Global Proteome Machine Organization8 visualizes tandem mass spectra. Additionally, deviations of the experimentally measured masses of assigned fragments from their theoretical values are shown. c, w, z•, and z′ ions are marked with different colors. A reliable distinction between z′ and the first isotope of z• cannot be made in each individual case; however, good resolution between the statistical distributions of 13C(z•) and 12C(z′) is achieved.20 Hence if a z• peak is present, it will be marked and its deviation from the theoretical mass calculated while the eventual z′ will be ignored, but if z• is not present then the z′ peak will be marked. Since the efficiency of ECD rarely reaches above 15%, we chose to remove from the graphical representation the peaks corresponding to the precursor ion. These peaks and their abundances are however kept intact in the downloadable spectral data sets. In some cases an abundant peak is still present in the area of the precursor m/z. This happens due to isolation of additional peaks in the vicinity of the precursor m/z during the experiment. Small Neutral Losses from rs in ECD. Analytically, losses of small groups from the charge-reduced species (rs) could be used in the same way as immonium ions in CAD, namely, to confirm the presence of some amino acid residues in the peptide sequence.31-34 In order to be useful in a database search or de novo sequencing, these losses have to be at least 95% specific, meaning that the presence of a loss must coincide with the presence of the corresponding amino acid in at least 95% of the cases. Note that, similar to immonium ions in CAD, the absence of a specific neutral loss in a ECD mass spectrum does not mean the absence of the corresponding amino acid in the sequence. The desired 95% level of certainty can only be obtained after the

Figure 1. Neutral loss mass spectrum obtained by integration of 11 491 ECD mass spectra of doubly charged tryptic peptides with the channel resolution of 0.002 Da. A 0 Da loss corresponds to the monoisotopic mass of charge-reduced species [M + 2H]+•. The height of a peak at -X Da corresponds to the number of mass spectra in which an X Da loss from the reduced species was found.

analysis of thousands of spectra. For performing the statistical evaluation of neutral losses, the ions with masses e150 Da below the theoretical mass of the [M + 2H]+• rs were considered for each MS/MS data set. A vector V of the length 75 000 was constructed such that if an ion was present X Da below the rs mass, then the position integer (X/0.002) in V was incremented by 1. The resulting overall spectrum of lost neutral masses is shown in Figure 1. This spectrum contains hundreds of peaks (several peaks are often present for the same nominal mass). To simplify the data analysis, data in Figure 1 were separated into subsets. In the composite Figure 2, there are 20 plots each corresponding to a different amino acid (AA). These plots are subsets of Figure 1, modified for each AA in such a way that a channel was zeroed unless more than 95% of all MS/MS data sets contributing to this channel were due to peptides containing AA. Thus each spectrum in Figure 2 shows neutral losses occurring with >95% specificity for a given amino acid. For instance, the plot for aspartic acid (D) contains an abundant peak at -60.021 Da. This means that more than 95% of the MS/MS data sets which had a neutral loss from rs between 60.020 and 60.022 Da were assigned to sequences containing aspartic acid. The last four plots in Figure 2 were generated for the appearance of either of two amino acids, and since the noise level in this case was higher, the specificity threshold was put at 98%. For the D/E plot that means that in all nonzero channels >98% of the contributing sequences contain either D or E. The high-mass accuracy of Fourier transform mass spectrometry (FTMS) enables one to determine the elemental composition of the specific losses. Though the suggested compositions are not always unique, there was always only one which made sense chemically. The experimental mass values of the losses measured at the apex of the peaks in Figure 2 did not deviate from the elemental compositions by more than 0.003 Da except for the (35) Kjeldsen, F.; Haselmann, K. F.; Sorensen, E. S.; Zubarev, R. A. Anal. Chem. 2003, 75, 1267–1274. (36) Kjeldsen, F.; Zubarev, R. J. Am. Chem. Soc. 2003, 125, 6628–6629.

71.0371 Da loss from glutamine containing peptides, where the deviation was 0.007 Da, most likely due to the low statistics for this loss. For all the peaks that are tagged in Figure 2 with elemental compositions or ion types (for K, S, T, V), there is less than 1% chance of random (unrelated to amino acid content) appearance in MS/MS of peptides containing the amino acid in question. For rare AAs like Y and M, the peak height for the 1% threshold is naturally lower than for more common amino acids, such as L and A. The losses that are not marked in Figure 2 are either ubiquitous, like the H• losses and NH2• losses from the reduced species, or located in the areas of mass spectra where the resolution was not sufficient to separate them from other losses of similar masses. After identifying all statistically significant amino-acid-specific losses, one can proceed to determine their exact specificity and sensitivity, as well as the basic characteristics of peptides from which these losses occur. For each peak tagged with an elemental composition in Figure 2, the following statistics were gathered. As a first step, all sequences assigned to the MS/MS data set containing the given neutral mass loss within the 0.01 Da mass window were extracted. The sensitivity was calculated as the percentage of sequences that gave the AA-specific peak among all sequences that contained the AA. The specificity was calculated as the percentage of sequences that gave the loss specific to AA and contained AA among all sequences that gave that loss. As can be seen in Table 1, most of the losses exhibit specificity close to or above 95% except for the three losses from arginine and the side-chain loss from glutamine. The average length of the sequences with AA-specific losses is also calculated and can be compared to the average length of all sequences that contain AA. The average position of AA in the sequence exhibiting the specific loss and in all sequences with AA was also calculated. The count was made from the N-terminal, and if there were two AAs in the same sequence, only the position of the first one was recorded. An immediate observation is that the combined NH3 and partial side-chain losses (w-type losses) occur from on average shorter peptides, with AA located close to the N-terminus. Whole sidechain losses are observed from H, D, L, R, Y, and Q and partial side-chain losses together with NH3 losses are observed from E, L, and M, amino acids particularly prone to w-ion formation.27 The double ammonia loss happens preferentially, but not exclusively, from peptides with missed cleavage sites. The frequent occurrence of CO losses from the reduced species of D/Econtaining peptides is rather puzzling. Given that this loss occurs from longer-than-average peptides, a possible explanation is the presence of zwitterionic structures, in which deprotonated sidechains of aspartic/glutamic acids lose CO upon electronic excitation in collisions with electrons possessing higher than average kinetic energies.37,38 An analytically important question is how often the 16 above (>94%)-specific losses occur in tryptic peptides. Figure 3 answers this question. The overall shape of the experimentally derived histogram is reminiscent of the binomial distribution. On average, there are 1.1 specific losses per ECD mass spectrum (11 491 peptides produced 12 527 specific losses). A total of 65% of all (37) Savitski, M. M.; Kjeldsen, F.; Nielsen, M. L.; Zubarev, R. A. J. Proteome Res. 2007, 6, 2669–2673. (38) Kjeldsen, F.; Silivra, O. A.; Zubarev, R. A. Chem.sEur. J. 2006, 12, 7720– 7728.

Analytical Chemistry, Vol. 80, No. 21, November 1, 2008

8091

Figure 2. Subsets from Figure 1 for 20 individual amino acids and four combinations of two amino acids (AA) with similar chemical properties. At least 95% (98% in case of double-AA combinations) sequences contributing to nonzero channels contain the amino acid in question.

peptides gave at least one specific loss and 30% gave at least two specific losses. Peptides that did not have small losses either did not have the amino acids which preferentially give rise to these losses (C, D, N/Q, H, E, M) or had a length that was not optimal. For instance, a peptide of length 17 containing histidine is unlikely to give rise to a 82.0531 loss because long peptides do not favor this loss as can be seen from Table 1. Note that data in Figure 3 are presented for single-acquisition, real-life, online ECD mass 8092

Analytical Chemistry, Vol. 80, No. 21, November 1, 2008

spectra and that integration of several acquisitions that is typical for off-line experiments would yield specific losses even more frequently. The occurrence of the specific losses in the majority of ECD mass spectra justifies incorporation of the neutral-loss feature in MS/MS search engines. The usefulness of small losses is emphasized by the fact that the amino acids that promote strongest losses (C, D, N/Q, H, E, and M) are different from the amino acids that preferentially promote immonium ions in high-

Table 1. Specific Neutral Losses from Reduced Species of Doubly-Charged Tryptic Peptides in ECD (Mass Window Is (0.010 Da) AA type e

C C D Ee Ee H Ie Le Le Le Me Qe R Re R R Y D/Ee N/Q S/Te

elemental composition

exact mass

sensitivity

>specificity

avg peptide length

a

b

c

d

•C2H4NOS + NH3 •C2H4NOS C2H4O2 •C2H3O2 + NH3 C3H4O2 + NH3 C4H6N2 •C2H5 + NH3 •C3H7 + NH3 C4H8 C4H8 + NH3 C3H6S + NH3 C3H5NO •CH3N2 •CH4NO2 NH3 + NH3 C4H11N3 C7H8O CO CH3NO H2O + NH3

107.0279 90.0013 60.0211 76.0398 89.0477 82.0531 46.0657 60.0813 56.0626 73.0891 91.0456 71.0371 43.0296 62.0242 34.0531 101.0953 108.0575 27.9949 45.0215 35.0371

63% 39% 54% 21% 6% 38% 6% 13% 5% 12% 18% 4% 15% 6% 3% 3% 3% 10% 30% 6%

>99% >96% >98% >94% >96% >99% >99% >99% >96% >95% >99% >85% >88% >89% >86% >94% >96% >95% >98% >98%

10.5 12.6 10.8 8.6 10.1 9.0 7.7 7.9 12.0 10.0 10.6 12.9 10.7 8.8 9.7 8.9 9.9 14.3 12.2 11.7

11.6 11.6 11.2 10.9 10.9 11.0 10.9 10.8 10.8 10.8 11.2 11.1 10.3 10.3 10.3 10.3 11.1 10.8 11.0 11.2

4.7 6.6 4.7 3.4 4.0 4.1 2.1 2.5 4.5 3.7 5.9 4.8 8.7 8.1 4.4 5.9 2.5 4.5 4.5 2.5

5.7 5.7 5.0 4.8 4.8 5.1 5.1 4.6 4.6 4.6 5.3 5.3 9.3 9.3 9.3 9.3 5.0 4.2 4.8 4.3

326 326 5252 6941 6941 2053 4854 7663 7663 7663 1552 4622 6367 6367 6367 6367 2635 9152 7054 7703

a Average peptide length in all peptides with AA. b AA position in the sequence counting from the N-terminal. c AA position in the sequence counting from the N-terminal in all peptides with AA. d Number of peptides in SwedECD with a given AA. e Novel side chain losses.

Figure 3. The occurrence of the 16 specific (>94%) neutral losses in 11 491 ECD MS/MS spectra.

energy CAD (H, I, L, F, P, W, and Y, data from http:// www.matrixscience.com/help/fragmentation_help.html), with the exception of histidine. A number of the specific losses reported here (those marked with an asterisk in Table 1) are novel, while others have been reported before but without specificity and sensitivity data.31-34 Some losses reported before are not listed in Table 1 because of their insufficiently high specificity, which is often due to confluence with other losses that are close in mass. With better mass accuracy these losses will most likely also become specific. In de novo sequencing, one of the difficult problems is the distinguishing of the isomeric amino acid combinations, e.g., [N, GG] and [Q, GA, AG]. Cleavage C-terminal to glycine is relatively weak in CAD and not particularly strong in ECD.19 We have previously suggested a strategy that in many cases solves this problem.36 The strategy is based on the SwedCAD-derived observation that ammonia losses from b-ions are >95% specific for AAs with nitrogen in the side-chain. This approach is not,

Figure 4. Upper panel, CAD MS/MS spectrum; lower panel, ECD MS/MS spectrum of the doubly charged peptide NHEEEMK.

however, applicable in all cases. Consider the two MS/MS spectra in Figure 4, for which our de novo program15 suggested with 95% a priori reliability the sequence [N/GG]HEEEMK. The ammonia loss is present in the CAD spectrum (Figure 4 upper panel) but cannot be used to make the N vs GG distinction, as the smallest N-terminal fragment can be assigned as b2 in the case of NH and Analytical Chemistry, Vol. 80, No. 21, November 1, 2008

8093

b3 in the case of GGH, and in both cases the ammonia loss may come from the histidine side-chain. However, in the ECD spectrum (Figure 4 lower panel) a CH3NO loss from rs is present, which is indicative of either an asparagine or a glutamine in the sequence. Since the only possibility is the presence of asparagine in the first position, GG is ruled out and hence the peptide sequence is deduced as NHEEEMK. Detection in the same sample of several other peptides from the same protein supported that assignment. Hydrogen-Deficient vs Hydrogen-Abundant Radical Species. Both rs and z• ions are radical species, but of different types: while reduced species are hydrogen-abundant radicals, z• ions are of the hydrogen-deficient type.39 Because of this difference, one might expect a difference in fragmentation behaviors. Indeed, no neutral losses from rs identical to the side-chain losses from z• ions during the formation of w ions27,34,35 are present in Figure 2 in significant quantity, which accentuates the validity of distinction between the hydrogen-abundant and hydrogen-deficient species. In Table 1, radical losses leading to wn ions do occur from rs but only in concert with the loss of the ammonia group for peptides containing I, L, and E. This is understandable, since the NH3 loss converts a hydrogen-abundant rs into a hydrogendeficient radical. This radical has the same elemental composition (and, if the loss occurs from the N-terminus, the same structure) as the z• ions with the same sequence. In principle, the difference between hydrogen-deficient and hydrogen-abundant species is relative,39 these species are not prohibited from interconversion, e.g., [R-(CO)-NH-CH(-CH3)-R]+• T R-(•COH)-NH-C()CH2 )- (RH+) (39) Zubarev, R. A. Mass Spectrom. Rev. 2003, 22, 57–77.

8094

Analytical Chemistry, Vol. 80, No. 21, November 1, 2008

The inhibition of w ion formation from hydrogen-abundant reduced species indicates that small neutral losses occur in rs prior to their conversion to hydrogen-deficient species and likely in competition with such a conversion. Further research is needed to answer the question whether or not such a behavior exhibits nonergodic features. CONCLUSIONS With the help of the SwedECD database, a systematic study of small neutral losses from the reduced species in ECD was performed. Several new types of losses were discovered, and 16 specific (>94%) losses were classified according to their specificity and sensitivity, as well as occurrence for peptides of different lengths. On average, there is one specific loss per typical ECD mass spectrum of a doubly charged tryptic peptide. The losses can be used for improved database search and de novo sequencing, in particular for distinguishing between N and GG isomeric sequence variants. The fragmentation findings in this work should also be directly applicable to peptides fragmented by ETD. ACKNOWLEDGMENT M.F. and M.M.S. contributed equally to this work. This work was supported by the Knut and Alice Wallenberg Foundation, European Union (grant to consortium EPITOPE) as well as the Swedish research council (Grants 621-2004-4897 and 621-20034877 to R.A.Z.). P.E.A was sponsored by the Swedish Research Council (VR), Grant No. 11565, 2004-3417, the Swedish Foundation for International Cooperation in Research and Higher Education (STINT) Institutional grant, the K&A Wallenberg Foundation, and the Karolinska Institutet Centre for Medical Innovations, Research Program in Medical Bioinformatics. Received for review May 7, 2008. Accepted July 31, 2008. AC800944U